The Worlds of Data Science & Software Development Are Converging

Extracting value from data is a massive undertaking that requires full buy-in and collaboration across teams – and companies are responding by hiring data professionals en masse. According to an IBM report, annual demand for the roles of data scientists, data developers, and data engineers will reach nearly 700,000 openings by 2020.

But the hiring figures alone don’t tell the full story of companies embracing a data-first approach. The rise of data has also reshaped existing roles across industries and continues to do so. Data scientists and developers are working more closely together than ever before – a reflection of the fact that the worlds of software production and data science are converging.

There is no need for data scientists to be outright experts in software and programming thanks to emerging tools for using data delivered through the cloud. Making use of the cloud as a shared platform ensures the delivery of the right data to the right people at the right time, without the inefficiency of workflows passing through many hands through hand-overs.

Using cloud services to build a user-friendly and productive collaboration platform means that data scientists can focus on their core skills to derive insights and useful knowledge from data. Using these new tools, data scientists can begin to think of their role as one running parallel to their developer colleagues – both working to deliver business value to end users keen to take advantage of the amount of data being produced.

A marriage between technique and technology

The Technique

Data scientists and developers work on different parts of the same workflow. Data scientists explore the data for new insights, and developers use these insights to automate the workflow and create apps. However, they are both working toward the common goal of delivering well-constructed apps, and that goal is best realised through structured, close collaboration.

The Worlds of Data Science & Software Development Are Converging — Margriet Groenendijk, Developer Advocate at IBM Watson Data Platform.

The mechanics behind the process of app creation involves elements of experimentation. For example – to create an application the data scientists work with raw data, building analytic models to draw useful and applicable insights from data. These insights would be fed back into the development team, which translates the resulting data models into functionality for the end user, through a programming language best suited to the app. This is a continual process, aimed at producing the most functional app possible.

However, closer collaboration is vital to fully capitalise the potential of the vast quantities of data now available and make the process of app creation as efficient as possible. This collaboration can be achieved by employing agile working methodologies and using a cloud platform to share data scientist and developer project workspaces. This allows for shared visibility into the work-in-progress and early results on each side, as well as allowing for quicker turnaround on feedback and co-creation of the project as it unfolds. Secondly, communication is key. Whilst having the tools to effectively share information is critical, there has to be a concerted effort to open lines of communication and share information and feedback as regularly as possible.

This still leaves the question of how data scientists and developers actually digest complex data using cloud services to ensure end users, whether in business or on a consumer app, receive accurate data as fast as possible.

The Technology

A cloud-ready tool encouraging the quicker delivery of data through collaboration is the Jupyter notebook. Notebooks allow users to write and share code in different languages such as Python, R, Scala and Node.js in one place. Data can be loaded from and saved to any cloud database, cleaned and processed to be used for prediction with machine learning models, and finally, the results can be published directly from a notebook as visualisations and APIs. Datasets can be worked with simultaneously – saving time on the traditional feedback loop which requires translation of code into different programming languages, and passing findings back and forth.

PixieDust is the magic open-source ingredient to add to notebooks to speed up the exploration of data. PixieDust allows both data scientists and developers to quickly create data visualisations without any code and publish these as standalone web apps. This means that data becomes accessible to even non-technical end-users. Data presented visually, as opposed to in code and numbers, lends itself more readily to the identification of business opportunities.

Harnessing data to inform business decisions

With the skills of both data scientists and developers combined, and increasingly sophisticated tools such as Jupyter notebooks and PixieDust, the potential for innovation is significantly enhanced. Let’s take one example – weather data.

Weather data can be combined and analysed with many other data sets gathered from a range of sources to inform business decisions. For example, weather influences traffic and can be used to build a system that predicts the likelihood of traffic congestion and collisions. Historic weather data can be related to traffic flow, collisions and road quality data to build a predictive machine learning model that can be published as an API and used with weather forecast data to build a road safety app that can give authorities insight on how to improve safety on the roads.

It is easy to see the potential for creativity and innovation when we make it easy to gain insights and collaborate on ways to use it – exploring the power of data for both consumer and business insights.

Collaboration results in innovation

Cloud continues to be on the rise, and with it comes the power to explore more data, and extract value faster. This cloud-facilitated potential is bringing closer together the roles of data scientists and developers, which have historically operated with a significant degree of separation – driven largely by their usage of different tools and programming languages. However, this no longer poses an issue, with tools that can easily be employed to streamline working processes and encourage more agile working. With this new efficiency, data scientists and developers have the capability to deliver creative, task-focused products faster.