View all newsletters
Receive our newsletter - data, insights and analysis delivered to you
  1. Technology
  2. Software
June 4, 2019

McKinsey Pops Its Open Source Cherry

Management consultancy releases free ETL tool

By CBR Staff Writer

Management consultancy McKinsey has created and released its first ever open source software, Kedro – a Python-based data pipeline workflow development tool that McKinsey says it has used on over 50 of its own projects.

Kedro, designed for used by data scientists (source code here) is the brainchild of two QuantumBlack engineers – Nikolaos Tsaousis and Aris Valtazanos, who created it to manage their workstreams at the analytics firm. (McKinsey bought QuantumBlack in 2015).

Kedro lets users structure analytics code in a uniform way and deliver it production-ready, as well as build modular, versioned data pipelines.

The release represents a “big step” for the firm, said Jeremy Palmer, CEO of QuantumBlack, “as we continue to balance the value of proprietary assets with opportunities to engage as part of the developer community.”

Rise of Open Source

McKinsey joins companies as diverse as dedicated software firms, retailers like Walmart or accommodation marketplace Airbnb in releasing open source tools for popular consumption, amid a resurgence in open source software use in the enterprise  (albeit one that has come hand-in-hand with ongoing challenges surrounding license type, the cloud and concerns about “asset stripping” of code bases.

Read this: Mark Shuttleworth on Taking Canonical Public, Legacy IT and Ubuntu, and his Botanical Garden

For businesses, becoming part of the open source community can help them to attract developers (who would often rather learn a foundational technology than one vendor’s proprietary system) the potential ability to commercialise that project in future by offering a managed service, a way to avoid vendor lock-in by creating and nurturing a free tool (for which “with enough eyeballs, all bugs are shallow”) as well as the warm fuzzy glow of creating a good tool and letting others use it for free.

Content from our partners
Unlocking growth through hybrid cloud: 5 key takeaways
How businesses can safeguard themselves on the cyber frontline
How hackers’ tactics are evolving in an increasingly complex landscape

McKinsey on Kedro: What’s Special?

In a Q&A published alongside an installation guide and other documentation, the project team explained how the tool differs from other workflow schedulers and extract-transform-load (ETL) tools.

“Data pipelines consist of extract-transform-load (ETL) workflows. If we understand that data pipelines must be scaleable, monitored, versioned, testable and modular then this introduces us to a spectrum of tools that can be used to construct such data pipelines. Pipeline abstraction is implemented in workflow schedulers like Luigi and Airflow, as well as in ETL frameworks like Bonobo ETL and Bubbles.”

“We see Airflow and Luigi as complementary frameworks: Airflow and Luigi are tools that handle deployment, scheduling, monitoring and alerting. Kedro is the worker that should execute a series of tasks, and report to the Airflow and Luigi managers. We are building integrations for both tools and intend these integrations to offer a faster prototyping time and reduce the barriers to entry associated with moving pipelines to both workflow schedulers.”

Kedro vs Other ETL Frameworks

McKinsey said the primary differences to Bonobo ETL and Bubbles are:

“Ability to support big data operations. Kedro supports big data operations by allowing you to use PySpark on your projects. We also look at processing dataframes differently to both tools as we consider entire dataframes and do not make use of the slower line-by-line data stream processing.

Project structure. Kedro provides a built-in project structure from the beginning of your project configured for best-practice project management.

Automatic dependency resolution for pipelines. The Pipeline module also maps out dependencies between nodes and displays the results of this in a sophisticated but easy to understand directed acyclic graph; and extensibility.

Project manager Yetunde Dada said: “Data scientists are trained in mathematics, statistics and modelling—not necessarily in the software engineering principles required to write production code. Often, converting a pilot project into production code can add weeks to a timeline, a pain point with clients. Now, they can spend less time on the code, and more time focused on applying analytics.”

Read this: Use of Enterprise Open Source Software is Surging

 

 

Websites in our network
Select and enter your corporate email address Tech Monitor's research, insight and analysis examines the frontiers of digital transformation to help tech leaders navigate the future. Our Changelog newsletter delivers our best work to your inbox every week.
  • CIO
  • CTO
  • CISO
  • CSO
  • CFO
  • CDO
  • CEO
  • Architect Founder
  • MD
  • Director
  • Manager
  • Other
Visit our privacy policy for more information about our services, how New Statesman Media Group may use, process and share your personal data, including information on your rights in respect of your personal data and how you can unsubscribe from future marketing communications. Our services are intended for corporate subscribers and you warrant that the email address submitted is your corporate email address.
THANK YOU