For better or for worse: How has the data science industry changed, and how has machine learning had an effect?
One of the biggest changes in the industry that I’ve noticed over the last few years is that more and more companies are embracing open source – for example, by sharing parts of their tool chain in GitHub. The availability of these tools is really great for making the most out of machine learning. I think data science and open source-related conferences are also growing, which means more people are not only getting interested in data science, but are also considering working together as open source contributors in their free time, which is a good thing.
Another shift in the industry that I’ve witnessed is the fact that deep learning is becoming more and more popular. However, this isn’t necessarily a positive change. There seems to be an urge to apply deep learning to problems even if it doesn’t necessarily make sense to. The willingness to embrace deep learning over the last few years is great, but sometimes it feels like lots of companies are succumbing to the urge to use deep learning just for the sake of it.
The positive thing to take away from this cultural shift is that people are getting excited about new and creative approaches to problem-solving, which can drive the field forward. One of the great things is that this excitement is driving communication and collaboration across different areas. For example, I’ve noticed that more and more people from other domains are increasingly familiar with the techniques used in statistical modeling and machine learning. Good communication in collaborations and teams is important, and a common knowledge about the basics makes this communication easier.
Looking forward: What’s the most exciting trend in data science and machine learning?
One trend I’m really interested in is the development of libraries that make machine learning even more accessible. Popular examples include TPOT and AutoML/auto-sklearn. These libraries further automate the building of machine learning pipelines. However, interpreting the outcomes of predictive modeling tasks and evaluating the results appropriately will always require a certain amount of knowledge. These tools don’t aim to replace experts in the field, but they may be able to make machine learning accessible to a broader audience of non-programmers. I see these tools not as replacements but rather as assistants for data scientists, to help automate tedious tasks such as hyperparameter tuning.