View all newsletters
Receive our newsletter - data, insights and analysis delivered to you
  1. Technology
  2. Data
January 4, 2017

5 Ways to Avoid Common Pitfalls in Large-Scale Analytics Projects

Elastic's Steve Mayzak gives CBR his top five tips for selecting the right products for large-scale analytics projects.

By James Nunns

Data now means more and does more within the enterprise than ever before. From mitigating financial risk by detecting fraud to creating recommendation engines and optimising the customer experience, data is helping companies solve increasingly complex problems.

What, then, have we learned over the past few years as data has moved to the forefront of organisations? With options ranging from proprietary software to cloud-based software and open source tools, today’s developers, architects, and IT professionals have many choices when it comes to large-scale analytics projects. Some require an expensive up-front investment. Others require many resources. And then there are tools that hit the sweet spot: they’re easy to implement and provide extensive features to prototype at scale.

Finding tools that increase project success and help you avoid common pitfalls is key. Here are five tips for selecting the right products for your large-scale analytics projects.


Start Small and Simple

The biggest mistake companies make when embarking on an analytics project is to go too big, too soon. Often, especially when projects are driven from the top down, the temptation is to start by building a complex solution with no clearly defined outcome. This results in expensive and time-intensive projects.

Large-Scale Analytics ProjectsInstead, start small and focus on quick, early ‘wins’ to build confidence with end users. Leverage modern open source technologies that don’t require large, up-front financial commitments and that enable your developers to get started quickly. A desired outcome is an application or prototype built in days or weeks.


Content from our partners
Unlocking growth through hybrid cloud: 5 key takeaways
How businesses can safeguard themselves on the cyber frontline
How hackers’ tactics are evolving in an increasingly complex landscape

Model Scalability Early

Even though you may only be building a prototype, it is critical that you test for scalability as early as possible. Many projects fail because the application wasn’t built or tested with scalability in mind, or because the technologies selected were not designed to handle large data volumes.

Make sure performance testing is not an afterthought. Model out how much data you think you’ll be capturing over time. Test it, reference it, and build the right architecture to enable horizontal scaling with zero performance degradation as data volumes grow.


Prioritise Real-Time Data Availability

We’ve all experienced what happens when an application or website is unresponsive or slow. Today, anything that is not real time is unacceptable. If a query response isn’t perceived as real time, users’ patience quickly runs out, and in many cases, this results in lost revenue and customers.

Ensure that the software you are using can not only handle large data volumes, but also has capabilities to perform fast queries and return results in real time. Use software that has built-in analytical features like aggregations and geo-capabilities combined with real-time search.


Large-Scale Analytics ProjectsUtilise Flexible Data Models

Today’s systems contain unstructured and structured data. Don’t be constrained by relational databases that were built for a structured table of columns and rows, which makes it incredibly difficult to index, parse, search, and analyse large volumes of data collected over time.

Use software with a versatile data structure. Many modern technologies used for analytics projects such as NoSQL databases and Elasticsearch use JSON, supporting both structured and unstructured data types like text, numbers, strings, Boolean values, arrays, and hashes.


Choose Developer-Friendly Tools

With the volume of data being collected today, it’s very difficult to fulfil large-scale analytics projects using software that does not contain open APIs. APIs are used to ingest, index, and analyse data, often from multiple sources or systems.

Empower your developers with software that has a rich set of open and well-documented APIs. This allows them to solve the problem quickly and efficiently. Over time, it also enables your developers to continually innovate and enhance the application as it scales.



Using these five criteria to help you choose the right tools for your large-scale analytics project will improve the project’s time to value and ensure that your organisation is set up for long-term success. Many enterprises like the BBC, Goldman Sachs, and The Guardian have adopted this approach, choosing open source software like the Elastic Stack to solve their critical use cases.  With the right approach, you may find it faster, simpler, and less expensive than you think to make data do what your organisation needs it to do.

Websites in our network
Select and enter your corporate email address Tech Monitor's research, insight and analysis examines the frontiers of digital transformation to help tech leaders navigate the future. Our Changelog newsletter delivers our best work to your inbox every week.
  • CIO
  • CTO
  • CISO
  • CSO
  • CFO
  • CDO
  • CEO
  • Architect Founder
  • MD
  • Director
  • Manager
  • Other
Visit our privacy policy for more information about our services, how New Statesman Media Group may use, process and share your personal data, including information on your rights in respect of your personal data and how you can unsubscribe from future marketing communications. Our services are intended for corporate subscribers and you warrant that the email address submitted is your corporate email address.