Google Cloud Platform Adds 5PB’s of Storage to Public Datasets Analytics

Google Cloud Platform (GCP) has added an additional five petabytes (5PB) of data storage for public datasets to its BigQuery enterprise data warehouse, which already hosts over 100 machine learning-ready public datasets.

The Google Cloud Public Datasets programme, launched in 2016, works with public data providers to store copies of high-value, high-demand public datasets in GCP to make them more accessible and discoverable.

It currently hosts some 3PB of data including Landsat data from the United States Geological Survey (USGS), along with Bitcoin blockchain transactions, GitHub Activity Data and Human Genome Variants.

The additional storage will be available for the next five years.

Shane Glass Program Manager at Google Cloud Public Dataset Program said in a blog: “We also continuing to curate and host datasets in BigQuery so users can leverage BigQuery Machine Learning to analyze data with machine learning using standard SQL queries… so that our users can JOIN their private data and the world’s public data with as little time and effort as possible.”

Public Datasets

Public datasets on the Google Cloud Platform provide a resource of contrasting datasets that are freely hosted and maintained. These datasets can be accessed and analysed using varying analytics software. Researchers can use open source software like Apache Spark or they can use Google Cloud Dataflow or BigQuery.

Google BigQuery is an enterprise data warehouse which allows people to conduct fast Structured Query Language queries on Google clouds infrastructure. Users can access BigQuery by using a web user interface or by access it through a command-line tool.

Sign up for our weekly news round-up!

Sign up to the newsletter: In Brief

Public Datasets

Sign up for our regular news round-up!

Sign up for our weekly news round-up!

Sign up to the newsletter: In Brief

I would also like to subscribe to:

Thank you for subscribing