View all newsletters
Receive our newsletter - data, insights and analysis delivered to you
  1. Technology
  2. Data
August 31, 2018

GCP Adds 5PB’s of Storage to Public Datasets Programme

Programme aims to make public datasets (with an emphasis on geospatial material) available and ready to run machine learning tools on

By CBR Staff Writer

Google Cloud Platform (GCP) has added an additional five petabytes (5PB) of data storage for public datasets to its BigQuery enterprise data warehouse, which already hosts over 100 machine learning-ready public datasets.

The Google Cloud Public Datasets programme, launched in 2016, works with public data providers to store copies of high-value, high-demand public datasets in GCP to make them more accessible and discoverable.

It currently hosts some 3PB of data including Landsat data from the United States Geological Survey (USGS), along with Bitcoin blockchain transactions, GitHub Activity Data and Human Genome Variants.

The additional storage will be available for the next five years.

Shane Glass Program Manager at Google Cloud Public Dataset Program said in a blog: “We also continuing to curate and host datasets in BigQuery so users can leverage BigQuery Machine Learning to analyze data with machine learning using standard SQL queries… so that our users can JOIN their private data and the world’s public data with as little time and effort as possible.”

Public Datasets

Public datasets on the Google Cloud Platform provide a resource of contrasting datasets that are freely hosted and maintained. These datasets can be accessed and analysed using varying analytics software. Researchers can use open source software like Apache Spark or they can use Google Cloud Dataflow or BigQuery.

Google BigQuery is an enterprise data warehouse which allows people to conduct fast Structured Query Language queries on Google clouds infrastructure. Users can access BigQuery by using a web user interface or by access it through a command-line tool.

Content from our partners
Scan and deliver
GenAI cybersecurity: "A super-human analyst, with a brain the size of a planet."
Cloud, AI, and cyber security – highlights from DTX Manchester

See Also: Google Cloud Announces Collaborations with Accenture and GitHub

The offering is fully managed, so companies do not have to setup any resources prior to using it such as virtual machines or disks.

BigQuery ML lets users utilises machine learning to create and execute learning models which can be used to analyse the large date sets held on site. Currently BigQuery ML supports two types of models; binary logistic regression and linear regression.

On Google Clouds blog they state that: “BigQuery ML democratizes the use of ML by empowering data analysts, the primary data warehouse users, to build and run models using existing business intelligence tools and spreadsheets. This enables business decision making through predictive analytics across the organization.”

Shane Glass added: “We are particularly focused on making available datasets that can support BigQuery’s new GIS capabilities like BigQuery Geo Viz.”


Websites in our network
Select and enter your corporate email address Tech Monitor's research, insight and analysis examines the frontiers of digital transformation to help tech leaders navigate the future. Our Changelog newsletter delivers our best work to your inbox every week.
  • CIO
  • CTO
  • CISO
  • CSO
  • CFO
  • CDO
  • CEO
  • Architect Founder
  • MD
  • Director
  • Manager
  • Other
Visit our privacy policy for more information about our services, how Progressive Media Investments may use, process and share your personal data, including information on your rights in respect of your personal data and how you can unsubscribe from future marketing communications. Our services are intended for corporate subscribers and you warrant that the email address submitted is your corporate email address.