Eighteen new public datasets are now available on a growing AWS registry, ranging from an encyclopedia of DNA elements to African soil chemistry data, via meteorological conditions and turbine power for more than 126,000 wind power sites.
The data was added to the public cloud giant’s Public Datasets programme, which provides free cloud storage for public datasets. AWS users can then choose to build services on top of it using a broad range of its commercial tools.
Amazon says it hopes that it that programme will help developers create “new cloud-native techniques, formats, and tools that lower the cost of working with data”.
New Public Datasets: A Snapshot
Among the newly added datasets: nine year’s worth of georeferenced soil sample data that was collected through the Africa Soil Information Service (AfSIS) project from 2009 to 2018. (Researchers have already used this data to train machine learning algorithms that predict to predict crop yield.)
One of the other newly added datasets was submitted by the University of Washington and contains 2PB of observations from the Murchison Widefield radio telescope array in Western Australia. These observations were taken in order to help detect the signatures of the first formation of galaxies and stars, which can give us a greater understanding of the evolution of the universe.
The University of Pennsylvania meanwhile has added a large-scale multilingual dataset of images paired with words. This dataset matches words with their equivalent in 97 other languages. Words in each languages are stored in parallel to the images that represent that word.
If an organisation joins the Amazon Public Dataset project with the intention of adding its work, it is required to take on some responsibilities in relation to it, including maintaining and managing the quality of all data content it submits to the programme. Any contributor is also required to make “reasonable efforts” to optimise the end user experience.
12 Trillion Lidar Point Cloud Records
Last February the United States Geological Survey (USGS) uploaded its 3D Elevation Program (3DEP) dataset to AWS. This contains a massive 12 trillion LIDAR point cloud records from over 1,200 projects across the United States.
The 3DEP initiative collects three-dimensional information from all over the US using LIDAR technology. A laser-based remote sensing device is fitted onto to an aircraft enabling it to collect billions of LIDAR pulse returns, helping to build a 3D map of an area.
Kevin Gallagher, Associate Director for USGS Core Science System commented: “The 3D Elevation Program was founded on the concept that high resolution elevation data should be provided unlicensed, free and open to the public.”
“This agreement with Amazon helps to fulfill that promise by providing cloud-access to the trillions of data points collected through the Program.”
“The democratization of elevation data is a tremendous achievement by the community of partners leading this effort and promises to revolutionize approaches to applications from flood forecasting and geologic assessments to precision agriculture and infrastructure development.”
There are now 109 datasets available under the programme.
A full registry is available on AWS’s Github repo.