Google has belatedly brought its Cloud Data Catalog to the market in a public beta release, announced this week. Google Data Catalog is a data management service that enables quick and easy searches of stored data and metadata. The service allows users to quickly find data by table or column name.
Google has built its catalog service on the same IT search infrastructure that supports Gmail and Google Drive. With Google Data Catalog researchers can search through tables stored on BigQuery, it also allows the user to search topics across all of the cloud projects that they are running.
Isaac Kuek Google Cloud Interaction Designer wrote in a blog that: “Integration with access controls defined in Cloud Identity and Access Management (IAM) returns data that you have access to, reducing the need to configure additional permissions within Data Catalog.”
At its heart Data catalog allows users to oversee data by tagging data assets with metadata that enables them to easily search through it at a later date. Business metadata can be defined using tag templates which can be applied to numerous data sets.
“Data Catalog extends the traditional business glossary concept by supporting doubles, booleans, and enumerated type in addition to storing metadata as strings. For example, you can assign a business category as an enumerated type to a data asset from a pre-set list of categories, ensuring consistent categories are used when capturing metadata,” writes Kuek.
Not the Only Cloud Data Catalog
The Google beta of its data catalog follows releases of similar services on cloud platforms like Amazon’s AWS, which introduced AWS Glue in 2017.
The AWS Glue Data Catalog is an index to the location, runtime metrics and schema of data stored in the AWS platform. All information stored in the data catalog is done so as metadata tables of which each table specifies a single data store.
There are other vendors than the big cloud service providers supplying data cataloguing software that can also be used natively on AWS, Azure, and Google Cloud.
California-based Waterline Data uses an AI –driven algorithm to automatically tag data stored in data warehouses. Its service aggregates all data from sources like public and on-premises clouds into one single dashboard.
Waterline Data CEO Kailash Ambwani commented in a release that “As more organizations realize the value in shifting their data to the Cloud and hybrid environments, the need for a cloud-native enterprise data catalog that can provide single truth visibility into their data assets, regardless of their location, has become increasingly critical,” said Waterline Data CEO Kailash Ambwani.