Machine learning data catalogs allow companies to categorize, access, interpret, and collaborate around company data across multiple data sources, while maintaining a high level of governance and access management. Artificial intelligence is key to many features of machine learning data catalogs, enabling functionality such as machine learning recommendations, natural language querying, and dynamic data masking for enhanced security purposes.
Companies can utilize machine learning data catalogs to maintain data sets in a single location so that searching for and discovering data is simple for everyday business users and analysts alike. Users have the ability to comment on, share, and recommend data sets so colleagues can have an immediate understanding of what they are querying. Additionally, IT administrators can put into place user provisioning to ensure unauthorized employees are not accessing sensitive data.
Machine learning data catalogs are most frequently implemented by companies that have multiple data sources, are searching for one source of truth, and are attempting to scale data usage company-wide. These products are generally administered by IT departments, who can maintain organization and security, but data can be accessed by data scientists or analysts and the average business user. The data can then be transformed, modeled, and visualized either directly in the machine learning data catalog or through an integration with business intelligence software.
It should be noted that not all machine learning data catalogs provide data preparation capabilities and may require an integration with a business intelligence platform. Additionally, these tools differ from master data management software due to their enhanced governance, collaboration, and machine learning functionality.
To qualify for inclusion in the Machine Learning Data Catalog category, a product must:
Cloudera Navigator is a complete data governance solution for Hadoop, offering critical capabilities such as data discovery, continuous optimization, audit, lineage, metadata management, and policy enforcement. As part of Cloudera Enterprise, Cloudera Navigator enables performance agile analytics, supporting continuous data architecture optimization, and meeting regulatory compliance requirements.
IBM Watson Knowledge Catalog powers intelligent, self-service discovery of data, models and more, activating them for artificial intelligence, machine learning and deep learning. Access, curate, categorize and share data, knowledge assets and their relationships, wherever they reside. Learn More: https://ibm.co/2QIK2d0
A Semantic Layer for the Enterprise. Enabling Connected Data Access and Analytics on Demand. Anzo Smart Data Lake (ASDL) connects to both internal and external data sources, including cloud or on-premise Hadoop based data lakes to rapidly ingest and catalog large volumes of structured and unstructured data through horizontally scaled, automated Extract, Transform and Load (ETL) processes that can be mapped to establish a Semantic Layer of business meaning.
Data Catalog automatically crawls, profiles, organizes, links, and enriches all your metadata. Up to 80% of the information associated with the data is documented automatically and kept up-to-date through smart relationships and machine learning, continually delivering the most meaningful data to the user.
Waterline Data Fingerprinting works by analyzing the data values in each data set and profiling the data. Waterline Data then uses that information to create a fingerprint for each column of data—using machine learning to intelligently and automatically tag and match data fingerprints to glossary terms and populate the data catalog. Users can then refine matched terms, and remaining unmatched terms, through crowdsourcing.