Metadata exploration and measurement tools suite
Managing more than 200 000 tables with over 4 million columns across the data lake.
Our client is one of the largest retailers in the world, delivering yearly billions of items and constantly innovating and disrupting the shopping experience.
The client had very large amounts of data from various systems, but most of them did not have metadata. Therefore, analyzing this data and using it by other systems was arduous and sometimes even impossible. The customer had already been using a commercial metadata management system providing data stewards with basic functionalities to manipulate metadata but lacking an easy way to measure the results of their work. Due to the overwhelming amount of metadata that needed to be supplied, the stewards often found it difficult to determine which data in the system needed more curation.
VL team took steps to integrate with the aforementioned metadata management system. After extracting the metadata stored in it, we defined a parameterizable set of criteria which allowed us to measure the quality of the metadata. We matched the quality scores of metadata with data stewards responsible for supplying them and with areas in the system which they were supposed to describe. Thanks to that we obtained the ability to exactly determine which data (with the precision of a single table and column within a database) needed to be curated better and by whom.
The resulting metrics were then visualized in a form of interactive dashboards with diagrams and presented to data stewards and their managers. This helped the client to more effectively allocate their limited human resources to struggle with the problem of incomplete metadata. This was of great importance in the situation when the amount of data to be cataloged was so huge that providing the metadata for only a small subset of it was possible in a reasonable time. So the choice of the right focus was crucial.
The process of data extraction from the external metadata management system resulted in structuring and exposing the metadata in a better way. This, in turn, had a positive side effect of making the client’s analysts able to easily perform custom queries on metadata stored in the system in order to answer important business questions.
Making the client aware of the scale of the problem is our greatest success
Our solutions allowed the users of the client’s system to gain more confidence in the data and made their work easier, more efficient, and error-prone
Identifying business entities whose metadata do not fulfill given requirements
Answering complex business queries based on metadata
Metadata metrics covering more than 200 000 tables with over 4 million columns across the data lake
Integration with other commercial and open-source metadata management tools
Visualization of metadata coverage within different parts of the managed system with adjustable granularity