Data Catalog Explanation & Detailed Feature Walkthrough
generate huge data from various sources which are stored at varied locations. Data can be in files, relational databases, data warehouses and the cloud.
Sprinkle’s No-Code Data integration & Data Analytics solutions enable organisations to bring all data to one place and perform a holistic, 360 degrees analysis.
Sprinkle presents, the Data Catalog, to organize, discover, manage & use the data with ease.
Catalog: Explanation & Feature Walkthrough
A Data Catalog is a catalogue
of all data assets of the organization along with tools
that help users to locate the data required for the analysis.
It is a go-to place for all data related needs of the users. Data Catalog enables users to “Search & Discover” data, “Understand” the context to the data through Technical & Business Metadata, and “Manage” data and its access.
From the left navigation panel,
click on the Catalog Option to use the Data Catalog.
The Catalog homepage is in the form of a table listing page which provides brief information about the tables in the data warehouse.
It presents a tabular view of the information & the Tags. The metadata columns are highlighted in the below image. The table listing page can be customized to include the columns as needed using the customize column display icon on the right.
Table listing Page: Business Metadata Columns & Customize Column DIsplay Options
The search bar
can be used to search the schema using the keywords. To view the details about the table,
click on the table.
On clicking on a table, the overview section opens up. To view column level stats
click on Refresh Stats.
The stats are displayed at the column level, the name of the column, its datatype, description (provided by the user), total distinct values and Missing values in the column.
Enable Advance Stats Toggle Button to generate Stats like the distinct values, missing values & frequency histograms which shows the spread of data.
Make sure to run Refresh Stats after enabling the Advance Stats Button to display the stats & the Frequency Charts.
Advance Stats & Refresh Stats
The jobs tab enables the user to run jobs and keep a track of the older jobs.
For a periodic refresh of stats, users can enable Autorun and set frequency according to their requirements.
The Preview tab shows the top rows of the table. Using Show Entries you can view up to 500-row entries.
Pipeline and Lineage help identify dependencies to this table and do impact analysis. The pipeline Graph shows the sequence of processes that create or update this particular table.
Lineage shows all the input tables that are used to create this particular table.
The status drop-down at the top can be used by the owner or the data stewards to correctly tag the data assets. This would help other users to identify the status of the data before using it further. The tags available are, WIP, Verified, Deprecated & Has Issues.