Hey everyone I m still exploring the full capacity of what D DataHub #troubleshoot

Hey everyone! I'm still exploring the full capacit...

cool-painting-92220

04/04/2022, 10:57 PM

Hey everyone! I'm still exploring the full capacity of what DataHub can offer and had a few questions on system performance and possible capabilities: 1. Based on how the architecture of the platform is designed, which component of the system (the data source, API connection, server hosting DataHub, etc) bears the largest computational burden from querying for metadata (specifically from databases, ex: when statistics of a table is calculated to provide min, max, range, and so on) in the ingestion process? 2. Curious about a scenario - is it possible to create a documentation page within DataHub that isn't linked to an asset? Something almost similar to a centralized wiki where additional information could be stored, and links to related/important assets could be provided. The use case here would be for new data consumers who have just entered the platform for the first time and don't know where to get started on orienting themselves (the proposed solution would be the starter guides that we set up for these users to kick them off on their data discovery journey).

better-orange-49102

04/05/2022, 3:08 AM

for (2), you can insert a link in the top right corner menu (mouse-over user avatar), the place to configure is inside datahub-web-react/src/conf/themeXX.config.json

better-orange-49102

04/05/2022, 3:11 AM

but you need to recompile the image, unfortunately

dazzling-judge-80093

04/05/2022, 10:55 AM

1. It depends on your source but in general metadata extraction is a lightweight process. As you pointed out dataset profiling can be computation heavy even though you can control which tables/schemas you would like to profile and which profilers you would like to run. On Bigquery, redshift, snowflake we try to run approximate queries wherever possible to make these queries less expensive.

Open in Slack

Previous Next