Hi All! I just started playing with DataHub in las...
# getting-started
k
Hi All! I just started playing with DataHub in last few days and I have a few newbie questions: 1. The search apparently is at dataset level, is there no direct way to search for all dataset that have a specific column? 2. In my quickstart docker instance, i see that the Lineage, Queries and Stats tabs are disabled - I am wondering is it because I am not using a Neo4j backend? If not, is there a way i can push these information? 3. Currently I am using postgres source to ingest metadata from Greenplum (since it is based on Postgres). I would like to tweak this source to say Greenplum with Greenplum icon, rather than postgres to avoid confusion for my users.. What is a quick way to tweak this behavior? Modify postgres source, create a new Greenplum source based on postgres source? Thanks for your inputs!
g
Hey @kind-dawn-17532! Welcome to datahub slack 🙂 1. search indexes a number of features of a dataset- including its name, columns, tags & terms applied to it, description, etc! You can search for a specific column name and all related datasets should appear. Let me know if you are having trouble with this! It may be a bug. 2. All functionality is available whether you use Elasticsearch or Neo4j as your graph backend. The lineage, queries, and stats tab will show as enabled if you have lineage, usage, or data profile data about your dataset. Which sources are you connecting with? Your source's ingestion docs will indicate if and how you can enable usage and dataprofiles. Often, lineage is acquired by integrating with a system that provides edges like airflow, dbt, looker, superset, etc! 3. I would recommend creating a new greenplum source. Ideally you could refactor the code a bit to let some logic be shared betwen the two of them. this way you could contribute your source back to OS! Then, you can make sure the Greenplum data platform is ingested by emitting a Data Platform snapshot. That would contain a url to greenplums icon.
👍 1
k
Thanks a lot for the pointers @green-football-43791 1. let me play with search some more to understand the behavior better 2. Right now i have just ingested postgres (Greenplum source).. but regardless, I think for any database source where there is a notion of tables and views, all views should some one or more tables or views as their first level lineage.. To start with I was thinking for parsing view SQLs to publish these to datahub. I am thinking this right or is there a better way? 3. i would ❤️ to contribute back. I will get back in case I have any questions.
🙌 1
w
@kind-dawn-17532 Please search for the string "@searchable" on this link https://datahubproject.io/docs/metadata-modeling/extending-the-metadata-model/#searchable for details on search capability.
👍 1
k
Thanks so much @witty-keyboard-20400! Great stuff. I will go through this.
m
Hey @kind-dawn-17532: there is a generic sqlalchemy source where you can configure the platform directly (https://datahubproject.io/docs/metadata-ingestion/source_docs/sqlalchemy)
thanks 1
this might work for you
k
Thanks so much @mammoth-bear-12532
m
Also w.r.t Views -> Tables and SQL Parsing would love to collaborate on that
❤️ 1
to get those nice logos, would be great to add it to the platforms.json
👍 1
k
Sure thing!
k
I am not a java person, and i am thinking DataPlatformInfo.json is embeded in a war/jar file in datahub-gms or datahub-frontend-react.. Is that correct? So unless i build related packages, logos may not appear...