Hi friends, I was wondering if you could comment o...
# getting-started
f
Hi friends, I was wondering if you could comment on these features , using the Python API - • Can I get a specific version of an entity. By default the latest is returned. • How can I relate two entities : Is it via lineages. • Can I add custom fields to the ingestion , example, I have a field "sensitive" (Y/N, or T/F) to flag security-sensitive fields.
l
Hey there 👋 I'm The DataHub Community Support bot. I'm here to help make sure the community can best support you with your request. Let's double check a few things first: 1️⃣ There's a lot of good information on our docs site: www.datahubproject.io/docs, Have you searched there for a solution? Yes button 2️⃣ It's not uncommon that someone has run into your exact problem before in the community. Have you searched Slack for similar issues? Yes button
b
Hi @full-beach-33961! • No you cannot fetch a specific version of an entity. What were you hoping to pull off? • Yes, entities are related to one another via lineage edges. Which types of entities are you trying to connect? • Yes, typically we suggest using the Business Glossary feature for defining and adding sensitivity and classifications. You can apply the labels via API, via a custom Transformer, or after ingestion inside the interface
f
I am trying to connect an upstream and downstream entity, part of a data transformation pipeline.
b
Yes that's very possible. So you can connect DataJob entities to Dataset entities via lineage edges, by producing the correct aspects to DataHub
f
Regarding versions, we will have multiple versions of a schema, installed for different accounts/customers… Hence was wondering if it was possible to specify a version for a schema-get.
Could you also comment on custom-properties - can I search DataHub based off a custom-property (or a tag). Thanks.
b
Yes you can
You can use tags or custom properties as alternative to Business Glossary. The restriction with custom properties is that you cannot edit via the UI
But in any of the cases, search is supported
f
Awesome.
b
There is one additional grouping tool: Domains
But that is more for establishing logical collections of related datasets, pipelines etc
(Think of it like a folder)
f
@big-carpet-38439 - one more question. Here’s our use case: 1. We write metadata to DataHub (a schema/table structure) 2. We will have multiple versions of our schema, with our own versioning system. 3. Can we query by URN, and version number . In case there is no direct method.. could you comment on my options.. (We are using Python based APIs, not preferring graphql). 1. Can I use a custom property where I store my version # (this version originates in a. different system, so we can’t use DH’s version). 2. I guess lineage would be one way to do this (each new version has upstream older versions). I will use custom properties on each (I will need to be able to search on custom property as a search criteria). I would store my version # (different from DH assigned version) as a custom field/property
s
@big-carpet-38439 Hi there, I was trying to make lineage like [dataset -> datajob -> dataset] using python api, but it is not working as I thought. upstream and downstream dataset are bigquery table and datajob would be airflow task. I tried the following link but got the following error.
Copy code
raise AvroTypeException(writers_schema, data_obj)
Is there any easier way to add lineage (datasert - datajob - dataset) using python? thanks
d
Hi @big-carpet-38439 , is there a way to bulk ingest custom properties using CSV like how we do for tags, owners other fields that exist on the UI?