ancient-queen-15575
06/16/2023, 1:14 PMsource db (eg mongo) > s3 > snowflake
where a schema or whole DBs are copied between the different storage platforms. The tables keep the same name across the platforms but lineage doesn’t appear automatically so I want to add it.
I know I could use file based lineage to specify individual connections between entities, but there are hundreds of tables so this would be very tedious. I’m looking at making a way of giving two containers and have lineage connections be added for all entities with matching names. Eg table_x
in the s3 container gets connected to table_x
in the snowflake container.
I hope to make configuration files that look similar to file-based-lineage but allow for specifying containers. Eg something like:
lineage:
- entity:
name: database_name.schema_name
type: container
env: PROD
platform: Snowflake
upstream:
- entity:
name: bucket_name.database_name.schema_name
type: container
env: PROD
platform: s3
upstream:
- entity:
name: database_name
type: container
env: PROD
platform: mongodb
An issue I’m having trouble with is how to find the container urn from a name. Once I have a urn I know I can use the container search to find the dataset entities within it. But it would be better for usability if the readable names could be specified.ancient-queen-15575
06/16/2023, 1:15 PM# errors when runs complaining about entity
{
browse(
input: {
count: 100,
type: DATASET,
path: "prod/snowflake/db_name/schema_name"
}
) {
total
entity {
... on Dataset {
properties {name}
}
}
}
}
delightful-ram-75848
06/19/2023, 11:31 PMsome-actor-27079
09/08/2023, 11:25 AMdazzling-rainbow-96194
09/28/2023, 5:42 PM