DataHub #ingestion

white-vegetable-93125

03/15/2023, 4:09 PM

Also, can I have more than one property in the Neo4j nodes?

✅ 1

crooked-carpet-28986

03/15/2023, 4:36 PM

Hey Folks, We have a need to build an auto tagging and cross reference solution to be applied on ingestion time. I am planning to use the Actions Framework to do that such as I will listen metadata changes, do transformations(assign tags and glossary terms) and then send back to metadata store the enriched metadata. However, I am not sure how can I do that once the actions options that already exists do not help me to achieve this goal. I saw this example https://github.com/acryldata/datahub-actions/blob/main/examples/metadata_change_sync.yaml, seems to be a brand new action type what is not presented on documentation, how does it work?

🩺 1

rich-daybreak-77194

03/16/2023, 4:06 AM

Why i can’t copy urn from share button? I use datahub v0.10.0

✅ 1

flat-yak-44699

03/16/2023, 7:43 AM

Hi everyone,I used datahub for the first time ,i try to extract metadata from hive , the kerberos authentication required , i used datahub docker quickstart ,and execute kinit on the linuxe server , but get error when ingestion , Can someone help explain why this happened, and am I missing any actions or Settings? : ERROR {datahub.entrypoints:213} - Command failed: Could not start SASL: b'Error in sasl_client_start (-1) " 'SASL(-1): generic failure: GSSAPI Error: Unspecified GSS failure. Minor code may provide more information (No Kerberos credentials available (default cache: FILE:/tmp/krb5cc_102))' my datahub version : DataHub CLI version: 0.9.5 Python version: 3.7.2 (default, Jan 5 2023, 085611) [GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] my ingestion source source: type: hive config: host_port: 'hive_host:10000' database: comm_out options: connect_args: auth: KERBEROS kerberos_service_name: hive sink: type: console

gorgeous-psychiatrist-31553

03/16/2023, 10:29 AM

Hi everyone! I don't know how to fix this error? Can someone help for this?) It was when I create the new ingestion connection to database. The server. which contained DataHub on docker is isolated for the ethernet. But i create and load worked images on docker. The screen in loads

bumpy-activity-74405

03/16/2023, 10:52 AM

Hi I am trying to update datahub

0.8.44

0.10.0

datahub-upgrade

went fine. But then I was also trying to upgrade the ingestion libs (hive, looker, lookml) that were even older (

0.8.32

). I was feeling optimistic about my endeavor and tried running without changing anything else.

Hive

and

lookml

seemed to work fine, but while ingesting looker dashboards I got:

Copy code

[2023-03-16 10:22:37,384] ERROR    {datahub.entrypoints:188} - Command failed: Failed to initialize: Bad git executable.

I am a bit confused as to why the recipe wants to do something with git. Could not find any configuration options that refer to this in the docs either. Does anyone have a clue on what's happening here?

✅ 1

glamorous-gigabyte-97530

03/16/2023, 1:00 PM

Hello, I am relatively new here. At the moment I am trying to test the functionalities of datahub on localhost in gcp. Is there any way to create a virtual datasource where I can create my own data lineage using the api? I have heard that some of the other data catalogs provide such a facility. Does dataHub offer such a possibility as well? I already have some data on localhost, but I want to create and test my own use cases with my own data. I am a bit confused because I don't think the documentation is beyond my requirements and needs. Is there a simple setup for this?

steep-laptop-41463

03/16/2023, 1:06 PM

Hello! Can you help me please with ingestions Column Stats using API? I try everything that i can find and no result( Maybe you have some curl or python working script or some info? Thanks

✅ 1

bland-orange-13353

03/16/2023, 2:24 PM

This message was deleted.

wide-optician-47025

03/16/2023, 3:10 PM

hello, I am trying to ingest s3 using aws profile - I have aws cli configured with named profile; I have exported export AWS_PROFILE=my-profile but the ingest run fails with

Copy code

416, in get_scoped_config
    raise ProfileNotFound(profile=profile_name)

ProfileNotFound: The config profile (dev-sphinx) could not be found

✅ 1

acceptable-morning-73148

03/16/2023, 5:02 PM

Hello there. I'm working on ingesting some meta-data from a source that's not supported out of the box. I need to be able to delete datasets that are no longer in the source system. I know the URN of the datasets to be deleted. What is the recommended way to delete them from DataHub programmatically so that they are not visible nor searchable in any way from the UI?

plus1 1

hallowed-lizard-92381

03/16/2023, 5:17 PM

A team we support wants to keep a large assortment of queries (.sql files) in datathub - so they can easily view them. Any ideas on best way to handle this? csv?

✅ 1

quiet-lawyer-86356

03/16/2023, 6:20 PM

Hello, While running the recipe for the s3 getting error pls help code attached. File "C:\Projects\datahub\datahub_envi\Lib\site-packages\datahub\ingestion\run\pipeline.py", line 117, in _add_init_error_context raise PipelineInitError(f"Failed to {step}: {e}") from e datahub.ingestion.run.pipeline.PipelineInitError: Failed to find a registered source for type s3: code() argument 13 must be str, not int

s3_movie_data.yaml

agreeable-cricket-61480

03/17/2023, 4:00 AM

Hi, While ingesting metadata from source to datahub can I assign glossary terms automatically. I am adding glossary terms manually for now but I want to know if there is any way to do it while ingesting metadata

rich-daybreak-77194

03/17/2023, 4:18 AM

Why query tab is disable? My datasource is snowflake and datahub v0.10.0 (edited)

agreeable-cricket-61480

03/17/2023, 6:21 AM

Hi, I want to create a view for a user Data Analyst who can see particular Datasets in datahub. In Datahub I am able to create view for myself and make it visible to everyone but how can I create a view for a user and restrict him to certain assets

adorable-computer-92026

03/17/2023, 10:14 AM

Hello everyone, i tried to ingest data from CLI mysql container that i have created on docker and contains a database, i get this error : "sqlalchemy.exc.OperationalError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') (Background on this error at: https://sqlalche.me/e/14/e3q8)" what should be the problem ?

Copy code

[2023-03-17 10:37:14,906] INFO     {datahub.cli.ingest_cli:163} - DataHub CLI version: 0.10.0.2
[2023-03-17 10:37:14,960] INFO     {datahub.ingestion.run.pipeline:184} - Sink configured successfully. DataHubRestEmitter: configured to talk to <http://localhost:8080>
[2023-03-17 10:37:15,769] INFO     {datahub.ingestion.run.pipeline:201} - Source configured successfully.
[2023-03-17 10:37:15,773] INFO     {datahub.cli.ingest_cli:120} - Starting metadata ingestion

microscopic-leather-94537

03/17/2023, 10:16 AM

hi folks ! I am using data hub , but I want to restore my datahub information . I followed the commands and steps to create backup.sql file. When I downloaded datahub on a new sytema and used the command to restore that sql backup file , I expected to get same information and restored databut I didnt any one has done it or can help me out ?

bitter-evening-61050

03/17/2023, 10:39 AM

Hi Team, I using unity catalog ingestion to datahub and i have successfully ingested and able to the lineage the data assets and lineage. The problem i am currently facing is in one of the tables i have added two extra columns to it and ran the job in databricks and ingested the metadata to datahub. In schema of the table there two versions got created but in lineage i am not able to see the history of the lineage before the changes i made .Can anyone please me how to see the lineage history. Another issue is i have successfully ran the job in databricks and ingested to datahub and i able to see the details correctly .In another the job got failed after two steps .Is there any way we can show the job failure type lineage in datahub. please help me in resolving these issues.

little-park-33017

03/17/2023, 1:30 PM

Hello All, Do you know if column level lineage's feature is supported for PowerBI ? Thank you and happy Friday !

✅ 1

lively-dusk-19162

03/17/2023, 6:28 PM

Hello all, I have build datahhb-gms module and try to re dploy datahub-gms then I got the following error: (cd docker && COMPOSE_DOCKER_CLI_BUILD=1 DOCKER_BUILDKIT=1 docker-compose -p datahub -f docker-compose-without-neo4j.yml -f docker-compose-without-neo4j.override.yml -f docker-compose.dev.yml up -d --no-deps --force-recreate datahub-gms) [+] Running 8/8 ⠿ datahub-gms Pulled 54.9s ⠿ 552d1f2373af Pull complete 3.3s ⠿ 6aee4541058c Pull complete 52.2s ⠿ 579c214a743a Pull complete 53.5s ⠿ 7186ff38e0e0 Pull complete 53.5s ⠿ 549fd62cd3b6 Pull complete 53.6s ⠿ 477867e3580d Pull complete 53.7s ⠿ 13023bc52a23 Pull complete 53.8s [+] Building 3.0s (13/17) => [internal] load build definition from Dockerfile 0.0s => => transferring dockerfile: 115B 0.0s => [internal] load .dockerignore 0.0s => => transferring context: 2B 0.0s => [internal] load metadata for docker.io/library/alpine:3 2.0s => [internal] load metadata for docker.io/library/golang:1-alpine3.17 1.9s => [auth] library/golang:pull token for registry-1.docker.io 0.0s => [auth] library/alpine:pull token for registry-1.docker.io 0.0s => [binary 1/5] FROM docker.io/library/golang:1-alpine3.17@sha256:1db127655b32aa559e32ed3754ed2ea735204d967a433e4b605aed1dd44c5084 0.0s => CACHED [base 1/3] FROM docker.io/library/alpine:3@sha256:ff6bdca1701f3a8a67e328815ff2346b0e4067d32ec36b7992c1fdc001dc8517 0.0s => => resolve docker.io/library/alpine:3@sha256:ff6bdca1701f3a8a67e328815ff2346b0e4067d32ec36b7992c1fdc001dc8517 0.0s => CACHED [binary 2/5] WORKDIR /go/src/github.com/jwilder 0.0s => CACHED [binary 3/5] RUN apk --no-cache --update add openssl git tar curl 0.0s => CACHED [binary 4/5] WORKDIR /go/src/github.com/jwilder/dockerize 0.0s => ERROR [binary 5/5] RUN go install github.com/jwilder/dockerize@v0.6.1 0.8s => CANCELED [base 2/3] RUN apk --no-cache --update-cache --available upgrade && apk --no-cache add curl bash coreutils gcompat && apk --no-cache add openjdk 0.8s ------ > [binary 5/5] RUN go install github.com/jwilder/dockerize@v0.6.1: #0 0.735 go: github.com/jwilder/dockerize@v0.6.1: github.com/jwilder/dockerize@v0.6.1: Get "https://proxy.golang.org/github.com/jwilder/dockerize/@v/v0.6.1.info": tls: failed to verify certificate: x509: certificate signed by unknown authority ------ failed to solve: executor failed running [/bin/sh -c go install github.com/jwilder/dockerize@$DOCKERIZE_VERSION]: exit code: 1

lively-dusk-19162

03/17/2023, 6:29 PM

Could anyone please help me out in resolving this error?

teamwork 1

breezy-honey-91751

03/17/2023, 8:48 PM

Hi All!! I am quite new to datahub and want to integrate glue, athena and redshift with datahub. I want all catalog tables belong to athena and glue jobs should come under glue. But when i am enabling plaform property with athena during ingestionn then all datasets along with jobs comes under athena. Is there any way to segregate these -> Datasets to athena and Jobs to glue.

little-breakfast-38102

03/17/2023, 11:46 PM

Hello! I am doing S3 ingestion using following include pattern. • include: ‘s3://bucket-name/foo/{table}/{partition_key[0]}={partition[0]}/*.parquet’ And exclude pattern as exclude: ‘*/bar/*’ (with ** preceding and at the end Are there options to filter or pass in table name, say folders starting with “temp*” in place of {table}. Objective is to create one dataset per entity that satisfies pattern in include. The overhead of reading all the files in include statement is too high for my use case

great-painter-51951

03/19/2023, 9:57 AM

Hello, Trying to send great expectations validations results to DataHub, but http://localhost:9002/aspects?action=ingestProposal cannot be found. Funning DataHub locally using the docker container and all images seems to be running fine. Any support will be much appreciated.

✅ 1

great-painter-51951

03/19/2023, 2:55 PM

Hello again. I am running Datahub straight from Docker, and struggling to mount a new volume for accessing certificates for setting up connections with sources. Any insights on how this can be achieved?

microscopic-room-90690

03/20/2023, 3:18 AM

Hi team, can we use

File Based Lineage

to define column lineage? If it works, is there a demo? Any help will be appreciated. Thank you!

lookaround 1

✅ 1

microscopic-room-90690

03/20/2023, 7:42 AM

Hi team, can anyone help to show me the difference between

Hive

and

presto-on-hive

? If

presto-on-hive

ingest metadata from Metastore DB, what does

hive

do?

✅ 2

fierce-restaurant-41034

03/20/2023, 9:59 AM

Hi all, We are using snowflake and dbt in datahub and I wanted to know how can I bring the column level descriptions of dbt to snowflake datasets (tables). I see the table description from dbt but not the column level. Is it configurable? Thanks

✅ 1

adorable-computer-92026

03/20/2023, 10:10 AM

Copy code

[2023-03-17 10:37:14,906] INFO     {datahub.cli.ingest_cli:163} - DataHub CLI version: 0.10.0.2
[2023-03-17 10:37:14,960] INFO     {datahub.ingestion.run.pipeline:184} - Sink configured successfully. DataHubRestEmitter: configured to talk to <http://localhost:8080>
[2023-03-17 10:37:15,769] INFO     {datahub.ingestion.run.pipeline:201} - Source configured successfully.
[2023-03-17 10:37:15,773] INFO     {datahub.cli.ingest_cli:120} - Starting metadata ingestion

Slack Conversation

✅ 1