DataHub #getting-started

hallowed-truck-92074

04/12/2023, 10:59 AM

Hello team，What linux distributions did you gays use for the datahub development environment ?

plus1 1

busy-ghost-93490

04/12/2023, 12:07 PM

I'm trying to make Lineage to hive tables and already have the pyspark codes for them , how to make Lineage and what is the recommended steps?

full-toddler-1726

04/12/2023, 2:42 PM

Hello Team. I'm trying to deploy Datahub on my ubuntu server with docker but i have an issue when i execute "*datahub docker quikstart*". Please someone can help me ?

📖 1

full-toddler-1726

04/12/2023, 2:43 PM

Note that i'm able to run docker without sudo.

bland-orange-13353

04/12/2023, 4:52 PM

This message was deleted.

✅ 1

bland-orange-13353

04/12/2023, 5:31 PM

This message was deleted.

✅ 1

steep-alligator-93593

04/12/2023, 6:55 PM

Hi I have deployed datahub using AWS EKS also with incorporating helm, i am looking to use my own Kafka cluster we have set up here which pods rely on that kafka connection so I can troubleshoot?

📖 1

🔍 1

✅ 1

brave-judge-32701

04/13/2023, 2:45 AM

try to use spark-lineage, test with saprk-shell :

bin/spark-shell --master local[*] --deploy-mode client

I can’t see any log about DatahubSparkListener and I got WARN message about :

Copy code

SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".                                                                                                                                                                           
SLF4J: Defaulting to no-operation (NOP) logger implementation

my log4j.properties config is :

Copy code

log4j.rootCategory=WARN, console                                                                                                                                                                                                           
log4j.appender.console=org.apache.log4j.ConsoleAppender                                                                                                                                                                                    
log4j.appender.console.target=System.err                                                                                                                                                                                                   
log4j.appender.console.layout=org.apache.log4j.PatternLayout                                                                                                                                                                               
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n  

log4j.logger.org.apache.spark.repl.Main=WARN
log4j.logger.org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver=WARN

log4j.logger.datahub.spark=DEBUG
log4j.logger.datahub.client.rest=DEBUG

🔍 1

📖 1

✅ 1

brave-judge-32701

04/13/2023, 8:36 AM

When I delete a Data Flow (Pipeline) will its Data Job (Task) be deleted Cascade？

bland-orange-13353

04/13/2023, 8:52 AM

This message was deleted.

📖 1

🔍 1

✅ 1

helpful-van-67650

04/13/2023, 9:08 AM

Hello evertone,when i execute

datahub docker quickstart --quickstart-compose-file ./docker/quickstart/docker-compose.quickstart.yml

, It list the following error,May I ask what I should do next?

🔍 1

✅ 1

📖 1

dry-thailand-78553

04/13/2023, 4:58 PM

I’m new to DataHub, but first impression is WOW! Very nice. I have a simple problem, but not sure if it’s a feature limitation / on purpose / user error.

✅ 1

dry-thailand-78553

04/13/2023, 4:58 PM

How do you search for a column?

✅ 1

fast-midnight-10167

04/13/2023, 7:44 PM

How do you create DataJobs? There is no content in the documentation about how to do this. I found this sample which does not work (the DataFlow class expects an attribute that does not exist), and the input arguments as documented in the DataJob source are not honored, and they don’t even match what’s in the file.

🔍 1

🩺 1

📖 1

proud-dusk-671

04/14/2023, 6:47 AM

Hi, Looking to use Google Authentication for the React App. In this I did not understand how to implement https://datahubproject.io/docs/authentication/guides/sso/configure-oidc-react-google#4-configure-datahub-frontend-to-enable-oidc-authentication. I am running datahub locally (via docker-compose) for PoC purposes and therefore, wouldn't be able to access the file at

docker/datahub-frontend/env/docker.env

. Secondly, I do not want to kill my docker container in fear of data loss. Can you confirm killing the frontend docker container not result in any data loss. Finally, is there any better way to test Google Authentication for the app?

future-holiday-32084

04/14/2023, 5:06 PM

Hi Folks, I'm new to DataHub. When using DataHub Spark Lineage (io.acryldatahub spark lineage0.10.1-1) with a Spark job, it ingests lineage perfectly. However, in the MySQL DataHub database, the "createdby" field shows "urnlicorpuser:__datahub_system". As a result, I cannot remove the lineage manually through the DataHub UI. Could anyone please provide a solution? Additionally, when executing this write command

Copy code

spark.sql("select * from <database>.<table_source>").write.mode("append").format("parquet").saveAsTable("<database>.<table_sink>")

The lineage, as shown in the image below, has been inferred perfectly for the sink table. However, the source table displays the location on my Hadoop Data Lake, even though I'm reading from a table, not a path.

✅ 1

early-hydrogen-27542

04/15/2023, 6:34 PM

👋 everyone! I'm looking for a way to allow folks to retrieve lineage at will for any number of datasets, both upstream and downstream. From what I can tell, the UI is limited to one dataset at a time. In addition, GraphQL does not seem to be able to do this efficiently when the number of hierarchy levels is not known. I started exploring the metadata store itself to potentially help with this challenge. To clarify my understanding, is it accurate to say the UI pulls from the metadata store (specifically

metadata_aspect_v2

) for any versioned aspects? If so, the metadata store should theoretically hold all the lineage information, right? Looking at that table, it looks like the

upstreamLineage

aspect accomplishes that. My initial plan is to: • Ingest

metadata_aspect_v2

into Redshift • Parse the json in the

metadata

column • Model the results to show lineage relationships (e.g. one row per relationship) Am I missing something in terms of native DataHub capabilities that might make this smoother?

🔍 1

📖 1

bland-orange-13353

04/17/2023, 11:07 AM

This message was deleted.

✅ 1

early-area-1276

04/18/2023, 12:20 AM

This content can't be displayed.

✅ 1

colossal-football-58924

04/18/2023, 12:20 AM

Hello, I am trying to execute the quick start but am getting the following error: INFO {datahub.cli.quickstart_versioning:78} - Unable to connect to GitHub, using default quickstart version mapping config. [2023-04-17 190713,390] INFO {datahub.cli.docker_cli:638} - Using quickstart plan: composefile_git_ref='master' docker_tag='head' Docker doesn't seem to be running. Did you start it?

🔍 1

brave-judge-32701

04/18/2023, 2:49 AM

how to config spark-lineage timezone

thank you 1

full-car-89338

04/18/2023, 3:28 AM

Hello how do sort out this Cors error?

cess to fetch at '<http://localhost:8080/api/graphql>' from origin '<http://localhost:5174>' has been blocked by CORS policy: Response to preflight request doesn't pass access control check: No 'Access-Control-Allow-Origin' header is present on the requested resource. If an opaque response serves your needs, set the request's mode to 'no-cors' to fetch the resource with CORS disabled.

📖 2

🔍 1

colossal-autumn-78301

04/18/2023, 2:42 PM

Hey everyone, I am trying to add custom aspect to an entity using the metadata-model-custom. I defined PDL File with a namespace

namespace be.kqe.ste

and these files are in the same directory

metadata-models-custom/src/main/pegasus/be.kqe.ste/CustomAspect_1.pdl

the command

../gradlew build

fails with the following exception. Also, with existing custom model

DataQualityRules

it does not work. But if I remove

dots

in the namespace such as

bekqeste

then it works. any ideas

Copy code

/Users/idris52/Git/datahub/datahub/metadata-models-custom/src/main/pegasus/be.kqe.ste/CustomAspect_1.pdl cannot be resolved.
13,15: Type not found: CustomAspect_1

        at com.linkedin.data.schema.generator.AbstractGenerator.parseSources(AbstractGenerator.java:170)
        at com.linkedin.data.avro.generator.AvroSchemaGenerator.generate(AvroSchemaGenerator.java:191)
        at com.linkedin.data.avro.generator.AvroSchemaGenerator.run(AvroSchemaGenerator.java:165)
        at com.linkedin.data.avro.generator.AvroSchemaGenerator.main(AvroSchemaGenerator.java:123)
Caused by: java.lang.IllegalArgumentException: be.kqe.ste.CustomAspect_1 has namespace that does not match file path '../metadata-models-custom/src/main/pegasus/be.kqe.ste/CustomAspect_1.pdl'
        at com.linkedin.data.schema.generator.AbstractGenerator.validateSchemaWithFilepath(AbstractGenerator.java:221)
        at com.linkedin.data.schema.generator.AbstractGenerator.parseFile(AbstractGenerator.java:194)
        at com.linkedin.data.avro.generator.AvroSchemaGenerator.parseFile(AvroSchemaGenerator.java:221)
        at com.linkedin.data.schema.generator.AbstractGenerator.parseSources(AbstractGenerator.java:132)

✅ 1

important-bear-9390

04/18/2023, 6:34 PM

Hi There! Can i query the info from datahub analytics from graphql ? For example, get total number os dashboards, charts etc ?

✅ 2

adamant-sugar-42217

04/19/2023, 7:36 AM

How to get the lineage details in datahub ?

enough-football-92033

04/19/2023, 6:15 PM

👋 How I can get workaround for this problem https://github.com/datahub-project/datahub/issues/7287 ? I run

datahub docker quickstart

on v0.10.2 using this guide https://datahubproject.io/docs/quickstart/

🔍 1

📖 1

bland-orange-13353

04/19/2023, 11:27 PM

This message was deleted.

🔍 1

rich-crowd-33361

04/19/2023, 11:27 PM

I am facing that error when i try to quick start

rich-crowd-33361

04/20/2023, 12:33 AM

i am running on the windows OS and error says linux

rich-crowd-33361

04/20/2023, 12:33 AM

what is going wrong here