https://datahubproject.io logo
Join Slack
Powered by
# getting-started
  • h

    hallowed-truck-92074

    04/12/2023, 10:59 AM
    Hello team,What linux distributions did you gays use for the datahub development environment ?
    plus1 1
    l
    b
    • 3
    • 2
  • b

    busy-ghost-93490

    04/12/2023, 12:07 PM
    I'm trying to make Lineage to hive tables and already have the pyspark codes for them , how to make Lineage and what is the recommended steps?
    l
    b
    +2
    • 5
    • 5
  • f

    full-toddler-1726

    04/12/2023, 2:42 PM
    Hello Team. I'm trying to deploy Datahub on my ubuntu server with docker but i have an issue when i execute "*datahub docker quikstart*". Please someone can help me ?
    📖 1
    l
    b
    a
    • 4
    • 5
  • f

    full-toddler-1726

    04/12/2023, 2:43 PM
    Note that i'm able to run docker without sudo.
  • b

    bland-orange-13353

    04/12/2023, 4:52 PM
    This message was deleted.
    ✅ 1
    l
    • 2
    • 1
  • b

    bland-orange-13353

    04/12/2023, 5:31 PM
    This message was deleted.
    ✅ 1
    l
    • 2
    • 1
  • s

    steep-alligator-93593

    04/12/2023, 6:55 PM
    Hi I have deployed datahub using AWS EKS also with incorporating helm, i am looking to use my own Kafka cluster we have set up here which pods rely on that kafka connection so I can troubleshoot?
    📖 1
    🔍 1
    ✅ 1
    l
    a
    d
    • 4
    • 10
  • b

    brave-judge-32701

    04/13/2023, 2:45 AM
    try to use spark-lineage, test with saprk-shell :
    bin/spark-shell --master local[*] --deploy-mode client
    I can’t see any log about DatahubSparkListener and I got WARN message about :
    Copy code
    SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".                                                                                                                                                                           
    SLF4J: Defaulting to no-operation (NOP) logger implementation
    my log4j.properties config is :
    Copy code
    log4j.rootCategory=WARN, console                                                                                                                                                                                                           
    log4j.appender.console=org.apache.log4j.ConsoleAppender                                                                                                                                                                                    
    log4j.appender.console.target=System.err                                                                                                                                                                                                   
    log4j.appender.console.layout=org.apache.log4j.PatternLayout                                                                                                                                                                               
    log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n  
    
    log4j.logger.org.apache.spark.repl.Main=WARN
    log4j.logger.org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver=WARN
    
    log4j.logger.datahub.spark=DEBUG
    log4j.logger.datahub.client.rest=DEBUG
    🔍 1
    📖 1
    ✅ 1
    l
    • 2
    • 2
  • b

    brave-judge-32701

    04/13/2023, 8:36 AM
    When I delete a Data Flow (Pipeline) will its Data Job (Task) be deleted Cascade?
    l
    a
    • 3
    • 4
  • b

    bland-orange-13353

    04/13/2023, 8:52 AM
    This message was deleted.
    📖 1
    🔍 1
    ✅ 1
    l
    • 2
    • 1
  • h

    helpful-van-67650

    04/13/2023, 9:08 AM
    Hello evertone,when i execute
    datahub docker quickstart --quickstart-compose-file ./docker/quickstart/docker-compose.quickstart.yml
    , It list the following error,May I ask what I should do next?
    🔍 1
    ✅ 1
    📖 1
    l
    b
    • 3
    • 9
  • d

    dry-thailand-78553

    04/13/2023, 4:58 PM
    I’m new to DataHub, but first impression is WOW! Very nice. I have a simple problem, but not sure if it’s a feature limitation / on purpose / user error.
    ✅ 1
    l
    • 2
    • 1
  • d

    dry-thailand-78553

    04/13/2023, 4:58 PM
    How do you search for a column?
    ✅ 1
    r
    g
    • 3
    • 8
  • f

    fast-midnight-10167

    04/13/2023, 7:44 PM
    How do you create DataJobs? There is no content in the documentation about how to do this. I found this sample which does not work (the DataFlow class expects an attribute that does not exist), and the input arguments as documented in the DataJob source are not honored, and they don’t even match what’s in the file.
    🔍 1
    🩺 1
    📖 1
    l
    a
    • 3
    • 7
  • p

    proud-dusk-671

    04/14/2023, 6:47 AM
    Hi, Looking to use Google Authentication for the React App. In this I did not understand how to implement https://datahubproject.io/docs/authentication/guides/sso/configure-oidc-react-google#4-configure-datahub-frontend-to-enable-oidc-authentication. I am running datahub locally (via docker-compose) for PoC purposes and therefore, wouldn't be able to access the file at
    docker/datahub-frontend/env/docker.env
    . Secondly, I do not want to kill my docker container in fear of data loss. Can you confirm killing the frontend docker container not result in any data loss. Finally, is there any better way to test Google Authentication for the app?
    l
    a
    • 3
    • 10
  • f

    future-holiday-32084

    04/14/2023, 5:06 PM
    Hi Folks, I'm new to DataHub. When using DataHub Spark Lineage (io.acryldatahub spark lineage0.10.1-1) with a Spark job, it ingests lineage perfectly. However, in the MySQL DataHub database, the "createdby" field shows "urnlicorpuser:__datahub_system". As a result, I cannot remove the lineage manually through the DataHub UI. Could anyone please provide a solution? Additionally, when executing this write command
    Copy code
    spark.sql("select * from <database>.<table_source>").write.mode("append").format("parquet").saveAsTable("<database>.<table_sink>")
    The lineage, as shown in the image below, has been inferred perfectly for the sink table. However, the source table displays the location on my Hadoop Data Lake, even though I'm reading from a table, not a path.
    ✅ 1
    l
    a
    • 3
    • 2
  • e

    early-hydrogen-27542

    04/15/2023, 6:34 PM
    👋 everyone! I'm looking for a way to allow folks to retrieve lineage at will for any number of datasets, both upstream and downstream. From what I can tell, the UI is limited to one dataset at a time. In addition, GraphQL does not seem to be able to do this efficiently when the number of hierarchy levels is not known. I started exploring the metadata store itself to potentially help with this challenge. To clarify my understanding, is it accurate to say the UI pulls from the metadata store (specifically
    metadata_aspect_v2
    ) for any versioned aspects? If so, the metadata store should theoretically hold all the lineage information, right? Looking at that table, it looks like the
    upstreamLineage
    aspect accomplishes that. My initial plan is to: • Ingest
    metadata_aspect_v2
    into Redshift • Parse the json in the
    metadata
    column • Model the results to show lineage relationships (e.g. one row per relationship) Am I missing something in terms of native DataHub capabilities that might make this smoother?
    🔍 1
    📖 1
    l
    m
    a
    • 4
    • 9
  • b

    bland-orange-13353

    04/17/2023, 11:07 AM
    This message was deleted.
    ✅ 1
    l
    • 2
    • 1
  • e

    early-area-1276

    04/18/2023, 12:20 AM
    This content can't be displayed.
    ✅ 1
    l
    • 2
    • 1
  • c

    colossal-football-58924

    04/18/2023, 12:20 AM
    Hello, I am trying to execute the quick start but am getting the following error: INFO {datahub.cli.quickstart_versioning:78} - Unable to connect to GitHub, using default quickstart version mapping config. [2023-04-17 190713,390] INFO {datahub.cli.docker_cli:638} - Using quickstart plan: composefile_git_ref='master' docker_tag='head' Docker doesn't seem to be running. Did you start it?
    🔍 1
    l
    a
    • 3
    • 4
  • b

    brave-judge-32701

    04/18/2023, 2:49 AM
    how to config spark-lineage timezone
    thank you 1
    l
    a
    +2
    • 5
    • 11
  • f

    full-car-89338

    04/18/2023, 3:28 AM
    Hello how do sort out this Cors error?
    cess to fetch at '<http://localhost:8080/api/graphql>' from origin '<http://localhost:5174>' has been blocked by CORS policy: Response to preflight request doesn't pass access control check: No 'Access-Control-Allow-Origin' header is present on the requested resource. If an opaque response serves your needs, set the request's mode to 'no-cors' to fetch the resource with CORS disabled.
    📖 2
    🔍 1
    l
    a
    • 3
    • 2
  • c

    colossal-autumn-78301

    04/18/2023, 2:42 PM
    Hey everyone, I am trying to add custom aspect to an entity using the metadata-model-custom. I defined PDL File with a namespace
    namespace be.kqe.ste
    and these files are in the same directory
    metadata-models-custom/src/main/pegasus/be.kqe.ste/CustomAspect_1.pdl
    the command
    ../gradlew build
    fails with the following exception. Also, with existing custom model
    DataQualityRules
    it does not work. But if I remove
    dots
    in the namespace such as
    bekqeste
    then it works. any ideas
    Copy code
    /Users/idris52/Git/datahub/datahub/metadata-models-custom/src/main/pegasus/be.kqe.ste/CustomAspect_1.pdl cannot be resolved.
    13,15: Type not found: CustomAspect_1
    
            at com.linkedin.data.schema.generator.AbstractGenerator.parseSources(AbstractGenerator.java:170)
            at com.linkedin.data.avro.generator.AvroSchemaGenerator.generate(AvroSchemaGenerator.java:191)
            at com.linkedin.data.avro.generator.AvroSchemaGenerator.run(AvroSchemaGenerator.java:165)
            at com.linkedin.data.avro.generator.AvroSchemaGenerator.main(AvroSchemaGenerator.java:123)
    Caused by: java.lang.IllegalArgumentException: be.kqe.ste.CustomAspect_1 has namespace that does not match file path '../metadata-models-custom/src/main/pegasus/be.kqe.ste/CustomAspect_1.pdl'
            at com.linkedin.data.schema.generator.AbstractGenerator.validateSchemaWithFilepath(AbstractGenerator.java:221)
            at com.linkedin.data.schema.generator.AbstractGenerator.parseFile(AbstractGenerator.java:194)
            at com.linkedin.data.avro.generator.AvroSchemaGenerator.parseFile(AvroSchemaGenerator.java:221)
            at com.linkedin.data.schema.generator.AbstractGenerator.parseSources(AbstractGenerator.java:132)
    ✅ 1
    l
    a
    m
    • 4
    • 4
  • i

    important-bear-9390

    04/18/2023, 6:34 PM
    Hi There! Can i query the info from datahub analytics from graphql ? For example, get total number os dashboards, charts etc ?
    ✅ 2
    l
    b
    • 3
    • 5
  • a

    adamant-sugar-42217

    04/19/2023, 7:36 AM
    How to get the lineage details in datahub ?
    l
    b
    • 3
    • 5
  • e

    enough-football-92033

    04/19/2023, 6:15 PM
    👋 How I can get workaround for this problem https://github.com/datahub-project/datahub/issues/7287 ? I run
    datahub docker quickstart
    on v0.10.2 using this guide https://datahubproject.io/docs/quickstart/
    🔍 1
    📖 1
    l
    b
    • 3
    • 5
  • b

    bland-orange-13353

    04/19/2023, 11:27 PM
    This message was deleted.
    🔍 1
    l
    r
    a
    • 4
    • 5
  • r

    rich-crowd-33361

    04/19/2023, 11:27 PM
    I am facing that error when i try to quick start
  • r

    rich-crowd-33361

    04/20/2023, 12:33 AM
    i am running on the windows OS and error says linux
    l
    b
    g
    • 4
    • 4
  • r

    rich-crowd-33361

    04/20/2023, 12:33 AM
    what is going wrong here
1...606162...80Latest