https://datahubproject.io logo
Join Slack
Powered by
# getting-started
  • e

    early-hydrogen-27542

    02/23/2023, 8:37 PM
    👋 everyone! I have a
    search
    input that retrieves dataset entities by platform and query text that looks for a specific schema name:
    Copy code
    search(
        input: {type: DATASET, query: "schema_1.", orFilters: [{and: [{field: "platform", values: ["urn:li:dataPlatform:redshift"]}]}], start: 0, count: 10}
      )
    This works well for one schema, and returns nearly the same number of datasets I see through the UI. However, when I try with a second schema name (e.g.
    schema_2
    , it returns 10k+ datasets which is way more than what actually exists. Is there a better way to look for specific schemas of tables?
    b
    • 2
    • 4
  • c

    colossal-autumn-78301

    02/24/2023, 10:51 AM
    Hey guys, I am trying to add custom entities and aspect- using the tutorial guide at https://datahubproject.io/docs/metadata-models-custom/ , however, I get the errors 'com.linkedin.common.CustomProperties' cannot be resolved. Although the recent update build process and deployment using './gradlew quickStartDebug` had successfully deployed the datahub locally. @gray-shoe-75895
    a
    • 2
    • 4
  • r

    rough-lamp-22858

    02/26/2023, 12:35 PM
    Hello, I am new in datahub. I am trying to install the datahub and getting the following error when logged in(UI): 9002/track1 Failed to load resource: the server responded with a status of 401 (Unauthorized) react-dom.production.min.js:216 Error: Could not fetch logged in user from cache. + Exception while fetching data (/corpUser) : java.lang.RuntimeException: Failed to retrieve entities of type CorpUser at Yn (main.3d3f7a94.chunk.js138970) at Ci (main.3d3f7a94.chunk.js183738) at ai (react-dom.production.min.js157137) at Vi (react-dom.production.min.js180154) at Bu (react-dom.production.min.js269343) at Sc (react-dom.production.min.js250347) at Ec (react-dom.production.min.js250278) at kc (react-dom.production.min.js250138) at mc (react-dom.production.min.js243163) at react-dom.production.min.js123115 uu @ react-dom.production.min.js:216 main.3d3f7a94.chunk.js:1 Uncaught (in promise) Error: Could not fetch logged in user from cache. + Exception while fetching data (/corpUser) : java.lang.RuntimeException: Failed to retrieve entities of type CorpUser at Yn (main.3d3f7a94.chunk.js138970) at Ci (main.3d3f7a94.chunk.js183738) at ai (react-dom.production.min.js157137) at Vi (react-dom.production.min.js180154) at Bu (react-dom.production.min.js269343) at Sc (react-dom.production.min.js250347) at Ec (react-dom.production.min.js250278) at kc (react-dom.production.min.js250138) at mc (react-dom.production.min.js243163) at react-dom.production.min.js123115 9002/track1 Failed to load resource: net::ERR_CONNECTION_RESET main.3d3f7a94.chunk.js:1 Uncaught (in promise) TypeError: Failed to fetch at Ve (main.3d3f7a94.chunk.js18025) at Object.track (main.3d3f7a94.chunk.js18428) at analytics.browser.es.js251246 at s (analytics.browser.es.js11940) at Generator._invoke (analytics.browser.es.js34922) at Generator.next (analytics.browser.es.js17421) at j (analytics.browser.es.js81120) at i (analytics.browser.es.js8339) Can you point me to my issue . Thank you
    b
    l
    • 3
    • 18
  • s

    salmon-angle-92685

    02/27/2023, 10:38 AM
    Hello, How would u guys define the difference between a Tag and a Glossary Term ? Thanks !
    ✅ 1
    b
    • 2
    • 1
  • b

    best-umbrella-88325

    02/27/2023, 12:21 PM
    Hello community! I'm trying to run GMS locally, without using docker. As per the documentation around local development, I've built GMS using command ./gradlew metadata servicewar:build. When I run the app, I come across this error.
    Copy code
    Error connecting to node prerequisites-kafka-0.prerequisites-kafka-headless.default.svc.cluster.local:9092 (id: 0 rack: null)
    Looks like it's trying to connect to the internal service name of the kafka pod. However, I couldn't find this name mentioned anywhere in the code or in the properties files. Upon checking the application.yml, it has the value localhost:9092 (I've port forwarded the kafka pod, so this should work). Can someone point me out to the location where I should be changing the URL? Thanks in advance.
    b
    • 2
    • 7
  • b

    bitter-translator-92563

    02/28/2023, 2:47 PM
    Hi all. I'm wondering whether we have development of "Product" type of entity in a roadmap or in any future plans? It would be good to use this type of entities in order to manage users in DataHub and manage groups of users, policies etc. So particular users would be able to add new users, generate tokens, give previlegies to other users.
    a
    • 2
    • 3
  • i

    incalculable-needle-41145

    03/01/2023, 1:08 AM
    Hello all, what is the best way to extract multiple datasets and corresponding schema from datahub? I need this metadata to develop ML algorithm. I can get list of datasets from metadata_aspect_v2 table. But wonder if there is a better and more efficient way to get schema metadata for these datasets.
    b
    • 2
    • 2
  • c

    colossal-autumn-78301

    03/02/2023, 12:12 PM
    Hellow all, is the github repo
    datahub-gma
    https://github.com/linkedin/datahub-gma used/referenced (or used in deployment) from any of the code in the main datahub repository at https://github.com/datahub-project/datahub ? Could not find any references to this from this main repos. Any hints will be appreciated.
    plus1 1
    m
    • 2
    • 1
  • r

    rough-journalist-49506

    03/02/2023, 1:33 PM
    The user walk through when I login to datahub for the first time is an awesome feature. Can I trigger it on-demand as well?
    ✅ 1
    a
    • 2
    • 1
  • g

    green-activity-32141

    03/02/2023, 2:19 PM
    hey folks, I'm trying to use the quickstart script to get Datahub up and running, and it seems to be sort of looping when it tries to start services.
    ✅ 1
  • g

    green-activity-32141

    03/02/2023, 2:28 PM
    Here's the log from the quickstart script run
    tmp6ulg5z8v.log
    b
    • 2
    • 4
  • a

    adorable-computer-92026

    03/02/2023, 2:39 PM
    Hello, i want to know what is the schema of datahub in the backend ? when adding metadata to a dataset, DataHub validates it against the GMS schema to ensure that it meets the defined structure and data types so i suppose GMS is the schema of datahub. If it is, where can i find it exactly in the code source of datahub ? I want to understand things more and thanks !
    ✅ 1
    a
    • 2
    • 2
  • r

    rich-salesmen-77587

    03/02/2023, 3:55 PM
    Hi @green-football-43791 Can you please help me identify the correct api call that i should make to get the lineage graph of a dataset via REST API?
    ✅ 1
    a
    b
    +2
    • 5
    • 5
  • b

    busy-action-2524

    03/02/2023, 6:34 PM
    Hello folks, I guess the similar question already popped up, but I didn't find any specific answer to it. Can DataHub use GCP Data catalog to source all the information? (to avoid duplication of the source of truth)
    a
    • 2
    • 1
  • g

    green-activity-32141

    03/02/2023, 7:59 PM
    question: How do I specify that I'm using SSL mode for a Postgres ingestion source?
    ✅ 1
    b
    • 2
    • 1
  • s

    straight-policeman-77814

    03/06/2023, 6:34 AM
    how can i add lineage for existing dataset
    ✅ 2
    a
    • 2
    • 2
  • p

    purple-terabyte-64712

    03/06/2023, 12:29 PM
    Are you following some metadata standards?
    d
    b
    • 3
    • 3
  • b

    bland-orange-13353

    03/06/2023, 6:22 PM
    This message was deleted.
    a
    m
    • 3
    • 3
  • m

    many-nest-43191

    03/06/2023, 6:56 PM
    Hi all, after running ./gradlew quickstart I get an error, * What went wrong: A problem occurred evaluating project ':buildSrc'.
    Could not find method compile() for arguments [io.acryljson schema avro0.1.5, build_3txm9qv85o1lfzkn2hmnfzpka$_run_closure1$_closure2@31ca483a] on object of type org.gradle.api.internal.artifacts.dsl.dependencies.DefaultDependencyHandler.
    Can someone help me to solve this
    teamwork 1
    a
    a
    • 3
    • 3
  • f

    fierce-forest-92066

    03/06/2023, 10:03 PM
    Hi! Sorry if this has been asked a lot, but how do you get external users (a friends PC for example) to be able to find your self-hosted Docker version of DataHub?
    ✅ 1
    a
    • 2
    • 1
  • b

    brave-judge-32701

    03/07/2023, 7:37 AM
    I’m under spark3.2.3, use spark-shell to run a sql:
    create table test.testtable4 as select * from test.testtable3
    , but the table testtable4's upstream is
    sql at <console>:23
    not
    testtable3
    , does it is a compatibility issue? And spark run on hive, spark create hive table can not be show in datahub immediately , I need to run batch Ingestion task to Ingest hive metastore data.
    a
    a
    • 3
    • 12
  • b

    big-postman-38407

    03/07/2023, 10:45 AM
    Hello! Is it possible to change sorting in the filter menu? I want to place
    Sub Type
    right under
    Type
    , because they are connected, and I do not understand why the sorting works differently.
    ✅ 1
    a
    f
    • 3
    • 6
  • e

    early-airline-85277

    03/07/2023, 5:11 PM
    Hello! Is there any way to save this search predicate as a View. This is searching object with the metadata property defined as
    Kafka
    . It looks like the filter option does not support this in "Edit View".
    ✅ 1
    a
    • 2
    • 1
  • r

    refined-football-89019

    03/07/2023, 9:15 PM
    I'm new to DataHub and prototyping a java client that forwards events to DataHub. I believe I'm using the latest library
    Copy code
    <dependency>
          <groupId>io.acryl</groupId>
          <artifactId>datahub-client</artifactId>
          <version>0.10.0-4</version>
        </dependency>
    ...but most of the code examples are Python. For example: https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/examples/library/lineage_job_dataflow_new_api_simple.py. This sample Python code contains the classes: DataJob, DataFlow, etc. These don't appear to have counterparts in the Java client library. For example DataFlow: the Java client library does contain DataFlowInfo, DataFlowKey, etc... but not DataFlow (with a 3 parameter constructor). Can you point me to sample Java code corresponding to the Python example above? thanks
    ✅ 1
    🩺 1
    a
    a
    g
    • 4
    • 5
  • h

    handsome-flag-16272

    03/07/2023, 10:26 PM
    Hello, For the command below, where I can find the definition of task quickstartDebug? I searched the entire project and don’t find it.
    Copy code
    ./gradlew quickstartDebug
    I plan to change the elasticsearch 9200 port to something else, like 39200, to avoid security scan failed on 9200 port for http GET and DELETE methods. Could anybody tell me which files I should make such change? Currently, I have made changes in the following files: • docker/elasticsearch-setup/env/docker.env • docker/quickstart/docker-compose.quickstart.yml • docker/docker-compose.yml
    b
    a
    • 3
    • 8
  • t

    tall-eye-41335

    03/07/2023, 11:44 PM
    @astonishing-answer-96712 added a workflow to this channel: *Community Support Bot *.
  • b

    bland-appointment-45659

    03/08/2023, 3:03 AM
    Team, Is it possible to show the column lineage of input and output dataframes for a spark pipeline task ? Any example ?
    a
    • 2
    • 1
  • t

    tall-butcher-30509

    03/08/2023, 5:54 AM
    I have a question regarding lineage lookup. We can’t seem to get any up/downstream lineage beyond 1 degree. What do we need to check/change?
    ✅ 1
    a
    e
    +2
    • 5
    • 14
  • a

    adorable-computer-92026

    03/08/2023, 10:56 AM
    Hello, i want to know when i run this command :'datahub docker ingest-sample-data' from where the data was ingested and where it is stored (in which database), is it in MySQL ? thank you!
    ✅ 1
    a
    b
    • 3
    • 4
  • p

    polite-tent-71027

    03/08/2023, 12:52 PM
    Hi, I'm trying to befriend
    s3
    and
    dbt
    and can't make them work together. Maybe someone can point me a direction to look forward. So, • I have a delta table in s3 having path
    <s3a://core-data/Features/ssns__lagermetrics>
    • There is an external table in Hive created as
    Copy code
    CREATE EXTERNAL TABLE `features`.`ssns__lagermetrics` USING DELTA LOCATION '<s3a://core-data/Features/ssns__lagermetrics>';
    • I use the external table as a source in dbt:
    Copy code
    sources:
      - name: features
        description: features schema
        tables:
          - name: ssns__lagermetrics
    ...
    • After ingestion there is 2 separate disconnected entities of ssns__lagermetrics which I can't connect =( I tried to use transformers but it seems to be wrong direction... I very appreciate any help. ----- PS. Just scanned through history and found
    dbt-labs/dbt_external_tables
    package. I'll try it and update message thereafter.
    a
    • 2
    • 1
1...565758...80Latest