https://datahubproject.io logo
Join Slack
Powered by
# getting-started
  • s

    square-solstice-69079

    03/01/2022, 12:59 PM
    Hello! What is the workflow in DataHub to remove entities that are no longer in the source data, or being filtered out with edits to the ingestion? Using DataHub to get a better overview over the data works great. And using it to delete tables/reports gradually. Also adding filters to the different schemas in the ingestion to no longer take them in. But how can these entities be removed in DataHub?
    h
    b
    • 3
    • 5
  • b

    billowy-rocket-47022

    03/01/2022, 11:30 PM
    hi
    b
    • 2
    • 2
  • c

    clean-nightfall-92007

    03/02/2022, 10:09 AM
    Hello, everyone. May I ask a question about front-end packaging? Is it caused by wsl2?
    gyp ERR! build error
    `gyp ERR! stack Error:
    make
    failed with exit code: 2`
    gyp ERR! stack     at ChildProcess.onExit (/mnt/c/Users/ITC210011/eclipse-workspace/datahub/docs-website/.gradle/nodejs/node-v16.8.0-linux-x64/lib/node_modules/npm/node_modules/node-gyp/lib/build.js:194:23)
    gyp ERR! stack     at ChildProcess.emit (node:events:394:28)
    gyp ERR! stack     at Process.ChildProcess._handle.onexit (node:internal/child_process:290:12)
    gyp ERR! System Linux 5.10.60.1-microsoft-standard-WSL2
    gyp ERR! command "/mnt/c/Users/ITC210011/eclipse-workspace/datahub/docs-website/.gradle/nodejs/node-v16.8.0-linux-x64/bin/node" "/mnt/c/Users/ITC210011/eclipse-workspace/datahub/docs-website/.gradle/nodejs/node-v16.8.0-linux-x64/lib/node_modules/npm/node_modules/node-gyp/bin/node-gyp.js" "rebuild"
    gyp ERR! cwd /mnt/c/Users/ITC210011/eclipse-workspace/datahub/docs-website/node_modules/sharp
    gyp ERR! node -v v16.8.0
    gyp ERR! node-gyp -v v7.1.2
    gyp ERR! not ok
    b
    • 2
    • 6
  • b

    billowy-rocket-47022

    03/02/2022, 6:27 PM
    I run datahub docker ingest-sample-data, how to see which data is part of that ingestion? What actually was part of that ingestion, I mean the raw data and the comparison
    b
    • 2
    • 5
  • s

    stocky-midnight-78204

    03/03/2022, 5:56 AM
    How do I use neo4j to replace es for lineage? When I run quickstart there is no neo4j docker.
    b
    e
    • 3
    • 3
  • b

    bitter-ram-53961

    03/03/2022, 3:34 PM
    Hello ! I'm new to DataHub and I'm testing the Kafka metadata extraction. Following the following error:
    Copy code
    'space-info': ["failed to get value schema: Subject 'space-info-value' not found. io.confluent.rest.exceptions.RestNotFoundException:"
                                 "Subject 'space-info-value' not found.\n"
                                 "io.confluent.rest.exceptions.RestNotFoundException: Subject 'space-info-value' not found.\n"
    I added the following parameter in my configuration file :
    b
    h
    • 3
    • 13
  • b

    bitter-ram-53961

    03/03/2022, 3:34 PM
    but I get the following error:
    Copy code
    1 validation error for KafkaSourceConfig
    topic_subject_map
      extra fields not permitted (type=value_error.extra)
    Is it my file that is badly formatted ? or my parameter ?
    b
    m
    • 3
    • 7
  • l

    little-caravan-7716

    03/03/2022, 3:53 PM
    Hi all, I am new to data hub and am wondering if it is possible to use your metadata model as a rdbms and elastic index and write our own graphql api and elastic search queries? To be clear I'd just want to use spring to write the api layer, leverage the ingestion scripts you guys have to pull the metadata, but do the mutations in a manner that would make sense with what we have so far. Please direct me to the right channel if need be.
    b
    • 2
    • 19
  • a

    adventurous-apple-98365

    03/03/2022, 4:15 PM
    Hi all! Not sure if this is the right channel but here goes anyway.. Is there an easy way I'm missing to assign read only access to the GMS api? We want people to be able to directly query for metadata but not have the ability to write or ingest. I have tried making a token with a user that has no edit access via a policy but it can still write using the ingest endpoint. I also don't see anything around read only access Thanks in advance!!
    b
    • 2
    • 4
  • b

    brave-nail-85388

    03/03/2022, 8:56 PM
    hi team, I am novice to Datahub...Post installation datahub on our azure vm , im trying to integrate with snowflake dataset...but i noticed that lineage, query and stats at data hub api are disabled...Based on doc i installed the plugins which satisfy the lineage ..But still i could see that its disabled ...Any idea ?
    h
    h
    • 3
    • 9
  • f

    full-cartoon-72793

    03/03/2022, 9:18 PM
    How do I get access to the DataHub CLI when DataHub is running on a cloud Kubernetes environment (Azure Kubernetes Service in my case)?
    e
    • 2
    • 2
  • c

    clean-nightfall-92007

    03/04/2022, 2:25 AM
    I want to know where this class is generated?
    com.linkedin.pegasus2avro.mxe.MetadataAuditEvent.SCHEMA$;
    b
    • 2
    • 2
  • f

    fierce-account-79227

    03/04/2022, 8:42 AM
    Need help with FineGrainedLineage modules from datahub package. The highlighted modules(in pic) are throwing error while trying to import. Attached is the screenshot for your reference.
    h
    • 2
    • 3
  • f

    fierce-account-79227

    03/04/2022, 8:44 AM
    Please help with sample graphql query for getting lineage information of a dataset. Have gone through lot of sites but was not of any help.
    h
    b
    r
    • 4
    • 3
  • e

    eager-florist-67924

    03/04/2022, 12:04 PM
    Hi Team I am trying to setup datahub ingestion by running image as cronjon on openshift. Setup of my recipe:
    Copy code
    run_id: sixsense-service-kafka
    source:
      type: "kafka"
      config:
        connection:
          bootstrap: {{$kafka_bootstrap}}
          schema_registry_url: {{$schema_registry}}/apis/ccompat/v6
    sink:
      type: "datahub-rest"
      config:
        server: {{$datahub}}
    as schema registry i use apicurio and have there json schemas like this one for key:
    Copy code
    {
      "$schema": <http://json-schema.org/draft-07/schema>,
      "type": "object",
      "required": [
        "key"
      ],
      "properties": {
        "key": {
          "type": "string"
        }
      },
      "additionalProperties": false
    }
    but when running i am not able to consume schemas and i get following error:
    Copy code
    [2022-03-04 11:51:26,603] ERROR    {datahub.ingestion.extractor.schema_util:476} - Failed to parse {"$schema":"<http://json-schema.org/draft-07/schema#>","type":"object","required":["value","key","timestamp"],"properties":{"value":{"type":"string"},"key":{"type":"string"},"timestamp":{"type":"number "}}} to mce_fields.
    Traceback (most recent call last):
      File "/usr/local/lib/python3.8/site-packages/datahub/ingestion/extractor/schema_util.py", line 470, in avro_schema_to_mce_fields
        schema_fields = list(
      File "/usr/local/lib/python3.8/site-packages/datahub/ingestion/extractor/schema_util.py", line 450, in to_mce_fields
        avro_schema = avro.schema.parse(avro_schema_string)
      File "/usr/local/lib/python3.8/site-packages/avro/schema.py", line 1148, in parse
        return make_avsc_object(json_data, names, validate_enum_symbols)
      File "/usr/local/lib/python3.8/site-packages/avro/schema.py", line 1113, in make_avsc_object
        raise SchemaParseException('Undefined type: %s' % type)
    avro.schema.SchemaParseException: Undefined type: object
    looks like it tries to parse avro schema. Found some reference to
    Copy code
    consumer_config:
            key.deserializer:
            value.deserializer:
    but not sure if that is what i miss. And what would be the deserializer value in such case. Thx in advance
    b
    h
    • 3
    • 4
  • a

    acoustic-cartoon-67906

    03/04/2022, 4:05 PM
    Hi all, I'm following the instructions to do a deployment of DataHub to Google Kubernetes Engine from here: https://datahubproject.io/docs/deploy/gcp and https://datahubproject.io/docs/deploy/kubernetes/ but running into an error with the datahub-acryl-datahub-actions pod following a helm install of the datahub chart itself. The error recorded for this pod is CrashLoopBackOff, with the error message in the logs reported as:
    Copy code
    InvalidURL: Failed to parse: http://${GMS_HOST:-localhost}:${GMS_PORT:-8080}/config
    From my limited knowledge, it would appear that the environment variables for GMS_HOST and GMS_PORT aren't getting interpolated correctly. Does anyone know how to resolve this issue? I've not overridden any of the default values in the values.yaml files, and following the steps exactly as described in the documentation. Any assistance gratefully received.
    p
    s
    • 3
    • 4
  • r

    rhythmic-kitchen-64860

    03/07/2022, 2:26 AM
    Hi all, im trying to get the PK from graphql query but return Null
    {
    dataset(urn:"urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)"){
    schemaMetadata(version:0){
    name
    platformUrn
    primaryKeys
    foreignKeys{
    name
    }
    }
    }
    }
    can anyone help me? TIA!
    b
    m
    • 3
    • 4
  • g

    gentle-teacher-84472

    03/07/2022, 7:13 AM
    can somebody please help with "Resource" input in UsageStats query of dataset ..what should be values ? sample query would be helpful
    h
    • 2
    • 1
  • a

    able-rain-74449

    03/07/2022, 12:14 PM
    Hi All
    Does DataHub supports Federated Identity (i.e. SAML / OIDC) and can be authenticated from Okta
    ?
    s
    • 2
    • 2
  • w

    witty-butcher-82399

    03/07/2022, 5:39 PM
    Hi! How can I fetch number of views for a dataset with graphql? I noted there is a chart in the UI for the “Top Viewed Datasets”, however I just want to fetch number of views for a given dataset. Is that possible? Thanks
    g
    • 2
    • 3
  • b

    bored-dress-52175

    03/07/2022, 6:30 PM
    What does properties signifies here ? and How it will be populated?
    g
    b
    • 3
    • 2
  • t

    thankful-army-34119

    03/08/2022, 4:08 AM
    Oracle source config
    m
    • 2
    • 5
  • c

    clean-nightfall-92007

    03/08/2022, 6:31 AM
    I want to know what effect this
    "hasvaluesfieldname": "hastags"
    has?
    e
    • 2
    • 2
  • a

    able-rain-74449

    03/08/2022, 9:13 AM
    Hi All Does
    Datahub
    support
    Amazon OpenSearch Service
    a successor if EasticSearch??
    s
    • 2
    • 12
  • s

    some-pizza-26257

    03/08/2022, 9:47 AM
    Hi all, Not sure if this is the right channel, so feel free to redirect me. How easy is it to replace ElasticSearch in DataHub by Apache Solr? How does one go about swapping it in if needed?
    g
    • 2
    • 2
  • b

    brave-businessperson-3969

    03/08/2022, 8:16 PM
    At the demo installation, there is a nice overview/documentation of all the DataHub entities: https://demo.datahubproject.io/browse/dataset/prod/datahub/entities Is it possible to ingest this information into a local installation, too?
    g
    m
    • 3
    • 13
  • s

    sparse-account-96468

    03/09/2022, 2:13 AM
    Hey I realise this is more of a
    github-pages
    question or maybe jekyll (if that’s used??), but is there an easy way to go back to older doco for say < 0.8.28 with changes to
    transform_one
    to
    transform_aspect
    in ingestion? Or do I just need to point my local repo back at early
    gh-pages
    m
    • 2
    • 3
  • c

    clean-nightfall-92007

    03/09/2022, 2:44 AM
    This problem was encountered when upgrading 0.8.28
    Failed to create channel, remote=localhost/127.0.0.1:8080
    ,
    Failed to get response from server for URI <http://localhost:8080/entities>
    In addition, what is the actions component? Does it have an impact?
    e
    c
    • 3
    • 6
  • g

    green-pencil-45127

    03/09/2022, 11:55 AM
    We have just started with DataHub ingesting our dbt metadata and attached to our snowflake database. We have very rich descriptions in our dbt documentation, but when I search for keywords or exact descriptions, dataHub does not find matching columns. It seems like the UI is really focused on dataset/table exploration and documentation and less on column-level discovery. Is this an issue with anyone else? Is the search engine set for some upgrades soon? Regarding visibility and discoverability, is there any plan to more closely ingest dbt documentation such that the model-generating sql could be surfaced? Bonus points if it could be displayed at the column level.
    b
    g
    • 3
    • 5
  • s

    stocky-midnight-78204

    03/09/2022, 1:27 PM
    is there any documentation for integrating with ldap for authentication?
    b
    • 2
    • 2
1...222324...80Latest