https://datahubproject.io logo
Join Slack
Powered by
# ingestion
  • l

    limited-forest-73733

    09/06/2022, 8:10 AM
    Hii everyone! I am getting compatibility issue while integrating great_expectations with airflow. Can someone please help me out @dazzling-judge-80093
    h
    • 2
    • 3
  • e

    enough-monitor-24292

    09/06/2022, 11:48 AM
    HI,
  • e

    enough-monitor-24292

    09/06/2022, 11:48 AM
    I need help for fetching all users from datahub using api. Can anyone please help. Thanks
    h
    b
    • 3
    • 2
  • m

    microscopic-mechanic-13766

    09/06/2022, 12:05 PM
    Hello again, I have ingested data from PostgreSQL with the profiling enabled. I was specting to have the "Queries" tab enabled as well as the "Stats", but the former one wasn't enabled. Is anything else needed to enable that tab?? Thanks in advance!!
    d
    b
    • 3
    • 4
  • h

    helpful-london-56362

    09/06/2022, 12:18 PM
    Hello, I'm having these errors when I try to ingest some data. I'm using the form UI. This is on version 0.8.44
    f
    • 2
    • 2
  • a

    agreeable-army-26750

    09/06/2022, 12:48 PM
    Hi everyone! I am trying to extend the metadata entity model with forking the repository following this guide: https://datahubproject.io/docs/metadata-modeling/extending-the-metadata-model/ (For testing I wanted to create a GlossaryTerm like entity (called DatasetMetadata) with the same fields and aspects, only the Key and the Info aspect was recreated as a separate file with the same body) I have done the following steps: 1. Created the Key aspect pdl based on GlossaryTerm 2. Created the new Info pdl based on GlossaryTerm 3. Created the entity in the registry yml 4. Ran: ./gradlew build 5. Redeployed gms service docker image I tried to call the openApi endpoint to create a dataseMetadata object with Postman (it is working with glossaryterm aspects) but it fails with 400 error code. The docker image throws a detailed error. How should I resolve this issue? Is there any step I am missing in order to achieve my goal? Thank you very much for helping!
    h
    • 2
    • 2
  • f

    full-chef-85630

    09/06/2022, 1:14 PM
    Hi,Ingest bigquery and add sharded_ table_ Pattern and the default value is used. An error is reported. Has anyone ever encountered it。@dazzling-judge-80093
    Copy code
    source:
      type: bigquery
      config:
        project_id: ${SOCIAL_INSIGHTS_BIGQUERY_ID}
        credential:
          project_id: ${SOCIAL_INSIGHTS_BIGQUERY_ID}
          private_key_id: ${SOCIAL_INSIGHTS_PRIVATE_KEY_ID}
          private_key: ${SOCIAL_INSIGHTS_PRIVATE_KEY}
          client_email: ${SOCIAL_INSIGHTS_CLIENT_EMAIL}
          client_id: ${SOCIAL_INSIGHTS_CLIENT_ID}
        sharded_table_pattern: ((.+)[_$])?(\d{4,10})$
        
    sink: 
      type: datahub-rest
      config:
        server: ${DATAHUB_GMS}
        token: ${TOKEN}
    h
    d
    • 3
    • 12
  • a

    alert-fall-82501

    09/06/2022, 6:17 PM
    Hi Team - getting this error message for ingesting atlas and olympus metadata from redshift to datahub
  • a

    alert-fall-82501

    09/06/2022, 6:17 PM
    Copy code
    psycopg2.errors.InsufficientPrivilege: permission denied for relation svv_table_info
  • a

    alert-fall-82501

    09/06/2022, 6:17 PM
    can anybody suggest on this ?
    d
    • 2
    • 12
  • s

    some-hairdresser-53679

    09/06/2022, 6:50 PM
    Hello, do you have an example of python emitter for data lineage?
    h
    • 2
    • 2
  • d

    delightful-barista-90363

    09/06/2022, 7:56 PM
    hello, I am currently implementing the DatahubSparkListener. We have a token setup, but it gets logged in the spark submit command. Is there anyway to hide the token or configure the token through environment variables?
    h
    • 2
    • 4
  • f

    faint-translator-23365

    06/07/2022, 7:14 AM
    Hi I am trying to ingest ldap as source. I'm getting this error. Attached the logs and recipe.yaml
    i
    • 2
    • 3
  • m

    miniature-journalist-76345

    09/07/2022, 6:58 AM
    Hi, team. Is there a way to replace queries section for dataset when ingesting it? When you ingest new query, it doesn't replace existing queries, but appends to them.
    h
    • 2
    • 12
  • b

    bumpy-journalist-41369

    09/07/2022, 7:10 AM
    Hi, team. Is there a way to ingest data from all databases in a S3 data lake with one recipe, instead of making a recipe for each db ?
    h
    s
    • 3
    • 5
  • b

    bland-orange-13353

    09/07/2022, 7:50 AM
    This message was deleted.
  • r

    rich-policeman-92383

    09/07/2022, 8:03 AM
    Hello daatahub version: v0.8.41 While ingesting glossary terms : https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/examples/bootstrap_data/business_glossary.yml, using command
    # datahub ingest -c business_glossary.yml
    we are getting below error:
    Copy code
    source
      value is not a valid dict (type=type_error.dict)
    nodes
      extra fields not permitted (type=value_error.extra)
    owners
      extra fields not permitted (type=value_error.extra)
    url
      extra fields not permitted (type=value_error.extra)
    version
      extra fields not permitted (type=value_error.extra)
    b
    • 2
    • 2
  • m

    melodic-beach-18239

    09/07/2022, 9:30 AM
    Hi, another question about mysql ingest. When i specify
    include_views
    is
    true
    . Why this ingestion ingest
    information_schema
    as my database’s view?
    g
    m
    • 3
    • 6
  • r

    rhythmic-nest-54679

    09/07/2022, 9:37 AM
    why ui based ingestion need install python dependencies when executing,any way to install thoses packages mannuly? I just test the latest version with quickstart compose file,without neo4j one
  • r

    rich-policeman-92383

    09/07/2022, 11:11 AM
    Hello Does datahub support ES 8 or above. While trying to ingest glossary terms with ES 8.2.3 as Graph backend we are getting below error on MAE "unable to feed bulk request. no retires left. unable to parse response body for response." datahub version: v0.8.41
    s
    • 2
    • 3
  • p

    plain-farmer-27314

    09/07/2022, 2:13 PM
    Hey all - recently updated to a newer version (8.41) and am seeing the following error when trying to ingest LookML now (this wasn't happening on a previous version)
    Copy code
    ValueError: '/Users/zachary.bluhm/Dev/datahub/discord-looker-data-reporting/views/my_view.view.lkml' does not start with 'discord-looker-data-reporting'
    Any pro tips to get my config updated so that we can carry on processing lookml? EDIT: Looks like our looker ingestion is broken as well. We were previously on version 0.8.39
    h
    • 2
    • 4
  • m

    modern-monitor-68945

    09/07/2022, 4:49 PM
    Hi! Does anybody know if there is a way to set datahub.cluster in bitnami airflow? It is configured via env vars and seems like it ignores datahub options.
    d
    • 2
    • 3
  • r

    rich-policeman-92383

    09/08/2022, 8:25 AM
    Hello How can we add Glossary term groups using business_glossary.yml
  • p

    prehistoric-dream-67257

    09/08/2022, 8:51 AM
    Does datahub support ingesting kafka topic tags? Like Lenses describes here (We don't use Lenses)? https://lenses.io/blog/2021/04/apache-kafka-metadata-management/
    h
    • 2
    • 4
  • m

    many-hairdresser-79517

    09/08/2022, 10:09 AM
    Hello Team, I get this error when ingest metadata from clickhouse 'default.exchange_rate2eur': ["Ingestion error: Orig exception: Code: 47, e.displayText() = DB:Exception Missing columns: 'comment' while processing query: 'SELECT database, name AS table_name, comment, formatRow('JSONEachRow', engine, partition_key, sorting_key, primary_key, sampling_key, storage_policy, metadata_modification_time, total_rows, total_bytes, data_paths, metadata_path) AS properties FROM system.tables WHERE name NOT LIKE '.inner%'', required columns: 'comment' 'primary_key' 'engine' 'data_paths' 'name' 'metadata_modification_time' 'metadata_path' 'partition_key' 'sampling_key' 'storage_policy' 'total_bytes' 'sorting_key' 'database' 'total_rows', maybe you meant: ['primary_key','engine','data_paths','name','metadata_modification_time','metadata_path','partition_key','sampling_key','storage_policy','total_bytes','sorting_key','database','total_rows'] (version 21.3.2.5 (official build))\n"], yml file source: config: host_port: "xxxxxxxxxxxxxxxxx" password: xxxxxxxxxxxxxxxx username: xxxxxxxxxxxxxxxxxxx type: clickhouse Hope you guys can take a look, thank you so much
    h
    • 2
    • 1
  • c

    chilly-potato-57465

    09/08/2022, 11:07 AM
    Hello! I am trying to ingest a .csv from a bucket on a local S3 with a DataHub deployed on K8S. I am running the ingestion from the CLI (v 0.8.43.4) and can ping the S3 server from the machine which runs the CLI. I have a very simple recipe: source: type: s3 config: path_specs: - include: "{s3-server}/{bucket-name}/*.csv" env: "PROD" profiling: enabled: false # sink configs sink: type: "datahub-rest" config: server: "http://localhost:8080" The pipeline finishes successfully but generates no events. The source and sink reports from the ingest are empty except for the timestamps. I only see this error (ERROR {logger:26} - Please set env variable SPARK_VERSION) when I run with --debug but in my recipe profiling is disabled. Any advice on how to resolve this will be greatly appreciated! Thank you in advance!
    h
    • 2
    • 27
  • l

    lemon-engine-23512

    09/08/2022, 11:26 AM
    Hello, anyone has any reference for a dashboard push model
    h
    • 2
    • 5
  • c

    chilly-potato-57465

    09/08/2022, 11:34 AM
    Hello! I am ingesting from a mysql source where I have a table used in several views. After the ingestion the lineage tab for the table and views is empty while I would expect it will show them. How can I ingest/show the lineage?
    h
    • 2
    • 2
  • b

    bland-balloon-48379

    09/08/2022, 12:27 PM
    Hi everyone! I have a few ingestion runs I'd like to rollback and had a couple quick questions. I read through the documentation on rolling back ingestion batch runs and it's pretty easy; I was able to roll back two ingestions last week using these commands. However, it seems to only do a soft delete since the rows are still present in mysql and the
    --hard
    option does not appear to be compatible with the rollback command. Is there a way to do a hard delete when rolling back an ingestion run? If not, is this something we could expect to see in future releases? Thanks!
    h
    • 2
    • 5
  • c

    chilly-potato-57465

    09/08/2022, 12:29 PM
    Hello again! Got another question when ingesting from mysql. I am interested in dataset schema evolution and read about the TimelineAPI. I modified a schema of a table and ingested it again - I can see the modified schema in the UI but not the history of changes as here (https://demo.datahubproject.io/dataset/urn:li:dataset:(urn:li:dataPlatform:hive,testTimelineDataset,PROD)/Schema?is_lineage_mode=false) I can see the changes for the table in the Open API, how to show them in the UI? Am I missing some plugin?
    b
    • 2
    • 2
1...686970...144Latest