https://datahubproject.io logo
Join Slack
Powered by
# ingestion
  • a

    acoustic-dusk-3739

    07/21/2022, 6:58 PM
    Hi community. Is there a DataHub CLI code that I can delete all Datasets, that has been marked as
    Deprecated
    ?
    c
    • 2
    • 1
  • m

    mysterious-eye-58423

    07/22/2022, 5:08 AM
    Hi folks, have we considered log compaction of
    MetadataChangeLog
    events and replaying log-compacted
    MetadataChangeLog
    events as a mechanism to rebuild/recover search indexes particularly for time-series metadata that we don't persist in Datahub MySQL table?
    m
    • 2
    • 3
  • l

    late-bear-87552

    07/22/2022, 9:08 AM
    Hi team, can anyone help me how to use _*datahub delete --env PROD --entity_type dataset --platform bigquery*_ command with hostname, it is by default trying to connect to localhost.
    c
    • 2
    • 3
  • l

    lemon-zoo-63387

    07/22/2022, 9:43 AM
    Hello, everyone, I used JSON file to import metadata. The first time I succeeded, and then I deleted it. When I executed it again, the client display was succeed, but there was no data in the UI page. Just modify the source. Why is that?
    python3 -m datahub delete --env QA --entity_type dataset --platform hive
    c
    b
    • 3
    • 22
  • g

    glamorous-library-1322

    07/22/2022, 11:31 AM
    Hey all, I'm trying to run a custom transformer, it works well via CLI but cannot make it work via the Ingestion GUI, i get
    ModuleNotFoundError
    So i would like to know how can i install packages in the env this ingestion is running in.
    c
    m
    • 3
    • 5
  • l

    late-bear-87552

    07/22/2022, 1:19 PM
    hi team, does anyone have idea which one is cheaper to use for bigquery metadata ingestion cloudaudit_googleapis_com_data_access table or Cloud Logging APIs??
    thank you 1
  • g

    gentle-camera-33498

    07/22/2022, 1:37 PM
    Hello everyone, I'm using DataHub on K8S with 3 replicas of GMS. When I start the ingestion process with Airflow, the frontend appears to be stuck (user experience). Does this happen to anyone? There are benefits to using Kafka ingestion compared to REST ingestion?
    l
    • 2
    • 2
  • b

    brave-tomato-16287

    07/22/2022, 3:03 PM
    Hello all, Is it possible to get and show custom_sql_query from tableau datasets?
    h
    • 2
    • 3
  • f

    full-chef-85630

    07/23/2022, 8:07 AM
    HI all, trying to configuration lineage, Who can answer the question
    [root@VM-4-16-centos airflow]# airflow connections add  --conn-type 'datahub_rest' 'datahub_rest_default' --conn-host '<http://localhost:8080>'
    [2022-07-23 161208,690] {cli_action_loggers.py:105} WARNING - Failed to log action with (sqlite3.OperationalError) no such table: log [SQL: INSERT INTO log (dttm, dag_id, task_id, event, execution_date, owner, extra) VALUES (?, ?, ?, ?, ?, ?, ?)] [parameters: ('2022-07-23 081208.687171', None, None, 'cli_connections_add', None, 'root', '{"host_name": "VM-4-16-centos", "full_command": "[\'/usr/local/bin/airflow\', \'connections\', \'add\', \'--conn-type\', \'datahub_rest\', \'datahub_rest_default\', \'--conn-host\', \'http://localhost:8080\']"}')] (Background on this error at: http://sqlalche.me/e/13/e3q8) Traceback (most recent call last): File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 1277, in _execute_context cursor, statement, parameters, context File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/default.py", line 608, in do_execute cursor.execute(statement, parameters) sqlite3.OperationalError: no such table: connection The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/local/bin/airflow", line 8, in <module> sys.exit(main()) File "/usr/local/lib/python3.6/site-packages/airflow/__main__.py", line 48, in main args.func(args) File "/usr/local/lib/python3.6/site-packages/airflow/cli/cli_parser.py", line 48, in command return func(*args, **kwargs) File "/usr/local/lib/python3.6/site-packages/airflow/utils/cli.py", line 92, in wrapper return f(*args, **kwargs) File "/usr/local/lib/python3.6/site-packages/airflow/cli/commands/connection_command.py", line 196, in connections_add if not session.query(Connection).filter(Connection.conn_id == new_conn.conn_id).first(): File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/orm/query.py", line 3429, in first ret = list(self[0:1]) File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/orm/query.py", line 3203, in getitem return list(res) File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/orm/query.py", line 3535, in iter return self._execute_and_instances(context) File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/orm/query.py", line 3560, in _execute_and_instances result = conn.execute(querycontext.statement, self._params) File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 1011, in execute return meth(self, multiparams, params) File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection return connection._execute_clauseelement(self, multiparams, params) File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 1130, in _execute_clauseelement distilled_params, File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 1317, in _execute_context e, statement, parameters, cursor, context File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 1511, in _handle_dbapi_exception sqlalchemy_exception, with_traceback=exc_info[2], from_=e File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/util/compat.py", line 182, in raise_ raise exception File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 1277, in _execute_context cursor, statement, parameters, context File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/default.py", line 608, in do_execute cursor.execute(statement, parameters) sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) no such table: connection [SQL: SELECT connection.password AS connection_password, connection.extra AS connection_extra, connection.id AS connection_id, connection.conn_id AS connection_conn_id, connection.conn_type AS connection_conn_type, connection.description AS connection_description, connection.host AS connection_host, connection.schema AS connection_schema, connection.login AS connection_login, connection.port AS connection_port, connection.is_encrypted AS connection_is_encrypted, connection.is_extra_encrypted AS connection_is_extra_encrypted FROM connection WHERE connection.conn_id = ? LIMIT ? OFFSET ?] [parameters: ('datahub_rest_default', 1, 0)] (Background on this error at: http://sqlalche.me/e/13/e3q8)
    d
    • 2
    • 2
  • l

    lemon-zoo-63387

    07/25/2022, 12:22 AM
    Hi, datahub team, forgot to synchronize here,error message keyError 'domain' https://datahubproject.io/docs/generated/ingestion/sources/csv https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/examples/demo_data/csv_enricher_demo_data.csv
  • l

    lemon-zoo-63387

    07/25/2022, 12:51 AM
    Hi, everyone, I use CSV to link entity with glossary and domain. I have built glossary and domain first, but CSV has built the same one, which cannot be changed... Why not use the one I created? Thanks in advance for your help https://datahubproject.io/docs/generated/ingestion/sources/csv
    c
    • 2
    • 2
  • b

    bitter-tent-3827

    07/25/2022, 5:58 AM
    Hi, everyone , I am using delta lake with s3. There is only one way to verify aws creds . So if we are using role based creds , it will create an issue. Got my ingestion successful by commenting the storage_options .
    c
    • 2
    • 1
  • s

    square-hair-99480

    07/25/2022, 6:51 AM
    Hello friends, while playing with a Snowflake ingestion and the profiling configuration (it was making the ingestion super slow) I got the question (the usual one that comes up with automated profiling): Can we set things so columns containing ids (nominal variables encoded with numbers) or categorical variables encoded with numbers so we do not calculate profiling statistics on nominal and categorical variables? Moreover, are there plans for having configs
    profiling.profile_table_size_limit
    &
    profiling.profile_table_row_limit
    also for Snowflake? ... I see now it works only for BigQuery
    c
    • 2
    • 3
  • c

    cool-vr-73109

    07/25/2022, 6:53 AM
    Hi Team, Is it possible to delete metadata from datahub? I tried the datahub hard delete command with urn and i got java unsupported operation error saying only upsert operation is supported. Then how to delete with this command?Any other suggestion pls...
    m
    e
    • 3
    • 3
  • s

    square-hair-99480

    07/25/2022, 9:49 AM
    Hello again friends another doubt now about Snowflake lineage. I have seen that it is based on a view named
    access_history
    but when I check this view in my Snowflake enterprise account it is empty and hence I have no lineage appearing in Datahub. Anyone has faced something similar?
    c
    h
    • 3
    • 5
  • g

    gentle-camera-33498

    07/25/2022, 2:33 PM
    Hello Everyone, I'm using Airflow to ingest Metadata on DataHub, but Airflow 2.3.3 uses SQLAlchemy >1.4 which is not compatible with sql_common ingestion dependencies. There are any tasks to resolve this problem? sql_common dependencies: https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/setup.py#L92 This dependency problem breaks my BigQuery ingestion process 😥
    s
    • 2
    • 1
  • c

    colossal-sandwich-50049

    07/25/2022, 3:50 PM
    Hello, is this Python Graph client available in the java API? I'm trying to reproduce the following example with Java, but I am having trouble implementing this EDIT: nvm, found that you don't need that graph library for Java 🙂 https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/graph/client.py
  • n

    nice-country-99675

    07/25/2022, 5:47 PM
    👋 Hi Team! just two quick question regarding Redshift ingestion... it stopped to ingest view comments as Documentation... as far I remember, it was automatically ingested in the past... is there something new I need to do in the recipe? and also, is there a way to hide view definitions? Thanks!
    • 1
    • 1
  • d

    dazzling-insurance-83303

    07/26/2022, 3:49 AM
    Postgres ingestion - allow_deny_pattern Hello! Starting this thread to discussion postgres ingestion profiling Under profiling, what does
    allow_deny_pattern
    signify?
    Copy code
    profiling:
          enabled: true 
          allow_deny_patterns:
            allow: 
              - .*
            deny:
              - 
            ignoreCase: True
            alphabet: '[A-Za-z0-9 .-]'
    Is that filtering for data within the columns? If so, are there any examples to refer to? I am interested in knowing if those can be regexes to do Luhn algorithm checks.
    b
    m
    +3
    • 6
    • 22
  • a

    able-evening-90828

    07/26/2022, 5:08 AM
    Hi DataHub ingestion experts, I followed the instructions to configure and build the ingestion CLI. Specifically I ran the following:
    Copy code
    cd metadata-ingestion
    ../gradlew :metadata-ingestion:installDev
    source venv/bin/activate
    Then I tried to ingest something from mysql using the command below
    Copy code
    python3 -m datahub ingest -c ../test.mysql.localhost.dhub.yml
    And I got the following mysterious error.
    Copy code
    Failed to create source due to mysql is disabled due to an error in initialization
    Some small instrumentation of code revealed the exception to be
    Copy code
    dlopen(/Users/jinlin/Code/datahub/metadata-ingestion/venv/lib/python3.9/site-packages/greenlet/_greenlet.cpython-39-darwin.so, 0x0002): tried: '/Users/jinlin/Code/datahub/metadata-ingestion/venv/lib/python3.9/site-packages/greenlet/_greenlet.cpython-39-darwin.so' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64e'))
    I am on a Mac with M1 chip and this looks like a mismatch between M1 binary and x86 binary. What should I do to make this working?
    c
    m
    • 3
    • 13
  • c

    cool-vr-73109

    07/26/2022, 9:26 AM
    Hi team, I tried s3 ingestion file lineage from cli and I could see a new source got ingested with lineage tab enabled and didn't enabled by existing ingestion lineage tab. Which part of s3 ingestion name should give as entity name in file lineage ingestion? Just dataset name Or bucketname/folder/S3_filename?
    c
    m
    • 3
    • 3
  • l

    lemon-terabyte-66903

    07/26/2022, 3:04 PM
    Hello, https://github.com/datahub-project/datahub/blob/efc5602493e66c83fa0ffe8cf9f9998fe9[…]bd/metadata-ingestion/src/datahub/ingestion/source/s3/source.py The SAMPLE_SIZE and SAMPLE_SIZE here limits the number of directories/files that boto3 s3 can browse and after which, if there is no mtach, the ingestion doesn’t happen. What if the matching directory is after this limit?
  • a

    adamant-mouse-7290

    07/26/2022, 4:10 PM
    Hi I have few specific tables (10) I would exclusively like ingest, can you please help me to set up the recipe to ingest them all at once?
    Copy code
    type: athena
        config:
            aws_region: xxx
            work_group: xxx
            username: '${xxx}'
            password: '${xxx}'
            s3_staging_dir: 'xxx'
            include_views: true
            include_tables: true
            database: table1, table2 ... table10
    sink:
        type: datahub-rest
        config:
            server: 'xxx'
            token: xxx
    m
    • 2
    • 2
  • b

    busy-analyst-8258

    07/26/2022, 7:24 PM
    Hello Everyone, I am working on Ingesting Stats data from the Greenplum database. Is there any sample code available for this process, I am looking for the using the aspect value usageStats Thank you Geetha
    b
    c
    • 3
    • 6
  • k

    kind-whale-32412

    07/26/2022, 11:49 PM
    How can I debug my lineage emit request? I'm getting results like this on ingestion:
    Copy code
    ] ERROR    {datahub.ingestion.run.pipeline:273} - Failed to extract some records due to: source produced an invalid metadata work unit: MetadataChangeEventClass...
    c
    • 2
    • 2
  • c

    cuddly-arm-8412

    07/27/2022, 3:30 AM
    hi,team,I can set blood-lineage by urn in the ingestion module,If my blood relationship is deleted, how can I dissolve this relationship?
    c
    • 2
    • 1
  • w

    wooden-chef-22394

    07/27/2022, 6:31 AM
    Although I execute 'pip install 'acryl-datahub[clickhouse]'', Datahub ingest still said.
    Failed to create source due to clickhouse is disabled; try running: pip install 'acryl-datahub[clickhouse]'
    c
    b
    • 3
    • 5
  • g

    gifted-knife-16120

    07/27/2022, 7:33 AM
    i already remove the dataset related to this container manually..
  • g

    gifted-knife-16120

    07/27/2022, 7:32 AM
    hi, i tried to remove by
    container
    but, i get this error my command is :
    datahub delete --urn "urn:li:container:(13c86013c4ae5a2027b9e2f2b9443a91)" --soft
    c
    s
    • 3
    • 15
  • f

    faint-advantage-18690

    07/27/2022, 7:49 AM
    Hi all, it looks like the number of tables in Datahub is increasing after each ingestion run, even though I did not create any new tables. Is it normal ? I am currently ingesting metadata from Bigquery tables, as well as bigquery usage.
    • 1
    • 1
1...565758...144Latest