https://datahubproject.io logo
Join Slack
Powered by
# ingestion
  • b

    billowy-rocket-47022

    03/10/2022, 5:37 PM
    spark = SparkSession.builder.master(“local”).appName(“flight”)\ .config(“spark.jar.packages”,“io.acryldatahub spark lineage0.8.23") \ .config(“spark.extraListener”, “datahub.spark.DatahubSparkListener”) \ .config(“spark.datahub.rest.server”, “http://localhost:8080”) \ .enableHiveSupport() \ .getOrCreate()
  • l

    lemon-terabyte-66903

    03/10/2022, 5:55 PM
    Hi, Can anyone please look into this? https://datahubspace.slack.com/archives/C033H1QJ28Y/p1646928683937019
  • m

    mysterious-australia-30101

    03/11/2022, 8:25 AM
    How to upgrade from version 0.8.26.3 to v0.8.27 ?
  • p

    prehistoric-optician-40107

    03/16/2022, 1:34 PM
    I tried
    datahub-gms:8080
    and got error again.
    Copy code
    OperationalError: (psycopg2.OperationalError) connection to server at "localhost" (127.0.0.1), port 5432 failed: Connection refused\n'
               '\tIs the server running on that host and accepting TCP/IP connections?\n'
               'connection to server at "localhost" (::1), port 5432 failed: Cannot assign requested address\n'
               '\tIs the server running on that host and accepting TCP/IP connections?\n'
               '\n'
               '(Background on this error at: <http://sqlalche.me/e/13/e3q8>)\n',
               "2022-03-16 13:05:46.799054 [exec_id=980fe2a1-816d-4970-95ec-09727322c3ea] INFO: Failed to execute 'datahub ingest'",
               '2022-03-16 13:05:46.799641 [exec_id=980fe2a1-816d-4970-95ec-09727322c3ea] INFO: Caught exception EXECUTING '
               'task_id=980fe2a1-816d-4970-95ec-09727322c3ea, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n'
               '  File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/default_executor.py", line 119, in execute_task\n'
               '    self.event_loop.run_until_complete(task_future)\n'
               '  File "/usr/local/lib/python3.9/site-packages/nest_asyncio.py", line 81, in run_until_complete\n'
               '    return f.result()\n'
               '  File "/usr/local/lib/python3.9/asyncio/futures.py", line 201, in result\n'
               '    raise self._exception\n'
               '  File "/usr/local/lib/python3.9/asyncio/tasks.py", line 256, in __step\n'
               '    result = coro.send(None)\n'
               '  File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 115, in execute\n'
               '    raise TaskError("Failed to execute \'datahub ingest\'")\n'
               "acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"]}
    Execution finished with errors.
    I can connect normally to PostgreSQL, I couldn't figure out why it gave such an error.
  • b

    billowy-book-26360

    03/17/2022, 1:23 AM
    Hey all, anyone successfully ingested Cloudera Hive? I've tried various yaml configs without success. Any sample yaml would be appreciated!
  • r

    red-napkin-59945

    03/24/2022, 11:59 PM
    Hey team, when ingest TimeseriesAspect, I should always populate eventGranularity, right?
  • n

    numerous-cricket-19689

    03/29/2022, 12:02 AM
    I am wondering what is best way of dynamically create a class with default value. Ex. `pydantic_resolve_key`https://github.com/datahub-project/datahub/blob/55357783f330950408e4624b3f1421594c[…]metadata-ingestion/src/datahub/configuration/import_resolver.py allows us to create a dynamic class based on property value. I am trying to figure out how to use default value with it
  • b

    bitter-toddler-42943

    03/29/2022, 2:15 AM
    Hello, Team. I am having trouble to ingestion on MSSQL. After I excute I got errors below.
  • b

    bitter-toddler-42943

    03/29/2022, 2:17 AM
    I install
    pip install 'acryl-datahub[datahub-rest]'
    already , is there anything else that I have to check more?
  • b

    bitter-toddler-42943

    03/29/2022, 2:19 AM
    One more thing, Is elasticsearch essential to datahub? (Surely YES). Does anybody know how can I setup some security option to enter password for elasticsearch on datahub? After I change some option to elasticsearch I cannot connect to datahub.
  • f

    fresh-memory-10355

    03/29/2022, 4:15 PM
    thankks
  • a

    average-france-59117

    03/30/2022, 8:37 AM
    I encountered
    ParserError
    while ingesting BigQuery profiling. This error occurs only when profiling monthly partitioned table. The error message is following:
    Copy code
    ...
    File "/Users/seb.kim/github/datahub/dh/lib/python3.8/site-packages/datahub/ingestion/run/pipeline.py", line 181, in run
        for wu in itertools.islice(
    File "/Users/seb.kim/github/datahub/dh/lib/python3.8/site-packages/datahub/ingestion/source/sql/bigquery.py", line 655, in get_workunits
        for wu in super().get_workunits():
    File "/Users/seb.kim/github/datahub/dh/lib/python3.8/site-packages/datahub/ingestion/source/sql/sql_common.py", line 656, in get_workunits
        profile_requests += list(
    File "/Users/seb.kim/github/datahub/dh/lib/python3.8/site-packages/datahub/ingestion/source/sql/sql_common.py", line 1157, in loop_profiler_requests
        (partition, custom_sql) = self.generate_partition_profiler_query(
    File "/Users/seb.kim/github/datahub/dh/lib/python3.8/site-packages/datahub/ingestion/source/sql/bigquery.py", line 585, in generate_partition_profiler_query
        partition_datetime = parser.parse(partition.partition_id)
    File "/Users/seb.kim/github/datahub/dh/lib/python3.8/site-packages/dateutil/parser/_parser.py", line 1368, in parse
        return DEFAULTPARSER.parse(timestr, **kwargs)
    File "/Users/seb.kim/github/datahub/dh/lib/python3.8/site-packages/dateutil/parser/_parser.py", line 651, in parse
        six.raise_from(ParserError(str(e) + ": %s", timestr), e)
    File "<string>", line 3, in raise_from
    
    ParserError: month must be in 1..12: 202203
    When I ingest
    daily
    partitioned table like the first, there’s no issue at all. However, for the
    monthly
    partitioned table like the second, it emits
    ParserError
    . FYI, for the first figure, timestamp TYPE is
    TIMESTAMP
    and for the second figure record_date TYPE is
    DATE
    in BigQuery. Any advice??
  • q

    quick-engine-33953

    03/31/2022, 1:16 PM
    Addition, the configuration is straightforward,
    Copy code
    source:
        type: bigquery
        config:
            project_id: awesome-project
            env: prod
            credential:
                project_id: awesome-project
                private_key_id: REDACTED
                private_key: "REDACTED"
                client_email: REDACTED
                client_id: 'REDACTED'
            include_tables: true
            include_views: true
            include_table_lineage: true
            start_time: '2022-01-01T00:00:00.000Z'
            end_time: '2022-12-31T00:00:00.000Z'
            use_exported_bigquery_audit_metadata: true
            profile_pattern:
                allow:
                    - schema.table.column
                deny:
                    - '*.*.*'
    sink:
        type: datahub-rest
        config:
            server: '<http://REDACTED:8080>'
    All the permissions in Google Cloud's roles are configured as well.
  • r

    red-smartphone-15526

    03/31/2022, 1:20 PM
    Hi! Using acryl-datahub (0.8.17) and I'm getting an issue when trying to ingest redash metadata to datahub and I just get 500 responses. Seems like the generated url is incorrect as it tries to connect to http://redash-url.com/dashboard/dashboard_name but in reality there is a sequential number prefix to the dashboard name in the url so the url actually looks like http://redash-url.com/dashboard/123-dashboard_name
    Copy code
    RetryError: HTTPSConnectionPool(host='<http://redashserver.org|redashserver.org>', port=443): Max retries exceeded with url: /api/dashboards/overview (Caused by ResponseError('too many 500 error responses'))
    Has anyone faced similar issues?
  • n

    nutritious-bird-77396

    03/31/2022, 9:08 PM
    @dazzling-judge-80093 Great demo today on Airflow! Do you have feature parity with MWAA? From the testing that i did MWAA is able to send lineage of the datasets using the inlets and outlets specified in the DAG with the
    DatahubEmitterOperator.
    It doesn't work with with
    bashOperator
    As a result only the dataset lineage is sent and not the Pipeline/Tasks info... Do you have any insights on this?
  • f

    fresh-memory-10355

    04/01/2022, 6:47 AM
    azure kubernertes
  • i

    incalculable-apartment-22203

    04/01/2022, 12:49 PM
    @mammoth-bear-12532 I meet this question on centos, How to solve it?
  • i

    incalculable-apartment-22203

    04/01/2022, 1:11 PM
    Please ignore my question, the problem has been solved https://datahubspace.slack.com/archives/CV2KB471C/p1648814245336819
  • b

    better-orange-49102

    04/05/2022, 9:42 AM
    any confirmation from the team?
  • s

    stocky-midnight-78204

    04/05/2022, 11:23 AM
    Is there any way to get the data lineage or source tables from one presto view?
  • b

    brave-forest-5974

    04/05/2022, 1:26 PM
    is there an easy solution for LookML includes that reference other projects? Checking in here before I go tweaking the code base
  • r

    red-napkin-59945

    04/07/2022, 4:48 PM
    Hey team, when ingest TimeseriesAspect, I should always populate eventGranularity, right?
  • b

    bitter-toddler-42943

    04/12/2022, 1:37 AM
    I think I am lost...does anybody know the problem of my datahub ingestion? 🤕
  • b

    bitter-toddler-42943

    04/12/2022, 6:23 AM
    Does anyone know what to do for above errormessage
  • c

    cold-hydrogen-10513

    04/12/2022, 9:15 AM
    hi, could you please help me with this question https://datahubspace.slack.com/archives/CUMUWQU66/p1649667128236889?thread_ts=1648808017.738699&amp;cid=CUMUWQU66?
  • s

    swift-breakfast-25077

    04/12/2022, 6:33 PM
    hi team, i want to ingest from mysql source, i ran this recipe to add only the "exo" database with the 'emails' table, but when i ran the ingest comand, it added all the tables, all the bases, views and even the information schema table and it's not what I wanted 😭, I didn't understand why ?
  • c

    curved-crayon-1929

    04/12/2022, 7:54 PM
    image.png
  • c

    creamy-van-28626

    04/13/2022, 6:00 AM
    This is my recipe.y'all
  • c

    creamy-van-28626

    04/14/2022, 7:57 AM
    But when I am ingesting from the pod itself it is running for Same recipe.yaml
  • f

    famous-match-44342

    04/14/2022, 1:32 PM
    How to configure the source RDS of AWS
1...136137138...144Latest