https://datahubproject.io logo
Join Slack
Powered by
# ingestion
  • p

    purple-terabyte-64712

    05/10/2023, 7:31 AM
    Hi, I have an issue with Parquet ingestion. When I execute an ingestion then every Parquet file will be a standalone ingestion, and the {table}, {partition_key} and {partition} marks has no effects. "c:/users/szger/PARQUET/{table}/{partition_key[0]}={partition[0]}/*.parquet" It is creating more hundred entries in the output file, one for every parquet file. I attach the sample parquet folders too. What I expecting is that every table has only one schema definition in the file.
    parquet_discovery_output.jsonPARQUET.7z
    ✅ 1
    l
    a
    • 3
    • 4
  • c

    colossal-tent-57599

    05/10/2023, 8:07 AM
    Hello all, I see that the ingestion with Snowflake source is always in pending state. It used to work before but not anymore. Attached is the gms log. I am running version v0.9.6 on EC2 via docker quickstart. Thanks for help!
    gms_log_0905.log
    l
    a
    • 3
    • 2
  • b

    bland-orange-13353

    05/10/2023, 12:56 PM
    This message was deleted.
    ✅ 1
    l
    • 2
    • 1
  • l

    loud-librarian-93625

    05/10/2023, 1:29 PM
    Hi all, I'm trying to ingest from Tableau server
    datahub ingest -c 'C:\Users\matt.evans\.datahub\tableau\tableau.dhub.yaml' --dry-run
    but am getting the following error
    Copy code
    File "C:\Users\matt.evans\AppData\Local\Programs\Python\Python310\lib\site-packages\datahub\configuration\config_loader.py", line 101, in load_config_file
        raise ConfigurationError(
    datahub.configuration.common.ConfigurationError: Cannot read remote file C:\Users\matt.evans\.datahub\tableau\tableau.dhub.yaml, error:No connection adapters were found for 'C:\\Users\\matt.evans\\.datahub\\tableau\\tableau.dhub.yaml'
    Any idea what I'm doing wrong? Seems to be something in the yaml file it doesn't like.
    Console Output.txttableau.dhub.yaml
    l
    h
    a
    • 4
    • 6
  • b

    bland-orange-13353

    05/10/2023, 3:31 PM
    This message was deleted.
    ✅ 1
    l
    • 2
    • 1
  • b

    bland-orange-13353

    05/10/2023, 3:39 PM
    This message was deleted.
    ✅ 1
    l
    • 2
    • 1
  • r

    rapid-spoon-75609

    05/10/2023, 9:25 PM
    Does anyone know if it’s possible for datahub to include Avro schema metadata (tags) as a datahub tag on the kafka topic resource? For example, taking the schema in the thread, and using the
    team
    tag in metadata as a tag on the Kafka topic
    ✅ 1
    📖 1
    🔍 1
    l
    m
    • 3
    • 4
  • p

    powerful-answer-39247

    05/11/2023, 2:33 AM
    Posstgres ingestion run fails with this error
    Copy code
    File "/tmp/datahub/ingest/venv-postgres-0.10.2/lib/python3.10/site-packages/datahub/ingestion/source/state_provider/datahub_ingestion_checkpointing_provider.py", line 76, in get_latest_checkpoint
        ] = self.graph.get_latest_timeseries_value(
      File "/tmp/datahub/ingest/venv-postgres-0.10.2/lib/python3.10/site-packages/datahub/ingestion/graph/client.py", line 299, in get_latest_timeseries_value
        assert len(values) == 1
    AssertionError
    l
    d
    +2
    • 5
    • 7
  • i

    important-area-90857

    05/11/2023, 5:30 AM
    Hi, guys How should i use "fldUrn", where can i find some example or documents?
    🔍 1
    📖 1
    l
    d
    • 3
    • 2
  • l

    loud-hospital-37195

    05/11/2023, 7:15 AM
    Hi, I am testing the file based lineage and the lineage that appears in the datahub interface only shows me two tables, I attach the .yaml and what appears in datahub. Does anyone know what I am doing wrong? version: 1 lineage: - entity: name: son_1 type: dataset env: UAT platform: snowflake upstream: - entity: name: dad type: dataset env: UAT platform: oracle - entity: name: mom type: dataset env: UAT platform: kafka - entity: name: dad_1 type: dataset env: UAT platform: oracle upstream: - entity: name: grandpa_of_dad type: dataset env: UAT platform: snowflake - entity: name: grandma_of_dad type: dataset env: UAT platform: oracle
    l
    a
    f
    • 4
    • 4
  • n

    numerous-refrigerator-15664

    05/11/2023, 7:19 AM
    Hi team, I'm testing to ingest column-level lineage between my hive datasets. I've ingested fine-grained lineage, in dataset-datajob-dataset form, following the example here (https://datahubproject.io/docs/generated/metamodel/entities/dataset/#fine-grained-lineage). I expected to see column lineage, but column lineage toggle shows nothing. I can see table level lineage though. (dataset - datajob - dataset) Also I can see new record with aspect='dataJobInputOutput' is created in mysql metadata_aspect_v2. Should I make dataset-dataset form lineage to see column-level lineage? Or did I miss something in making dataset-datajob-dataset lineage?
    🔍 1
    📖 1
    l
    • 2
    • 4
  • m

    mysterious-table-75773

    05/11/2023, 9:02 AM
    is there a way to run datahub without datahub-actions? it contains tens of critical vulnerabilities
    📖 1
    🔍 1
    ✅ 1
    l
    d
    • 3
    • 2
  • d

    delightful-painter-8227

    05/11/2023, 10:04 AM
    Hello! 👋 Can someone help me understanding why my ingestion is on pending status even after I restart the actions container? Thanks.
    l
    a
    • 3
    • 2
  • l

    loud-hospital-37195

    05/11/2023, 11:37 AM
    Hi, I am testing the file based lineage and the lineage that appears in the datahub interface only shows me two tables, I attach the .yaml and what appears in datahub. Does anyone know what I am doing wrong? version: 1 lineage: - entity: name: son_1 type: dataset env: UAT platform: snowflake upstream: - entity: name: dad type: dataset env: UAT platform: oracle - entity: name: mom type: dataset env: UAT platform: kafka - entity: name: dad_1 type: dataset env: UAT platform: oracle upstream: - entity: name: grandpa_of_dad type: dataset env: UAT platform: snowflake - entity: name: grandma_of_dad type: dataset env: UAT platform: oracle
    l
    d
    • 3
    • 2
  • l

    lemon-scooter-69730

    05/11/2023, 12:48 PM
    I have discovered a versioning in the programmatic pipelines feature. It's as follows: When running a pipeline with the SDK
    Copy code
    pipeline = Pipeline.create(recipe)
    pipeline.run()
    pipeline.pretty_print_summary()
    For example it throws this exception
    Copy code
    if regex("LATERAL VIEW EXPLODE(col)"):
    TypeError: 'str' object is not callable
    This error comes from
    sqllineage
    because it uses the latest version of
    sqlparse==0.4.4
    pinning my version to
    0.4.3
    fixed the problem. I also noticed that the version of
    sqllineage==1.3.6
    uses the as present here I resolved it by moving my version of sqllineage to
    1.4.2
    . I am just putting this here in case anyone runs into this issue... I spent the better part of an hour or two getting to the bottom of this.
    l
    d
    g
    • 4
    • 3
  • l

    lemon-scooter-69730

    05/11/2023, 1:30 PM
    When you set up a programatic pipeline should it show up in the UI?
    📖 1
    ✅ 1
    l
    b
    • 3
    • 7
  • d

    damp-orange-46267

    05/11/2023, 3:01 PM
    Hi guys, I’m trying to ingest data from tableau, but I’m getting this error :
    Copy code
    ~~~~ Execution Summary - RUN_INGEST ~~~~
    Execution finished with errors.
    {'exec_id': '7985f351-d346-4713-b683-f256a1b24b0d',
     'infos': ['2023-05-11 14:55:13.610978 INFO: Starting execution for task with name=RUN_INGEST',
               "2023-05-11 14:55:17.687276 INFO: Failed to execute 'datahub ingest'",
               '2023-05-11 14:55:17.687583 INFO: Caught exception EXECUTING task_id=7985f351-d346-4713-b683-f256a1b24b0d, name=RUN_INGEST, '
               'stacktrace=Traceback (most recent call last):\n'
               '  File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/default_executor.py", line 122, in execute_task\n'
               '    task_event_loop.run_until_complete(task_future)\n'
               '  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete\n'
               '    return future.result()\n'
               '  File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 231, in execute\n'
               '    raise TaskError("Failed to execute \'datahub ingest\'")\n'
               "acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"],
     'errors': []}
    
    ~~~~ Ingestion Logs ~~~~
    Obtaining venv creation lock...
    Acquired venv creation lock
    venv setup time = 0
    This version of datahub supports report-to functionality
    datahub  ingest run -c /tmp/datahub/ingest/7985f351-d346-4713-b683-f256a1b24b0d/recipe.yml --report-to /tmp/datahub/ingest/7985f351-d346-4713-b683-f256a1b24b0d/ingestion_report.json
    [2023-05-11 14:55:16,653] INFO     {datahub.cli.ingest_cli:165} - DataHub CLI version: 0.10.0
    1 validation error for PipelineConfig
    source -> sink
      extra fields not permitted (type=value_error.extra)
    📖 1
    ✅ 1
    l
    c
    • 3
    • 3
  • l

    limited-forest-73733

    05/11/2023, 4:13 PM
    Hey team i am not able to integrate airflow with datahub 0.10.2 using datahub kafka. Can anyone suggest which datahub version shall i use with airflow:2.4.3. Thanks in advance
    📖 1
    ✅ 1
    🔍 1
    l
    d
    • 3
    • 4
  • l

    little-refrigerator-78584

    05/12/2023, 1:21 PM
    Hi Guys I was trying to ingest only the Jobs from the AWS Glue and used this recipe
    Copy code
    source:
        type: glue
        config:
            aws_region: eu-central-1
            platform: glue
            extract_transforms: True
            database_pattern: {'deny': ['.*']}
            table_pattern: {'deny': ['.*']}
    It successfully pulled 2 jobs from aws and is showing the home page under Platform section. But when i click on it shows
    No results found for ""
    If i want to just pull my jobs from glue and not the tables and databases then it wont show it ?
    l
    a
    • 3
    • 3
  • b

    bland-orange-13353

    05/12/2023, 1:29 PM
    This message was deleted.
    ✅ 1
    l
    • 2
    • 1
  • p

    purple-terabyte-64712

    05/13/2023, 3:58 AM
    Hi guys, can you help me with this issue? https://datahubspace.slack.com/archives/CUMUWQU66/p1683703900400409
    📖 1
    🔍 1
    l
    a
    +2
    • 5
    • 41
  • m

    miniature-ghost-14229

    05/13/2023, 1:20 PM
    Hi everyone! I need some help. I am getting this error:
    Copy code
    Dataset query failed with error: 400 INFORMATION_SCHEMA.PARTITIONS query attempted to read too many tables. Please add more restrictive filters. Location: EU Job ID: fb2b9691-cb6bb7
    I tried to filter this query and reduce the amount of data to fetch but it looks like didn't work. Does datahub parse the entire project? I need to ingest only a specific dataset, so I add a filter and included my dataset name in allow patterns but it looks that is not working or taking it in consideration. Thank you
    📖 1
    🔍 1
    l
    d
    • 3
    • 4
  • b

    brave-room-48783

    05/14/2023, 9:00 AM
    Hey, getting this error while trying to ingest Metabase 0.41.2 (version as suggested in the docs - https://datahubproject.io/docs/generated/ingestion/sources/metabase/#compatibility) I have created a local metabase instance at localhost:3000 with the yaml recipe - source: type: metabase config: connect_uri: 'localhost:3000' username: admin_username password: admin_password DataHub CLI version: 0.10.2.3 Python version: 3.9.6 (default, Mar 10 2023, 201638) [Clang 14.0.3 (clang-1403.0.22.14.1)] Need a nudge in what I might be doing wrong here?
    Copy code
    ~~~~ Execution Summary - RUN_INGEST ~~~~
    Execution finished with errors.
    {'exec_id': '0de3d15c-4e8d-45bf-8877-46e9c8c66de8',
     'infos': ['2023-05-14 08:45:07.246534 INFO: Starting execution for task with name=RUN_INGEST',
               "2023-05-14 08:45:11.476157 INFO: Failed to execute 'datahub ingest'",
               '2023-05-14 08:45:11.486188 INFO: Caught exception EXECUTING task_id=0de3d15c-4e8d-45bf-8877-46e9c8c66de8, name=RUN_INGEST, '
               'stacktrace=Traceback (most recent call last):\n'
               '  File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/default_executor.py", line 122, in execute_task\n'
               '    task_event_loop.run_until_complete(task_future)\n'
               '  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete\n'
               '    return future.result()\n'
               '  File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 231, in execute\n'
               '    raise TaskError("Failed to execute \'datahub ingest\'")\n'
               "acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"],
     'errors': []}
    
    ~~~~ Ingestion Logs ~~~~
    Obtaining venv creation lock...
    Acquired venv creation lock
    venv setup time = 0
    This version of datahub supports report-to functionality
    datahub --debug ingest run -c /tmp/datahub/ingest/0de3d15c-4e8d-45bf-8877-46e9c8c66de8/recipe.yml --report-to /tmp/datahub/ingest/0de3d15c-4e8d-45bf-8877-46e9c8c66de8/ingestion_report.json
    [2023-05-14 08:45:08,814] DEBUG    {datahub.telemetry.telemetry:219} - Sending init Telemetry
    [2023-05-14 08:45:10,004] DEBUG    {datahub.telemetry.telemetry:248} - Sending telemetry for function-call
    [2023-05-14 08:45:10,417] INFO     {datahub.cli.ingest_cli:173} - DataHub CLI version: 0.10.2
    [2023-05-14 08:45:10,582] DEBUG    {datahub.ingestion.sink.datahub_rest:116} - Setting env variables to override config
    [2023-05-14 08:45:10,582] DEBUG    {datahub.ingestion.sink.datahub_rest:118} - Setting gms config
    [2023-05-14 08:45:10,583] DEBUG    {datahub.ingestion.run.pipeline:203} - Sink type datahub-rest (<class 'datahub.ingestion.sink.datahub_rest.DatahubRestSink'>) configured
    [2023-05-14 08:45:10,583] INFO     {datahub.ingestion.run.pipeline:204} - Sink configured successfully. DataHubRestEmitter: configured to talk to <http://datahub-gms:8080>
    [2023-05-14 08:45:10,595] DEBUG    {datahub.ingestion.run.pipeline:278} - Reporter type:file,<class 'datahub.ingestion.reporting.file_reporter.FileReporter'> configured.
    [2023-05-14 08:45:10,630] DEBUG    {datahub.telemetry.telemetry:248} - Sending telemetry for function-call
    [2023-05-14 08:45:11,034] ERROR    {datahub.entrypoints:195} - Command failed: Failed to find a registered source for type metabase: 'str' object is not callable
    Traceback (most recent call last):
      File "/tmp/datahub/ingest/venv-metabase-0.10.2/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 119, in _add_init_error_context
        yield
      File "/tmp/datahub/ingest/venv-metabase-0.10.2/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 214, in __init__
        source_class = source_registry.get(source_type)
      File "/tmp/datahub/ingest/venv-metabase-0.10.2/lib/python3.10/site-packages/datahub/ingestion/api/registry.py", line 173, in get
        tp = self._ensure_not_lazy(key)
      File "/tmp/datahub/ingest/venv-metabase-0.10.2/lib/python3.10/site-packages/datahub/ingestion/api/registry.py", line 117, in _ensure_not_lazy
        plugin_class = import_path(path)
      File "/tmp/datahub/ingest/venv-metabase-0.10.2/lib/python3.10/site-packages/datahub/ingestion/api/registry.py", line 48, in import_path
        item = importlib.import_module(module_name)
      File "/usr/local/lib/python3.10/importlib/__init__.py", line 126, in import_module
        return _bootstrap._gcd_import(name[level:], package, level)
      File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
      File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
      File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
      File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
      File "<frozen importlib._bootstrap_external>", line 883, in exec_module
      File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
      File "/tmp/datahub/ingest/venv-metabase-0.10.2/lib/python3.10/site-packages/datahub/ingestion/source/metabase.py", line 10, in <module>
        from sqllineage.runner import LineageRunner
      File "/tmp/datahub/ingest/venv-metabase-0.10.2/lib/python3.10/site-packages/sqllineage/__init__.py", line 41, in <module>
        _monkey_patch()
      File "/tmp/datahub/ingest/venv-metabase-0.10.2/lib/python3.10/site-packages/sqllineage/__init__.py", line 35, in _monkey_patch
        _patch_updating_lateral_view_lexeme()
      File "/tmp/datahub/ingest/venv-metabase-0.10.2/lib/python3.10/site-packages/sqllineage/__init__.py", line 24, in _patch_updating_lateral_view_lexeme
        if regex("LATERAL VIEW EXPLODE(col)"):
    TypeError: 'str' object is not callable
    
    The above exception was the direct cause of the following exception:
    
    Traceback (most recent call last):
      File "/tmp/datahub/ingest/venv-metabase-0.10.2/lib/python3.10/site-packages/datahub/entrypoints.py", line 182, in main
        sys.exit(datahub(standalone_mode=False, **kwargs))
      File "/tmp/datahub/ingest/venv-metabase-0.10.2/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
        return self.main(*args, **kwargs)
      File "/tmp/datahub/ingest/venv-metabase-0.10.2/lib/python3.10/site-packages/click/core.py", line 1055, in main
        rv = self.invoke(ctx)
      File "/tmp/datahub/ingest/venv-metabase-0.10.2/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
      File "/tmp/datahub/ingest/venv-metabase-0.10.2/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
      File "/tmp/datahub/ingest/venv-metabase-0.10.2/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
        return ctx.invoke(self.callback, **ctx.params)
      File "/tmp/datahub/ingest/venv-metabase-0.10.2/lib/python3.10/site-packages/click/core.py", line 760, in invoke
        return __callback(*args, **kwargs)
      File "/tmp/datahub/ingest/venv-metabase-0.10.2/lib/python3.10/site-packages/click/decorators.py", line 26, in new_func
        return f(get_current_context(), *args, **kwargs)
      File "/tmp/datahub/ingest/venv-metabase-0.10.2/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 379, in wrapper
        raise e
      File "/tmp/datahub/ingest/venv-metabase-0.10.2/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 334, in wrapper
        res = func(*args, **kwargs)
      File "/tmp/datahub/ingest/venv-metabase-0.10.2/lib/python3.10/site-packages/datahub/utilities/memory_leak_detector.py", line 95, in wrapper
        return func(ctx, *args, **kwargs)
      File "/tmp/datahub/ingest/venv-metabase-0.10.2/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 187, in run
        pipeline = Pipeline.create(
      File "/tmp/datahub/ingest/venv-metabase-0.10.2/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 328, in create
        return cls(
      File "/tmp/datahub/ingest/venv-metabase-0.10.2/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 211, in __init__
        with _add_init_error_context(
      File "/usr/local/lib/python3.10/contextlib.py", line 153, in __exit__
        self.gen.throw(typ, value, traceback)
      File "/tmp/datahub/ingest/venv-metabase-0.10.2/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 121, in _add_init_error_context
        raise PipelineInitError(f"Failed to {step}: {e}") from e
    datahub.ingestion.run.pipeline.PipelineInitError: Failed to find a registered source for type metabase: 'str' object is not callable
    [2023-05-14 08:45:11,040] DEBUG    {datahub.entrypoints:197} - DataHub CLI version: 0.10.2 at /tmp/datahub/ingest/venv-metabase-0.10.2/lib/python3.10/site-packages/datahub/__init__.py
    [2023-05-14 08:45:11,040] DEBUG    {datahub.entrypoints:200} - Python version: 3.10.10 (main, Mar 14 2023, 03:08:22) [GCC 10.2.1 20210110] at /tmp/datahub/ingest/venv-metabase-0.10.2/bin/python3 on Linux-5.15.49-linuxkit-aarch64-with-glibc2.31
    [2023-05-14 08:45:11,040] DEBUG    {datahub.entrypoints:205} - GMS config {'models': {}, 'patchCapable': True, 'versions': {'linkedin/datahub': {'version': 'v0.10.2', 'commit': '0fa983adc7370862371b4c0786aac0e3b81a563a'}}, 'managedIngestion': {'defaultCliVersion': '0.10.2', 'enabled': True}, 'statefulIngestionCapable': True, 'supportsImpactAnalysis': True, 'timeZone': 'GMT', 'telemetry': {'enabledCli': True, 'enabledIngestion': False}, 'datasetUrnNameCasing': False, 'retention': 'true', 'datahub': {'serverType': 'quickstart'}, 'noCode': 'true'}
    🔍 1
    📖 1
    l
    a
    +2
    • 5
    • 13
  • w

    wide-ghost-47822

    05/14/2023, 8:33 PM
    Hi, I’ve just playing with a script which we can ingest data to Datahub programatically following this link -> https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/examples/library/programatic_pipeline.py#enroll-beta At some point, I’ve figure it out there is a method called
    log_ingestion_stats
    in pipeline object. And I wondered if I can get some metrics about the pipeline which is runned. I saw some code block inside this method which sends some statistics data using telemetry object. It is like this:
    Copy code
    telemetry.telemetry_instance.ping(
        "ingest_stats",
        {
            "source_type": self.config.source.type,
            "sink_type": self.config.sink.type,
            "records_written": stats.discretize(
                self.sink.get_report().total_records_written
            ),
            "source_failures": stats.discretize(source_failures),
            "source_warnings": stats.discretize(source_warnings),
            "sink_failures": stats.discretize(sink_failures),
            "sink_warnings": stats.discretize(sink_warnings),
            "global_warnings": global_warnings,
            "failures": stats.discretize(source_failures + sink_failures),
            "warnings": stats.discretize(
                source_warnings + sink_warnings + global_warnings
            ),
        },
    Inside the ping method, the code sends this data to an external api called Mixpanel. It seems you are collecting data about the pipeline from my machine. I don’t like this way of collecting data. Why are you collecting this data?
    l
    a
    • 3
    • 2
  • c

    colossal-waitress-83487

    05/15/2023, 1:57 AM
    Hi Everyone, I want to use java to add a new dataset (mysql table), I found the following code but can't find how to add a table field,does anyone know how to add a table field? MetadataChangeProposalWrapper mcpw = MetadataChangeProposalWrapper.builder() .entityType("dataset") .entityUrn("urnlidataset:(urnlidataPlatform:mysql,test.test5,PROD)") .upsert() .aspect(new DatasetProperties().setDescription("test").setName("test5")) .build();
    ✅ 1
    📖 1
    🔍 1
    l
    b
    • 3
    • 2
  • c

    clever-author-65853

    05/15/2023, 1:19 PM
    Hello! I’m trying to understand how the airflow ingestion works. Does Datahub ingest logs from airflow or we need to send events from the task it self?
    🔍 1
    📖 1
    ✅ 1
    l
    d
    • 3
    • 4
  • m

    miniature-hair-20451

    05/15/2023, 6:10 PM
    Hi Added a new bug for delta lake ingestor and PR to resolve it. Please review. https://github.com/datahub-project/datahub/issues/8049
    l
    d
    g
    • 4
    • 3
  • s

    silly-intern-25190

    05/16/2023, 5:12 AM
    HI, during profiling, we faced this error, and it will be helpful if someone can explain this error and the possible cause for it.
    {'error': 'Unable to emit metadata to DataHub GMS',
    'info': {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException',
    'stackTrace': 'com.linkedin.restli.server.RestLiServiceException [HTTP Status:400]: Cannot parse request entity\n'
    '\tat com.linkedin.restli.server.RestLiServiceException.fromThrowable(RestLiServiceException.java:315)\n'
    '\tat com.linkedin.restli.server.BaseRestLiServer.buildPreRoutingError(BaseRestLiServer.java:202)',
    'message': 'Cannot parse request entity',
    'status': 400,
    'id': 'urn:li:dataset:(urn:li:dataPlatform:vertica_fresh,public.test_data1,PROD)'}},
    g
    d
    • 3
    • 2
  • s

    silly-nest-50341

    05/16/2023, 5:41 AM
    Hi there, I am trying to add such lineage (dataset -> datajob -> dataset), but kept failing (I refered to this link). Adding lineage using python SDK was successful using mce_builder.make_lineage_mce, but seems this function only support dataset entity not datajob. Does python sdk currently support one easy api for adding (dataset -> datajob -> dataset) lineage? or can you give me any other way around? thanks
    • 1
    • 1
  • d

    damp-orange-46267

    05/16/2023, 9:50 AM
    Hi guys do you have any experience with this error:
    Copy code
    PipelineInitError: Failed to find a registered source for type bigquery: 'str' object is not callable
    c
    a
    • 3
    • 2
1...120121122...144Latest