DataHub #ingestion

red-pizza-28006

05/19/2022, 2:30 PM

i am using the kafka ingestion source and getting this issue with our custom schema registry (non-confluent)

Copy code

[2022-05-19 16:27:27,391] WARNING  {datahub.ingestion.source.confluent_schema_registry:47} - Failed to get subjects from schema registry: ('Connection aborted.', BadStatusLine('\x15\x03\x03\x00\x02\x02P'))

Looking at the documentation, I understand that I need to have an implementation for

KafkaSchemaRegistryBase

but where do i implement and deploy this?

brash-sundown-77702

05/19/2022, 2:33 PM

Hi Team, I am a beginner in datahub ingestion.

brash-sundown-77702

05/19/2022, 2:36 PM

So please let me know where I am going wrong

some-shoe-34751

05/19/2022, 2:55 PM

Hi Team, quick question on the

dbt

ingestion - we've ingested the dbt json files into datahub, however it has also ingested some of the upstream

Glue

schemas into dbt dataset. Ideally it should just link to the existing glue schema metadata. This has caused duplicates in the datasets. Is there a way to dedup or skip the upstream schema ingestion for dbt ?

chilly-gpu-46080

05/20/2022, 5:36 AM

Hi Team, quick question about Great Expectations validation. I’ve created a checkpoint for GE which runs fine and it should be pushing the results to DataHub but when I go look at the table there, the validation tab is disabled.

Copy code

Using v3 (Batch Request) API
Calculating Metrics: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 35/35 [00:01<00:00, 24.01it/s]
Validation succeeded!

Suite Name                  Status   Expectations met
- incident_data_table_expectations      ✔ Passed  9 of 9 (100.0 %)

and I’m certain that the results are being pushed correctly as when I try to push them without a token, I get a 401 error when running the GE checkpoint. I’ve added this configuration to my checkpoint

Copy code

- name: datahub_action
    action:
      module_name: datahub.integrations.great_expectations.action
      class_name: DataHubValidationAction
      server_url: '<http://host_name:8080>'
      token: 'really_long_token'

Any help will be greatly appreciated!

nice-mechanic-83147

05/20/2022, 6:57 AM

Hi Team, How to insert database description when ingesting using rest api . Checked below link but not finding that info https://datahubproject.io/docs/generated/ingestion/sources/postgres/

miniature-sandwich-75434

05/20/2022, 7:08 AM

Hi Team, to calculate BQ lineage and usage, what is the default time period for which logs are analysed?

red-pizza-28006

05/20/2022, 11:49 AM

More questions from my side - I am seeing this error when i run the kafka ingestion

Copy code

[2022-05-20 13:39:39,780] WARNING  {datahub.ingestion.source.confluent_schema_registry:47} - Failed to get subjects from schema registry: HTTPSConnectionPool(host='endpoint', port=8081): Max retries exceeded with url: /subjects (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1125)')))

whereas when I run this it works fine.

Copy code

curl --cacert cert.pem <endpoint>:8081/subjects

my config looks like this

Copy code

source:
  type: "kafka"
  config:
    connection:
      bootstrap: "<>"
      consumer_config:
        sasl.mechanism: "PLAIN"
        sasl.username: "<>"
        sasl.password: "<>"
        security.protocol: "sasl_ssl"
      schema_registry_url: "https://<>:8081"
      schema_registry_config:
        <http://basic.auth.user.info|basic.auth.user.info>: <>:<>
        ssl.ca.location: "cert.pem"
# see <https://datahubproject.io/docs/metadata-ingestion/sink_docs/datahub> for complete documentation
sink:
  type: "datahub-rest"
  config:
    server: "<http://gms-datacatalog.data-ing-prod-eks-eu-west-1.sam-app.ro>"

Any ideas?

salmon-angle-92685

05/20/2022, 12:53 PM

Hello guys, I am having a trouble regarding the transformers of type "_pattern_add_dataset_tags_". I want to add a tag to all the tables inside a schema called "_test_" for example. However, there are as well some tables with this pattern on their names but which are inside of different schemas, for example: _different_schema.table_test_ . So on the pattern string that we set in the YML recipe, I want to scape a single dot to be sure I am getting the schema name: "_.*\.test_". But if a try to scape the "." as "\." it does not work... How do you guys do it? Thank you so much everyone!

plus1 1

nice-mechanic-83147

05/20/2022, 5:36 PM

Hi Team When inserting dataset to postgresql database using rest_emitter mentiond in the https://datahubproject.io/docs/generated/metamodel/entities/dataset/ doc Python SDK Add schema to database example, To add dataset to existing data source we need to mention the urn name as database.schema.table_name like below entityUrn=make_dataset_urn(platform='postgres', name='demo_db.demo_schema.demo_table', env="PROD"). The Dataset name is showing wrongly as demo_db.demo_schema.demo_table in datahub UI. Any solution to solve this issue?

steep-thailand-61363

05/22/2022, 11:42 AM

Hi All, We are having authentication issues when running ingestion from Snowflake to DataHub. The ingestion doesn’t work from UI when OKTA Auth is enabled and works when OKTA Auth is disabled. Is seems like writing back the metrics to DataHub fails on the auth errors. Is this a known issue? Any help here will be highly appreciated. Best, Asaf

astonishing-dusk-99990

05/23/2022, 4:40 AM

Hi All, Currently I'm executing ingestion from superset to datahub with following as per documentation.

Copy code

source:
    type: superset
    config:
        connect_uri: '<http://xx.xx.xx.xx:8088>'
        username: admin
        password: admin
        provider: db
sink:
    type: datahub-rest
    config:
        server: '<http://datahub-gms:8080>'

But I'm getting error on this

Copy code

---- (full traceback above) ----
           'File "/tmp/datahub/ingest/venv-2bd5bc52-6ba1-45c3-b58a-830ae5dd0254/lib/python3.9/site-packages/datahub/cli/ingest_cli.py", line 103, in '
           'run\n'
           '    pipeline = Pipeline.create(pipeline_config, dry_run, preview, preview_workunits)\n'
           'File "/tmp/datahub/ingest/venv-2bd5bc52-6ba1-45c3-b58a-830ae5dd0254/lib/python3.9/site-packages/datahub/ingestion/run/pipeline.py", line '
           '203, in create\n'
           '    return cls(\n'
           'File "/tmp/datahub/ingest/venv-2bd5bc52-6ba1-45c3-b58a-830ae5dd0254/lib/python3.9/site-packages/datahub/ingestion/run/pipeline.py", line '
           '151, in __init__\n'
           '    self.source: Source = source_class.create(\n'
           'File "/tmp/datahub/ingest/venv-2bd5bc52-6ba1-45c3-b58a-830ae5dd0254/lib/python3.9/site-packages/datahub/ingestion/source/superset.py", '
           'line 157, in create\n'
           '    return cls(ctx, config)\n'
           'File "/tmp/datahub/ingest/venv-2bd5bc52-6ba1-45c3-b58a-830ae5dd0254/lib/python3.9/site-packages/datahub/ingestion/source/superset.py", '
           'line 137, in __init__\n'
           '    self.access_token = login_response.json()["access_token"]\n'
           '\n'
           "KeyError: 'access_token'\n"
           '[2022-05-23 04:34:04,688] INFO     {datahub.entrypoints:176} - DataHub CLI version: 0.8.34.1 at '
           '/tmp/datahub/ingest/venv-2bd5bc52-6ba1-45c3-b58a-830ae5dd0254/lib/python3.9/site-packages/datahub/__init__.py\n'
           '[2022-05-23 04:34:04,688] INFO     {datahub.entrypoints:179} - Python version: 3.9.9 (main, Dec 21 2021, 10:03:34) \n'
           '[GCC 10.2.1 20210110] at /tmp/datahub/ingest/venv-2bd5bc52-6ba1-45c3-b58a-830ae5dd0254/bin/python3 on '
           'Linux-4.14.262-200.489.amzn2.x86_64-x86_64-with-glibc2.31\n'
           "[2022-05-23 04:34:04,688] INFO     {datahub.entrypoints:182} - GMS config {'models': {}, 'versions': {'linkedin/datahub': {'version': "
           "'v0.8.35', 'commit': 'f0756460483e84a121410ad16d7acf6f34986978'}}, 'managedIngestion': {'defaultCliVersion': '0.8.34.1', 'enabled': "
           "True}, 'statefulIngestionCapable': True, 'supportsImpactAnalysis': False, 'telemetry': {'enabledCli': True, 'enabledIngestion': False}, "
           "'datasetUrnNameCasing': False, 'retention': 'true', 'noCode': 'true'}\n",
           "2022-05-23 04:34:05.851851 [exec_id=2bd5bc52-6ba1-45c3-b58a-830ae5dd0254] INFO: Failed to execute 'datahub ingest'",
           '2022-05-23 04:34:05.852216 [exec_id=2bd5bc52-6ba1-45c3-b58a-830ae5dd0254] INFO: Caught exception EXECUTING '
           'task_id=2bd5bc52-6ba1-45c3-b58a-830ae5dd0254, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n'
           '  File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/default_executor.py", line 121, in execute_task\n'
           '    self.event_loop.run_until_complete(task_future)\n'
           '  File "/usr/local/lib/python3.9/site-packages/nest_asyncio.py", line 89, in run_until_complete\n'
           '    return f.result()\n'
           '  File "/usr/local/lib/python3.9/asyncio/futures.py", line 201, in result\n'
           '    raise self._exception\n'
           '  File "/usr/local/lib/python3.9/asyncio/tasks.py", line 256, in __step\n'
           '    result = coro.send(None)\n'
           '  File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 115, in execute\n'
           '    raise TaskError("Failed to execute \'datahub ingest\'")\n'
           "acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"]}
Execution finished with errors.

Does anyone know how to fix it?

nutritious-bird-77396

05/23/2022, 2:59 PM

Hello Team, Will the Protobuf Support PR be part of the next release? https://github.com/datahub-project/datahub/pull/4578 I don't see it the release notes last week (0.8.35)

mysterious-lamp-91034

05/23/2022, 6:24 PM

Hi I have a question, How can datahub ingest hive table change information in my kafka flow? Is there a document? The context is I want a realtime push-based hive ingestion. Thanks

clever-machine-43182

05/24/2022, 5:24 AM

Hi 😄 Here’s a question, is DataHub team going to support DynamoDB?

polite-application-51650

05/24/2022, 6:33 AM

Hi Team, can anyone guide me on setting up great expectations with Big Query for data validation.

rapid-fireman-19686

05/24/2022, 10:07 AM

Hi I’m new to Datahub. I have a problem integrating Datahub with Kafka and Spark -- there seems no lineage of them. My Kafka has two topics. My spark streaming app consumes messages from topic1. And then, the spark streaming app produces the messages to topic2. Currently I can see the spark streaming app on my Datahub web UI although I can’t find the lineage at all. Could you give me some advices on this? Thanks in advance!

quick-motorcycle-57957

05/24/2022, 2:12 PM

Hello, just deployed a DH instance to a private cluster; is there a way to ingest data from a private S3 bucket? it looks like the S3 ingestion assumes AWS?

nutritious-bird-77396

05/24/2022, 4:51 PM

Team, I am getting errors when trying to ingest Glossary Terms - https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/examples/bootstrap_data/sample_pii_glossary.yml Errors in 🧵

orange-coat-2879

05/24/2022, 9:48 PM

Hi team, one quick question: Can I connect Datahub with Veeva Vault? If so, how can I ingest data? Thanks

best-umbrella-24804

05/25/2022, 1:15 AM

Hello, I am trying to get query data from snowflake. I have defined a usage recipe as per this page. https://datahubproject.io/docs/generated/ingestion/sources/snowflake Which looks like this

Copy code

source:
    type: snowflake-usage
    config:
        host_port: <http://xxxxx.snowflakecomputing.com|xxxxx.snowflakecomputing.com>
        warehouse: DEVELOPER_X_SMALL
        username: DATAHUB_DEV_USER
        password: '${SNOWFLAKE_DEV_PASSWORD}'
        role: DATAHUB_DEV_ACCESS
        top_n_queries: 10
sink:
    type: datahub-rest
    config:
        server: xxxxxx'

When running the ingestion, it errors out and looks like its having trouble installing packages? I've attached the full log. Thanks in advance!

Copy code

~~~~ Execution Summary ~~~~

RUN_INGEST - {'errors': [],
 'exec_id': '897f0639-d673-433a-aab5-76460957d26b',
 'infos': ['2022-05-25 00:32:52.050959 [exec_id=897f0639-d673-433a-aab5-76460957d26b] INFO: Starting execution for task with name=RUN_INGEST',
           '2022-05-25 00:35:00.631829 [exec_id=897f0639-d673-433a-aab5-76460957d26b] INFO: stdout=Requirement already satisfied: pip in '
           '/tmp/datahub/ingest/venv-897f0639-d673-433a-aab5-76460957d26b/lib/python3.9/site-packages (21.2.4)\n'
           'Collecting pip\n'
           '  Using cached pip-22.1.1-py3-none-any.whl (2.1 MB)\n'
           'Collecting wheel\n'
           '  Using cached wheel-0.37.1-py2.py3-none-any.whl (35 kB)\n'
           'Requirement already satisfied: setuptools in /tmp/datahub/ingest/venv-897f0639-d673-433a-aab5-76460957d26b/lib/python3.9/site-packages '
           '(58.1.0)\n'
           'Collecting setuptools\n'
           '  Using cached setuptools-62.3.2-py3-none-any.whl (1.2 MB)\n'
           'Installing collected packages: wheel, setuptools, pip\n'
           '  Attempting uninstall: setuptools\n'
           '    Found existing installation: setuptools 58.1.0\n'
           '    Uninstalling setuptools-58.1.0:\n'
           '      Successfully uninstalled setuptools-58.1.0\n'
           '  Attempting uninstall: pip\n'
           '    Found existing installation: pip 21.2.4\n'
           '    Uninstalling pip-21.2.4:\n'
           '      Successfully uninstalled pip-21.2.4\n'
           'Successfully installed pip-22.1.1 setuptools-62.3.2 wheel-0.37.1\n'
           'Collecting acryl-datahub[datahub-rest,snowflake-usage]==0.8.33\n'
           '  Using cached acryl_datahub-0.8.33-py3-none-any.whl (756 kB)\n'
           'Collecting mixpanel>=4.9.0\n'
           '  Using cached mixpanel-4.9.0-py2.py3-none-any.whl (8.9 kB)\n'
           'Collecting types-termcolor>=1.0.0\n'
           '  Using cached types_termcolor-1.1.4-py3-none-any.whl (2.1 kB)\n'
           'Collecting avro<1.11,>=1.10.2\n'
           '  Using cached avro-1.10.2-py3-none-any.whl\n'
           'Collecting docker\n'
           '  Using cached docker-5.0.3-py2.py3-none-any.whl (146 kB)\n'
           'Collecting markupsafe<=2.0.1,>=1.1.1\n'
           '  Using cached MarkupSafe-2.0.1-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (30 kB)\n'
           'Collecting typing-extensions>=3.10.0.2\n'
           '  Using cached typing_extensions-4.2.0-py3-none-any.whl (24 kB)\n'
           'Collecting types-Deprecated\n'
           '  Using cached types_Deprecated-1.2.8-py3-none-any.whl (3.1 kB)\n'
           'Collecting toml>=0.10.0\n'
           '  Using cached toml-0.10.2-py2.py3-none-any.whl (16 kB)\n'
           'Collecting click>=6.0.0\n'
           '  Using cached click-8.1.3-py3-none-any.whl (96 kB)\n'
           'Collecting termcolor>=1.0.0\n'
           '  Using cached termcolor-1.1.0-py3-none-any.whl\n'
           'Collecting python-dateutil>=2.8.0\n'
           '  Using cached python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)\n'
           'Collecting click-default-group\n'
           '  Using cached click_default_group-1.2.2-py3-none-any.whl\n'
           'Collecting tabulate\n'
           '  Using cached tabulate-0.8.9-py3-none-any.whl (25 kB)\n'
           'Collecting typing-inspect\n'
           '  Using cached typing_inspect-0.7.1-py3-none-any.whl (8.4 kB)\n'
           'Collecting mypy-extensions>=0.4.3\n'
           '  Using cached mypy_extensions-0.4.3-py2.py3-none-any.whl (4.5 kB)\n'
           'Collecting PyYAML\n'
           '  Using cached PyYAML-6.0-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (661 kB)\n'
           'Collecting entrypoints\n'
           '  Using cached entrypoints-0.4-py3-none-any.whl (5.3 kB)\n'
           'Collecting psutil>=5.8.0\n'
           '  Using cached psutil-5.9.1-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (281 '
           'kB)\n'
           'Collecting progressbar2\n'
           '  Using cached progressbar2-4.0.0-py2.py3-none-any.whl (26 kB)\n'
           'Collecting expandvars>=0.6.5\n'
           '  Using cached expandvars-0.9.0-py3-none-any.whl (6.6 kB)\n'
           'Collecting pydantic>=1.5.1\n'
           '  Using cached pydantic-1.9.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.4 MB)\n'
           'Collecting stackprinter\n'
           '  Using cached stackprinter-0.2.6-py3-none-any.whl (28 kB)\n'
           'Collecting Deprecated\n'
           '  Using cached Deprecated-1.2.13-py2.py3-none-any.whl (9.6 kB)\n'
           'Collecting avro-gen3==0.7.2\n'
           '  Using cached avro_gen3-0.7.2-py3-none-any.whl (26 kB)\n'
           'Collecting requests\n'
           '  Using cached requests-2.27.1-py2.py3-none-any.whl (63 kB)\n'
           'Collecting sqlalchemy==1.3.24\n'
           '  Using cached SQLAlchemy-1.3.24-cp39-cp39-manylinux2010_x86_64.whl (1.3 MB)\n'
           'Collecting Jinja2<3.1.0\n'
           '  Using cached Jinja2-3.0.3-py3-none-any.whl (133 kB)\n'
           'Collecting more-itertools>=8.12.0\n'
           '  Downloading more_itertools-8.13.0-py3-none-any.whl (51 kB)\n'
           '     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 51.6/51.6 kB 8.3 MB/s eta 0:00:00\n'
           'Collecting snowflake-sqlalchemy<=1.2.4\n'
           '  Using cached snowflake_sqlalchemy-1.2.4-py2.py3-none-any.whl (29 kB)\n'
           'Collecting cryptography\n'
           '  Using cached cryptography-37.0.2-cp36-abi3-manylinux_2_24_x86_64.whl (4.0 MB)\n'
           'Collecting greenlet\n'
           '  Using cached greenlet-1.1.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (153 kB)\n'
           'Collecting great-expectations>=0.14.11\n'
           '  Using cached great_expectations-0.15.6-py3-none-any.whl (5.1 MB)\n'
           'Collecting sqlparse\n'
           '  Downloading sqlparse-0.4.2-py3-none-any.whl (42 kB)\n'
           '     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 42.3/42.3 kB 7.8 MB/s eta 0:00:00\n'
           'Collecting six\n'
           '  Using cached six-1.16.0-py2.py3-none-any.whl (11 kB)\n'
           'Collecting pytz\n'
           '  Using cached pytz-2022.1-py2.py3-none-any.whl (503 kB)\n'
           'Collecting tzlocal\n'
           '  Using cached tzlocal-4.2-py3-none-any.whl (19 kB)\n'
           'Collecting colorama>=0.4.3\n'
           '  Using cached colorama-0.4.4-py2.py3-none-any.whl (16 kB)\n'
           'Collecting importlib-metadata>=1.7.0\n'
           '  Using cached importlib_metadata-4.11.4-py3-none-any.whl (18 kB)\n'
           'Collecting jsonpatch>=1.22\n'
           '  Using cached jsonpatch-1.32-py2.py3-none-any.whl (12 kB)\n'
           'Collecting pandas>=0.23.0\n'
           '  Using cached pandas-1.4.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.7 MB)\n'
           'Collecting nbformat>=5.0\n'
           '  Using cached nbformat-5.4.0-py3-none-any.whl (73 kB)\n'
           'Collecting ruamel.yaml<0.17.18,>=0.16\n'
           '  Using cached ruamel.yaml-0.17.17-py3-none-any.whl (109 kB)\n'
           'Collecting urllib3<1.27,>=1.25.4\n'
           '  Using cached urllib3-1.26.9-py2.py3-none-any.whl (138 kB)\n'
           'Collecting Ipython>=7.16.3\n'
           '  Using cached ipython-8.3.0-py3-none-any.whl (750 kB)\n'
           'Collecting pyparsing<3,>=2.4\n'
           '  Using cached pyparsing-2.4.7-py2.py3-none-any.whl (67 kB)\n'
           'Collecting packaging\n'
           '  Using cached packaging-21.3-py3-none-any.whl (40 kB)\n'
           'Collecting tqdm>=4.59.0\n'
           '  Using cached tqdm-4.64.0-py2.py3-none-any.whl (78 kB)\n'
           'Collecting altair<5,>=4.0.0\n'
           '  Using cached altair-4.2.0-py3-none-any.whl (812 kB)\n'
           'Collecting notebook>=6.4.10\n'
           '  Using cached notebook-6.4.11-py3-none-any.whl (9.9 MB)\n'
           'Collecting cryptography\n'
           '  Using cached cryptography-36.0.2-cp36-abi3-manylinux_2_24_x86_64.whl (3.6 MB)\n'
           'Collecting jsonschema>=2.5.1\n'
           '  Using cached jsonschema-4.5.1-py3-none-any.whl (72 kB)\n'
           'Collecting mistune>=0.8.4\n'
           '  Using cached mistune-2.0.2-py2.py3-none-any.whl (24 kB)\n'
           'Collecting scipy>=0.19.0\n'
           '/usr/local/bin/run_ingest.sh: line 16:   424 Killed                  pip install -r $req_file\n'
           '/tmp/datahub/ingest/venv-897f0639-d673-433a-aab5-76460957d26b/bin/python3: No module named datahub\n',
           "2022-05-25 00:35:00.632102 [exec_id=897f0639-d673-433a-aab5-76460957d26b] INFO: Failed to execute 'datahub ingest'",
           '2022-05-25 00:35:00.632580 [exec_id=897f0639-d673-433a-aab5-76460957d26b] INFO: Caught exception EXECUTING '
           'task_id=897f0639-d673-433a-aab5-76460957d26b, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n'
           '  File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/default_executor.py", line 119, in execute_task\n'
           '    self.event_loop.run_until_complete(task_future)\n'
           '  File "/usr/local/lib/python3.9/site-packages/nest_asyncio.py", line 81, in run_until_complete\n'
           '    return f.result()\n'
           '  File "/usr/local/lib/python3.9/asyncio/futures.py", line 201, in result\n'
           '    raise self._exception\n'
           '  File "/usr/local/lib/python3.9/asyncio/tasks.py", line 256, in __step\n'
           '    result = coro.send(None)\n'
           '  File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 115, in execute\n'
           '    raise TaskError("Failed to execute \'datahub ingest\'")\n'
           "acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"]}
Execution finished with errors.

clever-machine-43182

05/25/2022, 6:47 AM

Hi. I have a question about privileges for ingestion. Can I get right privilege of each data sources for ingestion or SQL profiling? This is for giving right role on right user(DataHub).

polite-application-51650

05/25/2022, 8:07 AM

Hi team, I've enabled profiling in my datahub recipe for Big Query Ingestion, but after running for a while it threw the following error. Can anyone please help me resolve it

Copy code

E0524 21:41:14.098937000 123145790181376 <http://completion_queue.cc:1052]|completion_queue.cc:1052]>     Completion queue next failed: {"created":"@1653408674.098904000","description":"Too many open files","errno":24,"file":"src/core/lib/iomgr/wakeup_fd_pipe.cc","file_line":40,"os_error":"Too many open files","syscall":"pipe"}
E0524 21:41:14.100019000 123145571917824 <http://completion_queue.cc:1052]|completion_queue.cc:1052]>     Completion queue next failed: {"created":"@1653408674.099962000","description":"Too many open files","errno":24,"file":"src/core/lib/iomgr/wakeup_fd_pipe.cc","file_line":40,"os_error":"Too many open files","syscall":"pipe"}
E0524 21:41:14.100510000 123146495340544 <http://wakeup_fd_pipe.cc:39]|wakeup_fd_pipe.cc:39]>         pipe creation failed (24): Too many open files
E0524 21:41:14.100272000 123146478551040 <http://completion_queue.cc:1052]|completion_queue.cc:1052]>     Completion queue next failed: {"created":"@1653408674.099987000","description":"Too many open files","errno":24,"file":"src/core/lib/iomgr/wakeup_fd_pipe.cc","file_line":40,"os_error":"Too many open files","syscall":"pipe"}

@dazzling-judge-80093

clean-piano-28976

05/25/2022, 10:37 AM

Hi can anyone confirm if it’s possible to perform a

curl request

to delete all metadata related to a specific platform? In the documentation I only see an example using URN

steep-painter-66054

05/25/2022, 11:02 AM

Hi everyone, We are trying to ingest Mysql source Data via UI in the ingestion menu. We got an error according to the connection access (cf. error document linked). Note : It is working in CLI. Can anyone provide some help ? Thanks

error.txt

plus1 1

handsome-football-66174

05/25/2022, 1:15 PM

Hi Everyone, we wanted to check if we nested schema is supported by Datahub.

salmon-angle-92685

05/25/2022, 1:53 PM

Hello guys, I have some tables which are present in both Redshift and S3. Is there a way of ingesting the metadata from redshift into s3 for the for the concomitant tables ? Thank you so much !

numerous-camera-74294

05/25/2022, 2:17 PM

hello! the datahub sdk for Python defines the urn for a dataset field as

Copy code

urn:li:schemaField:(...,fieldName)

and the datahub sdk for Java defines it as

Copy code

urn:li:datasetField:(...,fieldName)

which ine is the correct one? I am using v0.8.34 for both sdk’s

echoing-alligator-70530

05/25/2022, 3:38 PM

Hey everyone, I was working on ingesting dbt into datahub, was wondering what the platform_instance was in the configuration. Is it to specify the dbt project if there are multiples?

billions-twilight-48559

05/25/2022, 8:44 PM

Hi there, It’s possible to add auth configuration to GMS API like bearer token in the crawlers config? We are distributing crawlers for several business units and must call to GMS which will be behind an API Gateway with jwt security. I think just will be possible to place authentication methods that can be placed in the GMS URL like https://user:password@domain-gms.com or https://domain-gms.com?token=xxxxxx There are any place where pass credentials at the body request? Thanks!