https://datahubproject.io logo
Join Slack
Powered by
# ingestion
  • l

    late-arm-1146

    11/15/2022, 3:15 AM
    Hi All, I am trying to ingest data from an Oracle Database. Profiling is off. The connecting user just has 'SELECT_CATALOG_ROLE'. DataHub version 0.8.45. I am able to view the schema and names of the tables within the schema but no details on the table columns i.e., name or data type. Do I need a broader permission to retrieve the table level metadata?
    h
    • 2
    • 3
  • s

    silly-finland-62382

    11/15/2022, 7:10 AM
    Hello All, is there any way to delete all tasks or complete spark platform from the datahub ? because as per dochttps://datahubproject.io/docs/how/delete-metadata/ there is no command to delete complete spark platform from the daatahub dashboard, and there are lot of tasks/pipelines which are very hard to delete based on URN based or other commands as per doc is not working
    h
    • 2
    • 1
  • a

    ancient-policeman-73437

    11/15/2022, 8:36 AM
    Dear support, could somebody please help me with it ? It is one of the key features of DataHub and it doesnt work well. https://datahubspace.slack.com/archives/CUMUWQU66/p1668187176982779?thread_ts=1646688583.497739&cid=CUMUWQU66
    g
    • 2
    • 2
  • s

    swift-judge-22731

    11/15/2022, 3:57 PM
    Hello, We have hosted datahub using kubernetes and are trying to ingest data with mssql using the custom recipe in the UI but we cannot get it to work. Our current recipe looks like this:
    Copy code
    sink:
        type: datahub-rest
    source:
        type: mssql
        config:
            use_odbc: 'True'
            host_port: '----:1433'
            password: ----
            database: ----
            username: ----
            uri_args:
                driver: 'ODBC Driver 17 for SQL Server'
                Encrypt: yes
                TrustServerCertificate: Yes
                ssl: 'True'
    And we get the following error:
    Copy code
    RUN_INGEST - {'errors': [],
     'exec_id': '75ff2262-fcbb-47c3-8a60-bcf0445697e0',
     'infos': ['2022-11-15 15:37:57.369176 [exec_id=75ff2262-fcbb-47c3-8a60-bcf0445697e0] INFO: Starting execution for task with name=RUN_INGEST',
               '2022-11-15 15:38:01.452989 [exec_id=75ff2262-fcbb-47c3-8a60-bcf0445697e0] INFO: stdout=venv setup time = 0\n'
               'This version of datahub supports report-to functionality\n'
               'datahub --debug ingest run -c /tmp/datahub/ingest/75ff2262-fcbb-47c3-8a60-bcf0445697e0/recipe.yml --report-to '
               '/tmp/datahub/ingest/75ff2262-fcbb-47c3-8a60-bcf0445697e0/ingestion_report.json\n'
               '[2022-11-15 15:37:58,617] DEBUG    {datahub.telemetry.telemetry:210} - Sending init Telemetry\n'
               '[2022-11-15 15:37:59,245] DEBUG    {datahub.telemetry.telemetry:243} - Sending Telemetry\n'
               '[2022-11-15 15:37:59,501] INFO     {datahub.cli.ingest_cli:182} - DataHub CLI version: 0.9.0\n'
               "[2022-11-15 15:37:59,504] DEBUG    {datahub.cli.ingest_cli:196} - Using config: {'pipeline_name': "
               "'urn:li:dataHubIngestionSource:79107ae9-94bc-4d08-82f8-7e3769edae25', 'run_id': '75ff2262-fcbb-47c3-8a60-bcf0445697e0', 'sink': {'type': "
               "'datahub-rest'}, 'source': {'config': {'database': '--------', 'host_port': "
               "'---------', 'password': '--------', 'uri_args': {'Encrypt': 'yes', 'TrustServerCertificate': "
               "'Yes', 'driver': 'ODBC Driver 17 for SQL Server', 'ssl': 'True'}, 'use_odbc': 'True', 'username': '--------'}, 'type': 'mssql'}}\n"
               '[2022-11-15 15:37:59,504] DEBUG    {datahub.telemetry.telemetry:243} - Sending Telemetry\n'
               '[2022-11-15 15:37:59,764] ERROR    {datahub.entrypoints:165} - 1 validation error for PipelineConfig\n'
               'datahub_api -> __root__\n'
               '  DataHubGraphConfig expected dict not NoneType (type=type_error)\n'
               '[2022-11-15 15:37:59,765] DEBUG    {datahub.entrypoints:198} - DataHub CLI version: 0.9.0 at '
               '/tmp/datahub/ingest/venv-mssql-0.9.0/lib/python3.10/site-packages/datahub/__init__.py\n'
               '[2022-11-15 15:37:59,765] DEBUG    {datahub.entrypoints:201} - Python version: 3.10.7 (main, Sep 13 2022, 14:31:33) [GCC 10.2.1 '
               '20210110] at /tmp/datahub/ingest/venv-mssql-0.9.0/bin/python3 on Linux-5.4.0-1091-azure-x86_64-with-glibc2.31\n'
               '[2022-11-15 15:37:59,766] DEBUG    {datahub.entrypoints:204} - GMS config {}\n',
               "2022-11-15 15:38:01.453254 [exec_id=75ff2262-fcbb-47c3-8a60-bcf0445697e0] INFO: Failed to execute 'datahub ingest'",
               '2022-11-15 15:38:01.453461 [exec_id=75ff2262-fcbb-47c3-8a60-bcf0445697e0] INFO: Caught exception EXECUTING '
               'task_id=75ff2262-fcbb-47c3-8a60-bcf0445697e0, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n'
               '  File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/default_executor.py", line 123, in execute_task\n'
               '    task_event_loop.run_until_complete(task_future)\n'
               '  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 646, in run_until_complete\n'
               '    return future.result()\n'
               '  File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 168, in execute\n'
               '    raise TaskError("Failed to execute \'datahub ingest\'")\n'
               "acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"]}
    Execution finished with errors.
    Do anyone have any tips for ingesting data for mssql? Using the cli we get pyodbc missing driver and can't figure out a way that way either. Appreciate any suggestions!
    a
    • 2
    • 1
  • b

    bland-orange-13353

    11/15/2022, 3:57 PM
    Acryl Data delivers an easy to consume DataHub platform for the enterprise - sign up here: https://www.acryldata.io/sign-up
  • a

    ancient-jordan-41401

    11/16/2022, 8:17 AM
    Hi, i'm trying to ingest data from a azure sql server to my datahub which is deployed on K8s but am getting pyodbc not found? I've installed pyodbc in "datahub-acryl-datahub-actions" as a follow up; how do i ingest using the cli on kubernetes? error log:
    Copy code
    ~~~~ Execution Summary ~~~~
    
    RUN_INGEST - {'errors': [],
     'exec_id': '889d2fc9-8e50-4605-b9b9-a5af6db7fb08',
     'infos': ['2022-11-16 07:58:54.589510 [exec_id=889d2fc9-8e50-4605-b9b9-a5af6db7fb08] INFO: Starting execution for task with name=RUN_INGEST',
               '2022-11-16 07:58:58.677408 [exec_id=889d2fc9-8e50-4605-b9b9-a5af6db7fb08] INFO: stdout=venv setup time = 0\n'
               'This version of datahub supports report-to functionality\n'
               'datahub  ingest run -c /tmp/datahub/ingest/889d2fc9-8e50-4605-b9b9-a5af6db7fb08/recipe.yml --report-to '
               '/tmp/datahub/ingest/889d2fc9-8e50-4605-b9b9-a5af6db7fb08/ingestion_report.json\n'
               '[2022-11-16 07:58:56,721] INFO     {datahub.cli.ingest_cli:182} - DataHub CLI version: 0.9.0\n'
               '[2022-11-16 07:58:56,754] INFO     {datahub.ingestion.run.pipeline:175} - Sink configured successfully. DataHubRestEmitter: configured '
               'to talk to <http://datahub-datahub-gms:8080>\n'
               '[2022-11-16 07:58:57,527] ERROR    {datahub.entrypoints:192} - \n'
               'Traceback (most recent call last):\n'
               '  File "/tmp/datahub/ingest/venv-mssql-0.9.0/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 196, in __init__\n'
               '    self.source: Source = source_class.create(\n'
               '  File "/tmp/datahub/ingest/venv-mssql-0.9.0/lib/python3.10/site-packages/datahub/ingestion/source/sql/mssql.py", line 177, in create\n'
               '    return cls(config, ctx)\n'
               '  File "/tmp/datahub/ingest/venv-mssql-0.9.0/lib/python3.10/site-packages/datahub/ingestion/source/sql/mssql.py", line 123, in __init__\n'
               '    for inspector in self.get_inspectors():\n'
               '  File "/tmp/datahub/ingest/venv-mssql-0.9.0/lib/python3.10/site-packages/datahub/ingestion/source/sql/mssql.py", line 213, in '
               'get_inspectors\n'
               '    url = self.config.get_sql_alchemy_url()\n'
               '  File "/tmp/datahub/ingest/venv-mssql-0.9.0/lib/python3.10/site-packages/datahub/ingestion/source/sql/mssql.py", line 75, in '
               'get_sql_alchemy_url\n'
               '    import pyodbc  # noqa: F401\n'
               "ModuleNotFoundError: No module named 'pyodbc'\n"
               '\n'
               'The above exception was the direct cause of the following exception:\n'
               '\n'
               'Traceback (most recent call last):\n'
               '  File "/tmp/datahub/ingest/venv-mssql-0.9.0/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 197, in run\n'
               '    pipeline = Pipeline.create(\n'
               '  File "/tmp/datahub/ingest/venv-mssql-0.9.0/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 317, in create\n'
               '    return cls(\n'
               '  File "/tmp/datahub/ingest/venv-mssql-0.9.0/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 202, in __init__\n'
               '    self._record_initialization_failure(\n'
               '  File "/tmp/datahub/ingest/venv-mssql-0.9.0/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 129, in '
               '_record_initialization_failure\n'
               '    raise PipelineInitError(msg) from e\n'
               'datahub.ingestion.run.pipeline.PipelineInitError: Failed to configure source (mssql)\n'
               '[2022-11-16 07:58:57,528] ERROR    {datahub.entrypoints:195} - Command failed: \n'
               '\tFailed to configure source (mssql) due to \n'
               "\t\t'No module named 'pyodbc''.\n"
               '\tRun with --debug to get full stacktrace.\n'
               "\te.g. 'datahub --debug ingest run -c /tmp/datahub/ingest/889d2fc9-8e50-4605-b9b9-a5af6db7fb08/recipe.yml --report-to "
               "/tmp/datahub/ingest/889d2fc9-8e50-4605-b9b9-a5af6db7fb08/ingestion_report.json'\n",
               "2022-11-16 07:58:58.677623 [exec_id=889d2fc9-8e50-4605-b9b9-a5af6db7fb08] INFO: Failed to execute 'datahub ingest'",
               '2022-11-16 07:58:58.677807 [exec_id=889d2fc9-8e50-4605-b9b9-a5af6db7fb08] INFO: Caught exception EXECUTING '
               'task_id=889d2fc9-8e50-4605-b9b9-a5af6db7fb08, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n'
               '  File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/default_executor.py", line 123, in execute_task\n'
               '    task_event_loop.run_until_complete(task_future)\n'
               '  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 646, in run_until_complete\n'
               '    return future.result()\n'
               '  File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 168, in execute\n'
               '    raise TaskError("Failed to execute \'datahub ingest\'")\n'
               "acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"]}
    Execution finished with errors.
    d
    h
    f
    • 4
    • 4
  • b

    billowy-pilot-93812

    11/16/2022, 10:54 AM
    Hi all I'm ingesting meta data from superset and getting this error. Any idea on this? Thank yo
    Copy code
    '[2022-11-16 10:16:23,427] ERROR    {datahub.entrypoints:185} - File '
               '"/tmp/datahub/ingest/venv-superset-0.9.2/lib/python3.10/site-packages/datahub/entrypoints.py", line 164, in main\n'
               '    161  def main(**kwargs):\n'
               '    162      # This wrapper prevents click from suppressing errors.\n'
               '    163      try:\n'
               '--> 164          sys.exit(datahub(standalone_mode=False, **kwargs))\n'
               '    165      except click.Abort:\n'
               '\n'
               'File "/tmp/datahub/ingest/venv-superset-0.9.2/lib/python3.10/site-packages/click/core.py", line 1130, in __call__\n'
               '    1128  def __call__(self, *args: t.Any, **kwargs: t.Any) -> t.Any:\n'
               ' (...)\n'
               '--> 1130      return self.main(*args, **kwargs)\n'
               '\n'
               'File "/tmp/datahub/ingest/venv-superset-0.9.2/lib/python3.10/site-packages/click/core.py", line 1055, in main\n'
               '    rv = self.invoke(ctx)\n'
               'File "/tmp/datahub/ingest/venv-superset-0.9.2/lib/python3.10/site-packages/click/core.py", line 1657, in invoke\n'
               '    return _process_result(sub_ctx.command.invoke(sub_ctx))\n'
               'File "/tmp/datahub/ingest/venv-superset-0.9.2/lib/python3.10/site-packages/click/core.py", line 1657, in invoke\n'
               '    return _process_result(sub_ctx.command.invoke(sub_ctx))\n'
               'File "/tmp/datahub/ingest/venv-superset-0.9.2/lib/python3.10/site-packages/click/core.py", line 1404, in invoke\n'
               '    return ctx.invoke(self.callback, **ctx.params)\n'
               'File "/tmp/datahub/ingest/venv-superset-0.9.2/lib/python3.10/site-packages/click/core.py", line 760, in invoke\n'
               '    return __callback(*args, **kwargs)\n'
               'File "/tmp/datahub/ingest/venv-superset-0.9.2/lib/python3.10/site-packages/click/decorators.py", line 26, in new_func\n'
               '    return f(get_current_context(), *args, **kwargs)\n'
               'File "/tmp/datahub/ingest/venv-superset-0.9.2/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 347, in wrapper\n'
               '    290  def wrapper(*args: Any, **kwargs: Any) -> Any:\n'
               ' (...)\n'
               '    343                  "status": "error",\n'
               '    344                  "error": get_full_class_name(e),\n'
               '    345              },\n'
               '    346          )\n'
               '--> 347          raise e\n'
               '\n'
               'File "/tmp/datahub/ingest/venv-superset-0.9.2/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 299, in wrapper\n'
               '    290  def wrapper(*args: Any, **kwargs: Any) -> Any:\n'
               ' (...)\n'
               '    295      telemetry_instance.ping(\n'
               '    296          "function-call", {"function": function, "status": "start"}\n'
               '    297      )\n'
               '    298      try:\n'
               '--> 299          res = func(*args, **kwargs)\n'
               '    300          telemetry_instance.ping(\n'
               '\n'
               'File "/tmp/datahub/ingest/venv-superset-0.9.2/lib/python3.10/site-packages/datahub/utilities/memory_leak_detector.py", line 95, in '
               'wrapper\n'
               '    86   def wrapper(ctx: click.Context, *args: P.args, **kwargs: P.kwargs) -> Any:\n'
               ' (...)\n'
               '    91           )\n'
               '    92           _init_leak_detection()\n'
               '    93   \n'
               '    94       try:\n'
               '--> 95           return func(ctx, *args, **kwargs)\n'
               '    96       finally:\n'
               '\n'
               'File "/tmp/datahub/ingest/venv-superset-0.9.2/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 192, in run\n'
               '    103  def run(\n'
               '    104      ctx: click.Context,\n'
               '    105      config: str,\n'
               '    106      dry_run: bool,\n'
               '    107      preview: bool,\n'
               '    108      strict_warnings: bool,\n'
               '    109      preview_workunits: int,\n'
               '    110      test_source_connection: bool,\n'
               '    111      report_to: str,\n'
               '    112      no_default_report: bool,\n'
               '    113      no_spinner: bool,\n'
               '    114  ) -> None:\n'
               ' (...)\n'
               '    188          raw_pipeline_config,\n'
               '    189      )\n'
               '    190  \n'
               '    191      loop = asyncio.get_event_loop()\n'
               '--> 192      loop.run_until_complete(run_func_check_upgrade(pipeline))\n'
               '\n'
               'File "/usr/local/lib/python3.10/asyncio/base_events.py", line 646, in run_until_complete\n'
               '    610  def run_until_complete(self, future):\n'
               ' (...)\n'
               '    642          future.remove_done_callback(_run_until_complete_cb)\n'
               '    643      if not future.done():\n'
               "    644          raise RuntimeError('Event loop stopped before Future completed.')\n"
               '    645  \n'
               '--> 646      return future.result()\n'
               '\n'
               'File "/tmp/datahub/ingest/venv-superset-0.9.2/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 151, in '
               'run_func_check_upgrade\n'
               '    146  async def run_func_check_upgrade(pipeline: Pipeline) -> None:\n'
               '    147      version_stats_future = asyncio.ensure_future(\n'
               '    148          upgrade.retrieve_version_stats(pipeline.ctx.graph)\n'
               '    149      )\n'
               '    150      the_one_future = asyncio.ensure_future(run_pipeline_async(pipeline))\n'
               '--> 151      ret = await the_one_future\n'
               '    152  \n'
               '\n'
               'File "/tmp/datahub/ingest/venv-superset-0.9.2/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 142, in run_pipeline_async\n'
               '    140  async def run_pipeline_async(pipeline: Pipeline) -> int:\n'
               '    141      loop = asyncio._get_running_loop()\n'
               '--> 142      return await loop.run_in_executor(\n'
               '    143          None, functools.partial(run_pipeline_to_completion, pipeline)\n'
               '\n'
               'File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run\n'
               '    53   def run(self):\n'
               '    54       if not self.future.set_running_or_notify_cancel():\n'
               '    55           return\n'
               '    56   \n'
               '    57       try:\n'
               '--> 58           result = self.fn(*self.args, **self.kwargs)\n'
               '    59       except BaseException as exc:\n'
               '\n'
               'File "/tmp/datahub/ingest/venv-superset-0.9.2/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 133, in '
               'run_pipeline_to_completion\n'
               '    117  def run_pipeline_to_completion(\n'
               '    118      pipeline: Pipeline, structured_report: Optional[str] = None\n'
               '    119  ) -> int:\n'
               ' (...)\n'
               '    129              )\n'
               '    130              <http://logger.info|logger.info>(\n'
               '    131                  f"Sink ({pipeline.config.sink.type}) report:\\n{pipeline.sink.get_report().as_string()}"\n'
               '    132              )\n'
               '--> 133              raise e\n'
               '    134          else:\n'
               '\n'
               'File "/tmp/datahub/ingest/venv-superset-0.9.2/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 125, in '
               'run_pipeline_to_completion\n'
               '    117  def run_pipeline_to_completion(\n'
               '    118      pipeline: Pipeline, structured_report: Optional[str] = None\n'
               '    119  ) -> int:\n'
               ' (...)\n'
               '    121      with click_spinner.spinner(\n'
               '    122          beep=False, disable=no_spinner, force=False, stream=sys.stdout\n'
               '    123      ):\n'
               '    124          try:\n'
               '--> 125              pipeline.run()\n'
               '    126          except Exception as e:\n'
               '\n'
               'File "/tmp/datahub/ingest/venv-superset-0.9.2/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 344, in run\n'
               '    332  def run(self) -> None:\n'
               ' (...)\n'
               '    340              else DeadLetterQueueCallback(\n'
               '    341                  self.ctx, self.config.failure_log.log_config\n'
               '    342              )\n'
               '    343          )\n'
               '--> 344          for wu in itertools.islice(\n'
               '    345              self.source.get_workunits(),\n'
               '\n'
               'File "/tmp/datahub/ingest/venv-superset-0.9.2/lib/python3.10/site-packages/datahub/ingestion/source/superset.py", line 354, in '
               'get_workunits\n'
               '    353  def get_workunits(self) -> Iterable[MetadataWorkUnit]:\n'
               '--> 354      yield from self.emit_dashboard_mces()\n'
               '    355      yield from self.emit_chart_mces()\n'
               '\n'
               'File "/tmp/datahub/ingest/venv-superset-0.9.2/lib/python3.10/site-packages/datahub/ingestion/source/superset.py", line 263, in '
               'emit_dashboard_mces\n'
               '    247  def emit_dashboard_mces(self) -> Iterable[MetadataWorkUnit]:\n'
               ' (...)\n'
               '    259  \n'
               '    260          current_dashboard_page += 1\n'
               '    261  \n'
               '    262          payload = dashboard_response.json()\n'
               '--> 263          for dashboard_data in payload["result"]:\n'
               '    264              dashboard_snapshot = self.construct_dashboard_from_api_data(\n'
               '\n'
    f
    g
    d
    • 4
    • 4
  • p

    polite-ghost-91039

    11/16/2022, 12:20 PM
    Hi folks, I tried ingesting trino, dbt, airflow and it's working flawlessly. I am new to datahub and looking for a custom source ingestion something liks foo:bar will also work. I tried the documentation for creating custom source and couldn't follow much. Would appreciate if someone can help me out with this like how to create a custom package. TIA
    a
    • 2
    • 1
  • s

    steep-family-13549

    11/16/2022, 12:52 PM
    Hi, I tried to integrate Great Expectations with datahub. I have done all processes but still, get an error "Datasource my_datasource is not present in platform_instance_map" before 2 days my code is working fine and show in the datahub UI under validations. Please help me this issue
    a
    • 2
    • 6
  • s

    steep-family-13549

    11/16/2022, 12:54 PM
    image.png
  • m

    microscopic-mechanic-13766

    11/16/2022, 1:28 PM
    Hello, so I am trying out transformers but I am getting the following error:
    Copy code
    transformers:
        type: pattern_add_dataset_terms
        config:
            term_patter:
                rules:
                    '*metadata*':
                        - 'urn:li:glossaryTerm:metadata'
    source:
        type: postgres
        config:
            include_tables: true
            database: knoxdb
            password: <password>
            profiling:
                enabled: false
            host_port: 'postgresql:5432'
            include_views: true
            username: <username>
    Copy code
    ERROR    {datahub.entrypoints:182} - 1 validation error for PipelineConfig\n'
               'transformers\n'
               '  value is not a valid list (type=type_error.list)\n'
    d
    • 2
    • 12
  • c

    colossal-smartphone-90274

    11/16/2022, 4:20 PM
    Hi, I am currently working on ingesting some on-premise PowerBI data using the module "powerbi-report-server". My problem is to do with the "graphql_url" attribute as from what I have seen on-premise PowerBI doesn't have an endpoint for it. Thanks very much 🙂
    a
    a
    a
    • 4
    • 11
  • r

    rich-state-73859

    11/16/2022, 6:55 PM
    Hi channel, I got
    Caused by: java.lang.ClassNotFoundException: <http://datahub.shaded.org|datahub.shaded.org>.apache.http.ssl.TrustStrategy
    when using
    datahub-protobuf-0.9.2.jar
    but it could work with
    datahub-protobuf-0.8.45.jar
    . Is there any solution for this?
  • c

    chilly-truck-63841

    11/16/2022, 9:40 PM
    Hey team - is support for dbt cloud available yet or coming soon? I recall an early Q4 timeline referenced in a previous town hall. Thanks!
    g
    f
    • 3
    • 4
  • w

    wonderful-egg-79350

    11/17/2022, 1:40 AM
    Hello all. I have a question about csv-enricher. How to ingest owners, tags, domain etc by using csv-enricher? I write yml and csv. and try to ingest by using "datahub ingest -c test.yml" cli. But error pops up like below. Also yml file source like this
    b
    g
    a
    • 4
    • 13
  • t

    thousands-branch-81757

    11/17/2022, 5:18 AM
    How can I setup and push metadata to a custom env? Currently there are only pre-defined env {'UAT', 'TEST', 'NON_PROD', 'STG', 'PROD', 'CORP', 'QA', 'EI', 'PRE', 'DEV'} but I want to setup multi env for multi tenant.
    a
    • 2
    • 2
  • b

    bumpy-journalist-41369

    11/17/2022, 10:28 AM
    Hello. I am wondering is it possible enrich ingested data from a source like S3 or Glue with the tags that are provided in those buckets. For example if an S3 bucket is tagged with a key-value pairs like: owner : foo@bar.gz how do you tell the recipe to include this data and then display it in the tags column? Attaching the current state of the ingested data (it is currently empty as you can see):
    h
    • 2
    • 1
  • b

    best-wire-59738

    11/17/2022, 1:25 PM
    Hello Team, I am facing a problem during Ingestion from UI. We have upgraded our datahub from 0.8.41 to 0.9.2 recently. After creating new ingestion pipeline the corresponding pipeline and status is not showing up in the UI. If we try to run the already successful Ingestion pipeline they are going to pending state. I have attached the screenshot of it. Actions pod is healthy and we have hosted datahub on eks cluster in AWS.
  • b

    bland-orange-13353

    11/17/2022, 1:25 PM
    Acryl Data delivers an easy to consume DataHub platform for the enterprise - sign up here: https://www.acryldata.io/sign-up
  • b

    better-fireman-33387

    11/17/2022, 1:43 PM
    Hi, is it safe to run helm upgrade (specifically to increase gms resources) while ingestion is running? will it harm the running ingestion?
    i
    • 2
    • 2
  • l

    lively-dusk-19162

    11/17/2022, 6:15 PM
    Hi, Can anyone please help me on the following issue I faced while ingesting fine grained lineage to datahub: FAILED TO VALIDATE RECORD WITH CLASS COM.linkedin.dataset.UpstreamLineageERROR:/fineGrainedLineages/495/downstreams/0 :: “Provided urn urnlischemafield:(urnlidataPlatform:teradata,orders.agent_id,PROD) is invalid
  • a

    aloof-art-29270

    11/17/2022, 8:18 PM
    Hello team, I am trying to add different aspects on the dataset and field level but couldn’t find the python SDK to add column level description and domain for the dataset. Can someone share the link or path to find the Python SDK to ingest column description and domain?
    f
    • 2
    • 1
  • b

    billowy-pilot-93812

    11/18/2022, 9:29 AM
    Hi team, I'm ingestion meta data from superset and redshift, my superset chart using redshift table but in the superset chart linage, it can't maping the 2 same redshift dataset together and in the redshift dataset linage it dont show superset chart. Is there any solution? Thank you
    d
    • 2
    • 1
  • l

    little-spring-72943

    11/18/2022, 10:47 AM
    Is there a way to have this text replaced with custom business friendly text?
  • b

    bright-receptionist-94235

    11/18/2022, 1:36 PM
    Hi, if I need to ingest fee different MySQL instances, what flag need to be used in recipe so each instance contains its own metadata?
    d
    • 2
    • 2
  • p

    polite-monitor-47621

    11/18/2022, 2:54 PM
    Hi all! first time caller, long time listener - I had a question about dataFlows (or dataJobs not sure) - We use AWS Lambda for a lot of our batch extraction to write to redshift where it is further processed. What we’d like in the lineage is to include a node for the lambda upstream from redshift (with probably the code in the description / link to github) - I was trying to use this example which give me the nodes but not sure how to make the display name nice / add icon + docs / make searchable (maybe eventually add runs as well). Not even sure if this would be the proper way to do it or there’s something super easy I am missing, any help or examples if anyone has done similar appreciated!
    m
    m
    • 3
    • 7
  • l

    lively-dusk-19162

    11/18/2022, 4:25 PM
    Hello team , I have a question about ingestion. Even ingestion is success , but no assests were ingested after success message.
    d
    m
    • 3
    • 4
  • l

    lively-dusk-19162

    11/18/2022, 4:25 PM
    For all of the platforms I am facing the same issue
  • l

    lively-dusk-19162

    11/18/2022, 4:26 PM
    Can anyone please quickly help me resolve this?
  • m

    melodic-book-17939

    11/18/2022, 9:36 PM
    Hi all, I am raulminon, new in the Slack and in Slack. I have been testing Datahub for two weeks in local environment with docker-compose. Now after recreating the containers and volumes (having all the required containers up) when I try to ingest the previous successfull recipesby using the CLI I get an 401 unauthorized message agaisnt Datahub GMS. Any cue? Thanks in advance. The messages from three distinct datasources are in this line "{'error': 'Unable to emit metadata to DataHub GMS', 'info': {'message': '401 Client Error: Unauthorized for url: http://localhost:8080/aspects?action=ingestProposal', 'id': 'urnlidataset:(urnlidataPlatform:postgres,illu_local.public.storage,PROD)'}},"
    thank you 1
    • 1
    • 1
1...848586...144Latest