Hello, we have setup DataHub in AWS as a POC so we...
# getting-started
a
Hello, we have setup DataHub in AWS as a POC so we can evaluate it using our own metadata. Other than some permissions issues that I am working on with my SRE team for other sources, I was able to get an MSSQL Ingestion source setup to connect to one of our databases. It ran successfully a few times. I added the configuration to deny some schemas from being included which ran successfully the first time but as I added more schemas it eventually failed and now even with that configuration removed it is failing consistently. This is done via the UI. DataHub Version: v0.10.2 Configuration yaml:
Copy code
source:
    type: mssql
    config:
        host_port: 'some.database:1433'
        env: DEV
        database: mydatabase
        include_views: true
        include_tables: true
        profiling:
            enabled: false
        stateful_ingestion:
            enabled: true
        username: DataHub_App
        password: '${PASSWORD_DEV}'
Error Log:
Copy code
[2023-04-25 14:58:07,151] INFO     {datahub.cli.ingest_cli:137} - Sink (datahub-rest) report:
{'total_records_written': 1153,
 'records_written_per_second': 98,
 'warnings': [],
 'failures': [],
 'start_time': '2023-04-25 14:57:55.502886 (11.65 seconds ago)',
 'current_time': '2023-04-25 14:58:07.151328 (now)',
 'total_duration_in_seconds': 11.65,
 'gms_version': 'v0.10.2',
 'pending_requests': 0}
[2023-04-25 14:58:07,339] ERROR    {datahub.entrypoints:195} - Command failed: 
Traceback (most recent call last):
  File "/tmp/datahub/ingest/venv-mssql-0.10.2/lib/python3.10/site-packages/datahub/entrypoints.py", line 182, in main
    sys.exit(datahub(standalone_mode=False, **kwargs))
  File "/tmp/datahub/ingest/venv-mssql-0.10.2/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/tmp/datahub/ingest/venv-mssql-0.10.2/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/tmp/datahub/ingest/venv-mssql-0.10.2/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/tmp/datahub/ingest/venv-mssql-0.10.2/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/tmp/datahub/ingest/venv-mssql-0.10.2/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/tmp/datahub/ingest/venv-mssql-0.10.2/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/tmp/datahub/ingest/venv-mssql-0.10.2/lib/python3.10/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/tmp/datahub/ingest/venv-mssql-0.10.2/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 379, in wrapper
    raise e
  File "/tmp/datahub/ingest/venv-mssql-0.10.2/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 334, in wrapper
    res = func(*args, **kwargs)
  File "/tmp/datahub/ingest/venv-mssql-0.10.2/lib/python3.10/site-packages/datahub/utilities/memory_leak_detector.py", line 95, in wrapper
    return func(ctx, *args, **kwargs)
  File "/tmp/datahub/ingest/venv-mssql-0.10.2/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 198, in run
    loop.run_until_complete(run_func_check_upgrade(pipeline))
  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/tmp/datahub/ingest/venv-mssql-0.10.2/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 158, in run_func_check_upgrade
    ret = await the_one_future
  File "/tmp/datahub/ingest/venv-mssql-0.10.2/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 149, in run_pipeline_async
    return await loop.run_in_executor(
  File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/tmp/datahub/ingest/venv-mssql-0.10.2/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 140, in run_pipeline_to_completion
    raise e
  File "/tmp/datahub/ingest/venv-mssql-0.10.2/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 132, in run_pipeline_to_completion
    pipeline.run()
  File "/tmp/datahub/ingest/venv-mssql-0.10.2/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 359, in run
    for wu in itertools.islice(
  File "/tmp/datahub/ingest/venv-mssql-0.10.2/lib/python3.10/site-packages/datahub/utilities/source_helpers.py", line 104, in auto_stale_entity_removal
    yield from stale_entity_removal_handler.gen_removed_entity_workunits()
  File "/tmp/datahub/ingest/venv-mssql-0.10.2/lib/python3.10/site-packages/datahub/ingestion/source/state/stale_entity_removal_handler.py", line 267, in gen_removed_entity_workunits
    last_checkpoint: Optional[Checkpoint] = self.source.get_last_checkpoint(
  File "/tmp/datahub/ingest/venv-mssql-0.10.2/lib/python3.10/site-packages/datahub/ingestion/source/state/stateful_ingestion_base.py", line 320, in get_last_checkpoint
    self.last_checkpoints[job_id] = self._get_last_checkpoint(
  File "/tmp/datahub/ingest/venv-mssql-0.10.2/lib/python3.10/site-packages/datahub/ingestion/source/state/stateful_ingestion_base.py", line 295, in _get_last_checkpoint
    self.ingestion_checkpointing_state_provider.get_latest_checkpoint(
  File "/tmp/datahub/ingest/venv-mssql-0.10.2/lib/python3.10/site-packages/datahub/ingestion/source/state_provider/datahub_ingestion_checkpointing_provider.py", line 76, in get_latest_checkpoint
    ] = self.graph.get_latest_timeseries_value(
  File "/tmp/datahub/ingest/venv-mssql-0.10.2/lib/python3.10/site-packages/datahub/ingestion/graph/client.py", line 299, in get_latest_timeseries_value
    assert len(values) == 1
AssertionError
Thank you for the help.
πŸ“– 1
πŸ” 1
l
Hey there πŸ‘‹ I'm The DataHub Community Support bot. I'm here to help make sure the community can best support you with your request. Let's double check a few things first: βœ… There's a lot of good information on our docs site: www.datahubproject.io/docs, Have you searched there for a solution? βœ… button βœ… It's not uncommon that someone has run into your exact problem before in the community. Have you searched Slack for similar issues? βœ… button Did you find a solution to your issue? ❌ Sorry you weren't able to find a solution. I'm sending you some tips on info you can provide to help the community troubleshoot. Whenever you feel your issue is solved, please react βœ… to your original message to let us know!
d
Hello Bill, Thank you for reporting this. We're addressing this issue and fixing it in the next patch. In the meantime, we would appreciate it if you could open an issue ticket on this! πŸ™‚
h
This is fixed in ingestion cli version v0.10.2.1 and above. Can you please try using the newer cli version as mentioned here, if it solves the problem for you.
f
Hi, sorry for the interruption. But I am facing the same issue. I have changed the CLI version from 0.10.2 to 0.10.2.1. But it failed again with the same error. Thanks for your help πŸ™‚.
a
Issue submitted: #7933
Like Yair, I tried
0.10.2.1
and still get the same error.
b
@hundreds-photographer-13496 Looks like we'll need another pass
h
apologies, I mentioned incorrect version earlier. This is fixed in version 0.10.2.2 by this PR.
a
Thanks, that worked.
r
@delightful-ram-75848 Hi Hyejin, I have changed datahub cli version to 0.10.2.2 and 0.10.2.3, but I still got same error, here's my datahub environment:
Copy code
DataHub CLI version: 0.10.2.3
Python version: 3.8.15 | packaged by conda-forge | (default, Nov 22 2022, 08:49:35) 
[GCC 10.4.0]
can you tell me what can I do to figure it out
a
Have you verified that that’s the actual CLI version being used in ingestion? You should see it printed in the ingestion logs