Hi, i am trying to use druid ingestion library. Bu...
# ingestion
s
Hi, i am trying to use druid ingestion library. But getting this error. (Python Version 3.8 / Datahub Version 0.8.6)
Copy code
File "/home/jovyan/conda-envs/catalog/lib/python3.8/site-packages/datahub/ingestion/source/sql_common.py", line 62, in make_sqlalchemy_uri
    40   def make_sqlalchemy_uri(
    41       scheme: str,
    42       username: Optional[str],
    43       password: Optional[str],
    44       at: Optional[str],
    45       db: Optional[str],
    46       uri_opts: Optional[Dict[str, Any]] = None,
    47   ) -> str:
 (...)
    58       if uri_opts is not None:
    59           if db is None:
    60               url += "/"
    61           params = "&".join(
--> 62               f"{key}={quote_plus(value)}" for (key, value) in uri_opts.items() if value
    63           )

AttributeError: 'DruidConfig' object has no attribute 'items'
b
seems like the recipe you're using is missing a value somewhere? though im not sure what is the proper recipe
s
Copy code
source:
  type: druid
  config:
    env: PROD
    host_port: druid-broker-endpoint:8082
    schema_pattern:
      deny:
        - "^(lookup|sys).*"

sink:
  type: "datahub-rest"
  config:
    server: "<http://datahub-datahub-gms.catalog-production.svc.cluster.local:8080>"
Hi, Thanks for the answer @better-orange-49102 I just checked again the recipe I used, but looks same w/ what in documentation
AttributeError: 'DruidConfig' object has no attribute 'items'
For me, It seems language or class level (DruidConfig) problem. Since the error shows failed to find 'items' function in 'DruidConfig' object.
g
@salmon-cricket-21860 I’ve found the cause of the issue here and will put up a PR to fix it shortly
❤️ 1
s
Thanks! :)
g
@salmon-cricket-21860 this PR https://github.com/linkedin/datahub/pull/2882 has been merged and is included in the 0.8.6.1 release of acryl-datahub
🙌 1
s
Thanks for the quick patch!! 🙂
Hi, I tried 0.8.6.1 and was able to fetch datasources on the druid cluster, but sqlalchemy is throwing an error like
Copy code
...

[2021-07-15 23:29:26,537] ERROR    {datahub.entrypoints:106} - File "/home/jovyan/.local/lib/python3.8/site-packages/sqlalchemy/engine/result.py", line 1215, in _fetchone_impl
    1213  def _fetchone_impl(self):
    1214      try:
--> 1215          return self.cursor.fetchone()
    1216      except AttributeError as err:
    ..................................................
     self = <sqlalchemy.engine.result.ResultProxy object at 0x7fc7b0f612e0>
     self.cursor.fetchone = # AttributeError
          self.cursor = None
    ..................................................

AttributeError: 'NoneType' object has no attribute 'fetchone'

---- (full traceback above) ----
File "/home/jovyan/conda-envs/catalog/lib/python3.8/site-packages/datahub/entrypoints.py", line 98, in main
    sys.exit(datahub(standalone_mode=False, **kwargs))
File "/home/jovyan/.local/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
File "/home/jovyan/.local/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
File "/home/jovyan/.local/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/jovyan/.local/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
File "/home/jovyan/.local/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
File "/home/jovyan/conda-envs/catalog/lib/python3.8/site-packages/datahub/entrypoints.py", line 85, in ingest
    pipeline.run()
File "/home/jovyan/conda-envs/catalog/lib/python3.8/site-packages/datahub/ingestion/run/pipeline.py", line 108, in run
    for wu in self.source.get_workunits():
File "/home/jovyan/conda-envs/catalog/lib/python3.8/site-packages/datahub/ingestion/source/sql_common.py", line 280, in get_workunits
    yield from self.loop_tables(inspector, schema, sql_config)
File "/home/jovyan/conda-envs/catalog/lib/python3.8/site-packages/datahub/ingestion/source/sql_common.py", line 300, in loop_tables
    columns = inspector.get_columns(table, schema)
File "/home/jovyan/.local/lib/python3.8/site-packages/sqlalchemy/engine/reflection.py", line 390, in get_columns
    col_defs = self.dialect.get_columns(
File "/home/jovyan/conda-envs/catalog/lib/python3.8/site-packages/pydruid/db/sqlalchemy.py", line 178, in get_columns
    return [
File "/home/jovyan/conda-envs/catalog/lib/python3.8/site-packages/pydruid/db/sqlalchemy.py", line 178, in <listcomp>
    return [
File "/home/jovyan/.local/lib/python3.8/site-packages/sqlalchemy/engine/result.py", line 1010, in __iter__
    row = self.fetchone()
File "/home/jovyan/.local/lib/python3.8/site-packages/sqlalchemy/engine/result.py", line 1343, in fetchone
    self.connection._handle_dbapi_exception(
File "/home/jovyan/.local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1514, in _handle_dbapi_exception
    util.raise_(exc_info[1], with_traceback=exc_info[2])
File "/home/jovyan/.local/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
    raise exception
File "/home/jovyan/.local/lib/python3.8/site-packages/sqlalchemy/engine/result.py", line 1336, in fetchone
    row = self._fetchone_impl()
File "/home/jovyan/.local/lib/python3.8/site-packages/sqlalchemy/engine/result.py", line 1217, in _fetchone_impl
    return self._non_result(None, err)
File "/home/jovyan/.local/lib/python3.8/site-packages/sqlalchemy/engine/result.py", line 1236, in _non_result
    util.raise_(
File "/home/jovyan/.local/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
    raise exception

ResourceClosedError: This result object does not return rows. It has been closed automatically.
Druid Cluster version is 0.20.0. Please let me know if you need more information.
g
Could you run with
datahub --debug ingest ...
s
Hi, today tested again w/ debug option. There Here are more logs such as
Copy code
[2021-07-16 09:13:28,664] INFO     {datahub.ingestion.run.pipeline:44} - sink wrote workunit prod_web_expr_event_stream_r0
2021-07-16 09:13:28,664 INFO sqlalchemy.engine.base.Engine 
            SELECT COLUMN_NAME,
                   DATA_TYPE,
                   IS_NULLABLE,
                   COLUMN_DEFAULT
              FROM INFORMATION_SCHEMA.COLUMNS
             WHERE TABLE_NAME = 'test'
         AND TABLE_SCHEMA = 'druid'
[2021-07-16 09:13:28,664] INFO     {sqlalchemy.engine.base.Engine:110} - 
            SELECT COLUMN_NAME,
                   DATA_TYPE,
                   IS_NULLABLE,
                   COLUMN_DEFAULT
              FROM INFORMATION_SCHEMA.COLUMNS
             WHERE TABLE_NAME = 'test'
         AND TABLE_SCHEMA = 'druid'
2021-07-16 09:13:28,664 INFO sqlalchemy.engine.base.Engine {}
[2021-07-16 09:13:28,664] INFO     {sqlalchemy.engine.base.Engine:110} - {}
Seems failing at this part. 'test' table. Our druid cluster doesn't have any 'test' datasource. • Maybe it's registered on druid RDS and removed previously, Anyway after modifying recipe to deny table pattern 'test', I was able to ingest druid tables 🙂 Thanks Harshal Sheth!
Source (druid) report:
{'failures': {},
'filtered': ['test', 'test2', 'lookup.*', 'sys.*'],
'tables_scanned': 13,
'views_scanned': 0,
'warnings': {},
'workunit_ids': ['prod_app_event_cancel_stream_r0',
'prod_app_event_order_stream_r0',
'prod_app_event_view_stream_r0',
'prod_app_experiment_event_stream_r0',
'prod_server_event_view_stream_r0',
'prod_server_inventory_payload_stream_r0',
'prod_web_event_cancel_stream_r0',
'prod_web_event_order_stream_r0',
'prod_web_event_view_stream_r0',
'prod_web_experiment_event_stream_r0',
'prod_web_expr_event_stream_r0'],
'workunits_produced': 11}
Sink (console) report:
{'failures': [], 'records_written': 11, 'warnings': []}
Pipeline finished successfully
One suggestion. If a searched datasource does not return any columns, throwing a different error would be better in this case like
test table doesn't have column information
g
@salmon-cricket-21860 the druid test table issue is quite odd - not sure where it got that from
I’ll make it issue a warning/error message if it finds a table with no columns - thanks for the feedback!
❤️ 1
🙂 1
@salmon-cricket-21860 added this PR https://github.com/linkedin/datahub/pull/2912 - I kept it a warning since it’s possible to have a valid table with no columns
🙌 1
s
Okay, Thanks for the PR.
m
Hi @salmon-cricket-21860 Good Morning I am facing the same issue as you explained above with error
Tables error: This result object does not return rows
In my case the Druid SQL API or Datahub is only able to query system tables. I mean the below query doesn't return any tables from
druid
schema
SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = 'druid'
But these queries work fine:
SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA =INFORMATION_SCHEMA
SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA =sys
are working fine and returning results. Did you have to do any additional configuration or setup on druid for datahub to be able to query
druid
SCHEMA tables?