witty-butcher-82399
04/21/2022, 9:59 AM0.8.33
and we have found this exception quite recurrent in different connectors:
[2022-04-21 09:40:45,039] ERROR {datahub.ingestion.run.pipeline:210} - Failed to extract some records due to: 'NoneType' object has no attribute 'group'
Any idea what it could be?dazzling-judge-80093
04/21/2022, 10:00 AMwitty-butcher-82399
04/21/2022, 10:03 AMwitty-butcher-82399
04/21/2022, 10:05 AMwitty-butcher-82399
04/21/2022, 10:20 AM--debug
flag
[2022-04-21 10:15:46,579] DEBUG {datahub.ingestion.source.sql.snowflake:481} - Upstream lineage of 'avalanche_dev.dwh_bridge.b_xiti_traffic': ['urn:li:dataset:(urn:li:dataPlatform:snowflake,avalanche_dev.dwh_core_green.f_xiti_daily_by_level2,DEV)', 'urn:li:dataset:(urn:li:dataPlatform:snowflake,avalanche_dev.dwh_core_green.f_xiti_daily_by_site,DEV)', 'urn:li:dataset:(urn:li:dataPlatform:snowflake,avalanche_dev.dwh_core_green.f_xiti_monthly_by_level2,DEV)', 'urn:li:dataset:(urn:li:dataPlatform:snowflake,avalanche_dev.dwh_core_green.f_xiti_monthly_by_site,DEV)', 'urn:li:dataset:(urn:li:dataPlatform:snowflake,avalanche_dev.dwh_core_green.f_xiti_weekly_by_level2,DEV)', 'urn:li:dataset:(urn:li:dataPlatform:snowflake,avalanche_dev.dwh_core_green.f_xiti_weekly_by_site,DEV)', 'urn:li:dataset:(urn:li:dataPlatform:snowflake,avalanche_dev.dwh_parameter.lu_vertical,DEV)', 'urn:li:dataset:(urn:li:dataPlatform:snowflake,avalanche_dev.dwh_parameter.p_xiti_daily_by_level2_corrections,DEV)', 'urn:li:dataset:(urn:li:dataPlatform:snowflake,avalanche_dev.dwh_parameter.p_xiti_daily_by_site_corrections,DEV)', 'urn:li:dataset:(urn:li:dataPlatform:snowflake,avalanche_dev.dwh_parameter.p_xiti_monthly_by_level2_corrections,DEV)', 'urn:li:dataset:(urn:li:dataPlatform:snowflake,avalanche_dev.dwh_parameter.p_xiti_monthly_by_site_corrections,DEV)', 'urn:li:dataset:(urn:li:dataPlatform:snowflake,avalanche_dev.dwh_parameter.p_xiti_weekly_by_level2_corrections,DEV)', 'urn:li:dataset:(urn:li:dataPlatform:snowflake,avalanche_dev.dwh_parameter.p_xiti_weekly_by_site_corrections,DEV)']
[2022-04-21 10:15:46,626] INFO {datahub.ingestion.run.pipeline:84} - sink wrote workunit snowflake-urn:li:dataset:(urn:li:dataPlatform:snowflake,avalanche_dev.dwh_bridge.b_xiti_traffic,DEV)-upstreamLineage
[2022-04-21 10:15:46,676] INFO {datahub.ingestion.run.pipeline:84} - sink wrote workunit avalanche_dev.dwh_bridge.b_xiti_traffic
[2022-04-21 10:15:46,722] INFO {datahub.ingestion.run.pipeline:84} - sink wrote workunit avalanche_dev.dwh_bridge.b_xiti_traffic-subtypes
2022-04-21 10:15:46,722 INFO sqlalchemy.engine.base.Engine SHOW /* sqlalchemy:get_view_names */ VIEWS IN dwh_bridge
[2022-04-21 10:15:46,722] INFO {sqlalchemy.engine.base.Engine:110} - SHOW /* sqlalchemy:get_view_names */ VIEWS IN dwh_bridge
2022-04-21 10:15:46,722 INFO sqlalchemy.engine.base.Engine {}
[2022-04-21 10:15:46,722] INFO {sqlalchemy.engine.base.Engine:110} - {}
[2022-04-21 10:15:46,857] ERROR {datahub.ingestion.run.pipeline:210} - Failed to extract some records due to: 'NoneType' object has no attribute 'group'
[2022-04-21 10:15:46,858] ERROR {datahub.ingestion.run.pipeline:210} - Failed to extract some records due to: 'NoneType' object has no attribute 'group'
[2022-04-21 10:15:46,858] ERROR {datahub.ingestion.run.pipeline:210} - Failed to extract some records due to: 'NoneType' object has no attribute 'group'
[2022-04-21 10:15:46,859] ERROR {datahub.ingestion.run.pipeline:210} - Failed to extract some records due to: 'NoneType' object has no attribute 'group'
2022-04-21 10:15:46,859 INFO sqlalchemy.engine.base.Engine SHOW /* sqlalchemy:get_table_names */ TABLES IN dwh_core_ad_content_red
[2022-04-21 10:15:46,859] INFO {sqlalchemy.engine.base.Engine:110} - SHOW /* sqlalchemy:get_table_names */ TABLES IN dwh_core_ad_content_red
2022-04-21 10:15:46,859 INFO sqlalchemy.engine.base.Engine {}
[2022-04-21 10:15:46,859] INFO {sqlalchemy.engine.base.Engine:110} - {}
2022-04-21 10:15:46,968 INFO sqlalchemy.engine.base.Engine SHOW /* sqlalchemy:_get_schema_primary_keys */PRIMARY KEYS IN SCHEMA avalanche_dev.dwh_core_ad_content_red
2022-04-21 10:15:46,968 INFO sqlalchemy.engine.base.Engine {}
[2022-04-21 10:15:46,968] INFO {sqlalchemy.engine.base.Engine:110} - SHOW /* sqlalchemy:_get_schema_primary_keys */PRIMARY KEYS IN SCHEMA avalanche_dev.dwh_core_ad_content_red
[2022-04-21 10:15:46,968] INFO {sqlalchemy.engine.base.Engine:110} - {}
2022-04-21 10:15:47,071 INFO sqlalchemy.engine.base.Engine
SELECT /* sqlalchemy:_get_schema_columns */
Not sure if this can tell you were it comes from 😅modern-artist-55754
04/21/2022, 12:11 PMmodern-artist-55754
04/21/2022, 12:22 PMSHOW /* sqlalchemy:get_view_names */ VIEWS IN dwh_bridge
in your snowflake using the same account you use for datahub ingestion?witty-butcher-82399
04/21/2022, 12:45 PMwitty-butcher-82399
04/21/2022, 1:51 PM>>> from sqlalchemy import create_engine
>>> engine = create_engine('<snowflake://XXXX:YYY@ZZZZ>')
>>> connect = engine.connect()
>>> results = connect.execute("SHOW /* sqlalchemy:get_view_names */ VIEWS IN dwh_bridge").fetchone()
>>> print(results)
None
and got no error when running the show views, so there is no permission issuewitty-butcher-82399
04/21/2022, 1:54 PMFailed to extract some records due to: 'NoneType' object has no attribute 'group'
This error looks to me like trying to run the group
method from the matches in a regular expression (the NoneType
suggests there was no match)witty-butcher-82399
04/21/2022, 2:02 PMdatahub@demo-ingestion-snowflake-willhaben-manual-wgn-r2dwn:/$ /usr/local/bin/python -m pdb /usr/local/bin/datahub ingest -c /etc/recipe/recipe.yaml
> /usr/local/bin/datahub(3)<module>()
-> import re
(Pdb) b /datahub-ingestion/src/datahub/ingestion/run/pipeline.py:210
Breakpoint 1 at /datahub-ingestion/src/datahub/ingestion/run/pipeline.py:210
(Pdb) b /datahub-ingestion/build/lib/datahub/ingestion/run/pipeline.py:210
Breakpoint 2 at /datahub-ingestion/build/lib/datahub/ingestion/run/pipeline.py:210
(Pdb) c
[2022-04-21 13:58:15,270] INFO {datahub.cli.ingest_cli:96} - DataHub CLI version: 0.8.33.post1.dev0+b84ccb6
[2022-04-21 13:58:20,916] INFO {datahub.ingestion.source_config.sql.snowflake:107} - using authenticator type 'DEFAULT_AUTHENTICATOR'
/usr/local/lib/python3.8/site-packages/datahub/ingestion/transformer/add_dataset_browse_path.py:33: DeprecationWarning: Call to deprecated class DatasetTransformer. (Legacy transformer that supports transforming MCE-s using transform_one method. Use BaseTransformer directly and implement the transform_aspect method)
return cls(config, ctx)
/usr/local/lib/python3.8/site-packages/datahub/ingestion/transformer/add_dataset_ownership.py:174: DeprecationWarning: Call to deprecated class DatasetTransformer. (Legacy transformer that supports transforming MCE-s using transform_one method. Use BaseTransformer directly and implement the transform_aspect method)
return cls(config, ctx)
[2022-04-21 13:58:21,057] INFO {datahub.cli.ingest_cli:112} - Starting metadata ingestion
[2022-04-21 13:58:21,062] INFO {datahub.ingestion.source.sql.snowflake:89} - Checking current version
[2022-04-21 13:58:23,923] INFO {datahub.ingestion.source.sql.snowflake:106} - Current role is META_DATA_READER
[2022-04-21 13:58:23,923] INFO {datahub.ingestion.source.sql.snowflake:110} - Checking grants for role META_DATA_READER
[2022-04-21 13:58:33,597] ERROR {datahub.ingestion.run.pipeline:210} - Failed to extract some records due to: 'NoneType' object has no attribute 'group'
[2022-04-21 13:58:33,598] ERROR {datahub.ingestion.run.pipeline:210} - Failed to extract some records due to: 'NoneType' object has no attribute 'group'
[2022-04-21 13:58:33,599] ERROR {datahub.ingestion.run.pipeline:210} - Failed to extract some records due to: 'NoneType' object has no attribute 'group'
[2022-04-21 13:58:33,681] ERROR {datahub.ingestion.run.pipeline:210} - Failed to extract some records due to: 'NoneType' object has no attribute 'group'
[2022-04-21 13:58:33,683] ERROR {datahub.ingestion.run.pipeline:210} - Failed to extract some records due to: 'NoneType' object has no attribute 'group'
[2022-04-21 13:58:33,684] ERROR {datahub.ingestion.run.pipeline:210} - Failed to extract some records due to: 'NoneType' object has no attribute 'group'
[2022-04-21 13:58:33,685] ERROR {datahub.ingestion.run.pipeline:210} - Failed to extract some records due to: 'NoneType' object has no attribute 'group'
[2022-04-21 13:58:33,875] ERROR {datahub.ingestion.run.pipeline:210} - Failed to extract some records due to: 'NoneType' object has no attribute 'group'
[2022-04-21 13:58:33,877] ERROR {datahub.ingestion.run.pipeline:210} - Failed to extract some records due to: 'NoneType' object has no attribute 'group'
[2022-04-21 13:58:33,879] ERROR {datahub.ingestion.run.pipeline:210} - Failed to extract some records due to: 'NoneType' object has no attribute 'group'
[2022-04-21 13:58:33,881] ERROR {datahub.ingestion.run.pipeline:210} - Failed to extract some records due to: 'NoneType' object has no attribute 'group'
[2022-04-21 13:58:36,485] INFO {datahub.ingestion.run.pipeline:84} - sink wrote workunit container-urn:li:container:fdb05ecbd6619a97ff103dafe85caf0b-to-urn:li:dataset:(urn:li:dataPlatform:snowflake,avalanche_dev.dwh_bridge.b_ad_active_ads,DEV)
[2022-04-21 13:58:44,387] INFO {datahub.ingestion.source.sql.snowflake:406} - A total of 12395 Table->Table edges found for 4102 downstream tables.
^C
witty-butcher-82399
04/21/2022, 2:21 PMpython -m trace
It seems the problem is a custom transform that we apply in most of our recipes, for some reason it is failing with the new version CC: @quick-pizza-8906
add_custom_dataplatform.py(77): full_name = result.group(2)
Sharing here the trick in case someone is in a similar situation. Thanks @modern-artist-55754 @dazzling-judge-80093 for the support and sorry for the false alarm.
datahub@demo-ingestion-snowflake-willhaben-manual-77v-fxsj6:/$ python -m trace -t /usr/local/bin/datahub ingest -c /etc/recipe/recipe.yaml | grep -C 10 "Failed to extract some records due to:"
[2022-04-21 14:13:54,824] INFO {datahub.cli.ingest_cli:96} - DataHub CLI version: 0.8.33.post1.dev0+b84ccb6
[2022-04-21 14:14:31,074] INFO {datahub.ingestion.source_config.sql.snowflake:107} - using authenticator type 'DEFAULT_AUTHENTICATOR'
/usr/local/lib/python3.8/site-packages/datahub/ingestion/transformer/add_dataset_browse_path.py:33: DeprecationWarning: Call to deprecated class DatasetTransformer. (Legacy transformer that supports transforming MCE-s using transform_one method. Use BaseTransformer directly and implement the transform_aspect method)
return cls(config, ctx)
/usr/local/lib/python3.8/site-packages/datahub/ingestion/transformer/add_dataset_ownership.py:174: DeprecationWarning: Call to deprecated class DatasetTransformer. (Legacy transformer that supports transforming MCE-s using transform_one method. Use BaseTransformer directly and implement the transform_aspect method)
return cls(config, ctx)
[2022-04-21 14:14:31,321] INFO {datahub.cli.ingest_cli:112} - Starting metadata ingestion
[2022-04-21 14:14:31,375] INFO {datahub.ingestion.source.sql.snowflake:89} - Checking current version
%6|1650550495.245|FAIL|rdkafka#producer-1| [thrd:sasl_<ssl://kafka-rapidpaper-internal.storage.mpi-internal.com:9>]: sasl_<ssl://kafka-rapidpaper-internal.storage.mpi-internal.com:9094/bootstrap>: Disconnected (after 59623ms in state UP)
%6|1650550495.246|FAIL|rdkafka#producer-2| [thrd:sasl_<ssl://kafka-rapidpaper-internal.storage.mpi-internal.com:9>]: sasl_<ssl://kafka-rapidpaper-internal.storage.mpi-internal.com:9094/bootstrap>: Disconnected (after 59619ms in state UP)
[2022-04-21 14:14:55,966] INFO {datahub.ingestion.source.sql.snowflake:106} - Current role is META_DATA_READER
[2022-04-21 14:14:55,967] INFO {datahub.ingestion.source.sql.snowflake:110} - Checking grants for role META_DATA_READER
enum.py(635): if type(value) is cls:
enum.py(640): try:
enum.py(641): return cls._value2member_map_[value]
re.py(306): if len(_cache) >= _MAXCACHE:
re.py(308): try:
re.py(309): del _cache[next(iter(_cache))]
re.py(312): _cache[type(pattern), pattern, flags] = p
re.py(313): return p
add_custom_dataplatform.py(77): full_name = result.group(2)
pipeline.py(209): except Exception as e:
pipeline.py(210): logger.error(f"Failed to extract some records due to: {e}")
--- modulename: __init__, funcname: error
__init__.py(1474): if self.isEnabledFor(ERROR):
--- modulename: __init__, funcname: isEnabledFor
__init__.py(1693): if self.disabled:
__init__.py(1696): try:
__init__.py(1697): return self._cache[level]
__init__.py(1698): except KeyError:
__init__.py(1699): _acquireLock()
--- modulename: __init__, funcname: _acquireLock
__init__.py(224): if _lock:
[2022-04-21 14:15:24,773] ERROR {datahub.ingestion.run.pipeline:210} - Failed to extract some records due to: 'NoneType' object has no attribute 'group'
--
--- modulename: add_custom_dataplatform, funcname: extract_dataset_from_urn
add_custom_dataplatform.py(76): result = re.search(r"(^urn:li:dataset:)\(([^)]+)\)", urn)
--- modulename: re, funcname: search
re.py(201): return _compile(pattern, flags).search(string)
--- modulename: re, funcname: _compile
re.py(291): if isinstance(flags, RegexFlag):
re.py(293): try:
re.py(294): return _cache[type(pattern), pattern, flags]
add_custom_dataplatform.py(77): full_name = result.group(2)
pipeline.py(209): except Exception as e:
pipeline.py(210): logger.error(f"Failed to extract some records due to: {e}")
--- modulename: __init__, funcname: error
__init__.py(1474): if self.isEnabledFor(ERROR):
--- modulename: __init__, funcname: isEnabledFor
__init__.py(1693): if self.disabled:
__init__.py(1696): try:
__init__.py(1697): return self._cache[level]
__init__.py(1475): self._log(ERROR, msg, args, **kwargs)
--- modulename: __init__, funcname: _log
__init__.py(1571): sinfo = None
__init__.py(1572): if _srcfile:
[2022-04-21 14:15:24,783] ERROR {datahub.ingestion.run.pipeline:210} - Failed to extract some records due to: 'NoneType' object has no attribute 'group'
big-carpet-38439
04/21/2022, 4:20 PMsquare-activity-64562
04/21/2022, 4:36 PM``` --- modulename: add_custom_dataplatform, funcname: extract_dataset_from_urn
add_custom_dataplatform.py(76): result = re.search(r"(^urnlidataset:)\(([^)]+)\)", urn)```I will try to see if I can add some unit tests to reliably reproduce it and change the regex
square-activity-64562
04/21/2022, 4:55 PMsquare-activity-64562
04/21/2022, 4:59 PM