https://datahubproject.io logo
#ingestion
Title
# ingestion
b

better-insurance-34701

09/29/2022, 4:19 AM
Hi, I'm on 0.8.45 & encoutered this error when ingest dbt source, please give some help
datahub --debug ingest -c /git/dwh_dev/datahub.yml
[2022-09-29 11:13:51,202] DEBUG    {datahub.telemetry.telemetry:210} - Sending init Telemetry
[2022-09-29 11:13:52,261] DEBUG    {datahub.telemetry.telemetry:243} - Sending Telemetry
[2022-09-29 11:13:52,726] INFO     {datahub.cli.ingest_cli:182} - DataHub CLI version: 0.8.45
[2022-09-29 11:13:52,746] DEBUG    {datahub.cli.ingest_cli:196} - Using config: {'source': {'type': 'dbt', 'config': {'manifest_path': '/git/dwh_dev/target/manifest.json', 'catalog_path': '/git/dwh_dev/target/catalog.json', 'test_results_path': '/git/dwh_dev/target/run_results.json', 'target_platform': 'postgres', 'load_schemas': False, 'meta_mapping': {'business_owner': {'match': '.*', 'operation': 'add_owner', 'config': {'owner_type': 'user', 'owner_category': 'BUSINESS_OWNER'}}, 'data_steward': {'match': '.*', 'operation': 'add_owner', 'config': {'owner_type': 'user', 'owner_category': 'DATA_STEWARD'}}, 'technical_owner': {'match': '.*', 'operation': 'add_owner', 'config': {'owner_type': 'user', 'owner_category': 'TECHNICAL_OWNER'}}, 'has_pii': {'match': True, 'operation': 'add_tag', 'config': {'tag': 'has_pii'}}, 'data_governance.team_owner': {'match': 'Finance', 'operation': 'add_term', 'config': {'term': 'Finance_test'}}, 'source': {'match': '.*', 'operation': 'add_tag', 'config': {'tag': '{{ $match }}'}}}, 'query_tag_mapping': {'tag': {'match': '.*', 'operation': 'add_tag', 'config': {'tag': '{{ $match }}'}}}}}}
[2022-09-29 11:13:52,814] DEBUG    {datahub.ingestion.sink.datahub_rest:116} - Setting env variables to override config
[2022-09-29 11:13:52,814] DEBUG    {datahub.ingestion.sink.datahub_rest:118} - Setting gms config
[2022-09-29 11:13:52,814] DEBUG    {datahub.ingestion.run.pipeline:174} - Sink type:datahub-rest,<class 'datahub.ingestion.sink.datahub_rest.DatahubRestSink'> configured
[2022-09-29 11:13:52,814] INFO     {datahub.ingestion.run.pipeline:175} - Sink configured successfully. DataHubRestEmitter: configured to talk to <http://localhost:8080>
[2022-09-29 11:13:52,818] DEBUG    {datahub.ingestion.sink.datahub_rest:116} - Setting env variables to override config
[2022-09-29 11:13:52,818] DEBUG    {datahub.ingestion.sink.datahub_rest:118} - Setting gms config
[2022-09-29 11:13:52,818] DEBUG    {datahub.ingestion.reporting.datahub_ingestion_run_summary_provider:120} - Ingestion source urn = urn:li:dataHubIngestionSource:cli-151c2b7711eb626e440af8c75a9082e9
[2022-09-29 11:13:52,819] DEBUG    {datahub.emitter.rest_emitter:247} - Attempting to emit to DataHub GMS; using curl equivalent to:
curl -X POST -H 'User-Agent: python-requests/2.28.1' -H 'Accept-Encoding: gzip, deflate' -H 'Accept: */*' -H 'Connection: keep-alive' -H 'X-RestLi-Protocol-Version: 2.0.0' -H 'Content-Type: application/json' --data '{"proposal": {"entityType": "dataHubIngestionSource", "entityUrn": "urn:li:dataHubIngestionSource:cli-151c2b7711eb626e440af8c75a9082e9", "changeType": "UPSERT", "aspectName": "dataHubIngestionSourceInfo", "aspect": {"value": "{\"name\": \"[CLI] dbt\", \"type\": \"dbt\", \"platform\": \"urn:li:dataPlatform:unknown\", \"config\": {\"recipe\": \"{\\\"source\\\": {\\\"type\\\": \\\"dbt\\\", \\\"config\\\": {\\\"manifest_path\\\": \\\"${DBT_PROJECT_ROOT}/target/manifest.json\\\", \\\"catalog_path\\\": \\\"${DBT_PROJECT_ROOT}/target/catalog.json\\\", \\\"test_results_path\\\": \\\"${DBT_PROJECT_ROOT}/target/run_results.json\\\", \\\"target_platform\\\": \\\"postgres\\\", \\\"load_schemas\\\": false, \\\"meta_mapping\\\": {\\\"business_owner\\\": {\\\"match\\\": \\\".*\\\", \\\"operation\\\": \\\"add_owner\\\", \\\"config\\\": {\\\"owner_type\\\": \\\"user\\\", \\\"owner_category\\\": \\\"BUSINESS_OWNER\\\"}}, \\\"data_steward\\\": {\\\"match\\\": \\\".*\\\", \\\"operation\\\": \\\"add_owner\\\", \\\"config\\\": {\\\"owner_type\\\": \\\"user\\\", \\\"owner_category\\\": \\\"DATA_STEWARD\\\"}}, \\\"technical_owner\\\": {\\\"match\\\": \\\".*\\\", \\\"operation\\\": \\\"add_owner\\\", \\\"config\\\": {\\\"owner_type\\\": \\\"user\\\", \\\"owner_category\\\": \\\"TECHNICAL_OWNER\\\"}}, \\\"has_pii\\\": {\\\"match\\\": true, \\\"operation\\\": \\\"add_tag\\\", \\\"config\\\": {\\\"tag\\\": \\\"has_pii\\\"}}, \\\"data_governance.team_owner\\\": {\\\"match\\\": \\\"Finance\\\", \\\"operation\\\": \\\"add_term\\\", \\\"config\\\": {\\\"term\\\": \\\"Finance_test\\\"}}, \\\"source\\\": {\\\"match\\\": \\\".*\\\", \\\"operation\\\": \\\"add_tag\\\", \\\"config\\\": {\\\"tag\\\": \\\"{{ $match }}\\\"}}}, \\\"query_tag_mapping\\\": {\\\"tag\\\": {\\\"match\\\": \\\".*\\\", \\\"operation\\\": \\\"add_tag\\\", \\\"config\\\": {\\\"tag\\\": \\\"{{ $match }}\\\"}}}}}}\", \"version\": \"0.8.45\", \"executorId\": \"__datahub_cli_\"}}", "contentType": "application/json"}}}' '<http://localhost:8080/aspects?action=ingestProposal>'
[2022-09-29 11:13:52,849] DEBUG    {datahub.ingestion.run.pipeline:269} - Reporter type:datahub,<class 'datahub.ingestion.reporting.datahub_ingestion_run_summary_provider.DatahubIngestionRunSummaryProvider'> configured.
[2022-09-29 11:13:52,982] DEBUG    {datahub.telemetry.telemetry:243} - Sending Telemetry
[2022-09-29 11:13:53,555] DEBUG    {datahub.entrypoints:168} - File "/home/thinh/datahub_venv/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 196, in __init__
131  def __init__(
132      self,
133      config: PipelineConfig,
134      dry_run: bool = False,
135      preview_mode: bool = False,
136      preview_workunits: int = 10,
137      report_to: Optional[str] = None,
138      no_default_report: bool = False,
139  ):
(...)
192          self._record_initialization_failure(e, "Failed to create source")
193          return
194
195      try:
--> 196          self.source: Source = source_class.create(
197              self.config.source.dict().get("config", {}), self.ctx
File "/home/thinh/datahub_venv/lib/python3.10/site-packages/datahub/ingestion/source/dbt.py", line 1001, in create
999  @classmethod
1000  def create(cls, config_dict, ctx):
--> 1001      config = DBTConfig.parse_obj(config_dict)
1002      return cls(config, ctx, "dbt")
File "pydantic/main.py", line 526, in pydantic.main.BaseModel.parse_obj
File "pydantic/main.py", line 342, in pydantic.main.BaseModel.__init__
ValidationError: 1 validation error for DBTConfig
load_schemas
extra fields not permitted (type=value_error.extra)
The above exception was the direct cause of the following exception:
File "/home/thinh/datahub_venv/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 197, in run
111  def run(
112      ctx: click.Context,
113      config: str,
114      dry_run: bool,
115      preview: bool,
116      strict_warnings: bool,
117      preview_workunits: int,
118      suppress_error_logs: bool,
119      test_source_connection: bool,
120      report_to: str,
121      no_default_report: bool,
122      no_spinner: bool,
123  ) -> None:
(...)
193          _test_source_connection(report_to, pipeline_config)
194
195      try:
196          logger.debug(f"Using config: {pipeline_config}")
--> 197          pipeline = Pipeline.create(
198              pipeline_config,
File "/home/thinh/datahub_venv/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 317, in create
306  def create(
307      cls,
308      config_dict: dict,
309      dry_run: bool = False,
310      preview_mode: bool = False,
311      preview_workunits: int = 10,
312      report_to: Optional[str] = None,
313      no_default_report: bool = False,
314      raw_config: Optional[dict] = None,
315  ) -> "Pipeline":
316      config = PipelineConfig.from_dict(config_dict, raw_config)
--> 317      return cls(
318          config,
File "/home/thinh/datahub_venv/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 202, in __init__
131  def __init__(
132      self,
133      config: PipelineConfig,
134      dry_run: bool = False,
135      preview_mode: bool = False,
136      preview_workunits: int = 10,
137      report_to: Optional[str] = None,
138      no_default_report: bool = False,
139  ):
(...)
198          )
199          logger.debug(f"Source type:{source_type},{source_class} configured")
200          <http://logger.info|logger.info>("Source configured successfully.")
201      except Exception as e:
--> 202          self._record_initialization_failure(
203              e, f"Failed to configure source ({source_type})"
File "/home/thinh/datahub_venv/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 129, in _record_initialization_failure
128  def _record_initialization_failure(self, e: Exception, msg: str) -> None:
--> 129      raise PipelineInitError(msg) from e
---- (full traceback above) ----
File "/home/thinh/datahub_venv/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 197, in run
pipeline = Pipeline.create(
File "/home/thinh/datahub_venv/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 317, in create
return cls(
File "/home/thinh/datahub_venv/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 202, in __init__
self._record_initialization_failure(
File "/home/thinh/datahub_venv/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 129, in _record_initialization_failure
raise PipelineInitError(msg) from e
PipelineInitError: Failed to configure source (dbt)
[2022-09-29 11:13:53,555] DEBUG    {datahub.entrypoints:198} - DataHub CLI version: 0.8.45 at /home/thinh/datahub_venv/lib/python3.10/site-packages/datahub/__init__.py
[2022-09-29 11:13:53,556] DEBUG    {datahub.entrypoints:201} - Python version: 3.10.6 (main, Aug 10 2022, 11:40:04) [GCC 11.3.0] at /home/thinh/datahub_venv/bin/python3 on Linux-5.15.0-48-generic-x86_64-with-glibc2.35
[2022-09-29 11:13:53,556] DEBUG    {datahub.entrypoints:204} - GMS config {'models': {}, 'patchCapable': True, 'versions': {'linkedin/datahub': {'version': 'v0.8.45', 'commit': '21a8718b1093352bc1e3a566d2ce0297d2167434'}}, 'managedIngestion': {'defaultCliVersion': '0.8.42', 'enabled': True}, 'statefulIngestionCapable': True, 'supportsImpactAnalysis': True, 'telemetry': {'enabledCli': True, 'enabledIngestion': False}, 'datasetUrnNameCasing': False, 'retention': 'true', 'datahub': {'serverType': 'quickstart'}, 'noCode': 'true'}
f

famous-florist-7218

09/29/2022, 4:22 AM
Hey Thinh, please check your dbt config with the latest docs. Seems like there are some conflicts.
Copy code
ValidationError: 1 validation error for DBTConfig
load_schemas
extra fields not permitted (type=value_error.extra)
b

better-insurance-34701

09/29/2022, 4:44 AM
dbt run/test runs fine with my models
it was imported ok with version 0.8.40
I don't know how to fix in this version
f

famous-florist-7218

09/29/2022, 5:15 AM
Could you please post your dbt config in here? Just hide the sensitive info, so I will take a look on it.
Copy code
The dbt ingestion source's disable_dbt_node_creation and load_schema options have been removed. They were no longer necessary due to the recently added sibling entities functionality.
b

better-insurance-34701

09/29/2022, 6:35 AM
That solved my issue. Appreciate all your help. Thanks a lot!
l

little-megabyte-1074

09/29/2022, 10:34 PM
Hi @better-insurance-34701! Gentle reminder to please follow our Slack Guidelines & post large blocks of code/stacktrace in message threads; it’s a HUGE help for us to keep track of which questions still need attention across our various support channels teamwork
b

bland-orange-13353

09/29/2022, 10:34 PM
You can view our Slack Gluidelines here: https://datahubproject.io/docs/slack/
11 Views