most-airplane-91939
10/12/2021, 7:05 PMstale-jewelry-2440
10/13/2021, 8:11 AMmetadata
in datahub/metadata-ingestion/src/*datahub*/
has been deleted, but modules in there are still used in the ingestion process. As example in:
metadata-ingestion/src/datahub/ingestion/extractor/schema_util.py:from datahub.metadata.com.linkedin.pegasus2avro.schema import (
metadata-ingestion/src/datahub/ingestion/extractor/mce_extractor.py:from datahub.metadata.com.linkedin.pegasus2avro.mxe import (
metadata-ingestion/src/datahub/ingestion/extractor/mce_extractor.py:from datahub.metadata.schema_classes import UsageAggregationClass
metadata-ingestion/src/datahub/ingestion/api/workunit.py:from datahub.metadata.com.linkedin.pegasus2avro.mxe import (
metadata-ingestion/src/datahub/ingestion/api/workunit.py:from datahub.metadata.schema_classes import UsageAggregationClass
metadata-ingestion/src/datahub/emitter/kafka_emitter.py:from datahub.metadata.com.linkedin.pegasus2avro.mxe import (
metadata-ingestion/src/datahub/emitter/kafka_emitter.py:from datahub.metadata.schemas import (
Is this a partial code refactoring, or should we restore that folder?stale-printer-44316
10/13/2021, 1:43 PMgentle-father-80172
10/13/2021, 3:05 PMDataHubGraphQLError{path=[dataset, upstreamLineage, entities, 3, entity, downstreamLineage, entities, 0, entity], code=SERVER_ERROR, locations=[SourceLocation{line=743, column=11}]}
@big-carpet-38439 - This might be related to the issue I was seeing last week.handsome-belgium-11927
10/14/2021, 2:33 PMcurl '<http://localhost:8080/entities?action=ingest>' -X POST --data '{
"entity": {
"value": {
"com.linkedin.metadata.snapshot.DatasetSnapshot": {
"urn": "urn:li:dataset:(urn:li:dataPlatform:exasol,main.dds.test2,PROD)",
"aspects": [
{
"com.linkedin.dataset.DatasetProperties": {
"description": "Hello",
"customProperties": {
"hello": "world"
}
}
}
]
}
}
}
}'
broad-crowd-13788
10/15/2021, 1:43 PMSLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See <http://www.slf4j.org/codes.html#StaticLoggerBinder> for further details.
Exception in thread "main" java.lang.RuntimeException: Failed to validate DataHub PDL models
at com.linkedin.metadata.model.validation.ModelValidationTask.main(ModelValidationTask.java:50)
Caused by: com.linkedin.metadata.models.ModelValidationException: Found invalid relationship with name OwnedBy at path /orgOwner. Invalid entityType(s) provided.
at com.linkedin.metadata.models.EntitySpecBuilder.failValidation(EntitySpecBuilder.java:323)
at com.linkedin.metadata.models.EntitySpecBuilder.buildEntitySpecs(EntitySpecBuilder.java:74)
at com.linkedin.metadata.model.validation.ModelValidationTask.main(ModelValidationTask.java:48)
Not sure why the validation task fails. We are trying to extend the model and use the 'OwnedBy' relationship between other entities but the build fails with the above error.
Is there a specific rule regarding 'OwnedBy' relationship that we are missing?blue-zoo-89533
10/16/2021, 4:58 PMcrooked-midnight-76614
10/18/2021, 8:21 AM07:51:34.891 [generic-mce-consumer-job-client-0-C-1[] ERROR o.s.k.l.KafkaMessageListenerContainer$ListenerConsumer - Authorization Exception
org.apache.kafka.common.errors.GroupAuthorizationException: Not authorized to access group: generic-mce-consumer-job-client
Which I think in turn is causing:
07:51:34.903 [generic-mce-consumer-job-client-0-C-1[] ERROR o.s.k.l.KafkaMessageListenerContainer$ListenerConsumer - Fatal consumer exception; stopping container
And perhaps finally causing the following error when trying to access the analytics tab:
08:17:15.790 [Thread-13[] ERROR c.l.d.g.a.service.AnalyticsService - Search query failed: Elasticsearch exception [type=index_not_found_exception, reason=no such index [datahub_usage_event[]]
08:17:15.792 [Thread-13[] ERROR c.l.d.g.e.DataHubDataFetcherExceptionHandler - Failed to execute DataFetcher
java.lang.RuntimeException: Search query failed:
at com.linkedin.datahub.graphql.analytics.service.AnalyticsService.executeAndExtract(AnalyticsService.java:245)
at com.linkedin.datahub.graphql.analytics.service.AnalyticsService.getTimeseriesChart(AnalyticsService.java:93)
at com.linkedin.datahub.graphql.analytics.resolver.GetChartsResolver.getProductAnalyticsCharts(GetChartsResolver.java:59)
at com.linkedin.datahub.graphql.analytics.resolver.GetChartsResolver.get(GetChartsResolver.java:39)
at com.linkedin.datahub.graphql.analytics.resolver.GetChartsResolver.get(GetChartsResolver.java:27)
at graphql.execution.ExecutionStrategy.fetchField(ExecutionStrategy.java:270)
at graphql.execution.ExecutionStrategy.resolveFieldWithInfo(ExecutionStrategy.java:203)
at graphql.execution.AsyncExecutionStrategy.execute(AsyncExecutionStrategy.java:60)
at graphql.execution.Execution.executeOperation(Execution.java:165)
at graphql.execution.Execution.execute(Execution.java:104)
at graphql.GraphQL.execute(GraphQL.java:557)
at graphql.GraphQL.parseValidateAndExecute(GraphQL.java:482)
at graphql.GraphQL.executeAsync(GraphQL.java:446)
at graphql.GraphQL.execute(GraphQL.java:377)
at com.linkedin.datahub.graphql.GraphQLEngine.execute(GraphQLEngine.java:88)
at com.datahub.metadata.graphql.GraphQLController.lambda$postGraphQL$0(GraphQLController.java:82)
at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.elasticsearch.ElasticsearchStatusException: Elasticsearch exception [type=index_not_found_exception, reason=no such index [datahub_usage_event[]]
at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:187)
at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:1892)
at org.elasticsearch.client.RestHighLevelClient.parseResponseException(RestHighLevelClient.java:1869)
at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1626)
at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1583)
at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1553)
at org.elasticsearch.client.RestHighLevelClient.search(RestHighLevelClient.java:1069)
at com.linkedin.datahub.graphql.analytics.service.AnalyticsService.executeAndExtract(AnalyticsService.java:240)
... 17 common frames omitted
Suppressed: org.elasticsearch.client.ResponseException: method [POST[], host [<https://search-datahub-cjfy6-y4e4cliwdivgdm3iou375f7d4a.eu-central-1.es.amazonaws.com:443>], URI [/datahub_usage_event/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true], status line [HTTP/1.1 404 Not Found]
{"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index [datahub_usage_event[]","resource.type":"index_or_alias","resource.id":"datahub_usage_event","index_uuid":"_na_","index":"datahub_usage_event"}],"type":"index_not_found_exception","reason":"no such index [datahub_usage_event[]","resource.type":"index_or_alias","resource.id":"datahub_usage_event","index_uuid":"_na_","index":"datahub_usage_event"},"status":404}
at org.elasticsearch.client.RestClient.convertResponse(RestClient.java:302)
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:272)
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:246)
at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1613)
... 21 common frames omitted
08:17:15.798 [Thread-15[] ERROR c.l.d.g.a.service.AnalyticsService - Search query failed: Elasticsearch exception [type=index_not_found_exception, reason=no such index [datahub_usage_event[]]
08:17:15.802 [Thread-13[] ERROR c.d.m.graphql.GraphQLController - Errors while executing graphQL query: "query getAnalyticsCharts {\n getAnalyticsCharts {\n title\n charts {\n ... on TimeSeriesChart {\n title\n lines {\n name\n data {\n x\n y\n __typename\n }\n __typename\n }\n dateRange {\n start\n end\n __typename\n }\n interval\n __typename\n }\n ... on BarChart {\n title\n bars {\n name\n segments {\n label\n value\n __typename\n }\n __typename\n }\n __typename\n }\n ... on TableChart {\n title\n columns\n rows {\n values\n __typename\n }\n __typename\n }\n __typename\n }\n __typename\n }\n}\n", result: {errors=[{message=An unknown error occurred., locations=[{line=2, column=3}], path=[getAnalyticsCharts[], extensions={code=500, classification=DataFetchingException}}], data=null}, errors: [DataHubGraphQLError{path=[getAnalyticsCharts[], code=SERVER_ERROR, locations=[SourceLocation{line=2, column=3}]}]
08:17:15.799 [Thread-15[] ERROR c.l.d.g.e.DataHubDataFetcherExceptionHandler - Failed to execute DataFetcher
java.lang.RuntimeException: Search query failed:
at com.linkedin.datahub.graphql.analytics.service.AnalyticsService.executeAndExtract(AnalyticsService.java:245)
at com.linkedin.datahub.graphql.analytics.service.AnalyticsService.getHighlights(AnalyticsService.java:216)
at com.linkedin.datahub.graphql.analytics.resolver.GetHighlightsResolver.getHighlights(GetHighlightsResolver.java:50)
at com.linkedin.datahub.graphql.analytics.resolver.GetHighlightsResolver.get(GetHighlightsResolver.java:29)
at com.linkedin.datahub.graphql.analytics.resolver.GetHighlightsResolver.get(GetHighlightsResolver.java:19)
at graphql.execution.ExecutionStrategy.fetchField(ExecutionStrategy.java:270)
at graphql.execution.ExecutionStrategy.resolveFieldWithInfo(ExecutionStrategy.java:203)
at graphql.execution.AsyncExecutionStrategy.execute(AsyncExecutionStrategy.java:60)
at graphql.execution.Execution.executeOperation(Execution.java:165)
at graphql.execution.Execution.execute(Execution.java:104)
at graphql.GraphQL.execute(GraphQL.java:557)
at graphql.GraphQL.parseValidateAndExecute(GraphQL.java:482)
at graphql.GraphQL.executeAsync(GraphQL.java:446)
at graphql.GraphQL.execute(GraphQL.java:377)
at com.linkedin.datahub.graphql.GraphQLEngine.execute(GraphQLEngine.java:88)
at com.datahub.metadata.graphql.GraphQLController.lambda$postGraphQL$0(GraphQLController.java:82)
at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.elasticsearch.ElasticsearchStatusException: Elasticsearch exception [type=index_not_found_exception, reason=no such index [datahub_usage_event[]]
at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:187)
at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:1892)
at org.elasticsearch.client.RestHighLevelClient.parseResponseException(RestHighLevelClient.java:1869)
at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1626)
at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1583)
at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1553)
at org.elasticsearch.client.RestHighLevelClient.search(RestHighLevelClient.java:1069)
at com.linkedin.datahub.graphql.analytics.service.AnalyticsService.executeAndExtract(AnalyticsService.java:240)
... 17 common frames omitted
Suppressed: org.elasticsearch.client.ResponseException: method [POST[], host [<https://search-datahub-cjfy6-y4e4cliwdivgdm3iou375f7d4a.eu-central-1.es.amazonaws.com:443>], URI [/datahub_usage_event/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true], status line [HTTP/1.1 404 Not Found]
{"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index [datahub_usage_event[]","resource.type":"index_or_alias","resource.id":"datahub_usage_event","index_uuid":"_na_","index":"datahub_usage_event"}],"type":"index_not_found_exception","reason":"no such index [datahub_usage_event[]","resource.type":"index_or_alias","resource.id":"datahub_usage_event","index_uuid":"_na_","index":"datahub_usage_event"},"status":404}
at org.elasticsearch.client.RestClient.convertResponse(RestClient.java:302)
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:272)
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:246)
at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1613)
... 21 common frames omitted
08:17:15.807 [Thread-15[] ERROR c.d.m.graphql.GraphQLController - Errors while executing graphQL query: "query getHighlights {\n getHighlights {\n value\n title\n body\n __typename\n }\n}\n", result: {errors=[{message=An unknown error occurred., locations=[{line=2, column=3}], path=[getHighlights[], extensions={code=500, classification=DataFetchingException}}], data=null}, errors: [DataHubGraphQLError{path=[getHighlights[], code=SERVER_ERROR, locations=[SourceLocation{line=2, column=3}]}]
I've tried setting the following environment variables:
DATAHUB_USAGE_EVENT_NAME
DATAHUB_USAGE_EVENT_KAFKA_CONSUMER_GROUP_ID
DATAHUB_TRACKING_TOPIC
some-hospital-42166
10/18/2021, 8:45 AMbrief-lizard-77958
10/18/2021, 1:29 PM[system.out] venv/lib/python3.9/site-packages/airflow/_vendor/connexion/spec.py:169: error: invalid syntax
See thread for the picture of where this points to.
I could build previous versions without a problem with the same system (Ubuntu 21.04)witty-butcher-82399
10/18/2021, 2:15 PMprofile
feature, I got the following error: Cannot perform CREATE TEMPTABLE. This session does not have a current schema. Call 'USE SCHEMA', or use a qualified name.
Have you got this error before? Is there any way I could set which is the schema to be used for the profiler when creating the temporary tables?
• When using the snowflake-usage
connector, I got the error below.
"Failed to parse usage line {'query_start_time': datetime.datetime(2021, 10, 17, 3, 41, 14, 560000, "
'tzinfo=datetime.timezone.utc), \'query_text\': "create temporary table '
'avalanche.dwh_stage_iad.accepted_values_stg_ad_images_source_system__IAD__tmp as\\n select '
'test_run_start_ts,\\n row_count,\\n failure_row_count,\\n case when failure_row_count > '
"0\\n then 'ERROR'\\n else 'PASS'\\n end as test_status\\n from "
'(\\n select current_timestamp as test_run_start_ts,\\n count_all_sql.row_count,\\n '
'case when count_all_sql.row_count > 0 \\n then (\\n -- begin of data test '
'query\\n select count(1) as row_count\\n from avalanche.dwh_stage_iad.stg_ad_images as '
"model\\n \\n where (\\n source_system not in ('IAD')\\n "
')\\n -- and of data test query\\n )\\n else '
'0\\n end as failure_row_count\\n from (\\n -- begin of count all '
'query\\n select count(1) as row_count\\n from avalanche.dwh_stage_iad.stg_ad_images as model\\n '
'\\n -- end of count all query\\n ) as count_all_sql\\n );", \'query_type\': '
"'CREATE_TABLE_AS_SELECT', 'base_objects_accessed': [{'columns': [{'columnId': 82388016, 'columnName': 'SOURCE_SYSTEM'}], "
"'objectDomain': 'Table', 'objectId': 19718154, 'objectName': 'AVALANCHE.DWH_STAGE_IAD.STG_AD_IMAGES'}], 'user_name': "
"'SERVICE_AVALANCHE', 'first_name': 'Avalanche', 'last_name': 'Service Account', 'display_name': 'SERVICE_AVALANCHE', "
"'email': None, 'role_name': 'SERVICE_DBT'}",
• Also, regarding the snowflake-usage
connector, it called my attention that it is handled as an independent connector instead of just a property on the snowflake
connector. Because of that, while I can filter (`allow`/`deny`) tables and schemas with the snowflake
connector, I can’t with the snowflake-usage
one. This results in snowflake-usage
producing events for tables that I don’t want to be in the catalog. Any reason why this split of the connector? Or how can I keep both connectors aligned on which tables being processed?
Thanks in advance!witty-actor-87329
10/18/2021, 6:28 PMCaused by: com.nimbusds.oauth2.sdk.ParseException: The scope must include an "openid" value
at com.nimbusds.openid.connect.sdk.AuthenticationRequest.parse(AuthenticationRequest.java:1378)
at com.nimbusds.openid.connect.sdk.AuthenticationRequest.parse(AuthenticationRequest.java:1312)
at org.pac4j.oidc.redirect.OidcRedirectActionBuilder.buildAuthenticationRequestUrl(OidcRedirectActionBuilder.java:110)
... 49 common frames omitted
Below is my configurations:
- AUTH_OIDC_ENABLED=true
- AUTH_OIDC_CLIENT_ID=xxxxxxxxxxxxxxx
- AUTH_OIDC_CLIENT_SECRET=zzzzzzzzzzzzzzzz
- AUTH_OIDC_DISCOVERY_URI=<https://xyz.okta.com/.well-known/openid-configuration>
- AUTH_OIDC_BASE_URL=<https://datahub-prod.xyz.io>
- AUTH_OIDC_SCOPE="openid profile email groups"
Can anyone help on this. cc: @gentle-father-80172red-pizza-28006
10/19/2021, 6:52 PMhandsome-football-66174
10/19/2021, 7:13 PMextraEnvs:
- name: AUTH_OIDC_ENABLED
value: "true"
- name: AUTH_OIDC_CLIENT_ID
value: MMSOauthClient
- name: AUTH_OIDC_CLIENT_SECRET
value: "<value>"
- name: AUTH_OIDC_DISCOVERY_URI
value: https://<saml-host>/.well-known/openid-configuration
- name: AUTH_OIDC_BASE_URL
value: https://<host>/
It is redirecting to this
<https://<hostname>/#error_description=The+global+default+access+token+manager+is+not+available+for+the+selected+client+and+authentication+context&error=invalid_request|https://<hostname>/#error_description=The+global+default+[…]ication+context&error=invalid_request>microscopic-elephant-47912
10/19/2021, 8:29 PM---- (full traceback above) ----
File "/usr/local/lib/python3.8/site-packages/datahub/entrypoints.py", line 91, in main
sys.exit(datahub(standalone_mode=False, **kwargs))
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1128, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1053, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1659, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1659, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/datahub/cli/ingest_cli.py", line 52, in run
pipeline = Pipeline.create(pipeline_config)
File "/usr/local/lib/python3.8/site-packages/datahub/ingestion/run/pipeline.py", line 120, in create
return cls(config)
File "/usr/local/lib/python3.8/site-packages/datahub/ingestion/run/pipeline.py", line 88, in __init__
self.source: Source = source_class.create(
File "/usr/local/lib/python3.8/site-packages/datahub/ingestion/source/looker.py", line 788, in create
return cls(config, ctx)
File "/usr/local/lib/python3.8/site-packages/datahub/ingestion/source/looker.py", line 245, in __init__
self.client = LookerAPI(self.source_config).get_client()
File "/usr/local/lib/python3.8/site-packages/datahub/ingestion/source/looker.py", line 84, in __init__
raise ConfigurationError(
ConfigurationError: Failed to initialize Looker client. Please check your configuration.
Slack Conversationmodern-nail-74015
10/20/2021, 1:55 AMhigh-notebook-40979
10/20/2021, 4:46 AMmodern-nail-74015
10/20/2021, 8:10 AMCaused by: java.lang.RuntimeException: Failed to resolve user name claim from profile provided by Identity Provider. Missing attribute. Attribute: 'preferred_username', Regex: '(.*)', Profile: {at_hash=froRaU5vpSNyY0aMxVEXmw, token_expiration_advance=-1, aud=[WCca1QCzMPQ6HDgOthv0UvB6WtuMUjHC], id_token=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJ0ZWxldHJhYW4iLCJzdWIiOiIxIiwiYXVkIjoiV0NjYTFRQ3pNUFE2SERnT3RodjBVdkI2V3R1TVVqSEMiLCJleHAiOjE2MzQ3MjQyOTgsImlhdCI6MTYzNDcxNzA5OCwidXNlcm5hbWUiOiJydWljb3JlIiwiYXRfaGFzaCI6ImZyb1JhVTV2cFNOeVkwYU14VkVYbXcifQ.EQeE8S12nrwIj5FeoJDcAOXT3nzcQCrdpyrmBVXJmTs, iss=teletraan, exp=Wed Oct 20 10:04:58 GMT 2021, iat=Wed Oct 20 08:04:58 GMT 2021, username=ruicore}
tall-forest-65335
10/20/2021, 10:44 AMred-pizza-28006
10/20/2021, 11:46 AMmelodic-helmet-78607
10/21/2021, 6:08 AMmodern-nail-74015
10/21/2021, 8:25 AMnice-planet-17111
10/21/2021, 8:26 AMdatahub-gms
and datahub-upgrade-job
deployment, and when i helm update, upgrade job fails and i get Error creating bean with name 'upgradeCli': Unsatisfied dependency expressed through field 'noCodeUpgrade'
error 😞 Is there something i can do? (error log and helm chart is in the thread because it's too long)red-pizza-28006
10/21/2021, 6:37 PMdry-policeman-74195
10/22/2021, 4:23 PMcurved-jordan-15657
10/24/2021, 8:49 PMFile "/home/airflow/airflow_venv/lib/python3.8/site-packages/airflow/task/task_runner/standard_task_runner.py", line 85, in _start_by_fork
args.func(args, dag=self.dag)
File "/home/airflow/airflow_venv/lib/python3.8/site-packages/airflow/cli/cli_parser.py", line 48, in command
return func(*args, **kwargs)
File "/home/airflow/airflow_venv/lib/python3.8/site-packages/airflow/utils/cli.py", line 92, in wrapper
return f(*args, **kwargs)
File "/home/airflow/airflow_venv/lib/python3.8/site-packages/airflow/cli/commands/task_command.py", line 292, in task_run
_run_task_by_selected_method(args, dag, ti)
File "/home/airflow/airflow_venv/lib/python3.8/site-packages/airflow/cli/commands/task_command.py", line 107, in _run_task_by_selected_method
_run_raw_task(args, ti)
File "/home/airflow/airflow_venv/lib/python3.8/site-packages/airflow/cli/commands/task_command.py", line 180, in _run_raw_task
ti._run_raw_task(
File "/home/airflow/airflow_venv/lib/python3.8/site-packages/airflow/utils/session.py", line 70, in wrapper
return func(*args, session=session, **kwargs)
File "/home/airflow/airflow_venv/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1324, in _run_raw_task
self._execute_task_with_callbacks(context)
File "/home/airflow/airflow_venv/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1443, in _execute_task_with_callbacks
result = self._execute_task(context, self.task)
File "/home/airflow/airflow_venv/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1494, in _execute_task
result = execute_callable(context=context)
File "/home/airflow/airflow_venv/lib/python3.8/site-packages/airflow/operators/python.py", line 151, in execute
return_value = self.execute_callable()
File "/home/airflow/airflow_venv/lib/python3.8/site-packages/airflow/operators/python.py", line 162, in execute_callable
return self.python_callable(*self.op_args, **self.op_kwargs)
File "/home/airflow/airflow/dags/datahub_ingestion_dag.py", line 39, in datahub_redshift
pipeline.run()
File "/home/airflow/airflow_venv/lib/python3.8/site-packages/datahub/ingestion/run/pipeline.py", line 125, in run
for wu in self.source.get_workunits():
File "/home/airflow/airflow_venv/lib/python3.8/site-packages/datahub/ingestion/source/sql/redshift.py", line 253, in get_workunits
lineage_mcp = self.get_lineage_mcp(wu.metadata.proposedSnapshot.urn)
File "/home/airflow/airflow_venv/lib/python3.8/site-packages/datahub/ingestion/source/sql/redshift.py", line 272, in get_lineage_mcp
tablename = dataset_params[2]
IndexError: list index out of range
After digging the problem, i realized that in redshift-py file, “dataset_params” array includes 3 parts inside by dataset_params =dataset_key.name.split(".")
. And dataset_key doesn’t recognize our db-name from our recipe file since we wrote `host: <endpoint>/<db-name>`If i give the database name like database:<db-name>
in recipe.yml file, it resolves this problem but then i had another error like:
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/datahub/cli/ingest_cli.py", line 58, in run
pipeline.run()
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/datahub/ingestion/run/pipeline.py", line 125, in run
for wu in self.source.get_workunits():
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/datahub/ingestion/source/sql/redshift.py", line 248, in get_workunits
for wu in super().get_workunits():
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/datahub/ingestion/source/sql/sql_common.py", line 364, in get_workunits
yield from self.loop_tables(inspector, schema, sql_config)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/datahub/ingestion/source/sql/sql_common.py", line 435, in loop_tables
columns = inspector.get_columns(table, schema)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sqlalchemy/engine/reflection.py", line 390, in get_columns
col_defs = self.dialect.get_columns(
File "<string>", line 2, in get_columns
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sqlalchemy/engine/reflection.py", line 52, in cache
ret = fn(self, con, *args, **kw)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sqlalchemy_redshift/dialect.py", line 454, in get_columns
cols = self._get_redshift_columns(connection, table_name, schema, **kw)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sqlalchemy_redshift/dialect.py", line 705, in _get_redshift_columns
return all_columns[key]
KeyError: '<our-schema-name>.<our-table-name>'
Everything was perfect until now. Datahub version is 0.8.16. I think there is with I need a solution for the first problem.fresh-carpet-31048
10/25/2021, 2:05 PMDataFetcher<CompletableFuture<String>>
and DataFetcher<CompletableFuture<Boolean>>
the return type (the first returns a string and second returns a boolean)?handsome-football-66174
10/25/2021, 8:50 PMfuture-hamburger-62563
10/26/2021, 1:19 AMdocker/dev.sh
, when I try and build I get this error
org.testcontainers.containers.ContainerFetchException: Can't get Docker image: RemoteDockerImage(imageName=<http://docker.elastic.co/elasticsearch/elasticsearch:7.9.3|docker.elastic.co/elasticsearch/elasticsearch:7.9.3>, imagePullPolicy=DefaultPullPolicy())
at org.testcontainers.containers.GenericContainer.getDockerImageName(GenericContainer.java:1286)
at org.testcontainers.containers.GenericContainer.logger(GenericContainer.java:615)
* What went wrong:
Execution failed for task 'metadata iotest'.
I tried running docker container prune
and the command cleared some disk space, but the build is still failing. Any ideas?ancient-hair-10877
10/26/2021, 10:59 AM