numerous-address-22061
05/25/2023, 5:23 PMbrowse path
of my ingested Kakfa Topics
. Some are getting a nice, fully qualified browse path, and some are just not. I am not explicitly defining the browse path in my ingestion, here is an example...
Ingestion
pipeline_name: ${PIPELINE_NAME}
source:
type: "kafka"
config:
platform_instance: ${CLUSTER_NAME}
connection:
bootstrap: ${BOOTSTRAP_BROKERS}
consumer_config:
security.protocol: "SASL_SSL"
sasl.mechanism: "SCRAM-SHA-512"
sasl.username: "${KAFKA_USERNAME}"
sasl.password: "${KAFKA_PASSWORD}"
schema_registry_url: ${SCHEMA_REGISTRY_URL}
sink:
type: "datahub-rest"
config:
server: ${DATAHUB_GMS_ENDPOINT}
First topic
(queried using GraphQL)
{
"data": {
"dataset": {
"urn": "urn:li:dataset:(urn:li:dataPlatform:kafka,platform-instance.org.db.app.topic_name,PROD)",
"platform": {
"name": "kafka"
},
"browsePaths": [
{
"path": [
"prod",
"kafka",
"platform-instance",
"org",
"db",
"app"
]
}
],
"properties": {
"name": "org.db.app.topic_name"
}
}
}
}
Second Topic
(note this is undesired
and I cant figure out why it is getting a different browse path than the topic above)
{
"data": {
"dataset": {
"urn": "urn:li:dataset:(urn:li:dataPlatform:kafka,platform-instance.org.db.app.topic_name_2,PROD)",
"platform": {
"name": "kafka"
},
"browsePaths": [
{
"path": [
"prod",
"kafka",
"platform-instance"
]
}
],
"properties": {
"name": "org.db.app.topic_name_2"
}
}
}
}
Why is the second browse path so short? It is very unfortunate for discovery in the UIcreamy-ram-28134
05/25/2023, 7:56 PMbrainy-balloon-97302
05/25/2023, 9:38 PM'failures': {'<s3://aws-glue-assets-XXXXXX-us-west-2/scripts/Untitled> job.py': ['Unable to download DAG for Glue job from <s3://aws-glue-assets-XXXXXX-us-west-2/scripts/Untitled> job.py, so job subtasks and lineage will be missing: An error occurred (NoSuchKey) when calling the GetObject operation: The specified key does not exist.', 'Unable to download DAG for Glue job from <s3://aws-glue-assets-XXXXXX-us-west-2/scripts/Untitled> job.py, so job subtasks and lineage will be missing: An error occurred (NoSuchKey) when calling the GetObject operation: The specified key does not exist.']}
I don't have that file in s3 nor a glue job called Untitled job.py
so I am trying to see what I can do to resolve. The rest of the metadata is being pulled over but it's annoying it's marking it as a failure.hundreds-airline-29192
05/26/2023, 5:18 AMhundreds-airline-29192
05/26/2023, 7:55 AMbotocore.exceptions.PaginationError: Error during pagination: The same next token was received twice: {'Marker': 'dwh/dev/fact/fact_gross_profit/order_date_key_07%3D20230109/part-00018-f1470254-2c8b-4a23-aaad-0260cdca7054.c000.snappy.parquet'}
hundreds-airline-29192
05/26/2023, 7:55 AMgifted-bird-57147
05/26/2023, 10:28 AMfreezing-fall-69290
05/26/2023, 10:32 AMbrainy-needle-61527
05/26/2023, 12:46 PMbrainy-intern-50400
05/26/2023, 4:34 PMlate-addition-48515
05/26/2023, 5:07 PMrapid-controller-60841
05/29/2023, 8:29 AMsource:
type: hive
config:
env: PROD
platform: databricks
host_port: 'http://JD-in-us.cloud.databricks.com/published'
username: token
password: '${databricks_token}'many-rocket-80549
05/29/2023, 10:36 AM~~~~ Execution Summary - RUN_INGEST ~~~~
Execution finished with errors.
{'exec_id': '06b7698c-048e-470e-bf2c-1ff4fca75bd0',
'infos': ['2023-05-29 10:18:33.415813 INFO: Starting execution for task with name=RUN_INGEST',
"2023-05-29 10:18:37.476974 INFO: Failed to execute 'datahub ingest'",
'2023-05-29 10:18:37.477118 INFO: Caught exception EXECUTING task_id=06b7698c-048e-470e-bf2c-1ff4fca75bd0, name=RUN_INGEST, '
'stacktrace=Traceback (most recent call last):\n'
' File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/default_executor.py", line 122, in execute_task\n'
' task_event_loop.run_until_complete(task_future)\n'
' File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete\n'
' return future.result()\n'
' File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 231, in execute\n'
' raise TaskError("Failed to execute \'datahub ingest\'")\n'
"acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"],
'errors': []}
~~~~ Ingestion Report ~~~~
{
"cli": {
"cli_version": "0.10.0.7",
"cli_entry_location": "/usr/local/lib/python3.10/site-packages/datahub/__init__.py",
"py_version": "3.10.10 (main, Mar 14 2023, 02:37:11) [GCC 10.2.1 20210110]",
"py_exec_path": "/usr/local/bin/python",
"os_details": "Linux-5.15.0-72-generic-x86_64-with-glibc2.31",
"peak_memory_usage": "57.82 MB",
"mem_info": "57.82 MB"
},
"source": {
"type": "file",
"report": {
"events_produced": 0,
"events_produced_per_sec": 0,
"entities": {},
"aspects": {},
"warnings": {},
"failures": {},
"total_num_files": 0,
"num_files_completed": 0,
"files_completed": [],
"percentage_completion": "0%",
"estimated_time_to_completion_in_minutes": -1,
"total_bytes_read_completed_files": 0,
"total_parse_time_in_seconds": 0,
"total_count_time_in_seconds": 0,
"total_deserialize_time_in_seconds": 0,
"aspect_counts": {},
"entity_type_counts": {},
"start_time": "2023-05-29 10:18:35.206188 (now)",
"running_time": "0 seconds"
}
},
"sink": {
"type": "datahub-rest",
"report": {
"total_records_written": 0,
"records_written_per_second": 0,
"warnings": [],
"failures": [],
"start_time": "2023-05-29 10:18:35.161225 (now)",
"current_time": "2023-05-29 10:18:35.208860 (now)",
"total_duration_in_seconds": 0.05,
"gms_version": "v0.10.3",
"pending_requests": 0
}
}
}
~~~~ Ingestion Logs ~~~~
Obtaining venv creation lock...
Acquired venv creation lock
venv setup time = 0
This version of datahub supports report-to functionality
datahub ingest run -c /tmp/datahub/ingest/06b7698c-048e-470e-bf2c-1ff4fca75bd0/recipe.yml --report-to /tmp/datahub/ingest/06b7698c-048e-470e-bf2c-1ff4fca75bd0/ingestion_report.json
[2023-05-29 10:18:35,113] INFO {datahub.cli.ingest_cli:173} - DataHub CLI version: 0.10.0.7
No ~/.datahubenv file found, generating one for you...
[2023-05-29 10:18:35,164] INFO {datahub.ingestion.run.pipeline:184} - Sink configured successfully. DataHubRestEmitter: configured to talk to <http://datahub-gms:8080>
[2023-05-29 10:18:35,206] INFO {datahub.ingestion.run.pipeline:201} - Source configured successfully.
[2023-05-29 10:18:35,207] INFO {datahub.cli.ingest_cli:129} - Starting metadata ingestion
[2023-05-29 10:18:35,209] INFO {datahub.ingestion.reporting.file_reporter:52} - Wrote UNKNOWN report successfully to <_io.TextIOWrapper name='/tmp/datahub/ingest/06b7698c-048e-470e-bf2c-1ff4fca75bd0/ingestion_report.json' mode='w' encoding='UTF-8'>
[2023-05-29 10:18:35,209] INFO {datahub.cli.ingest_cli:134} - Source (file) report:
{'events_produced': 0,
'events_produced_per_sec': 0,
'entities': {},
'aspects': {},
'warnings': {},
'failures': {},
'total_num_files': 0,
'num_files_completed': 0,
'files_completed': [],
'percentage_completion': '0%',
'estimated_time_to_completion_in_minutes': -1,
'total_bytes_read_completed_files': 0,
'total_parse_time_in_seconds': 0,
'total_count_time_in_seconds': 0,
'total_deserialize_time_in_seconds': 0,
'aspect_counts': {},
'entity_type_counts': {},
'start_time': '2023-05-29 10:18:35.206188 (now)',
'running_time': '0 seconds'}
[2023-05-29 10:18:35,210] INFO {datahub.cli.ingest_cli:137} - Sink (datahub-rest) report:
{'total_records_written': 0,
'records_written_per_second': 0,
'warnings': [],
'failures': [],
'start_time': '2023-05-29 10:18:35.161225 (now)',
'current_time': '2023-05-29 10:18:35.210294 (now)',
'total_duration_in_seconds': 0.05,
'gms_version': 'v0.10.3',
'pending_requests': 0}
[2023-05-29 10:18:35,809] ERROR {datahub.entrypoints:188} - Command failed: Failed to process /home/miquelp/datahub/file_onboarding/test_containers.json
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/datahub/entrypoints.py", line 175, in main
sys.exit(datahub(standalone_mode=False, **kwargs))
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 379, in wrapper
raise e
File "/usr/local/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 334, in wrapper
res = func(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/datahub/utilities/memory_leak_detector.py", line 95, in wrapper
return func(ctx, *args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 198, in run
loop.run_until_complete(run_func_check_upgrade(pipeline))
File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/usr/local/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 158, in run_func_check_upgrade
ret = await the_one_future
File "/usr/local/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 149, in run_pipeline_async
return await loop.run_in_executor(
File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/local/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 140, in run_pipeline_to_completion
raise e
File "/usr/local/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 132, in run_pipeline_to_completion
pipeline.run()
File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 339, in run
for wu in itertools.islice(
File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/source/file.py", line 196, in get_workunits
for f in self.get_filenames():
File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/source/file.py", line 193, in get_filenames
raise Exception(f"Failed to process {self.config.path}")
Exception: Failed to process /home/miquelp/datahub/file_onboarding/test_containers.json
acceptable-helmet-19082
05/29/2023, 10:53 AMastonishing-father-13229
05/29/2023, 5:15 PMhundreds-airline-29192
05/30/2023, 2:21 AMhundreds-airline-29192
05/30/2023, 2:24 AMcom.linkedin.restli.server.RestLiServiceException: com.datahub.util.exception.ESQueryException: Search query failed:
at com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:42)
at com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:50)
at com.linkedin.metadata.resources.usage.UsageStats.query(UsageStats.java:320)
at com.linkedin.metadata.resources.usage.UsageStats.queryRange(UsageStats.java:386)
at jdk.internal.reflect.GeneratedMethodAccessor375.invoke(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at com.linkedin.restli.internal.server.RestLiMethodInvoker.doInvoke(RestLiMethodInvoker.java:177)
at com.linkedin.restli.internal.server.RestLiMethodInvoker.invoke(RestLiMethodInvoker.java:333)
at com.linkedin.restli.internal.server.filter.FilterChainDispatcherImpl.onRequestSuccess(FilterChainDispatcherImpl.java:47)
at com.linkedin.restli.internal.server.filter.RestLiFilterChainIterator.onRequest(RestLiFilterChainIterator.java:86)
at com.linkedin.restli.internal.server.filter.RestLiFilterChainIterator.lambda$onRequest$0(RestLiFilterChainIterator.java:73)
at java.base/java.util.concurrent.CompletableFuture.uniAcceptNow(CompletableFuture.java:753)
at java.base/java.util.concurrent.CompletableFuture.uniAcceptStage(CompletableFuture.java:731)
at java.base/java.util.concurrent.CompletableFuture.thenAccept(CompletableFuture.java:2108)
at com.linkedin.restli.internal.server.filter.RestLiFilterChainIterator.onRequest(RestLiFilterChainIterator.java:72)
at com.linkedin.restli.internal.server.filter.RestLiFilterChain.onRequest(RestLiFilterChain.java:55)
at com.linkedin.restli.server.BaseRestLiServer.handleResourceRequest(BaseRestLiServer.java:262)
at com.linkedin.restli.server.RestRestLiServer.handleResourceRequestWithRestLiResponse(RestRestLiServer.java:294)
at com.linkedin.restli.server.RestRestLiServer.handleResourceRequest(RestRestLiServer.java:262)
at com.linkedin.restli.server.RestRestLiServer.handleResourceRequest(RestRestLiServer.java:232)
at com.linkedin.restli.server.RestRestLiServer.doHandleRequest(RestRestLiServer.java:215)
at com.linkedin.restli.server.RestRestLiServer.handleRequest(RestRestLiServer.java:171)
at com.linkedin.restli.server.RestLiServer.handleRequest(RestLiServer.java:130)
at com.linkedin.restli.server.DelegatingTransportDispatcher.handleRestRequest(DelegatingTransportDispatcher.java:70)
at com.linkedin.r2.filter.transport.DispatcherRequestFilter.onRestRequest(DispatcherRequestFilter.java:70)
at com.linkedin.r2.filter.TimedRestFilter.onRestRequest(TimedRestFilter.java:76)
at com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:146)
at com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:132)
at com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterChainIterator.java:62)
at com.linkedin.r2.filter.TimedNextFilter.onRequest(TimedNextFilter.java:55)
at com.linkedin.r2.filter.transport.ServerQueryTunnelFilter.onRestRequest(ServerQueryTunnelFilter.java:58)
at com.linkedin.r2.filter.TimedRestFilter.onRestRequest(TimedRestFilter.java:76)
at com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:146)
at com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:132)
at com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterChainIterator.java:62)
at com.linkedin.r2.filter.TimedNextFilter.onRequest(TimedNextFilter.java:55)
at com.linkedin.r2.filter.message.rest.RestFilter.onRestRequest(RestFilter.java:50)
at com.linkedin.r2.filter.TimedRestFilter.onRestRequest(TimedRestFilter.java:76)
at com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:146)
at com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:132)
at com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterChainIterator.java:62)
at com.linkedin.r2.filter.FilterChainImpl.onRestRequest(FilterChainImpl.java:106)
at com.linkedin.r2.filter.transport.FilterChainDispatcher.handleRestRequest(FilterChainDispatcher.java:75)
at com.linkedin.r2.util.finalizer.RequestFinalizerDispatcher.handleRestRequest(RequestFinalizerDispatcher.java:61)
at com.linkedin.r2.transport.http.server.HttpDispatcher.handleRequest(HttpDispatcher.java:101)
at com.linkedin.r2.transport.http.server.AbstractR2Servlet.service(AbstractR2Servlet.java:105)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at com.linkedin.restli.server.RestliHandlerServlet.service(RestliHandlerServlet.java:21)
at com.linkedin.restli.server.RestliHandlerServlet.handleRequest(RestliHandlerServlet.java:26)
at org.springframework.web.context.support.HttpRequestHandlerServlet.service(HttpRequestHandlerServlet.java:73)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799)
at org.eclipse.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1631)
at com.datahub.auth.authentication.filter.AuthenticationFilter.doFilter(AuthenticationFilter.java:102)
at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:548)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:600)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1624)
at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440)
at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501)
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1594)
at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:191)
at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
at org.eclipse.jetty.server.Server.handle(Server.java:516)
at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487)
at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:732)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:479)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: com.datahub.util.exception.ESQueryException: Search query failed:
at com.linkedin.metadata.timeseries.elastic.query.ESAggregatedStatsDAO.getAggregatedStats(ESAggregatedStatsDAO.java:375)
at com.linkedin.metadata.timeseries.elastic.ElasticSearchTimeseriesAspectService.getAggregatedStats(ElasticSearchTimeseriesAspectService.java:216)
at com.linkedin.metadata.resources.usage.UsageStats.getBuckets(UsageStats.java:182)
at com.linkedin.metadata.resources.usage.UsageStats.lambda$query$1(UsageStats.java:348)
at com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:30)
... 89 common frames omitted
Caused by: org.elasticsearch.ElasticsearchStatusException: Elasticsearch exception [type=search_phase_execution_exception, reason=all shards failed]
at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:187)
at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:1911)
at org.elasticsearch.client.RestHighLevelClient.parseResponseException(RestHighLevelClient.java:1888)
at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1645)
at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1602)
at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1572)
at org.elasticsearch.client.RestHighLevelClient.search(RestHighLevelClient.java:1088)
at com.linkedin.metadata.timeseries.elastic.query.ESAggregatedStatsDAO.getAggregatedStats(ESAggregatedStatsDAO.java:371)
... 93 common frames omitted
Suppressed: org.elasticsearch.client.ResponseException: method [POST], host
bland-orange-13353
05/30/2023, 2:24 AMhundreds-airline-29192
05/30/2023, 2:27 AMhundreds-airline-29192
05/30/2023, 2:53 AMthink you are demo datahub with company and boom! Unable to load description of tables
bitter-evening-61050
05/30/2023, 5:50 AMmicroscopic-room-90690
05/30/2023, 6:06 AM"**/*test*/**"
works, while **/(^|_)(tmp|temp|test)(_|$)/**"
do not work. Anyone can help?lemon-scooter-69730
05/30/2023, 11:06 AMdatahub.ingestion.run.pipeline.PipelineInitError: Failed to configure the source (bigquery): Missing provider configuration.
This is what the recipe looks like
pipeline_name: analytics
source:
type: bigquery
config:
env: DEV
include_table_lineage: true
include_usage_statistics: true
include_tables: true
include_views: true
profiling:
enabled: true
profile_table_level_only: false
stateful_ingestion:
enabled: true
credential:
project_id: <redacted>
private_key: <redacted>
private_key_id: <redacted>
client_email: <redacted>
client_id: <redacted>
sink:
type: datahub-kafka
config:
connection:
bootstrap: 'datahub-prerequisites-kafka:9092'
schema_registry_url: '<http://datahub-prerequisites-cp-schema-registry:8081>'
microscopic-elephant-47912
05/30/2023, 12:08 PMmicroscopic-elephant-47912
05/30/2023, 12:09 PMnarrow-bear-42430
05/30/2023, 2:57 PMastonishing-father-13229
05/30/2023, 8:37 PMgreat-rainbow-70545
05/30/2023, 9:52 PMCommand failed: TSocket read 0 bytes
. Not finding much via google other than possible wrong thrift version but I figured that would be coming up for other people as well. Ring any bells?few-air-34037
05/31/2023, 4:44 AMplatform_instance
-tags for lineages. We haven't used platform_instance
yet but we have added a lot of metadata to objects...
If we now add platform_instance
then we get new hierarchy and new objects.
What would be easiest way to migrate metadata from old objects without platform_instance
to new ones with it?cool-architect-34612
05/31/2023, 5:01 AM