numerous-account-62719
02/16/2023, 4:35 AMimportant-afternoon-19755
02/16/2023, 5:54 AMbest-napkin-60434
02/16/2023, 7:04 AM%6|1676528381.357|FAIL|rdkafka#consumer-1| [thrd:{server}.]: {server}/bootstrap: Disconnected while requesting ApiVersion: might be caused by incorrect security.protocol configuration (connecting to a SSL listener?) or broker version is < 0.10 (see api.version.request) (after 0ms in state APIVERSION_QUERY, 3 identical error(s) suppressed)
late-bear-87552
02/16/2023, 7:09 AMCaused by:
org.elasticsearch.client.ResponseException: method [HEAD], host [http://******:9200], URI [/graph_service_v1?ignore_throttled=false&ignore_unavailable=false&expand_wildcards=open%2Cclosed&allow_no_indices=false], status line [HTTP/1.1 503 Service Unavailable]|Warnings: [Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See <https://www.elastic.co/guide/en/elasticsearch/reference/7.16/security-minimal-setup.html> to enable security., [ignore_throttled] parameter is deprecated because frozen indices have been deprecated. Consider cold or frozen tiers in place of frozen indices.]
at org.elasticsearch.client.RestClient.convertResponse(RestClient.java:302)
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:272)
fierce-baker-1392
02/16/2023, 7:56 AMError: UPGRADE FAILED: YAML parse error on datahub/charts/datahub-ingestion-cron/templates/cron.yaml: error converting YAML to JSON: yaml: line 36: found unexpected end of stream
helm.go:84: [debug] error converting YAML to JSON: yaml: line 36: found unexpected end of stream
YAML parse error on datahub/charts/datahub-ingestion-cron/templates/cron.yaml
image:
repository: linkedin/datahub-ingestion
tag:
pullPolicy: IfNotPresent
imagePullSecrets: []
crons:
glossary:
schedule: "0 1 * * *"
recipe:
configmapName: recipe-conf
fileName: business_glossary.recipe.yaml
command: ["/bin/sh", "-c", "pip install 'acryl-datahub[datahub-business-glossary]'; datahub ingest -c business_glossary.recipe.yaml"]
extraVolumes:
- name: recipe-conf-volume
configMap:
name: recipe-conf
extraVolumeMounts:
- name: recipe-conf-volume
mountPath: /etc/recipe/data/business_glossary_dimension.yaml
subPath: business_glossary_dimension.yaml
readOnly: true
global:
datahub:
version: head
enough-bear-93481
02/16/2023, 11:29 AMbillowy-flag-4217
02/16/2023, 11:38 AMmany-solstice-66904
02/16/2023, 3:54 PMavro_schema_to_mce_fields
exists in python but I see no corresponding functionality in the java library.
Thanks in advance!purple-printer-15193
02/16/2023, 4:20 PMcuddly-kite-88848
02/16/2023, 5:29 PMdazzling-microphone-98929
02/16/2023, 6:48 PMlively-dusk-19162
02/16/2023, 7:52 PMlively-dusk-19162
02/16/2023, 9:08 PMbest-wire-59738
02/17/2023, 5:58 AMnumerous-computer-7054
02/17/2023, 8:06 AMsource:
type: mssql
config:
database: AdventureWorksLT2019
username: datahub_test
password: '${mssql_password}'
host_port: '172.17.189.15:1433'
I've tried using the host's IP address, localhost, the SQL server name, and nothing is working.
I keep getting this error when using an IP address:
PipelineInitError: Failed to configure the source (mssql): (pytds.tds_base.LoginError) ("Cannot connect to server '172.17.189.15': timed out", TimeoutError('timed out'))
(Background on this error at: <https://sqlalche.me/e/14/e3q8>)
or this if using the server name:
PipelineInitError: Failed to configure the source (mssql): (pytds.tds_base.LoginError) ("Cannot connect to server 'DESKTOP-9LGDO7K': [Errno 22] Invalid argument", OSError(22, 'Invalid argument'))
(Background on this error at: <https://sqlalche.me/e/14/e3q8>)
The SQL server is well configured, port 1433 is open, I can successfully connect to SQL server from other than Datahub, such as from Python.
What am I missing/ doing wrong?
Thanks!creamy-van-28626
02/17/2023, 10:17 AMred-waitress-53338
02/17/2023, 12:35 PMdazzling-microphone-98929
02/17/2023, 12:51 PMblue-crowd-84759
02/17/2023, 4:37 PMred-waitress-53338
02/17/2023, 5:13 PMblue-agency-87812
02/17/2023, 6:38 PMadorable-river-99503
02/17/2023, 7:19 PMgifted-bear-4760
02/19/2023, 10:43 AMacceptable-rain-30599
02/20/2023, 1:08 AMacceptable-rain-30599
02/20/2023, 1:10 AMnumerous-account-62719
02/20/2023, 5:27 AM``` 'OperationalError: (pyhive.exc.OperationalError) TOpenSessionResp(status=TStatus(statusCode=3, '
"infoMessages=['*org.apache.hive.service.cli.HiveSQLException:Failed to open new session: org.apache.hadoop.fs.s3a.AWSS3IOException: "
'getFileStatus on s3a://hivemr3-6054f65b-976f-4d25-8a0e-ea0a33898569/workdirtest: com.amazonaws.services.s3.model.AmazonS3Exception: '
'Gateway Time-out (Service: Amazon S3; Status Code: 504; Error Code: 504 Gateway Time-out; Request ID: null; S3 Extended Request ID: '
'null; Proxy: null), S3 Extended Request ID: null:504 Gateway Time-out: Gateway Time-out (Service: Amazon S3; Status Code: 504; Error '
"Code: 504 Gateway Time-out; Request ID: null; S3 Extended Request ID: null; Proxy: null)1413', "
"'org.apache.hive.service.cli.session.SessionManagercreateSessionSessionManager.java:434', "
"'org.apache.hive.service.cli.session.SessionManageropenSessionSessionManager.java:373', "
"'org.apache.hive.service.cli.CLIServiceopenSessionCLIService.java:187', "
"'org.apache.hive.service.cli.thrift.ThriftCLIServicegetSessionHandleThriftCLIService.java:480', "
"'org.apache.hive.service.cli.thrift.ThriftCLIServiceOpenSessionThriftCLIService.java:322', "
"'org.apache.hive.service.rpc.thrift.TCLIService$Processor$OpenSessiongetResultTCLIService.java:1497', "
"'org.apache.hive.service.rpc.thrift.TCLIService$Processor$OpenSessiongetResultTCLIService.java:1482', "
"'org.apache.thrift.ProcessFunctionprocessProcessFunction.java:39', 'org.apache.thrift.TBaseProcessorprocessTBaseProcessor.java:39', "
"'org.apache.hive.service.auth.TSetIpAddressProcessorprocessTSetIpAddressProcessor.java:56', "
"'org.apache.thrift.server.TThreadPoolServer$WorkerProcessrunTThreadPoolServer.java:286', "
"'java.util.concurrent.ThreadPoolExecutorrunWorkerThreadPoolExecutor.java:1149', "
"'java.util.concurrent.ThreadPoolExecutor$WorkerrunThreadPoolExecutor.java:624', 'java.lang.ThreadrunThread.java:748', "
"'*java.lang.RuntimeExceptionorg.apache.hadoop.fs.s3a.AWSS3IOException getFileStatus on "
's3a://hivemr3-6054f65b-976f-4d25-8a0e-ea0a33898569/workdirtest: com.amazonaws.services.s3.model.AmazonS3Exception: Gateway Time-out '
'(Service: Amazon S3; Status Code: 504; Error Code: 504 Gateway Time-out; Request ID: null; S3 Extended Request ID: null; Proxy: null), '
'S3 Extended Request ID: null:504 Gateway Time-out: Gateway Time-out (Service: Amazon S3; Status Code: 504; Error Code: 504 Gateway '
"Time-out; Request ID: null; S3 Extended Request ID: null; Proxy: null)173', "
"'org.apache.hadoop.hive.ql.session.SessionStatestartSessionState.java:652', "
"'org.apache.hadoop.hive.ql.session.SessionStatestartSessionState.java:593', "
"'org.apache.hive.service.cli.session.HiveSessionImplopenHiveSessionImpl.java:171', "
"'org.apache.hive.service.cli.session.SessionManagercreateSessionSessionManager.java:425', "
"'*org.apache.hadoop.fs.s3a.AWSS3IOException:getFileStatus on s3a://hivemr3-6054f65b-976f-4d25-8a0e-ea0a33898569/workdirtest: "
'com.amazonaws.services.s3.model.AmazonS3Exception: Gateway Time-out (Service: Amazon S3; Status Code: 504; Error Code: 504 Gateway '
'Time-out; Request ID: null; S3 Extended Request ID: null; Proxy: null), S3 Extended Request ID: null:504 Gateway Time-out: Gateway '
'Time-out (Service: Amazon S3; Status Code: 504; Error Code: 504 Gateway Time-out; Request ID: null; S3 Extended Request ID: null; Proxy: '
"null)2710', 'org.apache.hadoop.fs.s3a.S3AUtilstranslateExceptionS3AUtils.java:265', "
"'org.apache.hadoop.fs.s3a.S3AUtilstranslateExceptionS3AUtils.java:145', "
"'org.apache.hadoop.fs.s3a.S3AFileSystems3GetFileStatusS3AFileSystem.java:2248', "
"'org.apache.hadoop.fs.s3a.S3AFileSysteminnerGetFileStatusS3AFileSystem.java:2149', "
"'org.apache.hadoop.fs.s3a.S3AFileSystemgetFileStatusS3AFileSystem.java:2088', "
"'org.apache.hadoop.fs.FileSystemexistsFileSystem.java:1683', 'org.apache.hadoop.fs.s3a.S3AFileSystemexistsS3AFileSystem.java:2976', "
"'org.apache.hadoop.hive.ql.exec.UtilitiesensurePathIsWritableUtilities.java:4484', "
"'org.apache.hadoop.hive.ql.session.SessionStatecreateRootHDFSDirSessionState.java:731', "
"'org.apache.hadoop.hive.ql.session.SessionStatecreateSessionDirsSessionState.java:672', "
"'org.apache.hadoop.hive.ql.session.SessionStatestartSessionState.java:628', "
"'*com.amazonaws.services.s3.model.AmazonS3Exception:Gateway Time-out (Service: Amazon S3; Status Code: 504; Error Code: 504 Gateway "
"Time-out; Request ID: null; S3 Extended Request ID: null; Proxy: null)4419', "
"'com.amazonaws.http.AmazonHttpClient$RequestExecutorhandleErrorResponseAmazonHttpClient.java:1811', "
"'com.amazonaws.http.AmazonHttpClient$RequestExecutorhandleServiceErrorResponseAmazonHttpClient.java:1395', "
"'com.amazonaws.http.AmazonHttpClient$RequestExecutorexecuteOneRequestAmazonHttpClient.java:1371', "
"'com.amazonaws.http.AmazonHttpClient$RequestExecutorexecuteHelperAmazonHttpClient.java:1145', "
"'com.amazonaws.http.AmazonHttpClient$RequestExecutordoExecuteAmazonHttpClient.java:802', "
"'com.amazonaws.http.AmazonHttpClient$RequestExecutorexecuteWithTimerAmazonHttpClient.java:770', "
"'com.amazonaws.http.AmazonHttpClient$RequestExecutorexecuteAmazonHttpClient.java:744', "
"'com.amazonaws.http.AmazonHttpClient$RequestExecutor:access$500AmazonHttpClient.java704', "
"'com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImplexecuteAmazonHttpClient.java:686', "
"'com.amazonaws.http.AmazonHttpClientexecuteAmazonHttpClient.java:550', "
"'com.amazonaws.http.AmazonHttpClientexecuteAmazonHttpClient.java:530', "
"'com.amazonaws.services.s3.AmazonS3ClientinvokeAmazonS3Client.java:5062', "
"'com.amazonaws.services.s3.AmazonS3ClientinvokeAmazonS3Client.java:5008', "
"'com.amazonaws.services.s3.AmazonS3ClientinvokeAmazonS3Client.java:5002', "
"'com.amazonaws.services.s3.AmazonS3ClientlistObjectsV2AmazonS3Client.java:941', "
"'org.apache.hadoop.fs.s3a.S3AFileSystem:lambda$listObjects$5S3AFileSystem.java1262', "
"'org.apache.hadoop.fs.s3a.InvokerretryUntranslatedInvoker.java:322', "
"'org.apache.hadoop.fs.s3a.InvokerretryUntranslatedInvoker.java:285', "
"'org.apache.hadoop.fs.s3a.S3AFileSystemlistObjectsS3AFileSystem.java:1255', "
"'org.apache.hadoop.fs.s3a.S3AFileSystems3GetFileStatusS3AFileSystem.java:2223'], sqlState=None, errorCode=0, errorMessage='Failed to "
'open new session: org.apache.hadoop.fs.s3a.AWSS3IOException: getFileStatus on '
's3a://hivemr3-6054f65b-976f-4d25-8a0e-ea0a33898569/workdirtest: com.amazonaws.services.s3.model.AmazonS3Exception: Gateway Time-out '
'(Service: Amazon S3; Status Code: 504; Error Code: 504 Gateway Time-out; Request ID: null; S3 Extended Request ID: null; Proxy: null), '
'S3 Extended Request ID: null:504 Gateway Time-out: Gateway Time-out (Service: Amazon S3; Status Code: 504; Error Code: 504 Gateway '
"Time-out; Request ID: null; S3 Extended Request ID: null; Proxy: null)'), serverProtocolVersion=9, sessionHandle=None, "
'configuration=None)\n'
'(Background on this error at: http://sqlalche.me/e/13/e3q8)\n'
'[2023-02-17 135342,619] INFO {datahub.entrypoints:187} - DataHub CLI version: 0.8.41 at '
'/tmp/datahub/ingest/venv-8fa49a4b-8775-4d62-874f-cbe22e5a07c8/lib/python3.9/site-packages/datahub/__init__.py\n'
'[2023-02-17 135342,619] INFO {datahub.entrypoints:190} - Python version: 3.9.9 (main, Dec 21 2021, 100334) \n'
'[GCC 10.2.1 20210110] at /tmp/datahub/ingest/venv-8fa49a4b-8775-4d62-874f-cbe22e5a07c8/bin/python3 on '
'Linux-4.18.0-305.49.1.el8_4.x86_64-x86_64-with-glibc2.31\n'
"[2023-02-17 135342,619] INFO {datahub.entrypoints:193} - GMS config {'models': {}, 'versions': {'linkedin/datahub': {'version': "
"'v0.8.41', 'commit': '6e07ec59242abf53e237183319a01ef3b1f708a9'}}, 'managedIngestion': {'defaultCliVersion': '0.8.41', 'enabled': True}, "
"'statefulIngestionCapable': True, 'supportsImpactAnalysis': False, 'telemetry': {'enabledCli': True, 'enabledIngestion': False}, "
"'datasetUrnNameCasing': False, 'retention': 'true', 'datahub': {'serverType': 'prod'}, 'noCode': 'true'}\n",
"2023-02-17 135343.873169 [exec_id=8fa49a4b-8775-4d62-874f-cbe22e5a07c8] INFO: Failed to execute 'datahub ingest'",
'2023-02-17 135343.877338 [exec_id=8fa49a4b-8775-4d62-874f-cbe22e5a07c8] INFO: Caught exception EXECUTING '
'task_id=8fa49a4b-8775-4d62-874f-cbe22e5a07c8, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n'
' File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/default_executor.py", line 121, in execute_task\n'
' self.event_loop.run_until_complete(task_future)\n'
' File "/usr/local/lib/python3.9/site-packages/nest_asyncio.py", line 89, in run_until_complete\n'
' return f.result()\n'
' File "/usr/local/lib/python3.9/asyncio/futures.py", line 201, in result\n'
' raise self._exception\n'
' File "/usr/local/lib/python3.9/asyncio/tasks.py", line 256, in __step\n'
' result = coro.send(None)\n'
' File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 115, in execute\n'
' raise TaskError("Failed to execute \'datahub ingest\'")\n'
"acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"]}```
better-table-69560
02/20/2023, 12:31 PMquaint-appointment-83049
02/20/2023, 12:36 PMdef __init__(self, **data: Any):
super().__init__(**data)
if self.credential:
self._credentials_path = self.credential.create_credential_temp_file()
logger.debug(
f"Creating temporary credential file at {self._credentials_path}"
)
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = self._credentials_path
Please avoid replacing the GOOGLE_APPLICATION_CREDENTIALS parameter, as this gets used internally and used for other services propagation. It is not a good approach to overwrite it.
Can someone help us to resolve this issue ASAP? Thank you in advance.ambitious-shoe-92590
02/20/2023, 8:08 PMStruct
type with "pull" based ingestion. I have a nested field called data
which contains a number of child key:values. When I ingest this data with datahub, the outputted dataset will have the data field, but it is un-expandable.
I've read into Field Paths and the differences between v1 and v2 paths, but I am a bit confused on how to actually get to the point of being able to "expand" the nested struct. Seems like emitters are used in some examples but from my understanding that is if you want to manually add fields to the schema?
Any help would be appreciated, the data is coming from a S3 source if that makes a difference.lively-dusk-19162
02/20/2023, 8:15 PM