breezy-portugal-43538
04/28/2022, 1:30 PMmessage:"No root resource defined for path '/datasets'","status":404}
appears. Is it possible to update properties to datasets ingested from S3, if yes then how?
my curl command:
curl --location --request POST '<http://localhost:8080/datasets?action=ingest>' \
--header 'X-RestLi-Protocol-Version: 2.0.0' \
--header 'Content-Type: application/json' \
--data-raw '{
"snapshot": {
"aspects": [
{
"com.linkedin.dataset.DatasetProperties": {
"customProperties": {
"SuperProperty": "over 9000"
}
}
}
],
"urn": "urn:li:dataset:(urn:li:dataset:(urn:li:dataPlatform:s3,origin_file_src%2Fdata%2Ftest%2Fother_timeZ%2Ftime%2other_folder%2Fsome_folder%2Fexample.csv,DEV)
}
}'
Issue might be because my urn is incorrect - I had copied it from the webpage url. I tried to find the correct url at http://localhost:9200/datasetindex_v2/_search?=pretty but for some reason dataplatform:s3 is not visible there, do you know how can I get my s3 urn name to be sure that I had it setup correctly?
Thanks in advance for the help!
*EDIT: changing in the urn name to use . instead of %2F did not helpdazzling-judge-80093
04/28/2022, 1:41 PMhundreds-photographer-13496
04/28/2022, 1:47 PM/aspects?action=ingestProposal
endpoint with appropriate payload - https://datahubproject.io/docs/metadata-service/#ingesting-aspects
Alternatively, to save the trouble of creating serialized json string, you can use python emitter to create DatasetProperties aspect and emit the same.
https://datahubproject.io/docs/metadata-ingestion/as-a-library/#example-usagebreezy-portugal-43538
04/28/2022, 1:56 PMcurl --location --request POST '<http://localhost:8080/aspects?action=ingestProposal>' \
--header 'X-RestLi-Protocol-Version: 2.0.0' \
--header 'Content-Type: application/json' \
--data-raw '{
"proposal" : {
"entityType": "dataset",
"entityUrn" : "urn:li:dataset:(urn:li:dataset:(urn:li:dataPlatform:s3,origin_file_src%2Fdata%2Ftest%2Fother_timeZ%2Ftime%2other_folder%2Fsome_folder%2Fexample.csv,DEV)",
"changeType" : "UPSERT",
"aspectName" : "DatasetProperties",
"aspect" : {
"customProperties" : "{{
"SuperProperty": "over 9000"
}",
"contentType": "application/json"
}
}
}'
hundreds-photographer-13496
04/29/2022, 8:58 AMcurl --location --request POST '<http://localhost:8080/aspects?action=ingestProposal>' \
--header 'X-RestLi-Protocol-Version: 2.0.0' \
--header 'Content-Type: application/json' \
--data-raw '{
"proposal" : {
"entityType": "dataset",
"entityUrn" : "<dataset urn>",
"changeType" : "UPSERT",
"aspectName" : "datasetProperties",
"aspect" : {
"value":"{\"customProperties\": {\"SuperProperty\": \"over 9000\"}}",
"contentType": "application/json"
}
}
}'
hundreds-photographer-13496
04/29/2022, 9:01 AMbreezy-portugal-43538
04/29/2022, 10:26 AM[2022-04-29 12:51:54,146] INFO {datahub.ingestion.run.pipeline:84} - sink wrote workunit <s3://testing/folder1/test/iwinskiTest1/iwinskiTest2/iwinskiTest3/results2575/somestats.csv>
[2022-04-29 12:51:54,169] INFO {datahub.ingestion.run.pipeline:84} - sink wrote workunit container-urn:li:container:3ca95115310858747c3e3993be56c861-to-urn:li:dataset:(urn:li:dataPlatform:s3,testing/folder1/test/iwinskiTest1/iwinskiTest2/iwinskiTest3/results2575/somestats.csv,DEV)
[2022-04-29 12:51:54,170] INFO {datahub.cli.ingest_cli:106} - Finished metadata ingestion
After trying to run command:
$ datahub get --urn "urn:li:dataset:(urn:li:dataPlatform:s3,testing/folder1/test/iwinskiTest1/iwinskiTest2/iwinskiTest3/results2575/somestats.csv,DEV)"
I receive following error:
..................................................
entity_urn = 'urn:li:dataset:(urn:li:dataPlatform:s3,testing/folder1/test/iwinskiTest1/iwinskiTest2/iwinskiTest3/results2575/somestats.cs
v,DEV)'
aspects = ()
List = typing.List
typed = False
cached_session_host = None
Optional = typing.Optional
Tuple = typing.Tuple
Session = <class 'requests.sessions.Session'>
Dict = typing.Dict
Union = typing.Union
DictWrapper = <class 'avrogen.dict_wrapper.DictWrapper'>
entity_response = {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException',
'stackTrace': 'com.linkedin.restli.server.RestLiServiceException [HTTP Status:404]\n\tat com.linkedin.restli.server.Res
tLiServiceException.fromThrowable(RestLiServiceException.java:315)\n\tat com.linkedin.restli.server.BaseRestLiServer.bui
ldPreRoutingError(BaseRestLiServer.java:202)\n\tat com.linkedin.restli.server.RestRestLiServer.buildPreRoutingRestExcept
ion(RestRestLiServer.java:254)\n\tat com.linkedin.restli.server.RestRestLiServer.handleResourceRequest(RestRestLiServer.
java:228)\n\tat com.linkedin.restli.server.RestRestLiServer.doHandleRequest(RestRestLiServer.java:215)\n\tat com.linkedi
n.restli.server.RestRestLiServer.handleRequest(RestRestLiServer.java:171)\n\tat com.linkedin.restli.server.RestLiServer.
handleRequest(RestLiServer.java:130)\n\tat com.linkedin.restli.server.DelegatingTransportDispatcher.handleRestRequest(De
legatingTransportDispatcher.java:70)\n\tat com.linkedin.r2.filter.transport.DispatcherRequestFilter.onRestRequest(Dispat
cherRequestFilter.java:70)\n\tat com.linkedin.r2.filter.TimedRestFilter.onRestRequest(TimedRestFilter.java:76)\n\tat com
.linkedin.r2.filter.FilterChainIterator$FilterCh...
non_timeseries_aspects = []
..................................................
---- (full traceback above) ----
File "/sharedvolume/datahub_tbd/datahub/metadata-ingestion/src/datahub/entrypoints.py", line 138, in main
sys.exit(datahub(standalone_mode=False, **kwargs))
File "/sharedvolume/datahub_tbd/datahub/metadata-ingestion/venv/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/sharedvolume/datahub_tbd/datahub/metadata-ingestion/venv/lib/python3.8/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/sharedvolume/datahub_tbd/datahub/metadata-ingestion/venv/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/sharedvolume/datahub_tbd/datahub/metadata-ingestion/venv/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/sharedvolume/datahub_tbd/datahub/metadata-ingestion/venv/lib/python3.8/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/sharedvolume/datahub_tbd/datahub/metadata-ingestion/venv/lib/python3.8/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/sharedvolume/datahub_tbd/datahub/metadata-ingestion/src/datahub/telemetry/telemetry.py", line 304, in wrapper
raise e
File "/sharedvolume/datahub_tbd/datahub/metadata-ingestion/src/datahub/telemetry/telemetry.py", line 256, in wrapper
res = func(*args, **kwargs)
File "/sharedvolume/datahub_tbd/datahub/metadata-ingestion/src/datahub/cli/get_cli.py", line 38, in get
get_aspects_for_entity(entity_urn=urn, aspects=aspect, typed=False),
File "/sharedvolume/datahub_tbd/datahub/metadata-ingestion/src/datahub/cli/cli_utils.py", line 673, in get_aspects_for_entity
aspect_list: Dict[str, dict] = entity_response["aspects"]
For simplicity I had pasted only the last content of the log - if it is required I can paste all output from the get command. I'm not sure, but it looks like urn name is somehow incorrect... could you advice on further steps to resolve the issue?breezy-portugal-43538
04/29/2022, 10:32 AMhundreds-photographer-13496
04/29/2022, 10:49 AMbreezy-portugal-43538
04/29/2022, 11:39 AMhundreds-photographer-13496
04/29/2022, 12:19 PMbreezy-portugal-43538
04/29/2022, 12:34 PM../gradlew :metadata-ingestion:installDev
source venv/bin/activate
here is the output from datahub version:
$ datahub version
/home/mluser/.local/lib/python3.8/site-packages/cryptography/hazmat/backends/openssl/x509.py:14: CryptographyDeprecationWarning: This version of cryptography contains a temporary pyOpenSSL fallback path. Upgrade pyOpenSSL now.
warnings.warn(
DataHub CLI version: 0.8.31.6
Python version: 3.8.10 (default, Nov 26 2021, 20:14:08)
[GCC 9.3.0]
hundreds-photographer-13496
04/29/2022, 12:40 PMdocker logs datahub-gms
to view datahub-gms container logs.
(More details here - https://datahubproject.io/docs/how/extract-container-logs/)breezy-portugal-43538
04/29/2022, 1:31 PMhundreds-photographer-13496
05/02/2022, 6:31 AM[2022-04-29 14:31:17,749] ERROR
when the error occured. Alternatively, just re-execute the command and share the most recent ogs.hundreds-photographer-13496
05/02/2022, 11:08 AMdatahub get --urn
? Please include cli command, cli response and relevant datahub-gms log in that thread. The original issue on the thread(adding custom properties) is already resolvedbreezy-portugal-43538
05/02/2022, 11:24 AMhundreds-photographer-13496
05/02/2022, 11:28 AMbreezy-portugal-43538
05/02/2022, 11:39 AM