great-cpu-72376
05/16/2022, 2:32 PMdelightful-barista-90363
05/16/2022, 8:53 PMlate-zoo-31017
05/16/2022, 9:51 PMbored-dress-52175
05/17/2022, 2:34 AMshy-refrigerator-3266
05/17/2022, 9:10 AMAUTH_POLICIES_ENABLED=true
in datahub-gms
of my docker-compose.quickstart.yml
. However, the UI is still showing the error Token based authentication is currently disabled. Contact your DataHub administrator to enable this feature.
in the settings page. Am I doing something wrong here?abundant-receptionist-6114
05/17/2022, 1:33 PMalert-football-80212
05/17/2022, 1:56 PMcool-actor-73767
05/17/2022, 2:30 PMgreat-cpu-72376
05/18/2022, 1:41 PMbest-wolf-3369
05/18/2022, 2:34 PMimport requests
import json
url = "<http://host>:port/entities?action=ingest"
payload = json.dumps({
"entity": {
"value": {
"com.linkedin.metadata.snapshot.GlossaryTermSnapshot": {
"urn": "urn:li:glossaryTerm:camelCaseObject",
"aspects": [
{
"com.linkedin.glossary.GlossaryTermInfo": {
"definition": "Object definition",
"parentNode": "urn:li:glossaryTerm:camelCaseObjectParent",
"sourceRef": "DataHub",
"sourceUrl": "<https://github.com/linkedin/datahub/>",
"termSource": "INTERNAL"
}
}
]
}
}
}
})
headers = {
'Content-Type': 'application/json'
}
response = requests.request("POST", url, headers=headers, data=payload)
great-cpu-72376
05/18/2022, 3:05 PMsource:
type: file
config:
filename: '/home/afmul/PRODUCTION/datasetAnalysis/positioning/RAW/positioningRaw.csv'
sink:
type: datahub-rest
config:
server: '<http://localhost:9090>'
I try to execute the ingestion using command: datahub ingest run -c ingestion_file_receipe.yml
But I receive a big error:
---- (full traceback above) ----
File "/home/afmul/linkedin-datahub/lib/python3.8/site-packages/datahub/entrypoints.py", line 149, in main
sys.exit(datahub(standalone_mode=False, **kwargs))
File "/home/afmul/linkedin-datahub/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/home/afmul/linkedin-datahub/lib/python3.8/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/home/afmul/linkedin-datahub/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/afmul/linkedin-datahub/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/afmul/linkedin-datahub/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/afmul/linkedin-datahub/lib/python3.8/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/home/afmul/linkedin-datahub/lib/python3.8/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/home/afmul/linkedin-datahub/lib/python3.8/site-packages/datahub/telemetry/telemetry.py", line 317, in wrapper
raise e
File "/home/afmul/linkedin-datahub/lib/python3.8/site-packages/datahub/telemetry/telemetry.py", line 269, in wrapper
res = func(*args, **kwargs)
File "/home/afmul/linkedin-datahub/lib/python3.8/site-packages/datahub/utilities/memory_leak_detector.py", line 102, in wrapper
res = func(*args, **kwargs)
File "/home/afmul/linkedin-datahub/lib/python3.8/site-packages/datahub/cli/ingest_cli.py", line 128, in run
raise e
File "/home/afmul/linkedin-datahub/lib/python3.8/site-packages/datahub/cli/ingest_cli.py", line 114, in run
pipeline.run()
File "/home/afmul/linkedin-datahub/lib/python3.8/site-packages/datahub/ingestion/run/pipeline.py", line 214, in run
for wu in itertools.islice(
File "/home/afmul/linkedin-datahub/lib/python3.8/site-packages/datahub/ingestion/source/file.py", line 77, in get_workunits
for i, obj in enumerate(iterate_generic_file(self.config.filename)):
File "/home/afmul/linkedin-datahub/lib/python3.8/site-packages/datahub/ingestion/source/file.py", line 42, in iterate_generic_file
for i, obj in enumerate(_iterate_file(path)):
File "/home/afmul/linkedin-datahub/lib/python3.8/site-packages/datahub/ingestion/source/file.py", line 25, in _iterate_file
obj_list = json.load(f)
File "/usr/lib/python3.8/json/__init__.py", line 293, in load
return loads(fp.read(),
File "/usr/lib/python3.8/json/__init__.py", line 357, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.8/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.8/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
What does it mean?calm-jackal-26275
05/18/2022, 6:08 PMdatahub ingest -c business_glossary.yml
But get some syntax errors
[2022-05-18 14:02:49,315] INFO {datahub.cli.ingest_cli:96} - DataHub CLI version: 0.8.34.2
5 validation errors for PipelineConfig
source
value is not a valid dict (type=type_error.dict)
nodes
extra fields not permitted (type=value_error.extra)
owners
extra fields not permitted (type=value_error.extra)
url
extra fields not permitted (type=value_error.extra)
version
extra fields not permitted (type=value_error.extra)
cool-actor-73767
05/18/2022, 11:41 PMTask metadata ingestiondocGen FAILEDCaching disabled for task 'metadata ingestiondocGen' because: Build cache is disabled Task 'metadata ingestiondocGen' is not up-to-date because: Task has not declared any outputs despite executing actions. Starting process 'command 'bash''. Working directory: /home/ubuntu/datahub/metadata-ingestion Command: bash -c source venv/bin/activate && ./scripts/docgen.sh Successfully started process 'command 'bash'' rm: cannot remove '../docs/generated/ingestion': No such file or directory Traceback (most recent call last): File "scripts/docgen.py", line 7, in <module> from importlib.metadata import metadata, requires ModuleNotFoundError: No module named 'importlib.metadata'
curved-truck-53235
05/19/2022, 12:19 PMgentle-camera-33498
05/19/2022, 12:47 PMcool-actor-73767
05/19/2022, 6:44 PMfuture-student-30987
05/19/2022, 8:36 PMbright-beard-86474
05/19/2022, 8:40 PMgentle-umbrella-84426
05/20/2022, 2:25 PMbreezy-noon-83306
05/20/2022, 7:22 PMsparse-raincoat-42898
05/20/2022, 10:02 PMclever-machine-43182
05/21/2022, 7:13 AM.json.gz
file on S3. Can DataHub infer schema on this case? It seems like no options for compressed type.clever-machine-43182
05/22/2022, 10:43 AMcool-carpet-74662
05/23/2022, 10:16 AMsteep-sandwich-72508
05/23/2022, 2:04 PMsteep-sandwich-72508
05/23/2022, 2:40 PMbreezy-noon-83306
05/23/2022, 5:13 PMfull-raincoat-68234
05/23/2022, 8:33 PMhelpful-librarian-40144
05/24/2022, 2:03 AMbitter-dusk-52400
05/24/2022, 5:19 AM