Hello everyone , I'm a datahub beginner and i wan...
# troubleshoot
q
Hello everyone , I'm a datahub beginner and i want to try to ingest a business glossary with cli. But i've this error message :
apache@apache-VirtualBox:~$ python3.9 -m datahub ingest -c business_glossary.yml
[2022-03-24 15:32:13,017] INFO     {datahub.cli.ingest_cli:75} - DataHub CLI version: 0.8.31.2
[2022-03-24 15:32:13,164] ERROR    {datahub.entrypoints:152} - File "/home/apache/.local/lib/python3.9/site-packages/datahub/cli/ingest_cli.py", line 82, in run
70   def run(
71       ctx: click.Context, config: str, dry_run: bool, preview: bool, strict_warnings: bool
72   ) -> None:
(...)
78       pipeline_config = load_config_file(config_file)
79
80       try:
81           logger.debug(f"Using config: {pipeline_config}")
--> 82           pipeline = Pipeline.create(pipeline_config, dry_run, preview)
83       except ValidationError as e:
File "/home/apache/.local/lib/python3.9/site-packages/datahub/ingestion/run/pipeline.py", line 174, in create
170  @classmethod
171  def create(
172      cls, config_dict: dict, dry_run: bool = False, preview_mode: bool = False
173  ) -> "Pipeline":
--> 174      config = PipelineConfig.parse_obj(config_dict)
175      return cls(config, dry_run=dry_run, preview_mode=preview_mode)
File "pydantic/main.py", line 511, in pydantic.main.BaseModel.parse_obj
File "pydantic/main.py", line 329, in pydantic.main.BaseModel.__init__
File "pydantic/main.py", line 1022, in pydantic.main.validate_model
File "pydantic/fields.py", line 837, in pydantic.fields.ModelField.validate
File "pydantic/fields.py", line 1118, in pydantic.fields.ModelField._apply_validators
File "pydantic/class_validators.py", line 278, in pydantic.class_validators._generic_validator_cls.lambda2
File "/home/apache/.local/lib/python3.9/site-packages/datahub/ingestion/run/pipeline.py", line 56, in run_id_should_be_semantic
52   def run_id_should_be_semantic(
53       cls, v: Optional[str], values: Dict[str, Any], **kwargs: Any
54   ) -> str:
55       if v == "__DEFAULT_RUN_ID":
--> 56           if values["source"] is not None:
57               if values["source"].type is not None:
KeyError: 'source'
[2022-03-24 15:32:13,165] INFO     {datahub.entrypoints:161} - DataHub CLI version: 0.8.31.2 at /home/apache/.local/lib/python3.9/site-packages/datahub/__init__.py
[2022-03-24 15:32:13,165] INFO     {datahub.entrypoints:164} - Python version: 3.9.11 (main, Mar 16 2022, 17:19:28)
[GCC 9.4.0] at /usr/bin/python3.9 on Linux-5.13.0-35-generic-x86_64-with-glibc2.31
[2022-03-24 15:32:13,165] INFO     {datahub.entrypoints:167} - GMS config {}
s
Have you gone through https://datahubproject.io/docs/metadata-ingestion/source_docs/business_glossary? If yes, can you please share your recipe file. It seems like you are not using recipe file correctly
plus1 1
m
@quick-student-61408 I suspect you are passing the business glossary directly to the ingest command... you should be passing the recipe for the business glossary ingest to the ingest command.. a recipe looks like this : https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/examples/recipes/business_glossary_to_datahub.yml.
plus1 1
in the recipe you will be referring to the file that contains the real business glossary yaml
q
Hello @mammoth-bear-12532 and @square-activity-64562 thanks for your answers. I find an issue. It's because i'd just copy/paste the example present in the datahub website and it have an user that does not exist... However i've a new error : i've build a Postgres server and i can't connect it on Datahub. Do you know what to activate on postgres for the connection to be authorized? (its an ubuntu VM )
s
Suggest to check connectivity issues first. Use something postgres CLI or any other sql tool to connect to the postgres through the credentials that you are using to connect. That will ensure connectivity issues are not there. Once you have ensured connectivity is there then you can use DataHub CLI and retry the ingestion.
q
I am on the same machine (it serves me as a test)
I try to define if datahub it's a good data governance solution or not.
s
Are you able to use postgres CLI to connect to the database?
q
yes with differents users and differents passwords
s
Can you use the same username and password which you tested with postgres CLI to connect and check if it works or not?
q
i tried with the default user : postgres
s
Can you share the following • full logs in text format (instead of screenshots) from the ingestion that fails. Please do not remove any parts of the log (mask the secret if any secret is being shown) • the recipe in text format (instead of screenshots) masking the secrets
q
I start my VM to get you the logs
have another urgent concern, I will come back to you later
Hello @square-activity-64562 you can find my log on this txt file
s
This is all running in docker compose?
Instead of
localhost
please use internal docker name of gms. Probable
datahub-gms
will work
q
I've install docker-compose and datahub with docker. But postgres it's simply install on my VM
I think i've install datahub on docker but not will docker compose ( but ive install it ) ...
s
The logs are for ingestion through UI based ingestion, correct? Change the sink from
localhost
to
datahub-gms
q
With datahub-gms
Here it's my recipe : source: type: postgres config: host_port: 'localhost:5432' database: datahub username: postgres password: rootroot include_tables: true include_views: true profiling: enabled: false sink: type: datahub-rest config: server: 'http://datahub-gms:9002/api/gms'
i've already create datahub database
s
q
Hello @square-activity-64562: You can find my new output log here šŸ˜…: Thank you very much for you time
s
The important part is
Copy code
'OperationalError: (psycopg2.OperationalError) connection to server at "localhost" (127.0.0.1), port 5432 failed: Connection refused\n'
           '\tIs the server running on that host and accepting TCP/IP connections?\n'
           'connection to server at "localhost" (::1), port 5432 failed: Cannot assign requested address\n'
           '\tIs the server running on that host and accepting TCP/IP connections?\n'
From within the container running actions it won't be
localhost
for postgres
plus1 1
postgres is running in docker or on the machine itself?
q
On the machine itself ? I'm not really familiar with docker... I need to write my local adress ?
s
Machine IP address should work
q
Thank you @square-activity-64562 It works !
šŸ‘ 1