Hey Team I ve been trying to run ingestion using python scri DataHub #troubleshoot

Hey Team, I’ve been trying to run ingestion using ...

swift-dream-78272

03/24/2023, 1:53 PM

Hey Team, I’ve been trying to run ingestion using python script like this - https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/examples/library/programatic_pipeline.py Does it support when config_dict has env variables instead of explicitly inserted values? Something like this?

Copy code

from datahub.ingestion.run.pipeline import Pipeline

# The pipeline configuration is similar to the recipe YAML files provided to the CLI tool.
pipeline = Pipeline.create(
    {
        "source": {
            "type": "mysql",
            "config": {
                "username": "user",
                "password": "pass",
                "database": "db_name",
                "host_port": "localhost:3306",
            },
        },
        "sink": {
            "type": "datahub-rest",
            "config": {"server": "${DATAHUB_GMS_URL}",},
        },
    }
)

# Run the pipeline and report the results.
pipeline.run()
pipeline.pretty_print_summary()

✅ 1

swift-dream-78272

03/24/2023, 1:55 PM

I’ve been trying to run like this, but always throws an error and I’m wondering if by any chance there’s a way to do it?

better-orange-49102

03/24/2023, 4:00 PM

What's wrong with using os.environ["myvar"] to pull in the variable values in this case?

swift-dream-78272

03/27/2023, 8:00 AM

I think this way is clunky, I don’t want to create configs containing python packages and methods. If anyone would be looking for solution, I recommend to use

*from* datahub*.*configuration*.*config_loader *import* load_config_file

and load config from file path. Then even if it contains env variables defined as $SOME_KEY, it will substitute it with env variables. Topic closed.

3 Views

Open in Slack

Previous Next