Hello! i am trying to ingest Elasticsearch metadat...
# ingestion
w
Hello! i am trying to ingest Elasticsearch metadata. I am using also some transformers to put it in the right browse path, and add a tag. This worked really great with mongodb and postgres, but this doesnt work at all for Elasticsearch, as if it is ignoring the transformers at all. This is driving me crazy. I tried both programmatic pipeline and yaml recipe… The code is literally copy paste from mongo and postgres is there a chance that for ES the transformers need to be written differently? or that i am missing something? thank you!
Copy code
pipeline = Pipeline.create(
        # This configuration is analogous to a recipe configuration.
        {
            "source": {
                "type": "elasticsearch",
                "config": {
                    "env": ENV,
                    "host": es_connection_host_port,
                    "username": es_connection_login,
                    "password": es_connection_password,
                    "index_pattern": {
                        "deny": [es_deny_index_pattern]
                    }
                },
            },
            "sink": {
                "type": "datahub-rest",
                "config": {"server": datahub_server},
            },
            "transformers": [
                {
                    "type": "set_dataset_browse_path",
                    "config": {
                        "path_templates": [f"/ENV/PLATFORM/EsComments/DATASET_PARTS"]
                    }
                },
                {
                    "type": "simple_add_dataset_tags",
                    "config": {
                        "tag_urns": [f"urn:li:tag:EsComments"]
                    }
                }
            ]

        })

    pipeline.run()
l
@helpful-optician-78938 ^
e
Transformers should be orthogonal to the source. @helpful-optician-78938 any ideas why certain transformers wouldn’t apply for the es source?
h
Hi @witty-painting-90923, I'll look into this and get back to you.
👍 1
w
Hi @helpful-optician-78938, does it work on your side?
h
Hi @witty-painting-90923, haven't had a chance to get this yet. I'll try my best to get to this asap.
👍 1
n
Hi @helpful-optician-78938, i was able to debug the issue on our side, apparently transformers were not applied to elastic search ingestion because the connector was emitting different Mcp objects on the same dataset, refactoring the connector to emit a single Mce with all aspects appended solved the issue on our side. I'm still not sure why this happen, maybe it make more sense for you. Can be because of the order of emission? anyway i opened a PR with the fix that worked for us, if you want to have a look at it 🙂
m
Hey @witty-painting-90923 @numerous-application-54063: we are actually working to ensure Transformers can process MCP-s
PR should land today
so we shouldn't have to change the Elastic source
@witty-painting-90923: Somewhat unrelated but I just looked at what you are trying to do with the browse path transformer, now that Elastic supports
platform_instance
, you can just assign that in your recipe to get your ingestion to be "instance specific".
e.g. you can achieve the same thing as what you are doing by setting
platform_instance
to
EsComments
in your recipe
n
@mammoth-bear-12532 Hey, ok thanks for the feedback, great! will look after this new transformers feature. i close my pr then 🙂
m
@witty-painting-90923 @numerous-application-54063: this is now supported via the latest cli release (0.8.28.0). Please try it out and let us know!