Hi! I would like to add to the catalog metadata fr...
# ingestion
s
Hi! I would like to add to the catalog metadata from API calls. Let's say I have a web service which give me the data I need via REST calls, there is a Swagger page that describe the APIs, and I don't have direct access to the service database. As example, one of the API can give me a list of user's data by calling GET to
<https://test_web_service.com/api/user_data>
. And let's say that the results of this call is a JSON containing name, address, telephone number. So I would like to see in the catalog a dataset like
test_web_service.user_names
, which contains as fields name, address, telephone number. Does someone out there already did something similar?
l
This a great use-case - my understanding is LinkedIn had done something similar (@mammoth-bear-12532 can confirm). We've to add models for this and support ingestion. Would be great if you want to contribute it - someone can help you though the process
m
@stale-jewelry-2440: the easiest way to do this right now is implement a Transformer. (https://datahubproject.io/docs/metadata-ingestion/#transformations) to enrich the metadata as you are ingesting the main user corpus.
s
This is a very interesting feature. Very useful for a question like Where do this endpoint get the data from. And also for impact analysis. Perhaps, if the API is using somekind of ORM which use model based declaration, then one should can do some hook or render the graphs based on code.
s
I'm proceeding to code a solution, will ask you in case of doubt. Thank you!
m
Cool @stale-jewelry-2440, you can reach out to @gray-shoe-75895 as he is working on some reference implementations for transformers to check into the repo soon.
🥰 1