Hi all wave one of my customers has just started to import d DataHub #ingestion

Hi all :wave:, one of my customers has just start...

mammoth-sugar-1353

08/25/2021, 1:20 PM

Hi all 👋, one of my customers has just started to import datasets into their brand new lakehouse. They'd like to prioritise their ingestion roadmap by request/popularity. The current plan is to connect data hub up to swagger hub and ingest "tables" from the API GET methods, rather than ingest the operational database schemas. Long-term we plan to kafka/stream in operational data, rather than clone the DBs each day, depending on the designed API/topic schemas rather than the ORM controlled ones, so this feels like the best place to start. The aim is to give the data community visibility of available data, without having to ingest everything first. Has anyone else done similar? I spotted a few swagger comments, but it wasn't clear that it was quite the same thing.

mammoth-bear-12532

08/25/2021, 3:33 PM

This is a great idea @mammoth-sugar-1353. Happy to brainstorm on this

mammoth-sugar-1353

08/25/2021, 4:25 PM

How do you want to collate ideas?

mammoth-sugar-1353

08/25/2021, 4:25 PM

Right now I'm not sure we need to go as far as adding an API entity, as we will be treating them as feature equivalent to a dataset.

mammoth-sugar-1353

08/25/2021, 4:37 PM

Regards push vs pull.. either would be possible. Some of their APIs are connected to the hub (autogenerated docs via CICD), others are manually updated (😱). I guess it could make sense to have a webhook attached to the swagger hub publish event to push the MCE, but then we'd have to provide an endpoint/app. Obviously swagger has an API to read the API docs, which has entity descriptions.. so a pull version may be simpler.

mammoth-bear-12532

08/25/2021, 6:12 PM

Right.. you could look at the open PR here (https://github.com/linkedin/datahub/pull/2706) and provide a modified PR that does what you are thinking of.

mammoth-bear-12532

08/25/2021, 6:12 PM

This would be a pull based ingestion to begin with

Open in Slack

Previous Next