Hi all 👋, one of my customers has just started to import datasets into their brand new lakehouse. They'd like to prioritise their ingestion roadmap by request/popularity.
The current plan is to connect data hub up to swagger hub and ingest "tables" from the API GET methods, rather than ingest the operational database schemas. Long-term we plan to kafka/stream in operational data, rather than clone the DBs each day, depending on the designed API/topic schemas rather than the ORM controlled ones, so this feels like the best place to start.
The aim is to give the data community visibility of available data, without having to ingest everything first.
Has anyone else done similar? I spotted a few swagger comments, but it wasn't clear that it was quite the same thing.