does ETL scripts are enough for data ingestion ? i...
# ingestion
a
does ETL scripts are enough for data ingestion ? is that how LinkedIn internally uses it ? or is there any heavy weight framework like Gobblin that can probably also proactively scans the dataset and populates/push to datahub ?
b
Internally we mainly rely on various systems (incl Gobblin) to emit messages to us directly. There are also crawler scripts here and there for systems that we can't be instrumented, though they're written in Java and runs on Azkaban. The ETL scripts on GitHub mainly served as a "demo" and are not meant to be used in production verbatim.
a
Thanks Mars for the info. do you guys plan to have "discovery" as part of datahub feature. or you would be keeping this as strictly catalog/lineage tool ?
b
Discovery is certainly something that we intend to grow into over time. Internally we have already started showing "insights" for datasets which will lead to features like "most popular dataset" or "datasets you may be also interested in" in the future.