Hi All I ve been spending some time digging into datahub s b DataHub #getting-started

Hi All. I've been spending some time digging int...

billions-scientist-31934

03/06/2021, 1:26 PM

Hi All. I've been spending some time digging into datahub's backend and I had a quick question I noticed that the MAE's have an internal java representation that can be serialized into Avro, but no part of them seem to get put into any formal query intermediate representation (calcite for example). I thought that pegasus was this, but it looks like pegasus is just an object format to help decorate the rest layer. Does this meant that datahub is mean to be strictly only a federated metadata discovery tool, unlike a tool like Dremio which meant to be more like a federated Query or Execution engine? If so (apologies in advance if I overlooked something), is the long term plan to collide with the coral / dali community to start to get the execution side? Since coral only supports hive view definitions what is the interim plan to get things like pushdown optimization into queries before it supports more of the backends that datahub currently supports? Is datahub meant to avoid approaching query execution altogether only focus on metadata query?

mammoth-bear-12532

03/06/2021, 3:35 PM

Great question @billions-scientist-31934! We are definitely first focusing on metadata storage and query. Querying data does come up a lot in discussions with folks though. I’d say long term plan is to explore interop with query execution tech like coral or maybe even dremio, depending on what the community wants.

billions-scientist-31934

03/06/2021, 10:48 PM

Thanks for the quick reply! From my exploring the code I figured this was the case but I wanted to make sure I hadn't missed something.

Open in Slack

Previous Next