This message was deleted Apache Druid #dev

Join Slack

This message was deleted.

# dev

Slackbot

10/24/2023, 8:17 AM

This message was deleted.

Gian Merlino

10/24/2023, 4:26 PM

I assume you're talking about the new async queries backed by MSQ

Gian Merlino

10/24/2023, 4:27 PM

For those Joins are possible, both broadcast and sort-merge; for details see https://druid.apache.org/docs/latest/multi-stage-query/reference/#joins

Gian Merlino

10/24/2023, 4:35 PM

About retention duration, I'm not sure, possibly @Karan Kumar knows

Karan Kumar

10/25/2023, 1:15 AM

We can probably do a better job documenting result retention in one go. Since async q's run with the MSQ engine which run on tasks, the result retention is controlled by task log retention policy https://druid.apache.org/docs/latest/configuration/#log-retention-policy You will need to set another property on the overlord assuming you are using durableStorage(recomended) for query results How to configure durable storage : https://druid.apache.org/docs/latest/multi-stage-query/reference#durable-storage-configurations https://druid.apache.org/docs/latest/operations/durable-storage#durable-storage-clean-up

Mohit Jain

10/25/2023, 3:46 AM

Thanks guys for the reply 👍 Another suggestion, I was going thru the documentation for join. Just by the documentation it is little hard for me to visualize how the join is actually working. By reading up joins I am imagining it to work like spark join, but the details such as follows are missing • how the data is shuffled, • what is the equivalent of spark executor in this case • how can join performance be profiled Not sure if i am thinking in the right direction.

Gian Merlino

10/25/2023, 3:14 PM

@Mohit Jain basically there are two query engines— native and MSQ (multi-stage query). Native is designed for quick interactive queries, MSQ is designed for long running (minutes+) queries. Native uses persistent JVMs. MSQ uses dedicated JVMs (one per query) that are spun up for that query specifically, so there is some overhead to each query. We are still working on harmonizing the docs so it is more clear which doc applies to which engine (MSQ is much newer). This doc is about how joins are executed in native: https://druid.apache.org/docs/latest/querying/query-execution#join. They are always broadcast hash joins. So there is more of a "broadcast" not a "shuffle" This doc is about joins in MSQ: https://druid.apache.org/docs/latest/multi-stage-query/reference#joins. Joins can be broadcast or sort-merge. The sort-merge join is similar to Spark's sort-merge join. It shuffles based on the join key. With MSQ you can use the web console to understand performance of a query. It shows you each stage, how much time is spent in that stage, and how much data that stage processes.

🙌 2

Gian Merlino

10/25/2023, 3:14 PM

Hope this helps

Sharath S

10/30/2023, 4:58 AM

Thank you very much Gian for detailed explanation.

3 Views

Open in Slack

Previous Next