This message was deleted.
# dev
s
This message was deleted.
g
I assume you're talking about the new async queries backed by MSQ
For those Joins are possible, both broadcast and sort-merge; for details see https://druid.apache.org/docs/latest/multi-stage-query/reference/#joins
About retention duration, I'm not sure, possibly @Karan Kumar knows
k
We can probably do a better job documenting result retention in one go. Since async q's run with the MSQ engine which run on tasks, the result retention is controlled by task log retention policy https://druid.apache.org/docs/latest/configuration/#log-retention-policy You will need to set another property on the overlord assuming you are using durableStorage(recomended) for query results How to configure durable storage : https://druid.apache.org/docs/latest/multi-stage-query/reference#durable-storage-configurations https://druid.apache.org/docs/latest/operations/durable-storage#durable-storage-clean-up
m
Thanks guys for the reply 👍 Another suggestion, I was going thru the documentation for join. Just by the documentation it is little hard for me to visualize how the join is actually working. By reading up joins I am imagining it to work like spark join, but the details such as follows are missing • how the data is shuffled, • what is the equivalent of spark executor in this case • how can join performance be profiled Not sure if i am thinking in the right direction.
g
@Mohit Jain basically there are two query engines— native and MSQ (multi-stage query). Native is designed for quick interactive queries, MSQ is designed for long running (minutes+) queries. Native uses persistent JVMs. MSQ uses dedicated JVMs (one per query) that are spun up for that query specifically, so there is some overhead to each query. We are still working on harmonizing the docs so it is more clear which doc applies to which engine (MSQ is much newer). This doc is about how joins are executed in native: https://druid.apache.org/docs/latest/querying/query-execution#join. They are always broadcast hash joins. So there is more of a "broadcast" not a "shuffle" This doc is about joins in MSQ: https://druid.apache.org/docs/latest/multi-stage-query/reference#joins. Joins can be broadcast or sort-merge. The sort-merge join is similar to Spark's sort-merge join. It shuffles based on the join key. With MSQ you can use the web console to understand performance of a query. It shows you each stage, how much time is spent in that stage, and how much data that stage processes.
🙌 2
Hope this helps
s
Thank you very much Gian for detailed explanation.