This message was deleted.
# general
s
This message was deleted.
c
hmm, i think its more of a disclaimer about how complete the functionality is
and there are also some issues with using them on realtime tasks i think, though i believe there is an active PR trying to fix that
the broadcast join functionality is a bit rough/unfinished, but is eventually planned to be a replacement for lookups i think
https://github.com/apache/druid/pull/10224 has some details of what is essentially the unfinished reference implementation we have for integration tests
heh.. from 3 years ago, you can see we’re a bit slow on finishing this stuff up 😅
ah this is the fix for realtime tasks https://github.com/apache/druid/pull/14239
j
So... Should I be using a
loadForever
rule instead?
c
well, if you’re planning on doing joins with this to other datasources its probably worth investigating if the broadcast stuff works well enough for your use case, since it should be a fair bit more performant
i don’t think druid can automatically do anything fancy just by having the segments loaded everywhere other than just have a larger pool of nodes to process those tables, but it will still need to do subqueries to compute the join
the broadcast join stuff besides loading the segment everywhere also lets the join be pushed down to the data nodes so that the join can be computed without a subquery
it has some limitations though, such as the datasource can only be a single segment and has to be ingested with that extra stuff in the indexSpec to specify the key columns (which should be unique iirc)
the broadcast rule itself should work as well as a loadForever rule with many replicas though afaik, so that part is less experimental than the docs specify i think, just not a ton of advantage other than having more data nodes to compute for that table
https://github.com/apache/druid/pull/10020 has some more details about how the broadcast join stuff works
g
IMO right now it's best, from an operational perspective, to use either: • a regular table with
loadForever
• a lookup, rather than a regular table https://druid.apache.org/docs/latest/querying/lookups.html
the perf implications of a regular table w/
loadForever
are the broker will need to gather and broadcast the table for each query, which if the table is small, is probably not big deal
lookup will perform better, as they're prebroadcast; at the cost of needing to be managed in a separate way from your regular tables
the broadcast stuff is a not-yet-complete story that aims to enable prebroadcast of regular tables. when it's complete we'd recommend it for this case however, right now i wouldn't want you to have to wrestle with unless you really have to, due to incompleteness 🙂
👍 1
c
yeah that is fair, i was more trying to mention that there isn’t really a benefit to have it on loadForever with num replicas = num data servers since its still going to have to subquery
j
Oh, I didn't know you had to do something special to create a lookup table.
My team uses the UI. They copy and pasted the data from a simple CSV file. Is there functionality there to turn that table into a lookup table instead?
Btw, it's all in a single segment. When the team needs to update the data, they kill the old segment file (just in case).
c
lookup tables are a sort of special construct which we eventually plan to replace with the broadcast stuff once we finish it, but https://druid.apache.org/docs/latest/querying/lookups.html has the details
they are limited to a single key to value lookup, so if the table has multiple columns it probably wont work
they aren’t real segments, though we did retrofit them so you could query the key and value column as a table via sql when we added some of the broadcast stuff
👍 1
there are a couple of extensions which provide the lookups, i think https://druid.apache.org/docs/latest/development/extensions-core/lookups-cached-global.html is the more commonly used one
the web console does have stuff to manage the lookups, if the limitation of only having a single key to value replacement works for your use case it will be the most performant at least when using the
LOOKUP
function, especially if the lookup is 1:1 replacement since Druid can do some optimizations to only do the replacement when it really needs to