This message was deleted Apache Druid #general

Join Slack

This message was deleted.

# general

Slackbot

05/31/2023, 6:54 PM

This message was deleted.

Clint Wylie

06/02/2023, 5:59 AM

hmm, i think its more of a disclaimer about how complete the functionality is

Clint Wylie

06/02/2023, 6:00 AM

and there are also some issues with using them on realtime tasks i think, though i believe there is an active PR trying to fix that

Clint Wylie

06/02/2023, 6:01 AM

the broadcast join functionality is a bit rough/unfinished, but is eventually planned to be a replacement for lookups i think

Clint Wylie

06/02/2023, 6:02 AM

https://github.com/apache/druid/pull/10224 has some details of what is essentially the unfinished reference implementation we have for integration tests

Clint Wylie

06/02/2023, 6:02 AM

heh.. from 3 years ago, you can see we’re a bit slow on finishing this stuff up 😅

Clint Wylie

06/02/2023, 6:03 AM

ah this is the fix for realtime tasks https://github.com/apache/druid/pull/14239

JRob

06/02/2023, 3:08 PM

So... Should I be using a

loadForever

rule instead?

Clint Wylie

06/02/2023, 7:16 PM

well, if you’re planning on doing joins with this to other datasources its probably worth investigating if the broadcast stuff works well enough for your use case, since it should be a fair bit more performant

Clint Wylie

06/02/2023, 7:17 PM

i don’t think druid can automatically do anything fancy just by having the segments loaded everywhere other than just have a larger pool of nodes to process those tables, but it will still need to do subqueries to compute the join

Clint Wylie

06/02/2023, 7:17 PM

the broadcast join stuff besides loading the segment everywhere also lets the join be pushed down to the data nodes so that the join can be computed without a subquery

Clint Wylie

06/02/2023, 7:18 PM

it has some limitations though, such as the datasource can only be a single segment and has to be ingested with that extra stuff in the indexSpec to specify the key columns (which should be unique iirc)

Clint Wylie

06/02/2023, 7:20 PM

the broadcast rule itself should work as well as a loadForever rule with many replicas though afaik, so that part is less experimental than the docs specify i think, just not a ton of advantage other than having more data nodes to compute for that table

Clint Wylie

06/02/2023, 7:23 PM

https://github.com/apache/druid/pull/10020 has some more details about how the broadcast join stuff works

Gian Merlino

06/07/2023, 11:38 AM

IMO right now it's best, from an operational perspective, to use either: • a regular table with

loadForever

• a lookup, rather than a regular table https://druid.apache.org/docs/latest/querying/lookups.html

Gian Merlino

06/07/2023, 11:39 AM

the perf implications of a regular table w/

loadForever

are the broker will need to gather and broadcast the table for each query, which if the table is small, is probably not big deal

Gian Merlino

06/07/2023, 11:41 AM

lookup will perform better, as they're prebroadcast; at the cost of needing to be managed in a separate way from your regular tables

Gian Merlino

06/07/2023, 11:42 AM

the broadcast stuff is a not-yet-complete story that aims to enable prebroadcast of regular tables. when it's complete we'd recommend it for this case however, right now i wouldn't want you to have to wrestle with unless you really have to, due to incompleteness 🙂

👍 1

Clint Wylie

06/07/2023, 10:22 PM

yeah that is fair, i was more trying to mention that there isn’t really a benefit to have it on loadForever with num replicas = num data servers since its still going to have to subquery

JRob

06/08/2023, 1:44 AM

Oh, I didn't know you had to do something special to create a lookup table.

JRob

06/08/2023, 1:46 AM

My team uses the UI. They copy and pasted the data from a simple CSV file. Is there functionality there to turn that table into a lookup table instead?

JRob

06/08/2023, 1:48 AM

Btw, it's all in a single segment. When the team needs to update the data, they kill the old segment file (just in case).

Clint Wylie

06/08/2023, 1:50 AM

lookup tables are a sort of special construct which we eventually plan to replace with the broadcast stuff once we finish it, but https://druid.apache.org/docs/latest/querying/lookups.html has the details

Clint Wylie

06/08/2023, 1:50 AM

they are limited to a single key to value lookup, so if the table has multiple columns it probably wont work

Clint Wylie

06/08/2023, 1:51 AM

they aren’t real segments, though we did retrofit them so you could query the key and value column as a table via sql when we added some of the broadcast stuff

👍 1

Clint Wylie

06/08/2023, 1:52 AM

there are a couple of extensions which provide the lookups, i think https://druid.apache.org/docs/latest/development/extensions-core/lookups-cached-global.html is the more commonly used one

Clint Wylie

06/08/2023, 1:57 AM

the web console does have stuff to manage the lookups, if the limitation of only having a single key to value replacement works for your use case it will be the most performant at least when using the

LOOKUP

function, especially if the lookup is 1:1 replacement since Druid can do some optimizations to only do the replacement when it really needs to

Open in Slack

Previous Next