This message was deleted Apache Druid #general

Join Slack

This message was deleted.

# general

Slackbot

05/25/2023, 3:09 PM

This message was deleted.

Vadim

05/25/2023, 8:51 PM

How are you planning on getting the rabbitmq data into Druid? What is "rabbitmq super streams based model"? By "plugin is being developed right now by my colleague" are you saying you are writing a rabbitmq supervisor?

Kyle Larose

05/25/2023, 9:06 PM

Yes. It rabbitmq recently released a preview feature called "superstreams" which has much of the same functionality as kafka or kinesis. In particular, a rabbitmq stream is an append-only log, so the same ingestion model used by kafka and kinesis works here.

Kyle Larose

05/25/2023, 9:08 PM

https://blog.rabbitmq.com/posts/2021/07/rabbitmq-streams-overview/#what-are-rabbitmq-streams

Kyle Larose

05/25/2023, 9:08 PM

So, we're writing a plugin that will ingest data from them. We based much of it off the kinesis one, for what it's worth. We're planning on open-sourcing it. I know my colleague has discussed it, though I'm not sure if it was with the rabbit community or druid.

Vadim

05/25/2023, 9:24 PM

That is spectacular. Let me know how it goes. If you open source it and contribute it to Druid (and make sure that the sampler API works with it - I don't know if that happens automatically or need a bit of extra work) then ping me and I would be more than happy to make a tile for it in the streaming data loader in the console...

Vadim

05/25/2023, 9:26 PM

as for your original question: you can only have 1 supervisor per datasource so I assume you will have another (new) datasource for the rabbitmq. does it make sense to let them all run for some time and then just point you app at the new datasource (you can also do a batch backfill by copying from old data source to new datasource)

Vadim

05/25/2023, 9:29 PM

we recently did something like that where I work. We went from an OSS Kafka cluster to a Confluent cloud cluster (shoutout to Confluent cloud!). We did exactly what I said in my comment. In our usecase we only have several weeks of data that people can access in the app so we just let it run until the new datasource had that much data in it. We did not need to do the backfill but we discussed it before natural delays to the schedule from other projects obviated the need.

Kyle Larose

05/25/2023, 9:34 PM

We definitely plan on contributing it to druid, so that's great to hear. 🙂 I hadn't considered pointing it at a different data source. So, basically, we could identify when data started arriving at the new data source, and copy all the old data in prior to that. Or, keep the old one around, and stitch it together in the app. Unfortunately for us we have a few years' worth of data (not a huge amount, so copying it around is likely feasible).

Kyle Larose

05/25/2023, 9:34 PM

Thanks for the advice!

Open in Slack

Previous Next