This message was deleted.
# general
s
This message was deleted.
v
How are you planning on getting the rabbitmq data into Druid? What is "rabbitmq super streams based model"? By "plugin is being developed right now by my colleague" are you saying you are writing a rabbitmq supervisor?
k
Yes. It rabbitmq recently released a preview feature called "superstreams" which has much of the same functionality as kafka or kinesis. In particular, a rabbitmq stream is an append-only log, so the same ingestion model used by kafka and kinesis works here.
So, we're writing a plugin that will ingest data from them. We based much of it off the kinesis one, for what it's worth. We're planning on open-sourcing it. I know my colleague has discussed it, though I'm not sure if it was with the rabbit community or druid.
v
That is spectacular. Let me know how it goes. If you open source it and contribute it to Druid (and make sure that the sampler API works with it - I don't know if that happens automatically or need a bit of extra work) then ping me and I would be more than happy to make a tile for it in the streaming data loader in the console...
as for your original question: you can only have 1 supervisor per datasource so I assume you will have another (new) datasource for the rabbitmq. does it make sense to let them all run for some time and then just point you app at the new datasource (you can also do a batch backfill by copying from old data source to new datasource)
we recently did something like that where I work. We went from an OSS Kafka cluster to a Confluent cloud cluster (shoutout to Confluent cloud!). We did exactly what I said in my comment. In our usecase we only have several weeks of data that people can access in the app so we just let it run until the new datasource had that much data in it. We did not need to do the backfill but we discussed it before natural delays to the schedule from other projects obviated the need.
k
We definitely plan on contributing it to druid, so that's great to hear. 🙂 I hadn't considered pointing it at a different data source. So, basically, we could identify when data started arriving at the new data source, and copy all the old data in prior to that. Or, keep the old one around, and stitch it together in the app. Unfortunately for us we have a few years' worth of data (not a huge amount, so copying it around is likely feasible).
Thanks for the advice!