This message was deleted.
# troubleshooting
s
This message was deleted.
j
Hi Daniel, Two things come to mind (not 100% sure on either of these): • Is there an active Supervisor task on this datasource? If so then I would expect the DAY time chunk lock from that Supervisor to prevent an auto-compaction from being scheduled for the current year. • The auto-compact offset itself might be preventing auto-compaction from checking the current year since it is not completely older than P2D. (i.e. on Jan 3 this would kick in to compact the prior year). If this is the case then try P0D or P0Y offsets to see if that makes a difference.
l
Hey @John Kowtko , no, this is not a data source with live ingestion. Only batching, always writing or rewriting D and D-1, that’s why I set the P2D offset delay, so no time locks or interference The second possibility is exactly what I was afraid of. I started the spec thinking like ‘the offset is used for druid to look what are the segments to ignore when checking which ones are compacted’. But if it’s like it’s gonna run after the end of the segment + offset… then I need to trigger this via periodic reindexes (via Airflow) or change autocompaction to WEEK (+2d offset) granularity and every now and then manually force a YEAR merge.
j
Hi Daniel, if you want to try to trick the auto-compaction process, ingest a dummy record with __time value "2024-01-01" ... since the auto-compaction offset is from the LATEST time period, not current date/time, a P1Y offset should then try to operate on 2023 ... I have wrestled with auto-compaction contention plenty of times when "future" data was inadvertantly added to the customer, causing auto-compaction to try to compact current and even future time periods when we didn't want that ... in this case it might be a benefit ... 🤞 You just need to insert a record that doesn't disrupt your application.
l
But adding a ‘2024-01-01’ record would make my segments from today and yesterday considered ‘compactable’, wouldn’t it? That would hurt us as these segments arent final.
j
Are you replacing todays/yesterdays segments with overwrite? Or append? If overwrite, segment overshadowing allows overwrite of data with mixed segment granularity, e.g. you can ingest/overwrite a DAY segment on top of a YEAR segment, for query purposes that new DAY's data will overshadow/replace the old DAYs data that is contained in the YEAR segment, and then the next time compaction runs it will merge the DAY segment into the YEAR segment. If append, then the granularity of the segments must match ... so in that case you would just change your ingestion to use YEAR granularity instead of DAY. Would one of those two scenarios work for you?
l
I see a job running multiple times daily, overwriting data for D and D-1 (with corrections and new data). If it was done only once - or at least if it was only new data -, I’d be ok with the merge. But this way looks like autocompaction will continously merging partial data into this block where it won’t be replaced if needed.
j
It's definitely your call, and based on what I know today I don't see a way around that particular scheduling issue. By default compaction is kicked off every 30 minutes, so if you have several batch updates a day, then this datasource would be compacted that many times. And if compacting a YEAR is very resource intensive, then that is a reason not to do it. However if that is the case and MONTH granularity would work better with multiple daily compactions, maybe that's something to consider.
l
That’s exactly what I did yesterday, I changed the autocompaction granularity to MONTH, with good results. It’s already a huge improvement from the small daily ones, I can set the partitioning and the offset. There will be more daily indexes close to EOM than I would like, but still huge gains compared to the previous situation. For now is more than good enough, I’ll explore more in the future. Thanks @John Kowtko
👍 1
j
Glad to hear 🙂
l
In a parallel problem… can autocompaction be used for deeper rollups? So I have this database with HOUR query granularity and DAILY segment granularity. Can I set autocompaction to after a month, to compact/rollup data into DAILY query granularity and MONTH segment granularity I imagine it’s doable, but I couldn’t find examples of it.
j
Here is the doc page that mentions rollups: https://druid.apache.org/docs/latest/data-management/compaction.html#rollup ... I haven't tried it myself but the docs state that it will continue to roll up if the original segments had rollup enabled ... which makes sense. If you don't know if the original segments were ingested with rollup:true, then I don't know if it is possible to force rollup during compaction ...