Slackbot
05/24/2023, 2:32 PMJohn Kowtko
05/24/2023, 2:44 PMKyle Hoondert
05/24/2023, 2:56 PMappendToExisting=true
?Denis Ott
05/24/2023, 3:04 PM{
"type": "compact",
"dataSource": "dev.do.overshadow-test",
"ioConfig": {
"type": "compact",
"inputSpec": {
"type": "segments",
"segments": [
"dev.do.overshadow-test_2023-01-01T00:00:00.000Z_2023-02-01T00:00:00.000Z_2023-05-24T13:27:31.235Z",
"dev.do.overshadow-test_2023-01-01T00:00:00.000Z_2023-02-01T00:00:00.000Z_2023-05-24T13:15:52.256Z"
]
}
}
}
since the older *T13:15
segment was unused i got an error
tried it now using a time interval
{
"type": "compact",
"dataSource": "dev.do.overshadow-test",
"ioConfig": {
"type": "compact",
"inputSpec": {
"type": "interval",
"interval": "2023-01-01/2023-01-02"
}
}
}
that produced a third segment but it is identical to the second one (the one with less data)John Kowtko
05/24/2023, 3:14 PMJohn Kowtko
05/24/2023, 3:17 PMJohn Kowtko
05/24/2023, 3:20 PMJohn Kowtko
05/24/2023, 3:20 PMDenis Ott
05/24/2023, 3:21 PMappendToExisting
because it didnt use the dynamic
partitioning type
@John Kowtko the actual data has segmentGranularity=day, this is a smaller example with segmentGranularity=monthJohn Kowtko
05/24/2023, 3:23 PMappendToExisting
is only relevant to the ingestion task, not the existing datasource. So it doesn't matter how the original segments got there, this parameter only affects what happens when you ingest the new data.
I guess a small caveat to this ... if the original segments were already a result of multiple ingestions into the same time period and you haven't compacted that time period yet, then yes the existing segments may be marked as having been ingested in overwrite or append mode, and that will affect what queries see, and what compaction will do.John Kowtko
05/24/2023, 3:29 PMI have a datasource where several days=segments with millions of rows are overshadowed by segments that were added later with only a tiny fraction of rows"overshadowed" implies that the newer segments were ingested with appendToExisting=false ... and if all segments involved were at the DAY level, then compaction (using your second spec) should merge them.
Denis Ott
05/24/2023, 3:40 PM"data": "time,id,value\n2023-01-01,1.01,1.01\n2023-02-01,2.01,2.01\n2023-03-01,3.01,3.01"
to overshadow the previous segment and then ran the compactions on itJohn Kowtko
05/24/2023, 5:57 PM