Jonathan Meyer
05/14/2021, 10:47 AMSegment_1
& Segment_2
) -> Segment_3(date=01/01/2021, data=[DATA 1, DATA 2])
Or do we have to regenerate the 2 segments entirely ? (if so, we need to identify what they contain) - Possibly after merging them ?Mayank
Jonathan Meyer
05/14/2021, 11:53 AMMayank
Jonathan Meyer
05/14/2021, 4:11 PMIf you are generating daily segments then replacing one days segments should be straight forwardThe difficulty is that not all day's data may / will arrive at the same time but ingestion Hence • Batch 1:   _01/01/2021 (data date) - DATA 1, DATA 3, DATA 6  -> `Segment_1(date=01/01/2021, data=[DATA 1, DATA 3, DATA 6])`_ • Batch 2:   _01/01/2021 (data date) - DATA 2, DATA 4, DATA 5 -> `Segment_2(date=01/01/2021, data=[DATA 2, DATA 4, DATA 5])`_ In the end, I feel like my question is "how can we update part of a segment" ? I feel like it's not possible then It looks like there's only 2 ways to reach my goal then : 1. Only have a single segment per day at a time so a. Drop day's segment b. Regenerate segment with updated data [99% of the data may not have changed, so pretty inefficient] 2. Identify impacted segments & regenerate impacted one only (in their entirety) What do you think ? 🙂
Mayank
Jonathan Meyer
05/14/2021, 4:24 PMMayank
Jonathan Meyer
05/14/2021, 4:44 PMMayank
Jonathan Meyer
05/14/2021, 4:55 PMWhat is the max delay for data to arrive? Does one day's worth of data settle in a day or so? Or it can take several days / weeks?Typically much less than a month but it is technically unbounded - customer data could theoretically be corrected months after first ingestion
Mayank
Jonathan Meyer
05/14/2021, 4:59 PMAlso, even if your incoming data is not partitioned, you can always generate segments to guarantee the data belongs to one day (eg pick several folders to scan and select data only for single day to generate input for pinot segment)If I understand correctly, you're saying that after every batch, we generate the whole pinot segment ? For example, we've got a single file per batch and after every batch, we could regenerate a single pinot segment from every one of these files Meaning we always keep a single Pinot segment (per day) at a time, and replacing it is straightforward
Mayank
Jonathan Meyer
05/14/2021, 5:26 PMMayank
reallyonthemove tous
10/13/2022, 12:32 AM> a new feature will allow generating a single Pinot segment out of multiple input files, dropping the need for file concatenationIs there any updates to this? am looking for something similar