Evan Galpin
06/22/2021, 8:32 PMMayank
Kenny Bastani
06/22/2021, 8:36 PMKenny Bastani
06/22/2021, 8:38 PMEvan Galpin
06/22/2021, 8:45 PMposts
, comments
, likes
etc as distinct data sets, then later make use of Groovy scripts to effectively join those data sets at ingestion time (rather than search time)?Kenny Bastani
06/22/2021, 8:53 PMKenny Bastani
06/22/2021, 8:54 PMKenny Bastani
06/22/2021, 8:56 PMposts
, comments
, likes
, as they sit in different tables... whenever they are updated, deleted, created, a Kafka event is sent per table to a respective topic. Then the rest is as I said before with Flink.Kenny Bastani
06/22/2021, 8:56 PMKenny Bastani
06/22/2021, 8:57 PMEvan Galpin
06/22/2021, 9:09 PMposts
. To then start making use of the new dimension, the new table can be joined; it might not be very efficient but it can start answering questions about the data right away. And testing/local development can be done relatively cheaply by inserting data and joining at query time.
What does the developer workflow look like to support the same kind of feature evolution in Pinot? It seems complex to mimic an ETL pipeline for local development, for exampleKenny Bastani
06/22/2021, 9:13 PMKenny Bastani
06/22/2021, 9:20 PMfooTable
has columns a, b, c
 in the Pinot schema configuration, as well as primaryKey
• Upsert is enabled and partitioned on the primaryKey
• The table is real-time and has been populated with 1,000 records
• Now I change the Pinot schema to a, b, c, d
• The Kafka payload has been modified to stream in the new column for d
• To make queries return correctly after making this change in Pinot, you need to issue a reload
on the segments of the fooTable
• This will populate the d
column with the value null for the 1,000 existing rows
• Because upsert is enabled, when you populate the d
column in your RDBMS, the old 1,000 rows will be updated with the current version of the d
valueKenny Bastani
06/22/2021, 9:25 PMKenny Bastani
06/22/2021, 9:25 PMEvan Galpin
06/23/2021, 1:15 PMEvan Galpin
06/23/2021, 1:35 PMupsert
. What are the semantics of upserts in Pinot? Is it partial upsert or full overwrite?
Ex. in the example you gave above RE adding column d
to an existing schema, could the value for d
be added to existing entries with the same primaryKey
and values for a, b, c
already present by sending a payload with only the primaryKey
and value for d
? Or would the payload need to include a, b, c, d
with a given primaryKey
?Kenny Bastani
06/23/2021, 8:30 PMKenny Bastani
06/24/2021, 5:44 PMKenny Bastani
06/24/2021, 5:45 PMKenny Bastani
06/24/2021, 5:46 PMKenny Bastani
06/24/2021, 5:47 PMEvan Galpin
06/25/2021, 8:26 PMMayank