hi guys, does anyone know what is the status of <h...
# random
s
hi guys, does anyone know what is the status of "upsert stream - Table"> on the roadmap? Is there any ticket I can follow for updates?
g
What are you trying to implement?
s
I have a simple continuous query of the form "select * from table_a join table_b on (...)" For bootstrapping this table, I wanted to first run it in batch mode by directly connecting to the source DBs. Once the bootstrapping is complete, I wanted to switch to streaming mode via a pubsub connector. But some data in pubsub might be duplicated from the batch mode.. Am I even on the right track? how do people usually bootstrap big queries, do they just run it in streaming mode right from the beginning?
g
Bootstrapping through the hybrid source? A few things might also depend on the downstream sink. You could try also running an even despoliation query if duplicates is the concern. You can also investigate the use of https://paimon.apache.org/ that strong capabilities for such use cases.
s
cool, I was just checking out hybrid source.. I think I can use that.. but the few duplicate events during the switching is a concern, and hence the question about upsert stream. Thanks I will check out apache paimon. Also can you elaborate what you meant by
even despoliation query
?
s
wow thanks, this might work. Much appreciated πŸ™‡
πŸ™Œ 1