Hey guys, is there an example or a clear explanati...
# troubleshooting
m
Hey guys, is there an example or a clear explanation anywhere of streaming POJOs to Parquet without Avro?
d
m
thank you very much!
I managed to do so with Table API. My problem was that I didn’t enable checkpointing (I was lucky enough to stumble upon one of your answers in SO to figure that out). Do you know why the Parquet files are being written at the checkpoints, instead of by batching them to 128MB or by the rolling interval as configured in the sink?
d
With bulk (column-based) formats, it would be awkward to provide exactly-once guarantees without synchronizing the bulk writes with checkpointing.
m
NOTE: For bulk formats (parquet, orc, avro), the rolling policy in combination with the checkpoint interval(pending files become finished on the next checkpoint) control the size and number of these parts.
just saw this. Thanks