1. You can have a batch job that scheduled that incrementally pushed data to Pinot, as data arrives in HDFS. Curious though, if it is every 30min, do you have a stream pipeline that Pinot can ingest from directly?
2. Historical segments can be overwritten in Pinot. Any segment pushed to Pinot that has the same name as an existing segment within Pinot will overwrite the existing one. you just need to ensure that they are for the same time period.
3. Haven’t looked at spark data frame, but for segment generation from any format you just need to implement the RecordReader interface. @User do we have this in OSS?