Also is there an easy way to just sink a Spark dat...
# general
n
Also is there an easy way to just sink a Spark dataframe to pinot segments?
Looks like I’d just need to manually use SegmentIndexCreationDriverImpl with a foreachpartition or something. Then manually do the copying that
SparkSegmentGenerationJobRunner
does. I might make a PR to just have a method that takes a dataframe, does a foreachpartition and maps it to GenericRow.
Would be useful to have this feature since you can read from whatever custom spark readers you have, repartition as necessary, etc.
j
@Xiang Fu Thoughts?
x
This is a useful feature
I can help review the PR
and let me know if you have any question on the segment gen task
n
https://github.com/apache/incubator-pinot/pull/5787/files looks like someone already did this 😄
Ah dang it’s read only. That sucks.
x
yes, it’s reading from Pinot to spark, the sink is not yet there