https://pinot.apache.org/ logo
#general
Title
# general
n

Noah Prince

11/05/2020, 4:52 PM
Also is there an easy way to just sink a Spark dataframe to pinot segments?
Looks like I’d just need to manually use SegmentIndexCreationDriverImpl with a foreachpartition or something. Then manually do the copying that
SparkSegmentGenerationJobRunner
does. I might make a PR to just have a method that takes a dataframe, does a foreachpartition and maps it to GenericRow.
Would be useful to have this feature since you can read from whatever custom spark readers you have, repartition as necessary, etc.
j

Jackie

11/05/2020, 6:28 PM
@Xiang Fu Thoughts?
x

Xiang Fu

11/05/2020, 6:29 PM
This is a useful feature
I can help review the PR
and let me know if you have any question on the segment gen task
n

Noah Prince

11/12/2020, 8:30 PM
https://github.com/apache/incubator-pinot/pull/5787/files looks like someone already did this 😄
Ah dang it’s read only. That sucks.
x

Xiang Fu

11/12/2020, 10:49 PM
yes, it’s reading from Pinot to spark, the sink is not yet there