Hello :slightly_smiling_face: When ingesting Batc...
# troubleshooting
j
Hello 🙂 When ingesting Batch data + data partitioning (Parquet) using a key, that key is "missing" from the parquet file parts (makes sense) However, from what I've seen, Pinot cannot find that key then, and fails to generate the segments My current workaround is to duplicate the partition column. Is that a known issue / possible to adjusts settings ?
x
do you have stacktrace for the job? The key should be a column in your data even in batch side
j
The schema contains several columns including
dateString
which it is partitionned on This creates parquet partitions without this key
Actually, now that I look at it again, I'm seeing
Copy code
file:/kpi-data/raw/date=2020-11-30/ab331a05255849bf811a173a380aaf1d.parquet
Not
dateString=XXX
Curious but I'll check that
x
oic, cause the default null string caused the parsing failure
âž• 1
this date has to be one column in your parquet file
if you generated this parquet from spark, you can add the partitionkey as a column as well
j
If the parquet partitioning key was the one expected by Pinot (
dateString
), it would have worked, right ? (Pulling
dateString
values from the file paths)
x
yes
the error says the job tries to generate the partitionkey but got null value
so it’s failed
j
Makes sense, thanks @Xiang Fu :)
On an unrelated note, I've opened an issue on Python pinot-db driver, let me know what you think when you've got the time ;)
x
sounds good!