Hello slightly smiling face When ingesting Batch data + data Apache Pinot #troubleshooting

Hello :slightly_smiling_face: When ingesting Batc...

Jonathan Meyer

05/26/2021, 8:40 AM

Hello 🙂 When ingesting Batch data + data partitioning (Parquet) using a key, that key is "missing" from the parquet file parts (makes sense) However, from what I've seen, Pinot cannot find that key then, and fails to generate the segments My current workaround is to duplicate the partition column. Is that a known issue / possible to adjusts settings ?

Xiang Fu

05/26/2021, 9:56 AM

do you have stacktrace for the job? The key should be a column in your data even in batch side

Jonathan Meyer

05/26/2021, 1:03 PM

The schema contains several columns including

dateString

which it is partitionned on This creates parquet partitions without this key

Untitled

Jonathan Meyer

05/26/2021, 2:10 PM

Actually, now that I look at it again, I'm seeing

Copy code

file:/kpi-data/raw/date=2020-11-30/ab331a05255849bf811a173a380aaf1d.parquet

Not

dateString=XXX

Curious but I'll check that

Xiang Fu

05/26/2021, 6:24 PM

oic, cause the default null string caused the parsing failure

➕ 1

Xiang Fu

05/26/2021, 6:25 PM

this date has to be one column in your parquet file

Xiang Fu

05/26/2021, 6:25 PM

if you generated this parquet from spark, you can add the partitionkey as a column as well

Jonathan Meyer

05/26/2021, 6:26 PM

If the parquet partitioning key was the one expected by Pinot (

dateString

), it would have worked, right ? (Pulling

dateString

values from the file paths)

Xiang Fu

05/26/2021, 6:28 PM

yes

Xiang Fu

05/26/2021, 6:28 PM

the error says the job tries to generate the partitionkey but got null value

Xiang Fu

05/26/2021, 6:28 PM

so it’s failed

Jonathan Meyer

05/26/2021, 6:29 PM

Makes sense, thanks @Xiang Fu :)

Jonathan Meyer

05/26/2021, 6:29 PM

On an unrelated note, I've opened an issue on Python pinot-db driver, let me know what you think when you've got the time ;)

Xiang Fu

05/26/2021, 6:35 PM

sounds good!

Open in Slack

Previous Next