Hello! does pinot support importing gzipped data? ...
# general
d
Hello! does pinot support importing gzipped data? We have gzipped JSON files in GCS bucket - can those be imported directly to pinot or do we have to serve uncompressed files in GCS?
m
At present the segment generation takes uncompressed data. Would be a good enhancement to take gzipped. Could you please open an enhancement issue?
It should be very straight forward to enhance
d
thanks @Mayank I'll create an enhancement issue
m
@Dovydas Sabonis Could you try this change: https://github.com/apache/incubator-pinot/pull/6321
k
I currently import .gz files (OFFLINE) and it seems to work fine for me…is that what you were asking about?
d
@Ken Krugler, yes that's what I meant @Mayank many thanks for such a quick turnaround! I'll try to test these changes next week
m
@Ken Krugler are your input gzipped files JSON, or some other format? I do see that JSON record-reader did not support GZIPed files, hence the PR. If gz JSON is working for you, perhaps the unzip is happening somewhere in the flow level. Can you share how you are importing?
k
Hi @Mayank - I’m importing CSV, so that explains why it worked for me. Side note - in other frameworks I’ve used there’s a generalized concept of an input stream, so decompression is implicitly handled for all input formats.
m
@Ken Krugler yes that’s how it is implemented in Pinot as well. However, JSON record reader wasn’t really calling the right api.
k
Thanks for the explanation!
d
@Mayank I’ve tested compressed JSON import and just wanted to let you know that it worked nicely. Impressed by how quickly you got that implemented!
m
Thanks @Dovydas Sabonis for confirming 👍