Hello, Is there an upper limit to how big bitmap i...
# pinot-dev
a
Hello, Is there an upper limit to how big bitmap inverted index file can be? Seems to me like it can't exceed 2GB for now due to integer offsets. I am wondering if this is accidental or intentional or if we can consider larger size index files? One potential fix may be to change BitmapInvertedIndexWriter._bytesWritten to a
long
?
Copy code
Error: java.lang.RuntimeException: java.lang.IllegalArgumentException: Negative position at org.apache.pinot.hadoop.job.mappers.SegmentCreationMapper.map(SegmentCreationMapper.java:310) at org.apache.pinot.hadoop.job.mappers.SegmentCreationMapper.map(SegmentCreationMapper.java:66) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:177) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:171) Caused by: java.lang.IllegalArgumentException: Negative position at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:863) at org.apache.pinot.segment.local.segment.creator.impl.inv.BitmapInvertedIndexWriter.mapBitmapBuffer(BitmapInvertedIndexWriter.java:102) at org.apache.pinot.segment.local.segment.creator.impl.inv.BitmapInvertedIndexWriter.resizeIfNecessary(BitmapInvertedIndexWriter.java:95) at org.apache.pinot.segment.local.segment.creator.impl.inv.BitmapInvertedIndexWriter.add(BitmapInvertedIndexWriter.java:73) at org.apache.pinot.segment.local.segment.creator.impl.inv.json.OnHeapJsonIndexCreator.seal(OnHeapJsonIndexCreator.java:57) at org.apache.pinot.segment.local.segment.creator.impl.SegmentColumnarIndexCreator.seal(SegmentColumnarIndexCreator.java:560) at org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.handlePostCreation(SegmentIndexCreationDriverImpl.java:266) at org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.build(SegmentIndexCreationDriverImpl.java:238) at org.apache.pinot.hadoop.job.mappers.SegmentCreationMapper.map(SegmentCreationMapper.java:277) ... 9 more Suppressed: java.lang.IllegalArgumentException: Negative size at sun.nio.ch.FileChannelImpl.truncate(FileChannelImpl.java:324) at org.apache.pinot.segment.local.segment.creator.impl.inv.BitmapInvertedIndexWriter.close(BitmapInvertedIndexWriter.java:118) at org.apache.pinot.segment.local.segment.creator.impl.inv.json.OnHeapJsonIndexCreator.seal(OnHeapJsonIndexCreator.java:50) ... 13 more
r
the file format has 32 bit offsets, so the suggested fix wouldn't work
to double the limit to 4GB in a backward compatible way, the offsets could be treated as
uint32
which would require reading 32 bits as a
long
and masking with
0xFFFFFFFFL
a
I see, thanks. just wanted to check for now, otherwise I don't have a strong reason to do this right now.
r
more than 2GB is a very large bitmap index
a
true, it was more of a test that I was running rather than a production usecase.
r
👍 1
but this is a limitation which has existed since bitmap indexes were added to pinot