Regarding the <IdSet filtering> feature, is there ...
# general
b
Regarding the IdSet filtering feature, is there a way for an external program to generate the base64 encoded IdSet object?
r
It depends on the type
for ints it is a RoaringBitmap, so you can just do
Copy code
RoaringBitmap bitmap = RoaringBitmap.bitmapOf(...);
ByteBuffer buffer = ByteBuffer.allocate(bitmap.serializedSizeInBytes());
bitmap.serialize(buffer);
buffer.flip();
ByteBuffer idSet = Base64.getEncoder().encode(buffer);
I'm not sure if you can specify the base64 encoded idset in the query or not, I don't think so
I guess you want to do that so it's smaller than an in clause?
b
I guess you want to do that so it’s smaller than an in clause?
Not necessarily to make the clause smaller in size, but more to make the comparison faster.
The type I’m looking into is STRING, but I’m curious in general. If I can generate the base64 string, seems like one can pass it to the query in a literal way as in the example (from wiki page):
Copy code
SELECT yearID, count(*) 
FROM baseballStats 
WHERE IN_ID_SET(
 yearID,   
 'ATowAAABAAAAAAA7ABAAAABtB24HbwdwB3EHcgdzB3QHdQd2B3cHeAd5B3oHewd8B30Hfgd/B4AHgQeCB4MHhAeFB4YHhweIB4kHigeLB4wHjQeOB48HkAeRB5IHkweUB5UHlgeXB5gHmQeaB5sHnAedB54HnwegB6EHogejB6QHpQemB6cHqAc='
  ) = 1 
GROUP BY yearID
r
yes, you're right
it's implemented as a guava bloom filter for strings so I guess you could copy the code from
BloomFilterIdSet
take a look at the classes
IdSets
as well - you need a type byte=3 and funnel type byte=2 and then you can just use guava
b
Yes thanks for the pointers, let me take a look at the implementation details to see how it is generated. Another related question: would it be possible to use the ID_SET function with a literal id set. i.e. I have a list of 100 ids I know I’d like to filter by. Can I generate an ID_SET for them to be used subsequently in a filter operation? I mean, can I pass this literal list to Pinot to generate the ID_SET, rather than passing a subquery or column.
r
sorry I missed this last night, you can do it with a subquery
e.g. this query on airlinestats counts all the origin airports which are also destinations
select count(*) from airlineStats where insubquery(OriginAirportID, 'select idset(DestAirportID) from airlineStats') = 1