I’m getting slow regexp_like performance, for 0.3 ...
# general
t
I’m getting slow regexp_like performance, for 0.3 billion rows, it is costing nearly 2 secs to match a prefix for a column, but in Druid, the same data using
like
operator returned instantly. Is there any configs I can apply to speed up this kind of query?
s
have you tried text index ?
1
t
Text index will require raw value column, but I’m getting array out of bound exception previously using raw value index.
s
Text index is supported on both raw and dictionary columns
What is the error you see when creating/using text index ?
t
I only looked at the documentation, saying only raw value is supported. Is this feature from last release?
s
Sorry that's my bad. Text index on dictionary columns has been supported for quite some time (I think 1 or 2 release old). I will update the documentation
Can you share how you are setting up text index in table config ?
t
I will try text index in a test table, see if there are any errors.
s
Sure. Also, it should work for raw columns as well. Please share the call stack for out of bounds error. The only error we have seen in the past with raw columns is the integer overflow which was fixed with new segment format that supports larger string column values.
t
Will text index have a memory overhead?
s
Should not. We are running it on raw data where each string value can be as large as upto 2 million characters. However for such cases, disabling dictionary is preferable since dictionary creation will increase heap usage and GC pressure. The text index itself should not introduce any significant memory overhead.
t
Looks like text index is not using consuming segment data? Text index is only built when generating segment?
s
It uses consuming segment as well. Let me know and we can jump on a call to see what's going on
t
Turns out to be we have an older version which don’t have the lag issue fix in text index, I will try upgrade pinot first.
s
Yes please. The fix was merged 4-5 weeks ago and isn't part of 0.6.0 release you are using.
k
I spoke to @User .. looks like he deleted all indexes.. so he was looking at Pinot numbers without indexes
He added index on container name and its down to sub second now