https://pinot.apache.org/ logo
c

Chundong Wang

02/26/2021, 11:16 PM
Ran into
IllegalStateException
when using string functions in where clause. 😢
Copy code
select
currency_code,
from orders 
where SUBSTR(currency_code, 0, 2) <> 'US'
limit 10
Exception:
Copy code
"message": "QueryExecutionError:\njava.lang.IllegalStateException: Caught exception while invoking method: public static java.lang.String org.apache.pinot.common.function.scalar.StringFunctions.substr(java.lang.String,int,int) with arguments: [, 0, 2]\n\tat org.apache.pinot.common.function.FunctionInvoker.invoke(FunctionInvoker.java:148)\n\tat org.apache.pinot.core.operator.transform.function.ScalarTransformFunctionWrapper.transformToStringValuesSV(ScalarTransformFunctionWrapper.java:209)\n\tat org.apache.pinot.core.operator.dociditerators.ExpressionScanDocIdIterator.processProjectionBlock(ExpressionScanDocIdIterator.java:167)\n\tat org.apache.pinot.core.operator.dociditerators.ExpressionScanDocIdIterator.next(ExpressionScanDocIdIterator.java:81)
j

Jackie

02/26/2021, 11:19 PM
Seems the problem is caused by calling
substring
on an empty string
c

Chundong Wang

02/26/2021, 11:20 PM
It works if
substr
is in the select statement
Copy code
select
SUBSTR(currency_code, 0, 2)
from order
--where SUBSTR(currency_code, 0, 2) <> 'US'
limit 10
Is there way to tell if it’s related to some of the segments?
j

Jackie

02/26/2021, 11:21 PM
Can you try adding a filter
where currency_code = ''
?
c

Chundong Wang

02/26/2021, 11:21 PM
yep! that’d go wrong too
j

Jackie

02/26/2021, 11:22 PM
Javadoc for
String.substring()
Copy code
Returns a string that is a substring of this string. The substring begins at the specified beginIndex and extends to the character at index endIndex - 1. Thus the length of the substring is endIndex-beginIndex.
Examples:
       "hamburger".substring(4, 8) returns "urge"
       "smiles".substring(1, 5) returns "mile"
       
Params:
beginIndex – the beginning index, inclusive.
endIndex – the ending index, exclusive.
Returns:
the specified substring.
Throws:
IndexOutOfBoundsException – if the beginIndex is negative, or endIndex is larger than the length of this String object, or beginIndex is larger than endIndex.
c

Chundong Wang

02/26/2021, 11:22 PM
hmm… but it seems I can’t exclude empty strings…
Copy code
where currency_code<>''
and SUBSTR(currency_code, 0, 2) <> 'US'
ah i see
SUBSTR(RPAD(currency_code, 5, ' '), 0, 2) <> 'US'
would work 😅
j

Jackie

02/26/2021, 11:24 PM
You can probably try
substr(currency_code, 0, min(2, length(currency_code)))
🙌 1
Yeah, that works too
c

Chundong Wang

02/26/2021, 11:24 PM
that works too!
probably min/length is better than RPAD
Would such string be intern’ed?
j

Jackie

02/26/2021, 11:26 PM
What does
intern'ed
mean here?
c

Chundong Wang

02/26/2021, 11:26 PM
Single copy of immutable string that kind of intern (wiki)?
I’m wonder if
RPAD
would create more string in memory than arithmetic operations like min/strlen
j

Jackie

02/26/2021, 11:28 PM
Could be. We don't use
intern
explicitly
🙏 1
You can try both and see which one runs faster
c

Chundong Wang

02/26/2021, 11:29 PM
Got it. Thanks for the help!