https://pinot.apache.org/ logo
Join Slack
Powered by
# fix-numerical-predicate
  • a

    Amrish Lal

    04/13/2021, 11:19 PM
    FYI: this is the change that I am looking at for keeping track of precomputed predicates on the Broker side (
    QueryOptimizer.optimize
    function for
    PinotQuery
    ). Basically, if we can precompute the value of an
    Expression
    or
    Function
    , then
    precomputed
    will contain that value. This will help in adding type support as discussed and also further predicate pruning and optimization in future. Let me know if there are suggestions or alternate ideas for this?
    Copy code
    alal@alal-mn1 amrish-pinot-1 % git diff pinot-common/src/thrift/query.thrift
    diff --git a/pinot-common/src/thrift/query.thrift b/pinot-common/src/thrift/query.thrift
    index 98329f5ac..5445ce645 100644
    --- a/pinot-common/src/thrift/query.thrift
    +++ b/pinot-common/src/thrift/query.thrift
    @@ -47,6 +47,7 @@ struct Expression {
      2: optional Function functionCall;
      3: optional Literal literal;
      4: optional Identifier identifier;
    + 5: optional Literal precomputed;
     }
     
     struct Identifier {
    @@ -67,4 +68,5 @@ union Literal {
     struct Function {
      1: required string operator;
      2: optional list<Expression> operands;
    + 3: optional Literal precomputed;
     }
  • j

    Jackie

    04/13/2021, 11:29 PM
    No, we should not have this
    precomputed
    field
  • j

    Jackie

    04/13/2021, 11:30 PM
    If the result can be pre-computed, we should either remove the predicate or short-circuit the query to directly return
  • a

    Amrish Lal

    04/13/2021, 11:48 PM
    ok, I was planning to do a two-pass expression tree traversal to do mark and remove, but let me see if it can be done in a single pass.
  • a

    Amrish Lal

    04/13/2021, 11:54 PM
    For short-circuit return, we would need to know whether the predicate is evaluating to false, so we would need to keep that information somewhere right?
  • a

    Amrish Lal

    04/13/2021, 11:58 PM
    So for example if the query is
    SELECT * from mytable WHERE intColumn = 5.5
    , the predicate here will always evaluate to false and hence the query will become
    SELECT * from mytable
    .
  • a

    Amrish Lal

    04/14/2021, 12:02 AM
    If we don't want to store precomputed values, then one option may be to rewrite the query to
    SELECT * from mytable WHERE FALSE
    and then short-circuit the query if predicate is
    FALSE
    ? There could be more complicated cases such as
    SELECT * FROM mytable WHERE FALSE AND (intColumn > 5 OR FALSE)
    . I think this approach will avoid adding precomputed to
    Expression
    and
    Function
    , etc while ensuring that we have the information that we need to short-circuit the query. Sounds ok?
  • a

    Amrish Lal

    04/14/2021, 12:04 AM
    The query
    SELECT * from mytable WHERE FALSE
    isn't a valid query, but its just a temporary form that we are using for optimization so should be ok.
  • j

    Jackie

    04/14/2021, 12:24 AM
    You can remove the predicate only when it evaluates to
    true
    or it evaluates to
    false
    as a child under
    AND
    a
    • 2
    • 3
  • j

    Jackie

    04/14/2021, 12:24 AM
    Short-circuit means we don't even need to send the query to the servers
  • j

    Jackie

    04/14/2021, 12:27 AM
    I'm thinking adding this info (always
    true
    or
    false
    ) into the return value of the
    FilterOptimizer
    to pass it back to the caller
  • j

    Jackie

    04/14/2021, 12:42 AM
    We can take small steps. To unblock the issue of parsing error, we can first only convert the integral value to a format without decimal part (
    1.0
    ->
    1
    )
    a
    • 2
    • 1
  • a

    Amrish Lal

    04/14/2021, 12:46 AM
    Let's go with rewriting predicates to TRUE and FALSE, so that
    SELECT * from mytable WHERE intColumn = 5.5
    , is rewritten to
    SELECT * from mytable WHERE FALSE
    and that will allow us to do everything including circuiting without adding precomputed.
  • a

    Amrish Lal

    04/14/2021, 12:47 AM
    I think this is a good solution and give us a clean way forward. Later on if someone comes with a better solution, then the existing code will be generic and clean enough to allow for easy refactoring.
  • j

    Jackie

    04/14/2021, 12:47 AM
    My question here is why do we need to send such query to the server?
  • a

    Amrish Lal

    04/14/2021, 12:47 AM
    We don't
  • a

    Amrish Lal

    04/14/2021, 12:47 AM
    🙂
  • a

    Amrish Lal

    04/14/2021, 12:47 AM
    if the predicate is known to be false, then we short-circuit as you mentioned earlier.
  • j

    Jackie

    04/14/2021, 12:48 AM
    You mean add a special filter expression as
    FALSE
    ?
  • a

    Amrish Lal

    04/14/2021, 12:48 AM
    Yes sir, for example in case of query
    SELECT * FROM mytable WHERE FALSE
    and if it the query gets rewritten to
    SELECT * FROM mytable WHERE TRUE
    then we drop the WHERE clause.
  • j

    Jackie

    04/14/2021, 12:50 AM
    Yeah, that works
  • j

    Jackie

    04/14/2021, 12:50 AM
    Currently we don't support this syntax, but I think it is valid sql
  • a

    Amrish Lal

    04/14/2021, 12:50 AM
    Yes, I think in some database (mysql if I am not mistaken) its valid.
  • a

    Amrish Lal

    04/14/2021, 12:51 AM
    cool 🙂 I think we are on the same page @User ^^
  • j

    Jackie

    04/14/2021, 12:53 AM
    To summarize, we want to do the followings: 1. Introduce
    TRUE
    and
    FALSE
    as valid predicate 2. Implement the filter optimizer to rewrite values 3. Short circuit the query if the predicate is
    FALSE
  • j

    Jackie

    04/14/2021, 12:54 AM
    I like the idea of adding predicate
    TRUE
    and
    FALSE
    to pass around the information
    👍 1
  • a

    Amrish Lal

    04/14/2021, 12:56 AM
    Yes, but I would qualify point 1 slightly to read
    Introduce TRUE and FALSE as valid predicate in Broker optimizer.
    As a separate ticket item, we could add SQL querying support for queries that contain TRUE/FALSE in WHERE clause.
  • j

    Jackie

    04/14/2021, 12:57 AM
    Sounds good
  • a

    Amrish Lal

    04/14/2021, 12:58 AM
    ok cool 🙂
  • k

    Kavin Kuppusamy

    11/06/2021, 5:55 PM
    @User has left the channel