https://pinot.apache.org/ logo
j

Jonathan Meyer

06/30/2021, 1:58 PM
Hello I'm seeing odd behavior with filtering (WHERE predicate) A column contains LONG values (single valued), and filtering only works for some of the values in that column Here's the data:
Here's a query that works:
m

Mayank

06/30/2021, 1:59 PM
IIRC, there was a UI issue that was causing this. Do you see the same issue when making curl calls to Pinot from shell?
The UI issue was resolved, but may not have made it to 0.7.1
j

Jonathan Meyer

06/30/2021, 2:00 PM
Here's one that doesn't (the id is a copy paste from the data, no doubt its valid)
@Mayank Thanks for the pointer, I'll check the API output
👍 1
Output really is empty from the API
m

Mayank

06/30/2021, 2:05 PM
Hmm, the UI calls the api internally
j

Jonathan Meyer

06/30/2021, 2:05 PM
Yes, and the API really returns no results for some of the ids
m

Mayank

06/30/2021, 2:06 PM
I see, are you saying that both api and UI don't work in the same way?
what's the query?
j

Jonathan Meyer

06/30/2021, 2:06 PM
They do, I'm saying it doesn't look like a UI bug
(FYI I'm on version
0.8.0-SNAPSHOT-46009e152b8f56c244e415beefa81dbc626de7cb
)
what's the query?
NOK:
select * from test_communities where entityId = 47923728534576490
OK:
select * from test_communities where entityId = 47923728366804330
Really odd...
@Mayank If you don't mind, it may be easier if I can do a short demo whenever you've got time
k

Ken Krugler

06/30/2021, 2:09 PM
What happens with
select * from test_communities where entityId = '47923728534576490'
?
j

Jonathan Meyer

06/30/2021, 2:09 PM
@Ken Krugler Same result (as in, no change in behavior for both queries)
m

Mayank

06/30/2021, 2:13 PM
Where did you get the id to query from? If the id was from UI, it might have been incorrect to begin with due to UI issue
j

Jonathan Meyer

06/30/2021, 2:13 PM
More context: • Table contains 17 documents • Table was created (definition, schema) using the API, data was fed via the
ingestFromFile
API endpoint (unit testing)
Where did you get the id to query from? If the id was from UI, it might have been incorrect to begin with due to UInissue
Indeed, I copy pasted from the UI Which works for one ID, but not the others Do you recommend me another way ?
m

Mayank

06/30/2021, 2:14 PM
JavaScript truncates values in wierd ways
Can you get the id using select * from curl command and then try to repro?
👍 1
j

Jonathan Meyer

06/30/2021, 2:20 PM
curl localhost:8099/query/sql -d '{"sql": "select * from test_communities"}' | jq .
Copy code
{
  "resultTable": {
    "dataSchema": {
      "columnNames": [
        "communityId",
        "entityId",
        "entityType",
        "parentDepartmentId"
      ],
      "columnDataTypes": [
        "STRING",
        "LONG",
        "STRING",
        "STRING"
      ]
    },
    "rows": [
      [
        "604795e710fc330f65e9100e",
        47923728366804330,   <<< TEST ID.1
        "user",
        "null"
      ],
      [
        "604795e710fc330f65e9100e",
        47923728534576490,   <<< TEST ID.2
        "user",
        "null"
      ],
      [
        "604795e710fc330f65e9100c",
        62535436887531140,
        "department",
        "de2ba60543be995f958e1e80"
      ],
      [
        "604795e710fc330f65e9100c",
        62535436887531080,
        "department",
        "de2ba60543be995f958e1e80"
      ],
      [
        "604795e710fc330f65e9100c",
        62535436954639944,
        "department",
        "de2ba60543be995f958e1e80"
      ],
      [
        "604795e710fc330f65e9100c",
        47923728685571440,
        "user",
        "de2ba60543be995f958e1e80"
      ],
      [
        "604795e710fc330f65e9100c",
        47923728803011950,
        "user",
        "de2ba60543be995f958e1e80"
      ],
      [
        "604795e710fc330f65e9100c",
        62535436887531080,
        "department",
        "de2ba60543be995f958e1e46"
      ],
      [
        "604795e710fc330f65e9100c",
        47923728685571440,
        "user",
        "de2ba60543be995f958e1e46"
      ],
      [
        "604795e710fc330f65e9100c",
        62535436954639944,
        "department",
        "de2ba60943be995f958e1e47"
      ]
    ]
  },
  "exceptions": [],
  "numServersQueried": 1,
  "numServersResponded": 1,
  "numSegmentsQueried": 1,
  "numSegmentsProcessed": 1,
  "numSegmentsMatched": 1,
  "numConsumingSegmentsQueried": 0,
  "numDocsScanned": 10,
  "numEntriesScannedInFilter": 0,
  "numEntriesScannedPostFilter": 40,
  "numGroupsLimitReached": false,
  "totalDocs": 17,
  "timeUsedMs": 4,
  "offlineThreadCpuTimeNs": 398035,
  "realtimeThreadCpuTimeNs": 0,
  "segmentStatistics": [],
  "traceInfo": {},
  "minConsumingFreshnessTimeMs": 0,
  "numRowsResultSet": 10
}
Now take the first returned ID (
entityid
) ->
47923728366804330
Query :
curl localhost:8099/query/sql -d '{"sql": "select * from test_communities WHERE entityId = 47923728366804330"}' | jq .
Copy code
{
  "resultTable": {
    "dataSchema": {
      "columnNames": [
        "communityId",
        "entityId",
        "entityType",
        "parentDepartmentId"
      ],
      "columnDataTypes": [
        "STRING",
        "LONG",
        "STRING",
        "STRING"
      ]
    },
    "rows": [
      [
        "604795e710fc330f65e9100e",
        47923728366804330,
        "user",
        "null"
      ]
    ]
  },
  "exceptions": [],
  "numServersQueried": 1,
  "numServersResponded": 1,
  "numSegmentsQueried": 1,
  "numSegmentsProcessed": 1,
  "numSegmentsMatched": 1,
  "numConsumingSegmentsQueried": 0,
  "numDocsScanned": 1,
  "numEntriesScannedInFilter": 17,
  "numEntriesScannedPostFilter": 4,
  "numGroupsLimitReached": false,
  "totalDocs": 17,
  "timeUsedMs": 4,
  "offlineThreadCpuTimeNs": 342519,
  "realtimeThreadCpuTimeNs": 0,
  "segmentStatistics": [],
  "traceInfo": {},
  "minConsumingFreshnessTimeMs": 0,
  "numRowsResultSet": 1
}
-> OK Now take the second ID ->
47923728534576490
Query :
curl localhost:8099/query/sql -d '{"sql": "select * from test_communities WHERE entityId = 47923728534576490"}' | jq .
Copy code
{
  "resultTable": {
    "dataSchema": {
      "columnNames": [
        "communityId",
        "entityId",
        "entityType",
        "parentDepartmentId"
      ],
      "columnDataTypes": [
        "STRING",
        "LONG",
        "STRING",
        "STRING"
      ]
    },
    "rows": []
  },
  "exceptions": [],
  "numServersQueried": 1,
  "numServersResponded": 1,
  "numSegmentsQueried": 1,
  "numSegmentsProcessed": 1,
  "numSegmentsMatched": 0,
  "numConsumingSegmentsQueried": 0,
  "numDocsScanned": 0,
  "numEntriesScannedInFilter": 0,
  "numEntriesScannedPostFilter": 0,
  "numGroupsLimitReached": false,
  "totalDocs": 17,
  "timeUsedMs": 10,
  "offlineThreadCpuTimeNs": 383400,
  "realtimeThreadCpuTimeNs": 0,
  "segmentStatistics": [],
  "traceInfo": {},
  "minConsumingFreshnessTimeMs": 0,
  "numRowsResultSet": 0
}
-> NOK
Pretty surprising 😮
Logs don't show much but here they are
m

Mayank

06/30/2021, 2:26 PM
I missed where did you get
47923728366804330
and
47923728534576490
from?
j

Jonathan Meyer

06/30/2021, 2:26 PM
From the query results of the first query, column
entityId
(so, 2sd element from the arrays)
Here ->
Copy code
"rows": [
      [
        "604795e710fc330f65e9100e",
        >>>>> 47923728366804330 <<<<<,
        "user",
        "null"
      ],
I can repro the issue after dropping the table, schema & data and setting things back up I'll try to restart the cluster, start it from scratch, using the latest Docker image => Same behavior 😕
m

Mayank

06/30/2021, 3:03 PM
Can you file an issue with steps to reproduce? I can get to it later today/tomorrow
j

Jonathan Meyer

06/30/2021, 3:04 PM
@Mayank Will do, okay
m

Mayank

06/30/2021, 3:12 PM
Cc @Jackie
j

Jackie

06/30/2021, 4:44 PM
@Jonathan Meyer Where did you get the json response? If it is from the query console "showing json", then the value is already truncated (javascript only preserve 52 precision bits)
Here is the issue: https://github.com/apache/incubator-pinot/issues/5829 It is not fixed yet
✔️ 1
Can you try querying broker directly using curl: https://docs.pinot.apache.org/users/api/querying-pinot-using-standard-sql
It should give the accurate result
j

Jonathan Meyer

06/30/2021, 4:47 PM
@Jackie That's what I've tried too
j

Jackie

06/30/2021, 4:51 PM
Hmm, how about without
jq
?
j

Jonathan Meyer

06/30/2021, 4:53 PM
It's only for formatting the result Not sure I see then, which query do you want me to try without it ?
j

Jackie

06/30/2021, 4:54 PM
Copy code
jackie@MacBook-Pro ~ % echo '12345543211234554321' | jq .
12345543211234554000
I think it also truncates the value
j

Jonathan Meyer

06/30/2021, 4:54 PM
Ooooh interesting !
Let me try, nice catch
@Jackie @Mayank @Ken Krugler Sorry for the disturbance, @Jackie is right,
jq
was having the same issue as Pinot's UI, altering the LONG values...
So, no issue, case solved 🙂
Thanks again @Jackie
😉 1