https://pinot.apache.org/ logo
#troubleshooting
Title
# troubleshooting
a

Anish Nair

05/16/2022, 10:31 AM
Hi Team, Regarding lookup/Dimension Table and array data type use case. We have created a Dimension Table with following schema:
Copy code
{
  "schemaName": "test_dim_tags",
  "dimensionFieldSpecs": [
    {
      "name": "id",
      "dataType": "INT"
    },
    {
      "name": "tag_name",
      "dataType": "STRING",
      "singleValueField": false
    }
  ],
  "primaryKeyColumns": [
    "id"
  ]
}
Now when we use this table in lookup with Fact Table, query is returning no data or throwing NullPointerExpection. We wanted to use pinot's array explode functionality along with lookup. can someone please help to understand?
r

Richard Startin

05/16/2022, 10:43 AM
I believe this is a feature gap in lookup
I'll take a look and see if there are barriers to adding it
a

Anish Nair

05/16/2022, 10:50 AM
sure thanks @Richard Startin
r

Richard Startin

05/16/2022, 12:06 PM
featurewise it looks good, do you have a stack trace for the NPE?
a

Anish Nair

05/16/2022, 12:19 PM
Copy code
[
  {
    "message": "QueryExecutionError:\nProcessingException(errorCode:450, message:InternalError:\njava.lang.NullPointerException\n\tat org.apache.pinot.core.operator.combine.GroupByOrderByCombineOperator.mergeResults(GroupByOrderByCombineOperator.java:236)\n\tat org.apache.pinot.core.operator.combine.BaseCombineOperator.getNextBlock(BaseCombineOperator.java:119)\n\tat org.apache.pinot.core.operator.combine.BaseCombineOperator.getNextBlock(BaseCombineOperator.java:50)",
    "errorCode": 200
  },
  {
    "message": "QueryExecutionError:\nProcessingException(errorCode:450, message:InternalError:\njava.lang.NullPointerException\n\tat org.apache.pinot.core.operator.combine.GroupByOrderByCombineOperator.mergeResults(GroupByOrderByCombineOperator.java:242)\n\tat org.apache.pinot.core.operator.combine.BaseCombineOperator.getNextBlock(BaseCombineOperator.java:119)\n\tat org.apache.pinot.core.operator.combine.BaseCombineOperator.getNextBlock(BaseCombineOperator.java:50)",
    "errorCode": 200
  },
  {
    "message": "QueryExecutionError:\nProcessingException(errorCode:450, message:InternalError:\njava.lang.NullPointerException\n\tat org.apache.pinot.core.operator.combine.GroupByOrderByCombineOperator.mergeResults(GroupByOrderByCombineOperator.java:236)\n\tat org.apache.pinot.core.operator.combine.BaseCombineOperator.getNextBlock(BaseCombineOperator.java:119)\n\tat org.apache.pinot.core.operator.combine.BaseCombineOperator.getNextBlock(BaseCombineOperator.java:50)",
    "errorCode": 200
  },
  {
    "message": "QueryExecutionError:\nProcessingException(errorCode:450, message:InternalError:\njava.lang.NullPointerException\n\tat org.apache.pinot.core.operator.combine.GroupByOrderByCombineOperator.mergeResults(GroupByOrderByCombineOperator.java:236)\n\tat org.apache.pinot.core.operator.combine.BaseCombineOperator.getNextBlock(BaseCombineOperator.java:119)\n\tat org.apache.pinot.core.operator.combine.BaseCombineOperator.getNextBlock(BaseCombineOperator.java:50)",
    "errorCode": 200
  }
]
r

Richard Startin

05/16/2022, 12:26 PM
ok, this is most likely caused by the query being slow
are these lookups in unfiltered group bys?
a

Anish Nair

05/16/2022, 12:34 PM
we had few filter conditions , if that's what you are asking for.
r

Richard Startin

05/16/2022, 12:39 PM
can you remove the lookup from the query and post the response metadata (numDocsScanned etc.) please?
a

Anish Nair

05/16/2022, 12:41 PM
Copy code
"exceptions": [],
  "numServersQueried": 12,
  "numServersResponded": 12,
  "numSegmentsQueried": 569,
  "numSegmentsProcessed": 32,
  "numSegmentsMatched": 32,
  "numConsumingSegmentsQueried": 4,
  "numDocsScanned": 37273560,
  "numEntriesScannedInFilter": 88491445,
  "numEntriesScannedPostFilter": 260914920,
  "numGroupsLimitReached": false,
  "totalDocs": 5011102229,
  "timeUsedMs": 595,
  "offlineThreadCpuTimeNs": 0,
  "realtimeThreadCpuTimeNs": 0,
  "offlineSystemActivitiesCpuTimeNs": 0,
  "realtimeSystemActivitiesCpuTimeNs": 0,
  "offlineResponseSerializationCpuTimeNs": 0,
  "realtimeResponseSerializationCpuTimeNs": 0,
  "offlineTotalCpuTimeNs": 0,
  "realtimeTotalCpuTimeNs": 0,
  "segmentStatistics": [],
  "traceInfo": {},
  "minConsumingFreshnessTimeMs": 1652704731377,
  "numRowsResultSet": 350
r

Richard Startin

05/16/2022, 12:46 PM
ok so it's quite a heavy query, and then the lookup will make that worse because the approach employed is not very efficient, which makes timeout rather than feature incompleteness a more likely diagnosis
all I can say is lookup isn't powerful enough to power anything but the simplest and lightest weight join use cases, but the multi stage query engine will solve problems like this one
a

Anish Nair

05/16/2022, 12:55 PM
thats great. loking forward for it.
@Richard Startin one more thing with Dimension table, lookups starts to return null after sometime, we have to rerun the ingestion job to fix this, any know reason?