Hi Team, Regarding lookup/Dimension Table and arr...
# troubleshooting
a
Hi Team, Regarding lookup/Dimension Table and array data type use case. We have created a Dimension Table with following schema:
Copy code
{
  "schemaName": "test_dim_tags",
  "dimensionFieldSpecs": [
    {
      "name": "id",
      "dataType": "INT"
    },
    {
      "name": "tag_name",
      "dataType": "STRING",
      "singleValueField": false
    }
  ],
  "primaryKeyColumns": [
    "id"
  ]
}
Now when we use this table in lookup with Fact Table, query is returning no data or throwing NullPointerExpection. We wanted to use pinot's array explode functionality along with lookup. can someone please help to understand?
r
I believe this is a feature gap in lookup
I'll take a look and see if there are barriers to adding it
a
sure thanks @Richard Startin
r
featurewise it looks good, do you have a stack trace for the NPE?
a
Copy code
[
  {
    "message": "QueryExecutionError:\nProcessingException(errorCode:450, message:InternalError:\njava.lang.NullPointerException\n\tat org.apache.pinot.core.operator.combine.GroupByOrderByCombineOperator.mergeResults(GroupByOrderByCombineOperator.java:236)\n\tat org.apache.pinot.core.operator.combine.BaseCombineOperator.getNextBlock(BaseCombineOperator.java:119)\n\tat org.apache.pinot.core.operator.combine.BaseCombineOperator.getNextBlock(BaseCombineOperator.java:50)",
    "errorCode": 200
  },
  {
    "message": "QueryExecutionError:\nProcessingException(errorCode:450, message:InternalError:\njava.lang.NullPointerException\n\tat org.apache.pinot.core.operator.combine.GroupByOrderByCombineOperator.mergeResults(GroupByOrderByCombineOperator.java:242)\n\tat org.apache.pinot.core.operator.combine.BaseCombineOperator.getNextBlock(BaseCombineOperator.java:119)\n\tat org.apache.pinot.core.operator.combine.BaseCombineOperator.getNextBlock(BaseCombineOperator.java:50)",
    "errorCode": 200
  },
  {
    "message": "QueryExecutionError:\nProcessingException(errorCode:450, message:InternalError:\njava.lang.NullPointerException\n\tat org.apache.pinot.core.operator.combine.GroupByOrderByCombineOperator.mergeResults(GroupByOrderByCombineOperator.java:236)\n\tat org.apache.pinot.core.operator.combine.BaseCombineOperator.getNextBlock(BaseCombineOperator.java:119)\n\tat org.apache.pinot.core.operator.combine.BaseCombineOperator.getNextBlock(BaseCombineOperator.java:50)",
    "errorCode": 200
  },
  {
    "message": "QueryExecutionError:\nProcessingException(errorCode:450, message:InternalError:\njava.lang.NullPointerException\n\tat org.apache.pinot.core.operator.combine.GroupByOrderByCombineOperator.mergeResults(GroupByOrderByCombineOperator.java:236)\n\tat org.apache.pinot.core.operator.combine.BaseCombineOperator.getNextBlock(BaseCombineOperator.java:119)\n\tat org.apache.pinot.core.operator.combine.BaseCombineOperator.getNextBlock(BaseCombineOperator.java:50)",
    "errorCode": 200
  }
]
r
ok, this is most likely caused by the query being slow
are these lookups in unfiltered group bys?
a
we had few filter conditions , if that's what you are asking for.
r
can you remove the lookup from the query and post the response metadata (numDocsScanned etc.) please?
a
Copy code
"exceptions": [],
  "numServersQueried": 12,
  "numServersResponded": 12,
  "numSegmentsQueried": 569,
  "numSegmentsProcessed": 32,
  "numSegmentsMatched": 32,
  "numConsumingSegmentsQueried": 4,
  "numDocsScanned": 37273560,
  "numEntriesScannedInFilter": 88491445,
  "numEntriesScannedPostFilter": 260914920,
  "numGroupsLimitReached": false,
  "totalDocs": 5011102229,
  "timeUsedMs": 595,
  "offlineThreadCpuTimeNs": 0,
  "realtimeThreadCpuTimeNs": 0,
  "offlineSystemActivitiesCpuTimeNs": 0,
  "realtimeSystemActivitiesCpuTimeNs": 0,
  "offlineResponseSerializationCpuTimeNs": 0,
  "realtimeResponseSerializationCpuTimeNs": 0,
  "offlineTotalCpuTimeNs": 0,
  "realtimeTotalCpuTimeNs": 0,
  "segmentStatistics": [],
  "traceInfo": {},
  "minConsumingFreshnessTimeMs": 1652704731377,
  "numRowsResultSet": 350
r
ok so it's quite a heavy query, and then the lookup will make that worse because the approach employed is not very efficient, which makes timeout rather than feature incompleteness a more likely diagnosis
all I can say is lookup isn't powerful enough to power anything but the simplest and lightest weight join use cases, but the multi stage query engine will solve problems like this one
a
thats great. loking forward for it.
@Richard Startin one more thing with Dimension table, lookups starts to return null after sometime, we have to rerun the ingestion job to fix this, any know reason?