Hi Team Regarding lookup Dimension Table and array data type Apache Pinot #troubleshooting

Hi Team, Regarding lookup/Dimension Table and arr...

Anish Nair

05/16/2022, 10:31 AM

Hi Team, Regarding lookup/Dimension Table and array data type use case. We have created a Dimension Table with following schema:

Copy code

{
  "schemaName": "test_dim_tags",
  "dimensionFieldSpecs": [
    {
      "name": "id",
      "dataType": "INT"
    },
    {
      "name": "tag_name",
      "dataType": "STRING",
      "singleValueField": false
    }
  ],
  "primaryKeyColumns": [
    "id"
  ]
}

Now when we use this table in lookup with Fact Table, query is returning no data or throwing NullPointerExpection. We wanted to use pinot's array explode functionality along with lookup. can someone please help to understand?

Richard Startin

05/16/2022, 10:43 AM

I believe this is a feature gap in lookup

Richard Startin

05/16/2022, 10:43 AM

I'll take a look and see if there are barriers to adding it

Anish Nair

05/16/2022, 10:50 AM

sure thanks @Richard Startin

Richard Startin

05/16/2022, 12:06 PM

featurewise it looks good, do you have a stack trace for the NPE?

Anish Nair

05/16/2022, 12:19 PM

Copy code

[
  {
    "message": "QueryExecutionError:\nProcessingException(errorCode:450, message:InternalError:\njava.lang.NullPointerException\n\tat org.apache.pinot.core.operator.combine.GroupByOrderByCombineOperator.mergeResults(GroupByOrderByCombineOperator.java:236)\n\tat org.apache.pinot.core.operator.combine.BaseCombineOperator.getNextBlock(BaseCombineOperator.java:119)\n\tat org.apache.pinot.core.operator.combine.BaseCombineOperator.getNextBlock(BaseCombineOperator.java:50)",
    "errorCode": 200
  },
  {
    "message": "QueryExecutionError:\nProcessingException(errorCode:450, message:InternalError:\njava.lang.NullPointerException\n\tat org.apache.pinot.core.operator.combine.GroupByOrderByCombineOperator.mergeResults(GroupByOrderByCombineOperator.java:242)\n\tat org.apache.pinot.core.operator.combine.BaseCombineOperator.getNextBlock(BaseCombineOperator.java:119)\n\tat org.apache.pinot.core.operator.combine.BaseCombineOperator.getNextBlock(BaseCombineOperator.java:50)",
    "errorCode": 200
  },
  {
    "message": "QueryExecutionError:\nProcessingException(errorCode:450, message:InternalError:\njava.lang.NullPointerException\n\tat org.apache.pinot.core.operator.combine.GroupByOrderByCombineOperator.mergeResults(GroupByOrderByCombineOperator.java:236)\n\tat org.apache.pinot.core.operator.combine.BaseCombineOperator.getNextBlock(BaseCombineOperator.java:119)\n\tat org.apache.pinot.core.operator.combine.BaseCombineOperator.getNextBlock(BaseCombineOperator.java:50)",
    "errorCode": 200
  },
  {
    "message": "QueryExecutionError:\nProcessingException(errorCode:450, message:InternalError:\njava.lang.NullPointerException\n\tat org.apache.pinot.core.operator.combine.GroupByOrderByCombineOperator.mergeResults(GroupByOrderByCombineOperator.java:236)\n\tat org.apache.pinot.core.operator.combine.BaseCombineOperator.getNextBlock(BaseCombineOperator.java:119)\n\tat org.apache.pinot.core.operator.combine.BaseCombineOperator.getNextBlock(BaseCombineOperator.java:50)",
    "errorCode": 200
  }
]

Richard Startin

05/16/2022, 12:26 PM

ok, this is most likely caused by the query being slow

Richard Startin

05/16/2022, 12:28 PM

are these lookups in unfiltered group bys?

Anish Nair

05/16/2022, 12:34 PM

we had few filter conditions , if that's what you are asking for.

Richard Startin

05/16/2022, 12:39 PM

can you remove the lookup from the query and post the response metadata (numDocsScanned etc.) please?

Anish Nair

05/16/2022, 12:41 PM

Copy code

"exceptions": [],
  "numServersQueried": 12,
  "numServersResponded": 12,
  "numSegmentsQueried": 569,
  "numSegmentsProcessed": 32,
  "numSegmentsMatched": 32,
  "numConsumingSegmentsQueried": 4,
  "numDocsScanned": 37273560,
  "numEntriesScannedInFilter": 88491445,
  "numEntriesScannedPostFilter": 260914920,
  "numGroupsLimitReached": false,
  "totalDocs": 5011102229,
  "timeUsedMs": 595,
  "offlineThreadCpuTimeNs": 0,
  "realtimeThreadCpuTimeNs": 0,
  "offlineSystemActivitiesCpuTimeNs": 0,
  "realtimeSystemActivitiesCpuTimeNs": 0,
  "offlineResponseSerializationCpuTimeNs": 0,
  "realtimeResponseSerializationCpuTimeNs": 0,
  "offlineTotalCpuTimeNs": 0,
  "realtimeTotalCpuTimeNs": 0,
  "segmentStatistics": [],
  "traceInfo": {},
  "minConsumingFreshnessTimeMs": 1652704731377,
  "numRowsResultSet": 350

Richard Startin

05/16/2022, 12:46 PM

ok so it's quite a heavy query, and then the lookup will make that worse because the approach employed is not very efficient, which makes timeout rather than feature incompleteness a more likely diagnosis

Richard Startin

05/16/2022, 12:47 PM

all I can say is lookup isn't powerful enough to power anything but the simplest and lightest weight join use cases, but the multi stage query engine will solve problems like this one

Anish Nair

05/16/2022, 12:55 PM

thats great. loking forward for it.

Anish Nair

05/16/2022, 1:54 PM

@Richard Startin one more thing with Dimension table, lookups starts to return null after sometime, we have to rerun the ingestion job to fix this, any know reason?

Open in Slack

Previous Next