Why the datetimeconvert transform function is much slower th Apache Pinot #general

Why the datetimeconvert transform function is much...

troywinter

05/13/2021, 2:22 PM

Why the datetimeconvert transform function is much slower than time_floor in Druid? We are migrating our Druid table to Pinot, but found datetime transform and granularity is very slow compare to Druid.

Kishore G

05/13/2021, 2:26 PM

what is the conversion? simple date format to epoch?

troywinter

05/13/2021, 2:33 PM

Epoch to epoch with 15 minutes granularity.

Kishore G

05/13/2021, 2:43 PM

that should be fast, whats the query

troywinter

05/13/2021, 2:45 PM

Copy code

SELECT DATETIMECONVERT(datetimeconvert(__time, '1:MILLISECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '30:MINUTES'), '1:MILLISECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '1:MINUTES'),
       COUNT(*)
FROM product_log
WHERE datetimeconvert(__time, '1:MILLISECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '30:MINUTES') >= 1620830640000
  AND datetimeconvert(__time, '1:MILLISECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '30:MINUTES') < 1620917040000
  AND method = 'DeviceInternalService.CheckDeviceInSameGroup'
  AND container_name = 'whale-device'
  AND error > '0'
GROUP BY DATETIMECONVERT(datetimeconvert(__time, '1:MILLISECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '30:MINUTES'), '1:MILLISECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '1:MINUTES')
ORDER BY COUNT(*) DESC

troywinter

05/13/2021, 2:47 PM

Copy code

SELECT datetimeconvert(__time, '1:MILLISECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '30:MINUTES'),
       COUNT(*)
FROM product_log
WHERE datetimeconvert(__time, '1:MILLISECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '30:MINUTES') >= 1620830760000
  AND datetimeconvert(__time, '1:MILLISECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '30:MINUTES') < 1620917160000
  AND method = 'DeviceInternalService.CheckDeviceInSameGroup'
  AND container_name = 'whale-device'
  AND error > '0'
GROUP BY datetimeconvert(__time, '1:MILLISECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '30:MINUTES')
ORDER BY COUNT(*) DESC

The cost is similar if I deselect the 1 minute time grain in Superset.

troywinter

05/13/2021, 2:48 PM

Druid returned result under 1 second, but Pinot cost 5-6 seconds.

Kishore G

05/13/2021, 2:48 PM

can you paste the response

Kishore G

05/13/2021, 2:49 PM

I think segment pruning is not happening,

troywinter

05/13/2021, 2:51 PM

Copy code

{
  "resultTable": {
    "dataSchema": {
      "columnDataTypes": [
        "LONG",
        "LONG"
      ],
      "columnNames": [
        "datetimeconvert(__time,'1:MILLISECONDS:EPOCH','1:MILLISECONDS:EPOCH','30:MINUTES')",
        "count(*)"
      ]
    },
    "rows": [
      [
        1620873000000,
        180
      ],
      [
        1620869400000,
        179
      ],
      [
        1620871200000,
        178
      ],
      [
        1620894600000,
        172
      ],
      [
        1620892800000,
        166
      ],
      [
        1620874800000,
        164
      ],
      [
        1620876600000,
        163
      ],
      [
        1620896400000,
        163
      ],
      [
        1620867600000,
        162
      ],
      [
        1620885600000,
        161
      ]
    ]
  },
  "exceptions": [],
  "numServersQueried": 1,
  "numServersResponded": 1,
  "numSegmentsQueried": 41,
  "numSegmentsProcessed": 41,
  "numSegmentsMatched": 12,
  "numConsumingSegmentsQueried": 3,
  "numDocsScanned": 7706,
  "numEntriesScannedInFilter": 195554753,
  "numEntriesScannedPostFilter": 7706,
  "numGroupsLimitReached": false,
  "totalDocs": 165272282,
  "timeUsedMs": 2335,
  "segmentStatistics": [],
  "traceInfo": {},
  "minConsumingFreshnessTimeMs": 1620917392724
}

Kishore G

05/13/2021, 2:52 PM

Copy code

"numSegmentsProcessed": 41,
    "numSegmentsMatched": 12,
    "numConsumingSegmentsQueried": 3,
    "numDocsScanned": 7706,
    "numEntriesScannedInFilter": 195554753,
    "numEntriesScannedPostFilter": 7706,

Kishore G

05/13/2021, 2:52 PM

this should be processed in milliseconds

Kishore G

05/13/2021, 2:53 PM

@Jackie looks like if the udf is in the predicate the pruning is not efficient

Kishore G

05/13/2021, 2:54 PM

@troywinter can you file an issue

troywinter

05/13/2021, 2:55 PM

ok, I will file an issue for this.

Kishore G

05/13/2021, 2:58 PM

Copy code

SELECT datetimeconvert(__time, '1:MILLISECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '30:MINUTES'),
       COUNT(*)
FROM product_log
WHERE __time >= 1620830760000
  AND __time < 1620917160000
  AND method = 'DeviceInternalService.CheckDeviceInSameGroup'
  AND container_name = 'whale-device'
  AND error > '0'
GROUP BY datetimeconvert(__time, '1:MILLISECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '30:MINUTES')
ORDER BY COUNT(*) DESC

Kishore G

05/13/2021, 2:59 PM

whats the time for this query

troywinter

05/13/2021, 3:01 PM

Copy code

{
  "resultTable": {
    "dataSchema": {
      "columnDataTypes": [
        "LONG",
        "LONG"
      ],
      "columnNames": [
        "datetimeconvert(__time,'1:MILLISECONDS:EPOCH','1:MILLISECONDS:EPOCH','30:MINUTES')",
        "count(*)"
      ]
    },
    "rows": [
      [
        1620873000000,
        180
      ],
      [
        1620869400000,
        179
      ],
      [
        1620871200000,
        178
      ],
      [
        1620894600000,
        172
      ],
      [
        1620892800000,
        166
      ],
      [
        1620874800000,
        164
      ],
      [
        1620876600000,
        163
      ],
      [
        1620896400000,
        163
      ],
      [
        1620867600000,
        162
      ],
      [
        1620865800000,
        161
      ]
    ]
  },
  "exceptions": [],
  "numServersQueried": 1,
  "numServersResponded": 1,
  "numSegmentsQueried": 41,
  "numSegmentsProcessed": 12,
  "numSegmentsMatched": 12,
  "numConsumingSegmentsQueried": 3,
  "numDocsScanned": 7770,
  "numEntriesScannedInFilter": 68503679,
  "numEntriesScannedPostFilter": 7770,
  "numGroupsLimitReached": false,
  "totalDocs": 165381107,
  "timeUsedMs": 647,
  "segmentStatistics": [],
  "traceInfo": {},
  "minConsumingFreshnessTimeMs": 1620917833431
}

This one only takes 647ms to finish.

Open in Slack

Previous Next