Why the datetimeconvert transform function is much...
# general
t
Why the datetimeconvert transform function is much slower than time_floor in Druid? We are migrating our Druid table to Pinot, but found datetime transform and granularity is very slow compare to Druid.
k
what is the conversion? simple date format to epoch?
t
Epoch to epoch with 15 minutes granularity.
k
that should be fast, whats the query
t
Copy code
SELECT DATETIMECONVERT(datetimeconvert(__time, '1:MILLISECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '30:MINUTES'), '1:MILLISECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '1:MINUTES'),
       COUNT(*)
FROM product_log
WHERE datetimeconvert(__time, '1:MILLISECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '30:MINUTES') >= 1620830640000
  AND datetimeconvert(__time, '1:MILLISECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '30:MINUTES') < 1620917040000
  AND method = 'DeviceInternalService.CheckDeviceInSameGroup'
  AND container_name = 'whale-device'
  AND error > '0'
GROUP BY DATETIMECONVERT(datetimeconvert(__time, '1:MILLISECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '30:MINUTES'), '1:MILLISECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '1:MINUTES')
ORDER BY COUNT(*) DESC
Copy code
SELECT datetimeconvert(__time, '1:MILLISECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '30:MINUTES'),
       COUNT(*)
FROM product_log
WHERE datetimeconvert(__time, '1:MILLISECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '30:MINUTES') >= 1620830760000
  AND datetimeconvert(__time, '1:MILLISECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '30:MINUTES') < 1620917160000
  AND method = 'DeviceInternalService.CheckDeviceInSameGroup'
  AND container_name = 'whale-device'
  AND error > '0'
GROUP BY datetimeconvert(__time, '1:MILLISECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '30:MINUTES')
ORDER BY COUNT(*) DESC
The cost is similar if I deselect the 1 minute time grain in Superset.
Druid returned result under 1 second, but Pinot cost 5-6 seconds.
k
can you paste the response
I think segment pruning is not happening,
t
Copy code
{
  "resultTable": {
    "dataSchema": {
      "columnDataTypes": [
        "LONG",
        "LONG"
      ],
      "columnNames": [
        "datetimeconvert(__time,'1:MILLISECONDS:EPOCH','1:MILLISECONDS:EPOCH','30:MINUTES')",
        "count(*)"
      ]
    },
    "rows": [
      [
        1620873000000,
        180
      ],
      [
        1620869400000,
        179
      ],
      [
        1620871200000,
        178
      ],
      [
        1620894600000,
        172
      ],
      [
        1620892800000,
        166
      ],
      [
        1620874800000,
        164
      ],
      [
        1620876600000,
        163
      ],
      [
        1620896400000,
        163
      ],
      [
        1620867600000,
        162
      ],
      [
        1620885600000,
        161
      ]
    ]
  },
  "exceptions": [],
  "numServersQueried": 1,
  "numServersResponded": 1,
  "numSegmentsQueried": 41,
  "numSegmentsProcessed": 41,
  "numSegmentsMatched": 12,
  "numConsumingSegmentsQueried": 3,
  "numDocsScanned": 7706,
  "numEntriesScannedInFilter": 195554753,
  "numEntriesScannedPostFilter": 7706,
  "numGroupsLimitReached": false,
  "totalDocs": 165272282,
  "timeUsedMs": 2335,
  "segmentStatistics": [],
  "traceInfo": {},
  "minConsumingFreshnessTimeMs": 1620917392724
}
k
Copy code
"numSegmentsProcessed": 41,
    "numSegmentsMatched": 12,
    "numConsumingSegmentsQueried": 3,
    "numDocsScanned": 7706,
    "numEntriesScannedInFilter": 195554753,
    "numEntriesScannedPostFilter": 7706,
this should be processed in milliseconds
@Jackie looks like if the udf is in the predicate the pruning is not efficient
@troywinter can you file an issue
t
ok, I will file an issue for this.
k
Copy code
SELECT datetimeconvert(__time, '1:MILLISECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '30:MINUTES'),
       COUNT(*)
FROM product_log
WHERE __time >= 1620830760000
  AND __time < 1620917160000
  AND method = 'DeviceInternalService.CheckDeviceInSameGroup'
  AND container_name = 'whale-device'
  AND error > '0'
GROUP BY datetimeconvert(__time, '1:MILLISECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '30:MINUTES')
ORDER BY COUNT(*) DESC
whats the time for this query
t
Copy code
{
  "resultTable": {
    "dataSchema": {
      "columnDataTypes": [
        "LONG",
        "LONG"
      ],
      "columnNames": [
        "datetimeconvert(__time,'1:MILLISECONDS:EPOCH','1:MILLISECONDS:EPOCH','30:MINUTES')",
        "count(*)"
      ]
    },
    "rows": [
      [
        1620873000000,
        180
      ],
      [
        1620869400000,
        179
      ],
      [
        1620871200000,
        178
      ],
      [
        1620894600000,
        172
      ],
      [
        1620892800000,
        166
      ],
      [
        1620874800000,
        164
      ],
      [
        1620876600000,
        163
      ],
      [
        1620896400000,
        163
      ],
      [
        1620867600000,
        162
      ],
      [
        1620865800000,
        161
      ]
    ]
  },
  "exceptions": [],
  "numServersQueried": 1,
  "numServersResponded": 1,
  "numSegmentsQueried": 41,
  "numSegmentsProcessed": 12,
  "numSegmentsMatched": 12,
  "numConsumingSegmentsQueried": 3,
  "numDocsScanned": 7770,
  "numEntriesScannedInFilter": 68503679,
  "numEntriesScannedPostFilter": 7770,
  "numGroupsLimitReached": false,
  "totalDocs": 165381107,
  "timeUsedMs": 647,
  "segmentStatistics": [],
  "traceInfo": {},
  "minConsumingFreshnessTimeMs": 1620917833431
}
This one only takes 647ms to finish.