Machhindra
05/26/2021, 3:41 AMMetric.avsc
===========
{
"namespace": "com.blah",
"name": "Metric",
"type": "record",
"fields": [{
"name": "product",
"type": ["string", "null"]
},{
"name": "productVersion",
"type": ["string", "null"]
},{
"name": "MetricSource",
"type": ["com.blah.MetricSource", "null"]
},{
"name": "metricPath",
"type":{
"type": "array",
"items": ["string", "null"]
}
}]
}
MetricSource.avsc
===========
{
"namespace": "com.blah",
"name": "MetricSource",
"type": "record",
"fields": [{
"name": "metric",
"type": ["string", "null"]
},{
"name": "metricValue",
"type": ["string", "null"]
},{
"name": "time",
"type": "long"
},{
"name": "timeOffset",
"type": "double"
},{
"name": "category",
"order": "ignore",
"type": ["null", "string"],
"default": null
},{
"name": "subCategory",
"order": "ignore",
"type": ["null", "string"],
"default": null
}
]
}
Neha Pawar
transformFunction: jsonPathString(MetricSource, '$.time', <default value>)
Machhindra
05/26/2021, 4:12 AMRK
05/26/2021, 5:06 AMMachhindra
05/26/2021, 6:02 PM{
"name": "metric",
"dataType": "STRING",
"transformFunction": "jsonPathString(MetricSource, '$.metricClass', 'null')"
}
Neha Pawar
Machhindra
05/26/2021, 6:06 PMNeha Pawar
Machhindra
05/26/2021, 6:07 PMmetric metricValue entityMap metricTime
======================================================================================
CPUTIME 2210 {"Plan":"myplan","Pkg":"mypkg", "Server":"myDb2"} 1622076960000
WAITTIME 12 {"Plan":"myplan","Pkg":"mypkg", "Server":"myDb2"} 1622076960000
MQBPUSE% 50 {"MqBPool":"2","MqQmgr":"MQA3"} 1622076960000
MQBPUSE% 110 {"MqBPool":"3","MqQmgr":"MQA3"} 1622076960000
WLMPI 0.5 {"ZosWmSysSum":"jobs"} 1622076960000
MQQFULL% 10 {"MqQueue":"myqueue","MqQmgr":"CSQ4"} 1622076960000
....
....
CPUTIME 2220 {"Plan":"myplan","Pkg":"mypkg", "Server":"myDb2"} 1622083650000
WAITTIME 12 {"Plan":"myplan","Pkg":"mypkg", "Server":"myDb2"} 1622083650000
MQBPUSE% 50 {"MqBPool":"2","MqQmgr":"MQA3"} 1622083650000
MQBPUSE% 110 {"MqBPool":"3","MqQmgr":"MQA3"} 1622083650000
WLMPI 0.5 {"ZosWmSysSum":"jobs"} 1622083650000
MQQFULL% 10 {"MqQueue":"myqueue","MqQmgr":"CSQ4"} 1622083650000
....
This is operational metric data. Just like Prometheus. 300K/Minute. There are different metrics, being collected at regular intervals. The metrics are measured for given entity in the entityMap. e.g CPUTIME metric is collected for a particular plan, package and Db2 server in the first row.
Here are some of the queries, I wish to fire on this table.
# Get me timeseries for the given metric, plan, package and server
select * from metricTable where metric='CPUTIME' and Plan='myplan' and Pkg='mypkg' and Server='myDb2' and metrictime < 1622076960000
# Get me two timeseries for the given two metric, plan, package and server
select * from metricTable where metric IN ['CPUTIME', 'WAITTIME'] and Plan='myplan' and Pkg='mypkg' and Server='myDb2' and metricTime < 1622076960000
# Get me all timeseries for the given plan irrespective of package or server
select * from metricTable where metric='CPUTIME' and Plan='myplan' and metricTime < 1622076960000
# Get me top 10 highest CPUTIME package for a given Server
# Get me average CPUTIME for packages for a given Server with time window of 10 minutes
# Get me list of Plans for a given Server
# Get total CPUTIME used for a given Server
How should I flatten the “entityMap” column so that I can satisfy my queries? Not sure if multi-valued column would be helpful.RK
05/27/2021, 5:38 AMNeha Pawar
Kishore G
Neha Pawar
Machhindra
05/27/2021, 4:21 PMSELECT ... FROM metricTable WHERE metric='CPUTIME' and JSON_MATCH(entityMap, '"$.Plan"=''myplan''')
Instead of
select * from metricTable where metric='CPUTIME' and Plan='myplan'
Neha Pawar
Machhindra
05/27/2021, 5:41 PMJackie
05/27/2021, 6:19 PMJSON
data type and simplify the query from SELECT ... FROM metricTable WHERE metric='CPUTIME' and JSON_MATCH(entityMap, '"$.Plan"=''myplan''')
to SELECT ... FROM metricTable WHERE metric='CPUTIME' and entityMap.Plan='myplan')
JSON_MATCH
within the superset?Xiang Fu
Machhindra
06/03/2021, 2:33 PM"jsonIndexColumns": [
"entityMap"
],
"autoGeneratedInvertedIndex": false,
"createInvertedIndexDuringSegmentGeneration": false,
"sortedColumn": [
"metric"
],
Here is how my table looks -
select * from metric_v6 where entityMap!='null' limit 10
category entityMap groupId metric
Type74Subtype1 {"Volser":"HFSMND"} 1aeadda1-86e1-4ee4-aa4e-d067a381db0a DCTAVG_VOLSER
Type74Subtype1 {"Volser":"NMD008"} 632afe50-a42c-4c15-b0cd-9c6a4e0e08b1 DRTAVG_VOLSER
Type74Subtype1 {"DeviceNo":"0x6E06"} eadee40b-3c02-49f8-a295-0b059606a3f5 DQTAVG_DEVNUM
Db2Subsys {"Db2Bufp":"BP42","Db2Ssid":"DH3G"} f2a9ccc7-6aa6-4713-8382-f7b96f62c124 IDBBREAD
Notice that the entityMap JSON object differs for each row. Following query fails -
SELECT * FROM metric_v6 WHERE JSON_MATCH(entityMap, '"$.Volser"=''HFSMND''')
[
{
"errorCode": 200,
"message": "QueryExecutionError:\njava.lang.RuntimeException: Caught exception while running CombinePlanNode.\n\tat org.apache.pinot.core.plan.CombinePlanNode.run(CombinePlanNode.java:157)\n\tat org.apache.pinot.core.plan.InstanceResponsePlanNode.run(InstanceResponsePlanNode.java:33)\n\tat org.apache.pinot.core.plan.GlobalPlanImplV0.execute(GlobalPlanImplV0.java:45)\n\tat org.apache.pinot.core.query.executor.ServerQueryExecutorV1Impl.processQuery(ServerQueryExecutorV1Impl.java:294)\n\tat org.apache.pinot.core.query.executor.ServerQueryExecutorV1Impl.processQuery(ServerQueryExecutorV1Impl.java:215)\n\tat org.apache.pinot.core.query.executor.QueryExecutor.processQuery(QueryExecutor.java:60)\n\tat org.apache.pinot.core.query.scheduler.QueryScheduler.processQueryAndSerialize(QueryScheduler.java:157)\n\tat org.apache.pinot.core.query.scheduler.QueryScheduler.lambda$createQueryFutureTask$0(QueryScheduler.java:141)\n\tat java.util.concurrent.FutureTask.run(FutureTask.java:266)\n\tat java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)\n\tat shaded.com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:111)\n\tat shaded.com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:58)\n\tat shaded.com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:75)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)"
}
]
After facing these challenges, I decided to create new REALTIME table with the indexes on from the very beginning. The number of rows are 2.8M vs 93M on the previous table. Now I dont receive the ERRORS but no result is shown either.'SELECT * FROM metric_v7 WHERE JSON_MATCH(entityMap, 'Db2Bufp=''BP20''')'
works now. Team Pinot recently changed the syntax of the JSON_MATCH which is not included in 0.7.1
select * from metric_v7 limit 10
[
{
"errorCode": 200,
"message": "QueryExecutionError:\njava.lang.IndexOutOfBoundsException\n\tat java.nio.Buffer.checkIndex(Buffer.java:551)\n\tat java.nio.DirectByteBuffer.getInt(DirectByteBuffer.java:684)\n\tat org.apache.pinot.core.segment.memory.PinotByteBuffer.getInt(PinotByteBuffer.java:144)\n\tat org.apache.pinot.core.io.util.FixedByteValueReaderWriter.getInt(FixedByteValueReaderWriter.java:35)\n\tat org.apache.pinot.core.segment.index.readers.sorted.SortedIndexReaderImpl.getDictId(SortedIndexReaderImpl.java:83)\n\tat org.apache.pinot.core.segment.index.readers.sorted.SortedIndexReaderImpl.readDictIds(SortedIndexReaderImpl.java:108)\n\tat org.apache.pinot.core.segment.index.readers.sorted.SortedIndexReaderImpl.readDictIds(SortedIndexReaderImpl.java:33)\n\tat org.apache.pinot.core.common.DataFetcher$ColumnValueReader.readStringValues(DataFetcher.java:439)\n\tat org.apache.pinot.core.common.DataFetcher.fetchStringValues(DataFetcher.java:146)\n\tat org.apache.pinot.core.common.DataBlockCache.getStringValuesForSVColumn(DataBlockCache.java:194)\n\tat org.apache.pinot.core.operator.docvalsets.ProjectionBlockValSet.getStringValuesSV(ProjectionBlockValSet.java:94)\n\tat org.apache.pinot.core.common.RowBasedBlockValueFetcher.createFetcher(RowBasedBlockValueFetcher.java:64)\n\tat org.apache.pinot.core.common.RowBasedBlockValueFetcher.<init>(RowBasedBlockValueFetcher.java:32)\n\tat org.apache.pinot.core.operator.query.SelectionOnlyOperator.getNextBlock(SelectionOnlyOperator.java:82)"
}
]
Jackie
06/04/2021, 5:23 PMMachhindra
06/04/2021, 5:25 PMJackie
06/08/2021, 4:20 PMMachhindra
06/08/2021, 4:53 PMWe don’t usually put sorted index on metricsCan you please explain why? here is my table. There are about 1500 unique metrics. These metrics are collected at 1 min+ interval. Most of the queries will involve metric in predicate.
metric metricValue entityMap metricTime
======================================================================================
CPUTIME 2210 {"Plan":"myplan","Pkg":"mypkg", "Server":"myDb2"} 1622076960000
WAITTIME 12 {"Plan":"myplan","Pkg":"mypkg", "Server":"myDb2"} 1622076960000
MQBPUSE% 50 {"MqBPool":"2","MqQmgr":"MQA3"} 1622076960000
MQBPUSE% 110 {"MqBPool":"3","MqQmgr":"MQA3"} 1622076960000
WLMPI 0.5 {"ZosWmSysSum":"jobs"} 1622076960000
MQQFULL% 10 {"MqQueue":"myqueue","MqQmgr":"CSQ4"} 1622076960000
....
....
CPUTIME 2220 {"Plan":"myplan","Pkg":"mypkg", "Server":"myDb2"} 1622083650000
WAITTIME 12 {"Plan":"myplan","Pkg":"mypkg", "Server":"myDb2"} 1622083650000
MQBPUSE% 50 {"MqBPool":"2","MqQmgr":"MQA3"} 1622083650000
MQBPUSE% 110 {"MqBPool":"3","MqQmgr":"MQA3"} 1622083650000
WLMPI 0.5 {"ZosWmSysSum":"jobs"} 1622083650000
MQQFULL% 10 {"MqQueue":"myqueue","MqQmgr":"CSQ4"} 1622083650000
....
Jackie
06/08/2021, 5:11 PMmetric
is actually a dimension columnMachhindra
06/08/2021, 5:17 PMCPUTIME WAITTIME MQBPUSE% ...'1400 metrics..' Plan Pkg Server MQBPool MqQmgr ZosWmSysSum .. metricTime
========================================================================================================================
2210 myplan mypkg myDb2 1622076960000
12 myplan mypkg myDb2 1622076960000
50 2 MQA3 1622076960000
This table has METRICS, DIMENSIONS and TIMESTAMP. Data is similar to prometheus metrics.Jackie
06/08/2021, 5:50 PMNeha Pawar
jsonPathString
from the initial suggestions for getting the fields out of entityMap? this is a finite set right Plan Pkg Server MQBPool MqQmgr ZosWmSysSum
?Machhindra
06/08/2021, 6:06 PMget me average of CPUTIME for PLAN Blah and PKG Blah for a time range
get me top 10 PLANS which has highest CPUTIME usage
get me list of PKGS where PLAN Blah
I need to expose metrics vs dimensions to user, so that with Superset they can easily create charts.
With Short Table and JSON_PATH and JSON Index
metric metricValue entityMap metricTime
======================================================================================
CPUTIME 2210 {"Plan":"myplan","Pkg":"mypkg", "Server":"myDb2"} 1622076960000
WAITTIME 12 {"Plan":"myplan","Pkg":"mypkg", "Server":"myDb2"} 1622076960000
Query is going to be -
SELECT AVG(metricValue) from metricTable where JSON_PATH(entityMap, 'Plan=''myplan''') and metric='CPUTIME';
With Wide Table. No JSON
CPUTIME WAITTIME MQBPUSE% ...'1400 metrics..' Plan Pkg Server MQBPool MqQmgr ZosWmSysSum .. metricTime
========================================================================================================================
2210 myplan mypkg myDb2 1622076960000
12 myplan mypkg myDb2 1622076960000
Query is going to be -
SELECT AVG(CPUTIME) from metricTable where Plan='myplan';
Neha Pawar
Jackie
06/08/2021, 6:10 PMSELECT AVG(metricValue) FROM metricTable WHERE JSON_MATCH(entityMap, 'Plan=''myplan''') and metric='CPUTIME'
with JSON indexMachhindra
06/08/2021, 6:12 PMhow many distinct “metric” do you expect? and how many distinct fields inside “entityMap” ?Distinct Metrics = 1500 Distinct fields inside “entityMap” = 80
Neha Pawar
Machhindra
06/08/2021, 6:24 PMThe query for short table should be:Yes. That’s what I am using now. In Superset, this is how user uses it -with JSON indexSELECT AVG(metricValue) FROM metricTable WHERE JSON_MATCH(entityMap, 'Plan=''myplan''') and metric='CPUTIME'
Xiang Fu
Plan
the expression is JSON_MATCH(entityMap, 'Plan=''myplan''')
Machhindra
06/09/2021, 4:47 AMif that is more natural, then you can extract them all, or maybe the important ones?Following query works. It allows me to filter the dimensions and metric and gets me the average metric value.
SELECT AVG(metricValue) FROM metricTable WHERE JSON_MATCH(entityMap, 'Plan=''myplan''') and metric='CPUTIME'
However above method does not help, if I want to create a new metric (CPUTIME * WAITTIME/10) . What will be the query?
The wide table enables that possibility. e.g -
SELECT CPUTIME * WAITTIME/1 AS newMetric FROM metricTable WHERE Plan='myplan';
OR
SELECT CPUTIME * WAITTIME/1 AS newMetric FROM metricTable WHERE JSON_MATCH(entityMap, 'Plan=''myplan''')
If we keep the dimensions in the JSON format then how would I list all the Plans available?
SELECT JSON_MATCH(entityMap, 'Plan=''myplan''') AS Plans FROM metricTable
Xiang Fu
CPUTIME * WAITTIME
, as they are not in the same record, so you need to align the docs then do join to make them same line then multiplier. You can use presto to achieve thisSELECT distinct(JSON_MATCH(entityMap, 'Plan=''myplan''')) AS Plans FROM metricTable
or SELECT JSON_MATCH(entityMap, 'Plan=''myplan''') AS Plans, count(*) as cnt FROM metricTable GROUP BY JSON_MATCH(entityMap, 'Plan=''myplan''') AS Plans
Machhindra
06/10/2021, 9:19 PMXiang Fu