Grace Lu
02/07/2022, 8:33 PMusage_test
which showing good state in pinot and queryable through the pinot console, but when I run simple select query in Trino, like
select * from pinot_1.default.usage_test limit 10
,
query will fail with error like
null value in entry: Server_pinot-server-2.pinot-server-headless.pinot.svc.cluster.local_8098=null
,
but when I run aggregation queries like
select count(*), hostname from pinot_1.default.usage_test group by hostname
query can succeed in Trino without issue.
Does anyone have clue about this issue? Thanks in advance! 🙏Sergii Balganbaiev
02/07/2022, 8:36 PMLuis Fernandez
02/07/2022, 10:22 PM},
"tenants": {
"broker": "DefaultTenant",
"server": "DefaultTenant",
"tagOverrideConfig": {
"realtimeCompleted": "DefaultTenant_OFFLINE"
}
}
2. is there anywhere I can see a log that this is in fact working I have setup the configs but I’m unsure as to how to tell it’s doing what it’s supposed to be doing
3. documentation is a little bit misleading in the sense of the new updates that we have done to pinot as well as different examples doing different things that are not explained in the documentationRavishankar Nair
02/08/2022, 4:54 PMShadab Anwar
02/09/2022, 10:01 AMAyush Kumar Jha
02/09/2022, 11:44 AMSandeep R
02/09/2022, 11:04 PMselect count(*) from uapi-testing;
Awadesh Kumar
02/10/2022, 7:11 AMAnish Nair
02/10/2022, 8:25 AMselect b.entity_name, sum(metric1) metric1
from pinot_hybrid_table a
join dim_test_table b
ON a.entity_id = b.entity_id
WHERE a.time_column = '2022020800'
group by b.entity_name;
getting following error on presto cli: Query 20220210_075959_00003_9hwvs failed: Unsupported data table version: 3
Can someone help?Grace Walkuski
02/10/2022, 4:57 PM-Infinity
is returned?Anand Sainath
02/10/2022, 5:57 PMLuis Fernandez
02/10/2022, 8:48 PMsmallest
and largest
options affect pinot, what if we needed to reprocess data for some reason that was corrupted and needed to make it reprocess some of it, would the better route here go thru an offline table and somehow sync it from another source? are there any capabilities for realtime tables around this?James Mnatzaganian
02/10/2022, 9:03 PMmvn install package -DskipTests -Pbin-dist -Djdk.version=8
. After building, I took the contents of pinot-distribution/target/apache-pinot-0.9.3-bin.tar.gz
and put them on HDFS (enabling all nodes to have access to the jars). I've been following this doc for general guidance. Running spark-submit
results in Exception in thread "main" java.lang.NoSuchMethodException: org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand.main
.
Any advice? Is there a way to validate that my build is valid? My current thought is that it's either a bad build or I need to push the jars to each node and reference them locally instead of through HDFS.troywinter
02/11/2022, 4:20 AMWITH tmp_table_vmrss_inc_steps AS (
WITH t1 AS (
SELECT device_sn, event_time, ts, vmrss
, row_number() OVER (PARTITION BY device_sn, event_time ORDER BY ts ASC) AS rank
FROM (
SELECT format_datetime(from_unixtime((ts - ts % (60 * 60 * 1000)) / 1000 + 8 * 60 * 60), 'yyyy-MM-dd HH:mm:00') AS event_time
, ts, device_sn, vmrss
FROM pad_streamer_track_info
WHERE ts >= 1644494400000
AND ts < 1644498000000
) t
),
t2 AS (
SELECT device_sn, event_time, ts, vmrss
, rank - 1 AS rank
FROM t1
WHERE rank > 1
)
SELECT device_sn, event_time
, sum(CASE
WHEN diff > 0 THEN 1
ELSE 0
END) AS vmrss_inc_steps
FROM (
SELECT t2.ts, t2.device_sn, t2.event_time, t2.vmrss - t1.vmrss AS diff
FROM t2
LEFT JOIN t1
ON t1.device_sn = t2.device_sn
AND t1.event_time = t2.event_time
AND t2.rank = t1.rank
) a
GROUP BY device_sn, event_time
),
tmp_table_device_report_num AS (
SELECT device_sn, event_time, count(vmrss) AS device_report_num
FROM (
SELECT format_datetime(from_unixtime((ts - ts % (60 * 60 * 1000)) / 1000 + 8 * 60 * 60), 'yyyy-MM-dd HH:mm:00') AS event_time
, ts, device_sn, vmrss
FROM pad_streamer_track_info
WHERE ts >= 1644494400000
AND ts < 1644498000000
) t
GROUP BY device_sn, event_time
),
all_table AS (
SELECT device_sn, event_time
FROM (
SELECT device_sn, event_time
FROM tmp_table_vmrss_inc_steps
UNION ALL
SELECT device_sn, event_time
FROM tmp_table_device_report_num
)
GROUP BY device_sn, event_time
)
SELECT all_table.device_sn, all_table.event_time, coalesce(tmp_table_vmrss_inc_steps.vmrss_inc_steps, 0) AS vmrss_inc_steps
, coalesce(tmp_table_device_report_num.device_report_num, 0) AS device_report_num
FROM all_table
LEFT JOIN tmp_table_vmrss_inc_steps
ON all_table.device_sn = tmp_table_vmrss_inc_steps.device_sn
AND all_table.event_time = tmp_table_vmrss_inc_steps.event_time
LEFT JOIN tmp_table_device_report_num
ON all_table.device_sn = tmp_table_device_report_num.device_sn
AND all_table.event_time = tmp_table_device_report_num.event_time
LIMIT 1000
Shivam Sajwan
02/11/2022, 7:41 AMDiana Arnos
02/11/2022, 10:42 AM#13 256.9 Downloading from central: <https://repo.maven.apache.org/maven2/org/codehaus/groovy/groovy-all/2.4.21/groovy-all-2.4.21.jar>
Progress (4): 0.8/1.4 MB | 349 kB | 426/588 kB | 565/632 kB
#13 257.0 [output clipped, log limit 1MiB reached]
------
executor failed running [/bin/sh -c git clone ${PINOT_GIT_URL} ${PINOT_BUILD_DIR} && cd ${PINOT_BUILD_DIR} && git checkout ${PINOT_BRANCH} && mvn install package -DskipTests -Pbin-dist -Pbuild-shaded-jar -Dkafka.version=${KAFKA_VERSION} -Djdk.version=${JDK_VERSION} && mkdir -p ${PINOT_HOME}/configs && mkdir -p ${PINOT_HOME}/data && cp -r pinot-distribution/target/apache-pinot-*-bin/apache-pinot-*-bin/* ${PINOT_HOME}/. && chmod +x ${PINOT_HOME}/bin/*.sh]: exit code: 1
Anand Sainath
02/14/2022, 6:02 AMAnand Sainath
02/14/2022, 6:44 AMSELECT Count(*), SUM(COUNT(*)) FROM ... GROUP BY COL_1
The above syntax seems invalid for Pinot/calcite. What’s the best way to express this?Ali Atıl
02/14/2022, 1:11 PMsum
, max
, min
, etc) for each column defined in metricFieldsSpec for offline table segments? Can i achieve pre-aggregation for a single column without Star-tree index?Diana Arnos
02/14/2022, 2:26 PMAyush Kumar Jha
02/14/2022, 4:12 PMPartial Responses
in my prometheus metrics that are getting emitted.Can anyone tell me what I am doing wrong here??Prashant Korade
02/14/2022, 4:58 PMJames Mnatzaganian
02/14/2022, 8:05 PMFailed to generate Pinot segment for file - <s3://REDACTED.snappy.parquet>
java.lang.NullPointerException: null
at shaded.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:770) ~[pinot-all-0.9.3-jar-with-dependencies.jar:0.9.3-e23f213cf0d16b1e9e086174d734a4db868542cb]
at org.apache.pinot.segment.local.utils.CrcUtils.getAllNormalFiles(CrcUtils.java:63) ~[pinot-all-0.9.3-jar-with-dependencies.jar:0.9.3-e23f213cf0d16b1e9e086174d734a4db868542cb]
at org.apache.pinot.segment.local.utils.CrcUtils.forAllFilesInFolder(CrcUtils.java:52) ~[pinot-all-0.9.3-jar-with-dependencies.jar:0.9.3-e23f213cf0d16b1e9e086174d734a4db868542cb]
at org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.handlePostCreation(SegmentIndexCreationDriverImpl.java:314) ~[pinot-all-0.9.3-jar-with-dependencies.jar:0.9.3-e23f213cf0d16b1e9e086174d734a4db868542cb]
at org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.build(SegmentIndexCreationDriverImpl.java:258) ~[pinot-all-0.9.3-jar-with-dependencies.jar:0.9.3-e23f213cf0d16b1e9e086174d734a4db868542cb]
at org.apache.pinot.plugin.ingestion.batch.common.SegmentGenerationTaskRunner.run(SegmentGenerationTaskRunner.java:119) ~[pinot-all-0.9.3-jar-with-dependencies.jar:0.9.3-e23f213cf0d16b1e9e086174d734a4db868542cb]
at org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner.lambda$submitSegmentGenTask$1(SegmentGenerationJobRunner.java:263) ~[pinot-batch-ingestion-standalone-0.9.3-shaded.jar:0.9.3-e23f213cf0d16b1e9e086174d734a4db868542cb]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
at java.lang.Thread.run(Thread.java:833) [?:?]
Jagannath Timma
02/15/2022, 10:13 PMPeter Pringle
02/16/2022, 11:18 AMJacob M
02/16/2022, 7:45 PMcom.clearspring.analytics
and generating my HLL like this:
val hll = new HyperLogLog(12)
hll.offer(rows.filter(p => p.customer.isDefined).map(p => p.customer.get))
val serializedHll = ObjectSerDeUtils.HYPER_LOG_LOG_SER_DE.serialize(hll)
it seems to ~work ok but when i generate segments i'm getting some errors that I'm not really sure how to debug. any thoughts?
java.lang.RuntimeException: Caught exception while de-serializing HyperLogLog
Suppressed: java.lang.NullPointerException atorg.apache.pinot.segment.local.startree.v2.builder.OffHeapSingleTreeBuilder.close(OffHeapSingleTreeBuilder.java:346)
at org.apache.pinot.segment.local.startree.v2.builder.MultipleTreesBuilder.build(MultipleTreesBuilder.java:143)
and when i look at where the null pointer is coming from, it seems to be from _starTreeRecordBuffer.close()
which seems odd?Sumit Lakra
02/17/2022, 5:43 AMkaivalya apte
02/17/2022, 10:08 AMgroup.id
I see on the pinot server logs is null
. Is there a config I could set to have an assigned group id? I tried setting stream.kafka.consumer.prop.group.id
on the streamConfigs
, but it didn’t work. Thankskaivalya apte
02/17/2022, 10:25 AM[
{
"message": "null:\n6 segments [pinotemails__72__62__20220217T0806Z, pinotemails__89__59__20220217T0716Z, pinotemails__38__61__20220217T0817Z, pinotemails__4__61__20220217T0817Z, pinotemails__21__62__20220217T0737Z, pinotemails__55__63__20220217T0811Z] unavailable",
"errorCode": 305
}
]
I could see the segment from the rebalance API mapped to 2 servers. Any pointers?kaivalya apte
02/17/2022, 11:21 AMDISTINCTCOUNT
, I have a use case where I want a distinct count based on a column, which has very high cardinality. 900million rows out of which 890million might be distinct. DISTINCTCOUNT
function fails because of OOM. I see that DISTINCTCOUNT
is implemented using a HashSet which will load all the distinct values in memory. I was thinking if a bloom filter with low false positivity rate may help here. Thoughts?