Ali Atıl
01/25/2022, 12:20 PMMark Needham
select competitorId, max(distance) AS distanceCovered
from parkrun
group by competitorId
order by distanceCovered DESC
And I want to only return records where distanceCovered
is greater than say 1,000. But it doesn't like that.
What's the proper way to solve this type of problem?Sahar
01/25/2022, 3:28 PMafter
part of the payload on kafka will be null as expected. I am extracting the fields from $.after
of the payload (attached my table def and schema def). But when the record is deleted, I want to mark the deleted record with a deleted_at
column and extract the fields from the $.before
part of the payload. I haven't used Groovy before and not sure if I'm doing it right, as a test I'm trying to set my_id
field to payload.after.id if after is not null and otherwise set it to payload.before.id:
{
"columnName": "my_id",
"transformFunction": "Groovy({JSONPATHLONG(payload, '$.after.id', '0') == 0 ? JSONPATHLONG(payload, '$.before.id', '0') : JSONPATHLONG(payload, '$.after.id', '0')}, payload)"
},
but it fails creating the table with a 400 invalid table config error. Any help/hints would be appreciated or how others deal with deleted records coming from sourcextrntr
01/26/2022, 6:22 AMgroupLimitReached=true
in my queries. and this is also with the option, pinot.server.query.executor.num.groups.limit=N
where N is 10M and the number of groups i’m expecting to see is between 5-8M but it’s around 1M insteadVibhor Jaiswal
01/26/2022, 10:55 PMVibhor Jaiswal
01/26/2022, 11:00 PMSadim Nadeem
01/27/2022, 6:07 AMDiogo Baeder
01/27/2022, 2:36 PMAlexander Vivas
01/27/2022, 2:49 PMAditya
01/27/2022, 3:02 PMJulien Picard
01/27/2022, 8:27 PMServer: Server_pinot-server-0.pinot-server-headless.pinot.svc.cluster.local_8098 returned error: 404
I am wondering why there is a "_" in front of the pod name. I don't think this could be resolved. Looks like the reason for the 404 error.
Do you know anything about it?
EDIT: just saw there is another underscore before the port number so maybe that's normal.Jeff Moszuti
01/27/2022, 8:49 PM{
"_code": 400,
"_error": "java.lang.IllegalArgumentException: Time column can be only INT or LONG: TIMESTAMP"
}
My input json is:
{
"schema": {
"dimensionFieldSpecs": [
{
"averageLength": 36,
"cardinality": 10000,
"dataType": "STRING",
"name": "event_id"
},
{
"averageLength": 36,
"cardinality": 10000,
"dataType": "STRING",
"name": "app_id"
},
{
"averageLength": 36,
"cardinality": 10000,
"dataType": "STRING",
"name": "user_id"
}
],
"dateTimeFieldSpecs": [
{
"cardinality": 10000,
"dataType": "TIMESTAMP",
"format": "1:MILLISECONDS:EPOCH",
"granularity": "1:MILLISECONDS",
"name": "event_at"
}
],
"metricFieldSpecs": [],
"schemaName": "app_downloads"
},
"queriesWithWeights": {
" select count(event_id) as num_downloads from app_downloads where event_at between '2021-01-01 00:00:00' and '2021-01-31 00:00:00' ": 1
},
"tableType": "OFFLINE",
"numRecordsPerPush": 10000,
"qps": 5,
"latencySLA": 5000,
"rulesToExecute": {
"recommendRealtimeProvisioning": false
}
}
What could be the problem?Aditya
01/28/2022, 2:39 PMselect sum(amount) as amt from reward
where user_id = 'some_id'
Following exception occurred
[
{
"message": "InternalError:\njava.lang.NullPointerException\n\tat org.apache.pinot.core.util.trace.TraceContext.getTraceInfo(TraceContext.java:188)\n\tat org.apache.pinot.core.query.executor.ServerQueryExecutorV1Impl.processQuery(ServerQueryExecutorV1Impl.java:223)\n\tat org.apache.pinot.core.query.executor.QueryExecutor.processQuery(QueryExecutor.java:60)\n\tat org.apache.pinot.core.query.scheduler.QueryScheduler.processQueryAndSerialize(QueryScheduler.java:151)",
"errorCode": 450
},
{
"message": "InternalError:\njava.lang.NullPointerException\n\tat org.apache.pinot.core.util.trace.TraceContext.getTraceInfo(TraceContext.java:188)\n\tat org.apache.pinot.core.query.executor.ServerQueryExecutorV1Impl.processQuery(ServerQueryExecutorV1Impl.java:223)\n\tat org.apache.pinot.core.query.executor.QueryExecutor.processQuery(QueryExecutor.java:60)\n\tat org.apache.pinot.core.query.scheduler.QueryScheduler.processQueryAndSerialize(QueryScheduler.java:151)",
"errorCode": 450
},
{
"message": "1 servers [11.4.7.172_O] not responded",
"errorCode": 427
}
]
Whats more interesting the servers become unstable and randomly timeout "errorCode": 427
, queries (without tracing) are randomly processed successfully or timeout
Timeout log in server
ERROR [BaseCombineOperator] [pqr-1] Timed out while polling results block, numBlocksMerged: 0 (query: QueryContext{_tableName='reward_OFFLINE', _selectExpressions=[*], _aliasList=[null], _filter=null, _groupByExpressions=null, _havingFilter=null, _orderByExpressions=null, _limit=10, _offset=0, _queryOptions={responseFormat=sql, groupByMode=sql, timeoutMs=10000}, _debugOptions=null, _brokerRequest=BrokerRequest(querySource:QuerySource(tableName:reward_OFFLINE), pinotQuery:PinotQuery(dataSource:DataSource(tableName:reward_OFFLINE), selectList:[Expression(type:IDENTIFIER, identifier:Identifier(name:*))], orderByList:[], limit:10, queryOptions:{responseFormat=sql, groupByMode=sql, timeoutMs=10000}))})
Restarting all the servers fixed the issue
Using recent nightly docker image (digest a6c14285abf4)Diogo Baeder
01/28/2022, 6:56 PMpinot.yaml
file, in the controller
config, for deploying via the official Helm chart:
extra:
# Note: Extra configs will be appended to pinot-controller.conf file
configs: |-
pinot.set.instance.id.to.hostname=true
controller.task.scheduler.enabled=true
# Note: change this to the real bucket, after creating it in S3
controller.data.dir=s3://<redacted>
controller.local.temp.dir=/tmp/pinot-tmp-data/
pinot.controller.storage.factory.class.s3=org.apache.pinot.plugin.filesystem.S3PinotFS
pinot.controller.storage.factory.s3.region=<redacted>
pinot.controller.segment.fetcher.protocols=file,http,s3
pinot.controller.segment.fetcher.s3.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher
Is this the correct configuration to do - to append these configs to the existing ones? Or should the approach have been different? I'm asking this because I found a weird directory in one of the Controller local filesystems:
root@pinot-controller-1:/opt/pinot# du -sh /var/pinot/controller/data\,s3\:/
0 /var/pinot/controller/data,s3:/
By the looks of it, it seems like a configuration mistake somewhere
[SOLVED! - SOLUTION:]
Because the official Helm chart already defines that configuration, it ends up concatenating the one from the extras and the default one when loading from the final file, because that setting ends up defined twice in pinot-controller.conf
inside the Controller container. The solution for this is to, instead of putting controller.data.dir
as a one-line string in the extra
configs
, just define that setting starting from the controller
options in that YAML file, then data
instead of extra
, then dir
, so that the option replaces the default value.Syed Akram
01/31/2022, 5:09 AMAlexander Vivas
01/31/2022, 8:32 AMPrashant Korade
01/31/2022, 8:09 PMLuis Fernandez
01/31/2022, 9:33 PMAnish Nair
02/01/2022, 2:21 PMSELECT
lookUp('dim_tabl1', 'display_name', 'dim_joinkey', fact_joinkey),
sum(metric1) AS sum_1
FROM reporting_aggregations
WHERE stats_date_hour between '2022012000' and '2022012223'
GROUP BY lookUp('dim_tabl1', 'display_name', 'dim_joinkey', fact_joinkey)
ORDER BY sum(metric1)
DESC LIMIT 10000
Aditya
02/01/2022, 4:45 PMQuery : select sum(amount) as amt from transactions where from_user_id = "some id"
Initially created table with 13 monthly segments, achieved 3.37K qps p99 143ms
Further partitioned data 4 partitions on column from_user_id (13 * 4 segments), achieved 3.50K qps p99 146 ms
There is not much difference in throughput
Is the data size that too small to realise benefit of partitioning? Any thoughts on further tuning?Sowmiya
02/02/2022, 2:27 PMDiogo Baeder
02/02/2022, 10:17 PMtoDateTime
I'm getting unexpected times, like 7 hours ago. When I saw this, I thought it was an issue with the published timestamps, but then I fetched some data without doing the conversion, then I pasted a timestamp in my Python shell, converted it to a timezone-naive datetime object, and there it's just fine, it's the expected datetime. I tried to use 'UTC'
as the third argument for toDateTime
, but to no avail, it still brings me the wrong datetime.
This is not a super critical issue, since the vast majority of the analyses we'll do will be based off of a Python application, where that conversion will be done, but still, it would be nice to get more expected values right out of the Controller web UI. Any ideas what I can do to achieve that?
[SOLVED] I was mistakenly using hh
to convert the hours, while I should be using HH
insteadSalai Vasan
02/03/2022, 5:04 AMShadab Anwar
02/03/2022, 7:02 AMAnish Nair
02/03/2022, 8:18 AM2022/02/03 00:11:06.064 INFO [CrcUtils] [pool-6-thread-1] Computed crc = 1828318080, based on files [/tmp/pinot-b263f2fa-8bad-4a49-9511-508fc14c50e2/output/dim_testtable_OFFLINE_0/v3/columns.psf, /tmp/pinot-b263f2fa-8bad-4a49-9511-508fc14c50e2/output/dim_testtable_OFFLINE_0/v3/index_map, /tmp/pinot-b263f2fa-8bad-4a49-9511-508fc14c50e2/output/dim_testtable_OFFLINE_0/v3/metadata.properties]
2022/02/03 00:11:06.065 INFO [SegmentIndexCreationDriverImpl] [pool-6-thread-1] Driver, record read time : 13
2022/02/03 00:11:06.065 INFO [SegmentIndexCreationDriverImpl] [pool-6-thread-1] Driver, stats collector time : 0
2022/02/03 00:11:06.065 INFO [SegmentIndexCreationDriverImpl] [pool-6-thread-1] Driver, indexing time : 8
2022/02/03 00:11:06.065 INFO [SegmentGenerationJobRunner] [pool-6-thread-1] Tarring segment from: /tmp/pinot-b263f2fa-8bad-4a49-9511-508fc14c50e2/output/dim_testtable_OFFLINE_0 to: /tmp/pinot-b263f2fa-8bad-4a49-9511-508fc14c50e2/output/dim_testtable_OFFLINE_0.tar.gz
2022/02/03 00:11:06.090 INFO [SegmentGenerationJobRunner] [pool-6-thread-1] Size for segment: dim_testtable_OFFLINE_0, uncompressed: 217.24K, compressed: 70.93K
2022/02/03 00:11:06.618 INFO [IngestionJobLauncher] [main] Trying to create instance for class org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner
2022/02/03 00:11:06.619 INFO [PinotFSFactory] [main] Initializing PinotFS for scheme file, classname org.apache.pinot.spi.filesystem.LocalPinotFS
2022/02/03 00:11:06.620 INFO [PinotFSFactory] [main] Initializing PinotFS for scheme hdfs, classname org.apache.pinot.plugin.filesystem.HadoopPinotFS
2022/02/03 00:11:06.632 INFO [HadoopPinotFS] [main] successfully initialized HadoopPinotFS
2022/02/03 00:11:06.819 INFO [SegmentPushUtils] [main] Start pushing segments: [<hdfs://nameservice1/data/max/poc/pinot-ingestion/dimension_segments/dim_testtable/dim_testtable_OFFLINE_0.tar.gz>]... to locations: [org.apache.pinot.spi.ingestion.batch.spec.PinotClusterSpec@51827393] for table dim_testtable
2022/02/03 00:11:06.819 INFO [SegmentPushUtils] [main] Pushing segment: dim_testtable_OFFLINE_0 to location: <http://d9-max-insert-2.srv.net:9000> for table dim_testtable
2022/02/03 00:11:07.164 INFO [FileUploadDownloadClient] [main] Sending request: <http://d9-max-insert-2.srv.net:9000/v2/segments?tableName=dim_testtable&tableName=dim_testtable&tableType=OFFLINE> to controller: <http://d9-max-insert-2.srv.net|d9-max-insert-2.srv.net>, version: Unknown
2022/02/03 00:11:07.168 WARN [SegmentPushUtils] [main] Caught temporary exception while pushing table: dim_testtable segment: dim_testtable_OFFLINE_0 to <http://d9-max-insert-2.srv.net:9000>, will retry
org.apache.pinot.common.exception.HttpErrorStatusException: Got error status code: 500 (Internal Server Error) with reason: "Exception while uploading segment: null" while sending request: <http://d9-max-insert-2.srv.net:9000/v2/segments?tableName=dim_testtable&tableName=dim_testtable&tableType=OFFLINE> to controller: <http://d9-max-insert-2.srv.net|d9-max-insert-2.srv.net>, version: Unknown
at org.apache.pinot.common.utils.FileUploadDownloadClient.sendRequest(FileUploadDownloadClient.java:531) ~[pinot-all-0.10.0-SNAPSHOT-jar-with-dependencies.jar:0.10.0-SNAPSHOT-ea2f0aa641e17301293662c8e79dfd94d8568438]
at org.apache.pinot.common.utils.FileUploadDownloadClient.uploadSegment(FileUploadDownloadClient.java:838) ~[pinot-all-0.10.0-SNAPSHOT-jar-with-dependencies.jar:0.10.0-SNAPSHOT-ea2f0aa641e17301293662c8e79dfd94d8568438]
at org.apache.pinot.segment.local.utils.SegmentPushUtils.lambda$pushSegments$0(SegmentPushUtils.java:122) ~[pinot-all-0.10.0-SNAPSHOT-jar-with-dependencies.jar:0.10.0-SNAPSHOT-ea2f0aa641e17301293662c8e79dfd94d8568438]
at org.apache.pinot.spi.utils.retry.BaseRetryPolicy.attempt(BaseRetryPolicy.java:50) [pinot-all-0.10.0-SNAPSHOT-jar-with-dependencies.jar:0.10.0-SNAPSHOT-ea2f0aa641e17301293662c8e79dfd94d8568438]
at org.apache.pinot.segment.local.utils.SegmentPushUtils.pushSegments(SegmentPushUtils.java:119) [pinot-all-0.10.0-SNAPSHOT-jar-with-dependencies.jar:0.10.0-SNAPSHOT-ea2f0aa641e17301293662c8e79dfd94d8568438]
at org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner.run(SegmentTarPushJobRunner.java:88) [pinot-batch-ingestion-standalone-0.10.0-SNAPSHOT-shaded.jar:0.10.0-SNAPSHOT-ea2f0aa641e17301293662c8e79dfd94d8568438]
at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.kickoffIngestionJob(IngestionJobLauncher.java:146) [pinot-all-0.10.0-SNAPSHOT-jar-with-dependencies.jar:0.10.0-SNAPSHOT-ea2f0aa641e17301293662c8e79dfd94d8568438]
at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.runIngestionJob(IngestionJobLauncher.java:118)
Aditya
02/03/2022, 1:34 PM{"Tags":["TAG1","TAG3", "TAG2"]}
Running query like below :
select sum(amount) as amt from transactions
where from_user_id = 'some id'
and JSON_MATCH(tag, '"$.Tags[*]"=''FD''')
Created sorted index on from_user_id(string) and json index on tag
Explain plan shows both the index are used
The queries take ~500 ms,
Is there a way to improve this? Some obvious optimisation I am missingPeter Pringle
02/04/2022, 7:01 AMcontroller.admin.access.control.principals
and now the controller prompts for login; however the swagger endpoint doesn't seem to get passed these credentials and returns 403 for all operations. How do we pass the credentials into swagger?Ali Atıl
02/07/2022, 8:00 AMAndré Siefken
02/07/2022, 9:08 AMTransformFunction
implementations for geospatial support functionality. Many of them are designed to compare a given static value against table column values. Their current implementations allow to pass the static parameter in any of the two arguments of the function signature. My question is: is there an easy way to identify a static value passed as argument to a TransformFunction
, e.g. from a Projectionblock
?Anish Nair
02/07/2022, 12:22 PM