Weixiang Sun
09/28/2021, 6:07 PMCarl
09/30/2021, 2:16 AMDunith Dhanushka
Dan DC
09/30/2021, 12:49 PMShishpal Vishnoi
10/01/2021, 3:09 AMPrabhakar Reddy
10/01/2021, 3:19 AMDan DC
10/01/2021, 2:29 PMRomeo
10/03/2021, 10:12 PMKarin Wolok
Dan DC
10/04/2021, 10:45 AMSubin T P
10/05/2021, 1:38 PMPrashant Pandey
10/06/2021, 7:42 AM./docker-build.sh pinot:new-range-index master <https://github.com/apache/incubator-pinot.git>
This gives me an error:
executor failed running [/bin/sh -c git clone ${PINOT_GIT_URL} ${PINOT_BUILD_DIR} && cd ${PINOT_BUILD_DIR} && git checkout ${PINOT_BRANCH} && mvn install package -DskipTests -Pbin-dist -Pbuild-shaded-jar -Dkafka.version=${KAFKA_VERSION} -Djdk.version=${JDK_VERSION} && mkdir -p ${PINOT_HOME}/configs && mkdir -p ${PINOT_HOME}/data && cp -r pinot-distribution/target/apache-pinot-*-bin/apache-pinot-*-bin/* ${PINOT_HOME}/. && chmod +x ${PINOT_HOME}/bin/*.sh]: exit code: 1
Anything that I need to do/configure to fix this?Shubham Dhal
10/10/2021, 12:12 PMManish Soni
10/11/2021, 7:31 AMsuraj kamath
10/12/2021, 6:38 AMCharles
10/12/2021, 9:13 AMLuis Fernandez
10/12/2021, 2:56 PMselect * from table where user_id = x
when we first hit a query like this we get more than 500ms after we hit it again we get good results i guess it’s because the segment gets closer to memory, i was wondering why something like this would happen 500ms is def out of our expectations for query latency, our current configuration of the table has indexing and it’s a real time table.
our current config for noDictionaryColumns
"noDictionaryColumns": [
"click_count",
"impression_count",
],
so that we can aggregate in our dimensions using “aggregateMetrics” : true
segment flushing config configurations:
"realtime.segment.flush.threshold.rows": "0",
"realtime.segment.flush.threshold.time": "24h",
"realtime.segment.flush.segment.size": "250M"
we have rangeIndex in our serve_time which is an epoch timestamp to the hour.
we have an invertexindex on the user_id and sortedcolumn as well as a partition map with 4 partitions with modulo.
we chose 4 partitions because the consuming topic has 4 partitions.
the consuming topic is getting around 5k messages a second.
finally we currently have 2 servers with 4gigs of heap for java and 10g in the machine itself 4 cpu and 500G of disk space.
at the moment of writing this message we have 96 segments in this table.
metrics from what we issue a query like the one seen above:
timeUsedMs numDocsScanned totalDocs numServersQueried numServersResponded numSegmentsQueried numSegmentsProcessed numSegmentsMatched numConsumingSegmentsQueried numEntriesScannedInFilter numEntriesScannedPostFilter numGroupsLimitReached partialResponse minConsumingFreshnessTimeMs offlineThreadCpuTimeNs realtimeThreadCpuTimeNs
264 40 401325330 2 2 93 93 4 1 0 320 false - 1634050463550 0 159743463
could anyone direct me into what to look into even this queries based on the trouble shooting steps don’t seem to have much numDocsScanned and numEntriesScannedPostFilterlalit bhagtani
10/13/2021, 3:40 PMPrateek Singhal
10/13/2021, 11:43 PMsuraj kamath
10/18/2021, 6:17 AMManish Soni
10/18/2021, 6:22 AMVibhor Jain
10/18/2021, 12:05 PMKamal Chavda
10/18/2021, 4:32 PM1970-01-01
in Pinot (ex: date of birth)? In my real time table schema I have the date defined as below under dateTimeFieldSpecs:
{
"name": "date_of_birth",
"dataType": "TIMESTAMP",
"format": "1:DAYS:TIMESTAMP",
"granularity": "1:DAYS"
}
Ali Atıl
10/20/2021, 8:41 AMkauts shukla
10/20/2021, 8:43 AMAlexander Vivas
10/20/2021, 12:02 PMArpit
10/20/2021, 1:59 PMMap
10/20/2021, 4:19 PMGrant Sherrick
10/22/2021, 3:02 PMKafkaThriftMessageDecoder
a try? I haven’t seen anyone clammering for it, but I thought it might be worth giving it a go.Arpit
10/25/2021, 3:17 PM