Apache Pinot #troubleshooting

Luis Fernandez

05/31/2022, 2:33 PM

another question kinda related to the above, we are currently running on gke, and our deep storage is configured with gcs, we have liveness and readiness probes configured in these machines, i think that the server when it starts it tries to pull the data available from gcs, however, i think this may take longer as more data gets ingested, how do you all manage this? we had 10min configure for all the data to get into the server but now we more data being in the machines it seems like we need even more wait time for the data to be ready in the machines, any suggestions?

Raluca Lazar

06/01/2022, 2:05 PM

hi all, I need to dynamically enable/disable the

peerSegmentDownloadScheme

on a realtime table. If I do this manually by either adding this line or removing it on the table config, it works, but doing it via an environment variable does not (passing an empty env var vs passing this string to be replaced:

, "peerSegmentDownloadScheme": "http"

) . My question is, is there any other way to disable the peer segment download scheme other than removing the setting entirely? I tried this

"peerSegmentDownloadScheme": ""

and it failed with this message:

Copy code

{
  "code": 400,
  "error": "Invalid value '' for peerSegmentDownloadScheme. Must be one of http or https"
}

Luis Fernandez

06/01/2022, 2:16 PM

hey friends, coming with a sql optimization question today, we are trying to query pinot with larger timespans now that we are migrating our data, for example, this year, last year filters, for our use case we see that query execution time is way slower now and are trying to figure out ways to gain some performance given this, do you have any recommendations?

Copy code

SELECT product_id, SUM(impression_count) as impression_count, SUM(click_count) as click_count, SUM(cost) as spent_total FROM metrics 
WHERE user_id = xx AND serve_time BETWEEN 1641013200 AND 1654092017  
GROUP BY product_id 
LIMIT 100000

this is an example of a query we are running

Copy code

"numServersQueried": 2,
  "numServersResponded": 2,
  "numSegmentsQueried": 1317,
  "numSegmentsProcessed": 168,
  "numSegmentsMatched": 117,
  "numConsumingSegmentsQueried": 0,
  "numDocsScanned": 69212,
  "numEntriesScannedInFilter": 1165155303,
  "numEntriesScannedPostFilter": 415272,
  "numGroupsLimitReached": false,
  "totalDocs": 10362679599,
  "timeUsedMs": 4623,

this is some of the stats that come back. our data resolution is hourly for this data. Do you all have any idea how to make a query like this perform better? We have an idea of making the records not have hourly resolution after certain period of time but have daily resolution so that the data is compressed even further but wanted to ask if there are any methods we could use for this and if the method we can implement makes sense to you all.

Diogo Baeder

06/01/2022, 4:39 PM

Hey guys! Is there any plan to support UNION queries in Pinot, to unite results from two or more queries that yield the same result structure?

Bruno Brandão

06/01/2022, 9:36 PM

Hello!! I’m building an application on Apache Pinot and I need to customize a few features, but some errors are happening. I think the controller can’t associate to the configurations. I’m receiving null results. The following configurations were executed, leading to this problem: The file pinot-controller.conf has this configuration:

pinot.service.role=CONTROLLER

controller.port=9001

controller.zk.str=localhost:2181

controller.access.protocols.http.port=9001

pinot.cluster.name=MyClusterName

controller.vip.host=localhost

controller.vip.port=9001

controller.data.dir=/tmp/pinot/data/controller

controller.helix.cluster.name=MyClusterName

pinot.set.instance.id.to.hostname=true

controller.admin.access.control.principals=admin,user

controller.admin.access.control.principals.user.password=admin

controller.admin.access.control.principals.user.permissions=READ

controller.admin.access.control.principals.admin.password=admin

controller.admin.access.control.factory.class=org.apache.pinot.controller.api.access.BasicAuthAccessControlFactory

I’m executing with Docker compose and the following call is the one that I use to iniciate the controller:

StartController -configFileName /tmp/conf/pinot-controller.conf

The following mistake appears:

2022/06/01 154850.463 INFO [StartControllerCommand] [main] Executing command: StartController -configFileName /tmp/conf/pinot-controller.conf

pinot-controller | 2022/06/01 154850.541 ERROR [StartControllerCommand] [main] Caught exception while starting controller, exiting.

pinot-controller | java.lang.NullPointerException: null

pinot-controller | at org.apache.pinot.tools.admin.command.StartControllerCommand.getControllerConf(StartControllerCommand.java:207) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]

pinot-controller | at org.apache.pinot.tools.admin.command.StartControllerCommand.execute(StartControllerCommand.java:183) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]

pinot-controller | at org.apache.pinot.tools.Command.call(Command.java:33) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]

pinot-controller | at org.apache.pinot.tools.Command.call(Command.java:29) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]

pinot-controller | at picocli.CommandLine.executeUserObject(CommandLine.java:1953) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]

pinot-controller | at picocli.CommandLine.access$1300(CommandLine.java:145) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]

pinot-controller | at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2352) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]

pinot-controller | at picocli.CommandLine$RunLast.handle(CommandLine.java:2346) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]

pinot-controller | at picocli.CommandLine$RunLast.handle(CommandLine.java:2311) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]

pinot-controller | at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]

pinot-controller | at picocli.CommandLine.execute(CommandLine.java:2078) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]

pinot-controller | at org.apache.pinot.tools.admin.PinotAdministrator.execute(PinotAdministrator.java:161)

[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]

pinot-controller | at org.apache.pinot.tools.admin.PinotAdministrator.main(PinotAdministrator.java:192) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]

pinot-controller exited with code 255

Does that mistake happen due some account configuration problem or is it some sort of internal bug from Apache Pinot?

Sumit Lakra

06/02/2022, 11:04 AM

Hello team, how do I switch a Pinot cluster to use a different Zookeeper cluster ? I have a Pinot cluster setup that uses a single node Zookeeper. The zk came with the Pinot package and was started with the command ‘bin/pinot-admin.sh StartZookeeper’. I couldn’t find a configuration file for this zk process. Is there a way to modify it’s configuration ? Like restart it as part of a zk cluster and connect it to other zk servers ? Also, I have a separate 3 node zk cluster ready which I want to use in place of the existing single zk server. What would be the best way to make this switch without loosing any data in the pinot cluster ?

Stuart Millholland

06/02/2022, 4:04 PM

Before I re-invent the wheel here has anyone created a kubernetes Init Container to check for all servers to be available before running table creation scripts? I've got an Init Container in my server statefulsets that waits for the controller and that works great, but my init table script now needs to wait for the servers to be available. My current plan is to get my expected replica count and compare that to my pinot instances array (after parsing out only my servers).

Abhijeet Kushe

06/03/2022, 6:36 PM

<!here> I am implementing the pagination use-case based on https://pinot.apache.org/docs/user-guide/pql/#pagination-on-selection .I found out that pagination only works without distinct clause but not when distinct is included.Is that a limitation or a bug ?

Priyank Bagrecha

06/03/2022, 10:54 PM

Hello, I am trying to use trino connector and running into following error while trying to query pinot via trino

Copy code

>>> import trino
>>> conn = trino.dbapi.connect(host='<redacted>', port=8443, catalog='pinot', schema='default', http_scheme='https', auth=trino.auth.BasicAuthentication("xxx", "yyyy"))
>>> cur = conn.cursor()
>>> cur.execute('SELECT * FROM mytable LIMIT 10')
<trino.client.TrinoResult object at 0x10428d160>
>>> rows = cur.fetchall()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.8/site-packages/trino/dbapi.py", line 558, in fetchall
    return list(self.genall())
  File "/usr/local/lib/python3.8/site-packages/trino/client.py", line 509, in __iter__
    rows = self._query.fetch()
  File "/usr/local/lib/python3.8/site-packages/trino/client.py", line 677, in fetch
    status = self._request.process(response)
  File "/usr/local/lib/python3.8/site-packages/trino/client.py", line 440, in process
    raise self._process_error(response["error"], response.get("id"))
trino.exceptions.TrinoQueryError: TrinoQueryError(type=INTERNAL_ERROR, name=GENERIC_INTERNAL_ERROR, message="Failed communicating with server: <http://pinot-broker-1.pinot-broker-headless.pinot-dev-ns.svc.cluster.local:8099/debug/routingTable/mytable>", query_id=20220603_211510_00025_9srer)

I am using the external ip of the loadbalancer i.e

service/pinot-controller-external

with port 9000 for

pinot.controller-urls

. If it helps, I am using community provided helm chart to stand up the pinot infrastructure on AWS EKS.

Alice

06/05/2022, 12:50 AM

Hi, I noticed that I a server instance can be set with multiply tenant tags. But is it not recommended to make a server serve two tenants?

Alice

06/05/2022, 3:50 AM

Hi, what’s the common reason for errorCode 235? [ { “message”: “ServerSegmentMissing:\n17 segments [table_name__1__22__20220527T1518Z, table_name__1__10__20220516T1300Z, table_name__1__14__20220517T1900Z, missing on server: Server_pinot-server-28.pinot-server-headless.pinot.svc.cluster.local_8098", “errorCode”: 235 } ]

Ali Atıl

06/06/2022, 8:03 AM

Hello everyone, Is there a reason why GroovyFunctionEvaluator returns null on bindings with null values? Would it cause any side effects to run the script with null bindings? Thanks in advance

Tommaso Peresson

06/06/2022, 10:23 AM

Hi everybody, I have a question for you. I have a table/schema configured like:

Copy code

{
  "OFFLINE": {
    "tableName": "DailyUniqHll_OFFLINE",
    "tableType": "OFFLINE",
    "segmentsConfig": {
      "timeType": "DAYS",
      "retentionTimeUnit": "DAYS",
      "retentionTimeValue": "365",
      "replication": "1",
      "timeColumnName": "partition",
      "allowNullTimeValue": false
    },
    "tenants": {
      "broker": "DefaultTenant",
      "server": "DefaultTenant"
    },
    "tableIndexConfig": {
      "enableDefaultStarTree": false,
      "starTreeIndexConfigs": [
        {
          "dimensionsSplitOrder": [
            "partition",
            "fields.1",
            "fields.2",
            "fields.3",
            "fields.4",
            "fields.5",
            "fields.6",
            "fields.7",
            "fields.8",
            "fields.9"
          ],
          "functionColumnPairs": [
            "SUM__counters.c",
            "DISTINCTCOUNTHLL__hllState"
          ],
          "maxLeafRecords": 1000
        }
      ],
      "enableDynamicStarTreeCreation": true,
      "aggregateMetrics": false,
      "nullHandlingEnabled": false,
      "rangeIndexVersion": 2,
      "autoGeneratedInvertedIndex": false,
      "createInvertedIndexDuringSegmentGeneration": false
    },
    "metadata": {},
    "ingestionConfig": {
      "batchIngestionConfig": {
        "segmentIngestionType": "APPEND",
        "segmentIngestionFrequency": "DAILY"
      },
      "complexTypeConfig": {
        "fieldsToUnnest": [
          "fields",
          "counters"
        ],
        "delimiter": ".",
        "collectionNotUnnestedToJson": "NON_PRIMITIVE"
      }
    },
    "isDimTable": false
  }
}

Schema:

Copy code

{
  "schemaName": "ViewElementDailyUniqHll",
  "dimensionFieldSpecs": [
    {
      "name": "fields.1",
      "dataType": "STRING"
    },
    {
      "name": "fields.2",
      "dataType": "STRING"
    },
    {
      "name": "fields.3",
      "dataType": "STRING"
    },
    {
      "name": "fields.4",
      "dataType": "STRING"
    },
    {
      "name": "fields.5",
      "dataType": "STRING"
    },
    {
      "name": "fields.6",
      "dataType": "STRING"
    },
    {
      "name": "fields.7",
      "dataType": "STRING"
    },
    {
      "name": "fields.8",
      "dataType": "STRING"
    },
    {
      "name": "fields.9",
      "dataType": "STRING"
    },
    {
      "name": "cubeName",
      "dataType": "STRING"
    },
    {
      "name": "list",
      "dataType": "LONG",
      "singleValueField": false
    },
    {
      "name": "hllState",
      "dataType": "BYTES"
    },
    {
      "name": "counters.c",
      "dataType": "INT"
    }
  ],
  "dateTimeFieldSpecs": [
    {
      "name": "partition",
      "dataType": "STRING",
      "format": "1:SECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd",
      "granularity": "1:DAYS"
    }
  ]
}

When I ingest some data I get a ~10x size increase because of

DISTINCTCOUNTHLL__hllState

in the star tree index. Is this expected? Is there something misconfigured?

Luis Fernandez

06/06/2022, 1:25 PM

hey friends, it’s me again, this time around with a question around partitions, we have an offline job that uploads data to pinot that we are testing in our sandbox env to the offline tables, when that data is ingested and i look at the segments it generates i see the partitions being created like this: in the metadata from the ui

Copy code

{\"numPartitions\":8,\"partitions\":[0,1,2,3,4,5,6,7]

however in our prod system, which has a hybrid setup I always see one number in the partitions column,

Copy code

{\"numPartitions\":8,\"partitions\":[1]

is this something I should be concerned about?

Mayank

06/06/2022, 2:27 PM

Prod looks good, dev is not partitioned

Varagini Karthik

06/06/2022, 4:02 PM

<!here>, facing some problem to load the data to an existing offline table data getting overwrite iam using the following command, If i tried to load the new data I'm loosing the old data any suggestions

Copy code

sudo docker run --rm -ti \
    --network=pinot-demo_default \
    -v /home/XXXX/dna/pinot/lookup2/pinot-quick-start:/home/XXXX/dna/pinot/lookup2/pinot-quick-start \
    --name pinot-batch-table-creation \
    apachepinot/pinot:latest AddTable \
    -schemaFile /home/XXXX/dna/pinot/lookup2/pinot-quick-start/orders-schema.json \
    -tableConfigFile /home/XXXX/dna/pinot/lookup2/pinot-quick-start/orders-table-offline.json \
    -controllerHost manual-pinot-controller \
    -controllerPort 9000 -exec


sudo docker run --rm -ti \
    --network=pinot-demo_default \
    -v /home/XXXX/dna/pinot/lookup/pinot-quick-start:/home/XXXX/dna/pinot/lookup/pinot-quick-start \
    --name pinot-data-ingestion-job \
    apachepinot/pinot:latest LaunchDataIngestionJob \
    -jobSpecFile /home/XXXX/dna/pinot/lookup/pinot-quick-start/docker-job-spec.yml

orders-schema.json orders-table-offline.json

Mathieu Druart

06/06/2022, 4:30 PM

Hello everyone, I have an offline Pinot table with a STRING multi valued column and when I try this request :

Copy code

select distinct myMultiValuedColumn from MyTable where otherColumn in ('MY_VALUE') limit 1000

I have this error :

Copy code

"message": "QueryExecutionError:\njava.lang.UnsupportedOperationException\n\tat org.apache.pinot.segment.spi.index.reader.ForwardIndexReader.readDictIds(ForwardIndexReader.java:84)\n\tat org.apache.pinot.core.common.DataFetcher$ColumnValueReader.readDictIds(DataFetcher.java:418)\n\tat org.apache.pinot.core.common.DataFetcher.fetchDictIds(DataFetcher.java:89)\n\tat org.apache.pinot.core.common.DataBlockCache.getDictIdsForSVColumn(DataBlockCache.java:109)",
    "errorCode": 200

If I remove the distinct or the where clause, I have no issue. Am I missing something ? Thank you !

Alice

06/07/2022, 2:26 PM

Hi team, I’ve a question about RealtimeToOfflineSegmentsTask. If I configure this task like this: If I’m not misunderstanding the properties, it means every time this task is executed, 1 hour data older than 24h will be removed from realtime table to offline table. If it’s the case, when I keep bucketTimePeriod the same, and change the task to be executed every 2 hours, will there be longer and longer time data in the realtime table not be moved to offline table? If stream data constantly comes in this realtime table.

Diogo Baeder

06/07/2022, 4:07 PM

Hey guys, what sort of architecture would you choose to go for if you wanted to have separated handling of queries according to the client needs - whether "user interface" or "reporting"? More on this thread.

Priyank Bagrecha

06/07/2022, 6:17 PM

What is the recommended way for authentication and authorization for programmatic query access to a pinot table? We are thinking of having multiple tables per tenant and would like to be able to control access at a table level. What is the recommended mechanism for access logging?

Prashant Pandey

06/08/2022, 5:04 AM

Hi team, if I add an index to my table’s indexing config, what happens to the old segments? Will the new index be created when the segment is loaded during query? Do I need to reload all the segments?

Sowmya Gowda

06/08/2022, 6:57 AM

Hi @Xiang Fu @Xiaobing, I have a trouble in load data from s3 to pinot offline table. Sharing tableconfig and segment which is created

Copy code

{
  "OFFLINE": {
    "tableName": "test_transcript_OFFLINE",
    "tableType": "OFFLINE",
    "segmentsConfig": {
      "schemaName": "test_transcript",
      "replication": "1",
      "timeColumnName": "timestamp",
      "segmentPushFrequency": "HOURLY",
      "segmentPushType": "APPEND",
      "replicasPerPartition": "1"
    },
    "tenants": {
      "broker": "DefaultTenant",
      "server": "DefaultTenant"
    },
    "tableIndexConfig": {
      "invertedIndexColumns": [],
      "noDictionaryColumns": [],
      "rangeIndexColumns": [],
      "rangeIndexVersion": 2,
      "autoGeneratedInvertedIndex": false,
      "createInvertedIndexDuringSegmentGeneration": false,
      "sortedColumn": [],
      "bloomFilterColumns": [],
      "loadMode": "MMAP",
      "onHeapDictionaryColumns": [],
      "varLengthDictionaryColumns": [],
      "enableDefaultStarTree": false,
      "enableDynamicStarTreeCreation": false,
      "aggregateMetrics": false,
      "nullHandlingEnabled": false
    },
    "metadata": {},
    "quota": {},
    "task": {
      "taskTypeConfigsMap": {
        "SegmentGenerationAndPushTask": {
          "schedule": "/5 * * * * ?",
          "tableMaxNumTasks": "10"
        }
      }
    },
    "routing": {},
    "query": {},
    "ingestionConfig": {
      "batchIngestionConfig": {
        "batchConfigMaps": [
          {
            "input.fs.className": "org.apache.pinot.plugin.filesystem.S3PinotFS",
            "input.fs.prop.region": "us-east-1",
            "input.fs.prop.secretKey": "*****",
            "input.fs.prop.accessKey": "*****",
            "inputDirURI": "<s3://pp-airflow-qa/dremio_test_files/jsonfiles/>",
            "includeFileNamePattern": "glob:**/*.json",
            "excludeFileNamePattern": "glob:**/*.tmp",
            "inputFormat": "json"
          }
        ],
        "segmentIngestionType": "APPEND",
        "segmentIngestionFrequency": "HOURLY"
      }
    },
    "isDimTable": false
  }
}

✅ 1

Kevin Liu

06/08/2022, 8:09 AM

I configure replication=2 in TableConfig.segmentsConfig, when I call “v2/segments” to upload segment.tar.gz file to the pinot controller server, I often find that some segments are not loaded correctly on some pinot server servers.

Luis Fernandez

06/08/2022, 5:21 PM

Hello my friends it’s me again. I want to make the data that I have in pinot accessible also in bigquery so basically dump that data just in case people want to look at data that beats the retention that we have in pinot for analytics purposes this would be internal, are there ways that you recommend to synch data that we have in pinot to a data warehouse solution like bigquery?

abhinav wagle

06/08/2022, 6:48 PM

Hellos, when I issue a query from Pinot Controller UI

Query console

and check logs on Pinot Broker Pod. I see following log. Is there a way to identify

source_ip

on who/which server issued the query. My goal is to have trace of logs on broker which can provide info on which user/host is querying Pinot.

Copy code

requestId=130,table=<redacted>,timeMs=23,docs=72/108615312,entries=2741105/792,segments(queried/processed/matched/consuming/unavailable):256/253/8/32/0,consumingFreshnessTimeMs=1654707653215,servers=6/6,groupLimitReached=false,brokerReduceTimeMs=0,exceptions=0,serverStats=(Server=SubmitDelayMs,ResponseDelayMs,ResponseSize,DeserializationTimeMs,RequestSentDelayMs);pinot-server-2_R=0,4,1817,0,1;pinot-server-5_R=0,5,1817,1,1;pinot-server-1_R=0,20,1820,0,1;pinot-server-4_R=1,21,1820,0,1;pinot-server-0_R=1,4,1818,0,1;pinot-server-3_R=1,5,1817,0,1,offlineThreadCpuTimeNs(total/thread/sysActivity/resSer):0/0/0/0,realtimeThreadCpuTimeNs(total/thread/sysActivity/resSer):0/0/0/0,query=<redacted>

Luis Fernandez

06/08/2022, 7:02 PM

another question: if I import data into pinot thru the standalone job, and say I have a retention of 2 years, will it remove data as it ages from the data I imported? or how does that work with the retention manager.

Luis Fernandez

06/08/2022, 7:39 PM

another question related to importing data: we have imported our last 2 years worth of data into pinot using the standalone job in Dev however we are observing things between our 2 different behavior for the same query in prod/dev (prod doesn’t have this historical data yet but it does have data for this particular time range). Performance in dev is way slower. this query:

Copy code

SELECT product_id, SUM(impression_count) as impression_count, SUM(click_count) as click_count, SUM(cost) as spent_total FROM metrics WHERE user_id = xxx AND serve_time BETWEEN 1651363200 AND 1654012799 GROUP BY product_id LIMIT 6000

production metadata response:

Copy code

"numServersQueried": 4,
  "numServersResponded": 4,
  "numSegmentsQueried": 97,
  "numSegmentsProcessed": 31,
  "numSegmentsMatched": 31,
  "numConsumingSegmentsQueried": 1,
  "numDocsScanned": 15109,
  "numEntriesScannedInFilter": 0,
  "numEntriesScannedPostFilter": 60436,
  "numGroupsLimitReached": false,
  "totalDocs": 493642793,
  "timeUsedMs": 32,
  "offlineThreadCpuTimeNs": 0,
  "realtimeThreadCpuTimeNs": 0,
  "offlineSystemActivitiesCpuTimeNs": 0,
  "realtimeSystemActivitiesCpuTimeNs": 0,
  "offlineResponseSerializationCpuTimeNs": 0,
  "realtimeResponseSerializationCpuTimeNs": 0,
  "offlineTotalCpuTimeNs": 0,
  "realtimeTotalCpuTimeNs": 0,
  "segmentStatistics": [],
  "traceInfo": {},
  "minConsumingFreshnessTimeMs": 1654715649414,
  "numRowsResultSet": 9708

dev metadata response:

Copy code

"exceptions": [],
  "numServersQueried": 4,
  "numServersResponded": 4,
  "numSegmentsQueried": 11703,
  "numSegmentsProcessed": 31,
  "numSegmentsMatched": 31,
  "numConsumingSegmentsQueried": 1,
  "numDocsScanned": 15117,
  "numEntriesScannedInFilter": 0,
  "numEntriesScannedPostFilter": 60468,
  "numGroupsLimitReached": false,
  "totalDocs": 51283295726,
  "timeUsedMs": 580,
  "offlineThreadCpuTimeNs": 0,
  "realtimeThreadCpuTimeNs": 0,
  "offlineSystemActivitiesCpuTimeNs": 0,
  "realtimeSystemActivitiesCpuTimeNs": 0,
  "offlineResponseSerializationCpuTimeNs": 0,
  "realtimeResponseSerializationCpuTimeNs": 0,
  "offlineTotalCpuTimeNs": 0,
  "realtimeTotalCpuTimeNs": 0,
  "segmentStatistics": [],
  "traceInfo": {},
  "minConsumingFreshnessTimeMs": 1654716958681,
  "numRowsResultSet": 9708

amount of segments in prod: 1600 amount of segments in dev: 13000 I guess my question is that I see segments queried be way higher in dev and I’m wondering why and if that’s the reason why the query is just performing slower in dev it’s almost equal to the amount of segments that exist in the cluster while prod is only querying a tiny portion. Do you have an idea as to what may be happening?

Alice

06/09/2022, 12:16 AM

Hi team, I see a doc mentioning “The time boundary is computed based on the value of

ingestionConfig.batchIngestionConfig.segmentIngestionFrequency

in the offline table”. I’m wondering how is time boundary computed for offline table without this config configured, if RealtimeToOfflineSegmentsTask is configured in the corresponding realtime table?

Alice

06/09/2022, 3:05 AM

Hi team, I’m using grafana to monitoring pinot cluster. What does it really mean by Table consuming latency?

Luis Fernandez

06/09/2022, 4:05 PM

hey my friends question, has anyone of you gotten spark to partition your data and then upload it to pinot successfully? I’m trying to get spark to partition my data, but, in pinot data keeps on appearing not partitioned