https://pinot.apache.org/ logo
Join Slack
Powered by
# troubleshooting
  • l

    Luis Fernandez

    05/31/2022, 2:33 PM
    another question kinda related to the above, we are currently running on gke, and our deep storage is configured with gcs, we have liveness and readiness probes configured in these machines, i think that the server when it starts it tries to pull the data available from gcs, however, i think this may take longer as more data gets ingested, how do you all manage this? we had 10min configure for all the data to get into the server but now we more data being in the machines it seems like we need even more wait time for the data to be ready in the machines, any suggestions?
    m
    • 2
    • 7
  • r

    Raluca Lazar

    06/01/2022, 2:05 PM
    hi all, I need to dynamically enable/disable the
    peerSegmentDownloadScheme
    on a realtime table. If I do this manually by either adding this line or removing it on the table config, it works, but doing it via an environment variable does not (passing an empty env var vs passing this string to be replaced:
    , "peerSegmentDownloadScheme": "http"
    ) . My question is, is there any other way to disable the peer segment download scheme other than removing the setting entirely? I tried this
    "peerSegmentDownloadScheme": ""
    and it failed with this message:
    Copy code
    {
      "code": 400,
      "error": "Invalid value '' for peerSegmentDownloadScheme. Must be one of http or https"
    }
    m
    m
    • 3
    • 7
  • l

    Luis Fernandez

    06/01/2022, 2:16 PM
    hey friends, coming with a sql optimization question today, we are trying to query pinot with larger timespans now that we are migrating our data, for example, this year, last year filters, for our use case we see that query execution time is way slower now and are trying to figure out ways to gain some performance given this, do you have any recommendations?
    Copy code
    SELECT product_id, SUM(impression_count) as impression_count, SUM(click_count) as click_count, SUM(cost) as spent_total FROM metrics 
    WHERE user_id = xx AND serve_time BETWEEN 1641013200 AND 1654092017  
    GROUP BY product_id 
    LIMIT 100000
    this is an example of a query we are running
    Copy code
    "numServersQueried": 2,
      "numServersResponded": 2,
      "numSegmentsQueried": 1317,
      "numSegmentsProcessed": 168,
      "numSegmentsMatched": 117,
      "numConsumingSegmentsQueried": 0,
      "numDocsScanned": 69212,
      "numEntriesScannedInFilter": 1165155303,
      "numEntriesScannedPostFilter": 415272,
      "numGroupsLimitReached": false,
      "totalDocs": 10362679599,
      "timeUsedMs": 4623,
    this is some of the stats that come back. our data resolution is hourly for this data. Do you all have any idea how to make a query like this perform better? We have an idea of making the records not have hourly resolution after certain period of time but have daily resolution so that the data is compressed even further but wanted to ask if there are any methods we could use for this and if the method we can implement makes sense to you all.
    d
    k
    +2
    • 5
    • 159
  • d

    Diogo Baeder

    06/01/2022, 4:39 PM
    Hey guys! Is there any plan to support UNION queries in Pinot, to unite results from two or more queries that yield the same result structure?
    m
    • 2
    • 2
  • b

    Bruno Brandão

    06/01/2022, 9:36 PM
    Hello!! I’m building an application on Apache Pinot and I need to customize a few features, but some errors are happening. I think the controller can’t associate to the configurations. I’m receiving null results. The following configurations were executed, leading to this problem: The file pinot-controller.conf has this configuration:
    pinot.service.role=CONTROLLER
    controller.port=9001
    controller.zk.str=localhost:2181
    controller.access.protocols.http.port=9001
    pinot.cluster.name=MyClusterName
    controller.vip.host=localhost
    controller.vip.port=9001
    controller.data.dir=/tmp/pinot/data/controller
    controller.helix.cluster.name=MyClusterName
    pinot.set.instance.id.to.hostname=true
    controller.admin.access.control.principals=admin,user
    controller.admin.access.control.principals.user.password=admin
    controller.admin.access.control.principals.user.permissions=READ
    controller.admin.access.control.principals.admin.password=admin
    controller.admin.access.control.factory.class=org.apache.pinot.controller.api.access.BasicAuthAccessControlFactory
    I’m executing with Docker compose and the following call is the one that I use to iniciate the controller:
    StartController -configFileName /tmp/conf/pinot-controller.conf
    The following mistake appears:
    2022/06/01 154850.463 INFO [StartControllerCommand] [main] Executing command: StartController -configFileName /tmp/conf/pinot-controller.conf
    pinot-controller | 2022/06/01 154850.541 ERROR [StartControllerCommand] [main] Caught exception while starting controller, exiting.
    pinot-controller | java.lang.NullPointerException: null
    pinot-controller | at org.apache.pinot.tools.admin.command.StartControllerCommand.getControllerConf(StartControllerCommand.java:207) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
    pinot-controller | at org.apache.pinot.tools.admin.command.StartControllerCommand.execute(StartControllerCommand.java:183) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
    pinot-controller | at org.apache.pinot.tools.Command.call(Command.java:33) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
    pinot-controller | at org.apache.pinot.tools.Command.call(Command.java:29) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
    pinot-controller | at picocli.CommandLine.executeUserObject(CommandLine.java:1953) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
    pinot-controller | at picocli.CommandLine.access$1300(CommandLine.java:145) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
    pinot-controller | at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2352) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
    pinot-controller | at picocli.CommandLine$RunLast.handle(CommandLine.java:2346) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
    pinot-controller | at picocli.CommandLine$RunLast.handle(CommandLine.java:2311) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
    pinot-controller | at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
    pinot-controller | at picocli.CommandLine.execute(CommandLine.java:2078) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
    pinot-controller | at org.apache.pinot.tools.admin.PinotAdministrator.execute(PinotAdministrator.java:161)
    [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
    pinot-controller | at org.apache.pinot.tools.admin.PinotAdministrator.main(PinotAdministrator.java:192) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
    pinot-controller exited with code 255
    Does that mistake happen due some account configuration problem or is it some sort of internal bug from Apache Pinot?
    h
    • 2
    • 2
  • s

    Sumit Lakra

    06/02/2022, 11:04 AM
    Hello team, how do I switch a Pinot cluster to use a different Zookeeper cluster ? I have a Pinot cluster setup that uses a single node Zookeeper. The zk came with the Pinot package and was started with the command ‘bin/pinot-admin.sh StartZookeeper’. I couldn’t find a configuration file for this zk process. Is there a way to modify it’s configuration ? Like restart it as part of a zk cluster and connect it to other zk servers ? Also, I have a separate 3 node zk cluster ready which I want to use in place of the existing single zk server. What would be the best way to make this switch without loosing any data in the pinot cluster ?
    m
    • 2
    • 5
  • s

    Stuart Millholland

    06/02/2022, 4:04 PM
    Before I re-invent the wheel here has anyone created a kubernetes Init Container to check for all servers to be available before running table creation scripts? I've got an Init Container in my server statefulsets that waits for the controller and that works great, but my init table script now needs to wait for the servers to be available. My current plan is to get my expected replica count and compare that to my pinot instances array (after parsing out only my servers).
    m
    • 2
    • 29
  • a

    Abhijeet Kushe

    06/03/2022, 6:36 PM
    <!here> I am implementing the pagination use-case based on https://pinot.apache.org/docs/user-guide/pql/#pagination-on-selection .I found out that pagination only works without distinct clause but not when distinct is included.Is that a limitation or a bug ?
    m
    a
    s
    • 4
    • 12
  • p

    Priyank Bagrecha

    06/03/2022, 10:54 PM
    Hello, I am trying to use trino connector and running into following error while trying to query pinot via trino
    Copy code
    >>> import trino
    >>> conn = trino.dbapi.connect(host='<redacted>', port=8443, catalog='pinot', schema='default', http_scheme='https', auth=trino.auth.BasicAuthentication("xxx", "yyyy"))
    >>> cur = conn.cursor()
    >>> cur.execute('SELECT * FROM mytable LIMIT 10')
    <trino.client.TrinoResult object at 0x10428d160>
    >>> rows = cur.fetchall()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/local/lib/python3.8/site-packages/trino/dbapi.py", line 558, in fetchall
        return list(self.genall())
      File "/usr/local/lib/python3.8/site-packages/trino/client.py", line 509, in __iter__
        rows = self._query.fetch()
      File "/usr/local/lib/python3.8/site-packages/trino/client.py", line 677, in fetch
        status = self._request.process(response)
      File "/usr/local/lib/python3.8/site-packages/trino/client.py", line 440, in process
        raise self._process_error(response["error"], response.get("id"))
    trino.exceptions.TrinoQueryError: TrinoQueryError(type=INTERNAL_ERROR, name=GENERIC_INTERNAL_ERROR, message="Failed communicating with server: <http://pinot-broker-1.pinot-broker-headless.pinot-dev-ns.svc.cluster.local:8099/debug/routingTable/mytable>", query_id=20220603_211510_00025_9srer)
    I am using the external ip of the loadbalancer i.e
    service/pinot-controller-external
    with port 9000 for
    pinot.controller-urls
    . If it helps, I am using community provided helm chart to stand up the pinot infrastructure on AWS EKS.
    x
    a
    • 3
    • 25
  • a

    Alice

    06/05/2022, 12:50 AM
    Hi, I noticed that I a server instance can be set with multiply tenant tags. But is it not recommended to make a server serve two tenants?
    m
    • 2
    • 2
  • a

    Alice

    06/05/2022, 3:50 AM
    Hi, what’s the common reason for errorCode 235? [ { “message”: “ServerSegmentMissing:\n17 segments [table_name__1__22__20220527T1518Z, table_name__1__10__20220516T1300Z, table_name__1__14__20220517T1900Z, missing on server: Server_pinot-server-28.pinot-server-headless.pinot.svc.cluster.local_8098", “errorCode”: 235 } ]
    m
    • 2
    • 56
  • a

    Ali Atıl

    06/06/2022, 8:03 AM
    Hello everyone, Is there a reason why GroovyFunctionEvaluator returns null on bindings with null values? Would it cause any side effects to run the script with null bindings? Thanks in advance
    m
    n
    • 3
    • 3
  • t

    Tommaso Peresson

    06/06/2022, 10:23 AM
    Hi everybody, I have a question for you. I have a table/schema configured like:
    Copy code
    {
      "OFFLINE": {
        "tableName": "DailyUniqHll_OFFLINE",
        "tableType": "OFFLINE",
        "segmentsConfig": {
          "timeType": "DAYS",
          "retentionTimeUnit": "DAYS",
          "retentionTimeValue": "365",
          "replication": "1",
          "timeColumnName": "partition",
          "allowNullTimeValue": false
        },
        "tenants": {
          "broker": "DefaultTenant",
          "server": "DefaultTenant"
        },
        "tableIndexConfig": {
          "enableDefaultStarTree": false,
          "starTreeIndexConfigs": [
            {
              "dimensionsSplitOrder": [
                "partition",
                "fields.1",
                "fields.2",
                "fields.3",
                "fields.4",
                "fields.5",
                "fields.6",
                "fields.7",
                "fields.8",
                "fields.9"
              ],
              "functionColumnPairs": [
                "SUM__counters.c",
                "DISTINCTCOUNTHLL__hllState"
              ],
              "maxLeafRecords": 1000
            }
          ],
          "enableDynamicStarTreeCreation": true,
          "aggregateMetrics": false,
          "nullHandlingEnabled": false,
          "rangeIndexVersion": 2,
          "autoGeneratedInvertedIndex": false,
          "createInvertedIndexDuringSegmentGeneration": false
        },
        "metadata": {},
        "ingestionConfig": {
          "batchIngestionConfig": {
            "segmentIngestionType": "APPEND",
            "segmentIngestionFrequency": "DAILY"
          },
          "complexTypeConfig": {
            "fieldsToUnnest": [
              "fields",
              "counters"
            ],
            "delimiter": ".",
            "collectionNotUnnestedToJson": "NON_PRIMITIVE"
          }
        },
        "isDimTable": false
      }
    }
    Schema:
    Copy code
    {
      "schemaName": "ViewElementDailyUniqHll",
      "dimensionFieldSpecs": [
        {
          "name": "fields.1",
          "dataType": "STRING"
        },
        {
          "name": "fields.2",
          "dataType": "STRING"
        },
        {
          "name": "fields.3",
          "dataType": "STRING"
        },
        {
          "name": "fields.4",
          "dataType": "STRING"
        },
        {
          "name": "fields.5",
          "dataType": "STRING"
        },
        {
          "name": "fields.6",
          "dataType": "STRING"
        },
        {
          "name": "fields.7",
          "dataType": "STRING"
        },
        {
          "name": "fields.8",
          "dataType": "STRING"
        },
        {
          "name": "fields.9",
          "dataType": "STRING"
        },
        {
          "name": "cubeName",
          "dataType": "STRING"
        },
        {
          "name": "list",
          "dataType": "LONG",
          "singleValueField": false
        },
        {
          "name": "hllState",
          "dataType": "BYTES"
        },
        {
          "name": "counters.c",
          "dataType": "INT"
        }
      ],
      "dateTimeFieldSpecs": [
        {
          "name": "partition",
          "dataType": "STRING",
          "format": "1:SECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd",
          "granularity": "1:DAYS"
        }
      ]
    }
    When I ingest some data I get a ~10x size increase because of
    DISTINCTCOUNTHLL__hllState
    in the star tree index. Is this expected? Is there something misconfigured?
    m
    m
    k
    • 4
    • 9
  • l

    Luis Fernandez

    06/06/2022, 1:25 PM
    hey friends, it’s me again, this time around with a question around partitions, we have an offline job that uploads data to pinot that we are testing in our sandbox env to the offline tables, when that data is ingested and i look at the segments it generates i see the partitions being created like this: in the metadata from the ui
    Copy code
    {\"numPartitions\":8,\"partitions\":[0,1,2,3,4,5,6,7]
    however in our prod system, which has a hybrid setup I always see one number in the partitions column,
    Copy code
    {\"numPartitions\":8,\"partitions\":[1]
    is this something I should be concerned about?
  • m

    Mayank

    06/06/2022, 2:27 PM
    Prod looks good, dev is not partitioned
    l
    • 2
    • 3
  • v

    Varagini Karthik

    06/06/2022, 4:02 PM
    <!here>, facing some problem to load the data to an existing offline table data getting overwrite iam using the following command, If i tried to load the new data I'm loosing the old data any suggestions
    Copy code
    sudo docker run --rm -ti \
        --network=pinot-demo_default \
        -v /home/XXXX/dna/pinot/lookup2/pinot-quick-start:/home/XXXX/dna/pinot/lookup2/pinot-quick-start \
        --name pinot-batch-table-creation \
        apachepinot/pinot:latest AddTable \
        -schemaFile /home/XXXX/dna/pinot/lookup2/pinot-quick-start/orders-schema.json \
        -tableConfigFile /home/XXXX/dna/pinot/lookup2/pinot-quick-start/orders-table-offline.json \
        -controllerHost manual-pinot-controller \
        -controllerPort 9000 -exec
    
    
    sudo docker run --rm -ti \
        --network=pinot-demo_default \
        -v /home/XXXX/dna/pinot/lookup/pinot-quick-start:/home/XXXX/dna/pinot/lookup/pinot-quick-start \
        --name pinot-data-ingestion-job \
        apachepinot/pinot:latest LaunchDataIngestionJob \
        -jobSpecFile /home/XXXX/dna/pinot/lookup/pinot-quick-start/docker-job-spec.yml
    orders-schema.jsonorders-table-offline.json
    l
    t
    m
    • 4
    • 8
  • m

    Mathieu Druart

    06/06/2022, 4:30 PM
    Hello everyone, I have an offline Pinot table with a STRING multi valued column and when I try this request :
    Copy code
    select distinct myMultiValuedColumn from MyTable where otherColumn in ('MY_VALUE') limit 1000
    I have this error :
    Copy code
    "message": "QueryExecutionError:\njava.lang.UnsupportedOperationException\n\tat org.apache.pinot.segment.spi.index.reader.ForwardIndexReader.readDictIds(ForwardIndexReader.java:84)\n\tat org.apache.pinot.core.common.DataFetcher$ColumnValueReader.readDictIds(DataFetcher.java:418)\n\tat org.apache.pinot.core.common.DataFetcher.fetchDictIds(DataFetcher.java:89)\n\tat org.apache.pinot.core.common.DataBlockCache.getDictIdsForSVColumn(DataBlockCache.java:109)",
        "errorCode": 200
    If I remove the distinct or the where clause, I have no issue. Am I missing something ? Thank you !
    m
    s
    j
    • 4
    • 20
  • a

    Alice

    06/07/2022, 2:26 PM
    Hi team, I’ve a question about RealtimeToOfflineSegmentsTask. If I configure this task like this: If I’m not misunderstanding the properties, it means every time this task is executed, 1 hour data older than 24h will be removed from realtime table to offline table. If it’s the case, when I keep bucketTimePeriod the same, and change the task to be executed every 2 hours, will there be longer and longer time data in the realtime table not be moved to offline table? If stream data constantly comes in this realtime table.
    m
    m
    l
    • 4
    • 17
  • d

    Diogo Baeder

    06/07/2022, 4:07 PM
    Hey guys, what sort of architecture would you choose to go for if you wanted to have separated handling of queries according to the client needs - whether "user interface" or "reporting"? More on this thread.
    m
    • 2
    • 11
  • p

    Priyank Bagrecha

    06/07/2022, 6:17 PM
    What is the recommended way for authentication and authorization for programmatic query access to a pinot table? We are thinking of having multiple tables per tenant and would like to be able to control access at a table level. What is the recommended mechanism for access logging?
    m
    • 2
    • 8
  • p

    Prashant Pandey

    06/08/2022, 5:04 AM
    Hi team, if I add an index to my table’s indexing config, what happens to the old segments? Will the new index be created when the segment is loaded during query? Do I need to reload all the segments?
    n
    m
    h
    • 4
    • 13
  • s

    Sowmya Gowda

    06/08/2022, 6:57 AM
    Hi @Xiang Fu @Xiaobing, I have a trouble in load data from s3 to pinot offline table. Sharing tableconfig and segment which is created
    Copy code
    {
      "OFFLINE": {
        "tableName": "test_transcript_OFFLINE",
        "tableType": "OFFLINE",
        "segmentsConfig": {
          "schemaName": "test_transcript",
          "replication": "1",
          "timeColumnName": "timestamp",
          "segmentPushFrequency": "HOURLY",
          "segmentPushType": "APPEND",
          "replicasPerPartition": "1"
        },
        "tenants": {
          "broker": "DefaultTenant",
          "server": "DefaultTenant"
        },
        "tableIndexConfig": {
          "invertedIndexColumns": [],
          "noDictionaryColumns": [],
          "rangeIndexColumns": [],
          "rangeIndexVersion": 2,
          "autoGeneratedInvertedIndex": false,
          "createInvertedIndexDuringSegmentGeneration": false,
          "sortedColumn": [],
          "bloomFilterColumns": [],
          "loadMode": "MMAP",
          "onHeapDictionaryColumns": [],
          "varLengthDictionaryColumns": [],
          "enableDefaultStarTree": false,
          "enableDynamicStarTreeCreation": false,
          "aggregateMetrics": false,
          "nullHandlingEnabled": false
        },
        "metadata": {},
        "quota": {},
        "task": {
          "taskTypeConfigsMap": {
            "SegmentGenerationAndPushTask": {
              "schedule": "/5 * * * * ?",
              "tableMaxNumTasks": "10"
            }
          }
        },
        "routing": {},
        "query": {},
        "ingestionConfig": {
          "batchIngestionConfig": {
            "batchConfigMaps": [
              {
                "input.fs.className": "org.apache.pinot.plugin.filesystem.S3PinotFS",
                "input.fs.prop.region": "us-east-1",
                "input.fs.prop.secretKey": "*****",
                "input.fs.prop.accessKey": "*****",
                "inputDirURI": "<s3://pp-airflow-qa/dremio_test_files/jsonfiles/>",
                "includeFileNamePattern": "glob:**/*.json",
                "excludeFileNamePattern": "glob:**/*.tmp",
                "inputFormat": "json"
              }
            ],
            "segmentIngestionType": "APPEND",
            "segmentIngestionFrequency": "HOURLY"
          }
        },
        "isDimTable": false
      }
    }
    ✅ 1
    k
    k
    • 3
    • 13
  • k

    Kevin Liu

    06/08/2022, 8:09 AM
    I configure replication=2 in TableConfig.segmentsConfig, when I call “v2/segments” to upload segment.tar.gz file to the pinot controller server, I often find that some segments are not loaded correctly on some pinot server servers.
    x
    • 2
    • 27
  • l

    Luis Fernandez

    06/08/2022, 5:21 PM
    Hello my friends it’s me again. I want to make the data that I have in pinot accessible also in bigquery so basically dump that data just in case people want to look at data that beats the retention that we have in pinot for analytics purposes this would be internal, are there ways that you recommend to synch data that we have in pinot to a data warehouse solution like bigquery?
    m
    • 2
    • 2
  • a

    abhinav wagle

    06/08/2022, 6:48 PM
    Hellos, when I issue a query from Pinot Controller UI
    Query console
    and check logs on Pinot Broker Pod. I see following log. Is there a way to identify
    source_ip
    on who/which server issued the query. My goal is to have trace of logs on broker which can provide info on which user/host is querying Pinot.
    Copy code
    requestId=130,table=<redacted>,timeMs=23,docs=72/108615312,entries=2741105/792,segments(queried/processed/matched/consuming/unavailable):256/253/8/32/0,consumingFreshnessTimeMs=1654707653215,servers=6/6,groupLimitReached=false,brokerReduceTimeMs=0,exceptions=0,serverStats=(Server=SubmitDelayMs,ResponseDelayMs,ResponseSize,DeserializationTimeMs,RequestSentDelayMs);pinot-server-2_R=0,4,1817,0,1;pinot-server-5_R=0,5,1817,1,1;pinot-server-1_R=0,20,1820,0,1;pinot-server-4_R=1,21,1820,0,1;pinot-server-0_R=1,4,1818,0,1;pinot-server-3_R=1,5,1817,0,1,offlineThreadCpuTimeNs(total/thread/sysActivity/resSer):0/0/0/0,realtimeThreadCpuTimeNs(total/thread/sysActivity/resSer):0/0/0/0,query=<redacted>
    m
    • 2
    • 6
  • l

    Luis Fernandez

    06/08/2022, 7:02 PM
    another question: if I import data into pinot thru the standalone job, and say I have a retention of 2 years, will it remove data as it ages from the data I imported? or how does that work with the retention manager.
    m
    n
    • 3
    • 6
  • l

    Luis Fernandez

    06/08/2022, 7:39 PM
    another question related to importing data: we have imported our last 2 years worth of data into pinot using the standalone job in Dev however we are observing things between our 2 different behavior for the same query in prod/dev (prod doesn’t have this historical data yet but it does have data for this particular time range). Performance in dev is way slower. this query:
    Copy code
    SELECT product_id, SUM(impression_count) as impression_count, SUM(click_count) as click_count, SUM(cost) as spent_total FROM metrics WHERE user_id = xxx AND serve_time BETWEEN 1651363200 AND 1654012799 GROUP BY product_id LIMIT 6000
    production metadata response:
    Copy code
    "numServersQueried": 4,
      "numServersResponded": 4,
      "numSegmentsQueried": 97,
      "numSegmentsProcessed": 31,
      "numSegmentsMatched": 31,
      "numConsumingSegmentsQueried": 1,
      "numDocsScanned": 15109,
      "numEntriesScannedInFilter": 0,
      "numEntriesScannedPostFilter": 60436,
      "numGroupsLimitReached": false,
      "totalDocs": 493642793,
      "timeUsedMs": 32,
      "offlineThreadCpuTimeNs": 0,
      "realtimeThreadCpuTimeNs": 0,
      "offlineSystemActivitiesCpuTimeNs": 0,
      "realtimeSystemActivitiesCpuTimeNs": 0,
      "offlineResponseSerializationCpuTimeNs": 0,
      "realtimeResponseSerializationCpuTimeNs": 0,
      "offlineTotalCpuTimeNs": 0,
      "realtimeTotalCpuTimeNs": 0,
      "segmentStatistics": [],
      "traceInfo": {},
      "minConsumingFreshnessTimeMs": 1654715649414,
      "numRowsResultSet": 9708
    dev metadata response:
    Copy code
    "exceptions": [],
      "numServersQueried": 4,
      "numServersResponded": 4,
      "numSegmentsQueried": 11703,
      "numSegmentsProcessed": 31,
      "numSegmentsMatched": 31,
      "numConsumingSegmentsQueried": 1,
      "numDocsScanned": 15117,
      "numEntriesScannedInFilter": 0,
      "numEntriesScannedPostFilter": 60468,
      "numGroupsLimitReached": false,
      "totalDocs": 51283295726,
      "timeUsedMs": 580,
      "offlineThreadCpuTimeNs": 0,
      "realtimeThreadCpuTimeNs": 0,
      "offlineSystemActivitiesCpuTimeNs": 0,
      "realtimeSystemActivitiesCpuTimeNs": 0,
      "offlineResponseSerializationCpuTimeNs": 0,
      "realtimeResponseSerializationCpuTimeNs": 0,
      "offlineTotalCpuTimeNs": 0,
      "realtimeTotalCpuTimeNs": 0,
      "segmentStatistics": [],
      "traceInfo": {},
      "minConsumingFreshnessTimeMs": 1654716958681,
      "numRowsResultSet": 9708
    amount of segments in prod: 1600 amount of segments in dev: 13000 I guess my question is that I see segments queried be way higher in dev and I’m wondering why and if that’s the reason why the query is just performing slower in dev it’s almost equal to the amount of segments that exist in the cluster while prod is only querying a tiny portion. Do you have an idea as to what may be happening?
    m
    • 2
    • 26
  • a

    Alice

    06/09/2022, 12:16 AM
    Hi team, I see a doc mentioning “The time boundary is computed based on the value of
    ingestionConfig.batchIngestionConfig.segmentIngestionFrequency
    in the offline table”. I’m wondering how is time boundary computed for offline table without this config configured, if RealtimeToOfflineSegmentsTask is configured in the corresponding realtime table?
    m
    • 2
    • 2
  • a

    Alice

    06/09/2022, 3:05 AM
    Hi team, I’m using grafana to monitoring pinot cluster. What does it really mean by Table consuming latency?
    m
    k
    • 3
    • 6
  • l

    Luis Fernandez

    06/09/2022, 4:05 PM
    hey my friends question, has anyone of you gotten spark to partition your data and then upload it to pinot successfully? I’m trying to get spark to partition my data, but, in pinot data keeps on appearing not partitioned
    m
    k
    +2
    • 5
    • 33
1...434445...166Latest