Apache Pinot #pinot-perf-tuning

Tony Requist

11/10/2021, 3:45 PM

Question about server disk size - do server nodes need enough disk space to store all segments? Or will segments get dropped from local disk and re-read from deep storage as needed if the disk gets full?

Kishore G

11/10/2021, 3:46 PM

it needs enough disk space to store all the segments assigned to it

Tony Requist

11/10/2021, 3:55 PM

Thanks. So deep storage is just a backup. Is this use case tiered storage is meant to address? We have a AWS/EKS deployment and our cost is driven by server storage (EBS) - it would be ideal to have older data in S3

Subbu Subramaniam

11/10/2021, 6:31 PM

@User perhaps you are looking for a solution being worked on in this issue: https://github.com/apache/pinot/issues/7229

Subbu Subramaniam

11/10/2021, 6:31 PM

Tiered storage just moves some segments to a different set of servers, but those servers now need to have enough storage to host these.

Subbu Subramaniam

11/10/2021, 6:33 PM

Even in the issue that I mention, it is expected that the storage use temporarily bumps up on the servers, and then reclaimed when the segments "age". Pinot does not handle the case of serving data from segments that cannot be stored on servers.

Priyank Bagrecha

06/03/2022, 11:42 PM

@Priyank Bagrecha has left the channel

Anish Nair

07/05/2022, 7:51 AM

Hey guys, Regarding Realtime Table Memory usage . i had posted a thread on troubleshoot group. https://apache-pinot.slack.com/archives/C011C9JHN7R/p1656747644771759 Can anyone advice on the same?

Kartik Khare

09/14/2022, 3:37 PM

@Abhishek Gupta

Abhishek Gupta

09/14/2022, 5:26 PM

Hey everyone, I'm trying to run below query but it is erroring out with

upstream request timeout

, when using single month data in filter then it is working. Not sure why error message isn't giving valid reason of failure. Can you guys suggest the right and necessary table config

Copy code

SELECT
    DISTINCTCOUNT( CASE WHEN data_source = 'web' THEN master_id ELSE 'a' END) as activity_account_count,
	DISTINCTCOUNT( CASE WHEN data_source in ('b2bn', 'b2bn_excluded') THEN master_id ELSE 'a' END) as kw_account_count,
    DISTINCTCOUNT( CASE WHEN data_source = 'fpm' THEN master_id ELSE 'a' END) as fpm_account_count,
    DISTINCTCOUNT( CASE WHEN data_source = 'tpm' THEN master_id ELSE 'a' END) as tpm_account_count,
    DISTINCTCOUNT( CASE WHEN data_source = 'crm' THEN master_id ELSE 'a' END) as crm_account_count,
    DISTINCTCOUNT( CASE WHEN data_source = 'map' THEN master_id ELSE 'a' END) as map_account_count,
    DISTINCTCOUNT( master_id) as all_account_count
FROM
    six_sense_dapm
WHERE
    dt BETWEEN '2022-01-01' AND '2022-08-30'
    AND data_source IN ('web', 'b2bn', 'b2bn_excluded', 'fpm', 'tpm', 'crm', 'map')
    AND product='__all__'

table config :

Copy code

{
  "OFFLINE": {
    "tableName": "sumologic_dapm_OFFLINE",
    "tableType": "OFFLINE",
    "segmentsConfig": {
      "timeType": "DAYS",
      "schemaName": "sumologic_dapm",
      "replication": "2",
      "segmentPushType": "APPEND",
      "timeColumnName": "dt",
      "allowNullTimeValue": false
    },
    "tenants": {
      "broker": "DefaultTenant",
      "server": "DefaultTenant"
    },
    "tableIndexConfig": {
      "invertedIndexColumns": [
        "data_source",
        "product",
        "source_activity_name"
      ],
      "noDictionaryColumns": [
        "external_id",
        "master_id",
        "secondary_id",
        "source_activity_desc",
        "source_activity_url",
        "source_activity_referrer_url",
        "source_activity_desc",
        "source_activity_url_r",
        "metric_value",
        "source_id"
      ],
      "rangeIndexColumns": [
        "dt"
      ],
      "optimizeDictionaryForMetrics": true,
      "enableDefaultStarTree": true,
      "enableDynamicStarTreeCreation": true,
      "aggregateMetrics": true,
      "nullHandlingEnabled": true,
      "rangeIndexVersion": 2,
      "autoGeneratedInvertedIndex": false,
      "createInvertedIndexDuringSegmentGeneration": false,
      "sortedColumn": [
        "data_source",
        "master_id"
      ],
      "loadMode": "MMAP"
    },
    "metadata": {},
    "isDimTable": false
  }
}

Prakhar Pande

11/07/2022, 8:16 PM

This query does not scale beyond 240 qps {"sql":"select event_time , sum(count) as event_counts from nrt_app_open where year=year(now()) and month=month(now()) and day>=(day(now())-1) and FromDateTime(event_time, 'yyyy-MM-dd HHmmss') > cast((now() - 86400000) as long) group by 1 limit 1000000000","trace":false,"queryOptions":""} Any ideas on how I can optimise this. Cpu and memory of pinot-servers are also under utilized. Thanks in advance.

Lee Wei Hern Jason

03/03/2023, 4:31 AM

Hi Team, would like to recommendations for resource allocation. Right now we use the default configuration of allowing our segments to build off heap and i know that indexes are kept on heap. We want to get some advice on how much memory should we allocate to our severs/broker/controller ? Right now we are using i3.2xlarge. We want to fix our heap/nonheap allocation (currently we are using a % which is non ideal).

Shreeram Goyal

03/21/2023, 4:30 PM

Hi, I am facing an issue with offline servers on running heavy queries via presto. I have 6 offline servers each with 32G RAM and I have configured my tables to have 2 RGs with 3 servers each. We have major chunk of our data (some tables of size 29G) in offline servers and that would keep increasing with time. When I run queries, most of the times the server goes down due to OOM or the query gets aborted due to some exception. Can I get some insights on the configuration for heap/non-heap allocation?

Eric Liu

05/09/2023, 11:59 PM

Is it NOT recommended to use a primary key that created by

SHA-256

algorithm for the upsert table from the performance perspective? I want to avoid collisions as much as possible.

Eric Liu

08/16/2023, 3:02 AM

What are the recommended routing configs (more specifically, the

segmentPrunerTypes

config) for an upsert table? The partition key (say

pk

) in my upsert table is the hashed value of two concatenated id fields (lets say

and

), and the most common pattern of queries against that table is filtering on column

instead of the

pk

Eric Liu

08/24/2023, 3:34 PM

What’s the recommended memory setup for broker? If a node has 16GB memory, how much memory request of the pod (assume single pod on the node), heap and off heap I should configure?

Lee Wei Hern Jason

08/25/2023, 5:20 AM

Hi Team, can i get some advise if our star tree index is configured correctly for this query ? Query:

Copy code

SELECT start_timestamp_10mins, SUM(online_seconds)/3600.0 AS value FROM m_driver_supply_derived WHERE country_id = 6 AND city_id IN (84, 130, 20, 340, 369, 79, 360, 308, 98, 81, 230, 34, 349, 350, 370, 63, 372, 346, 231, 60, 86, 61, 218, 99, 43, 363, 222, 64, 367, 80, 15, 101, 348, 62, 361, 28, 75, 373, 356, 352, 69, 41, 362, 55, 347, 341, 374, 365, 219, 65, 275, 10, 66, 364, 40, 225, 366, 146, 35, 337, 354, 342, 344, 102, 345, 96, 132, 78, 18, 359, 36, 256, 77, 358, 343, 371, 100, 357, 368, 215, 26, 255, 44, 144, 351, 85, 353, 76) AND 
business_vertical = 'BUSINESS_VERTICAL_TYPE_TRANSPORT'  AND 
granularity = 'GRANULARITY_BUSINESS'  AND 
start_timestamp_10mins >= '2023-08-15 17:00:00.0' AND start_timestamp_10mins < '2023-08-25 17:00:00.0'  GROUP BY start_timestamp_10mins HAVING value >= 0 ORDER BY start_timestamp_10mins ASC LIMIT 10000000

Star Tree Index: We are configuring it in the order of highest cardinality to lowest.

Copy code

"starTreeIndexConfigs": [
        {
          "dimensionsSplitOrder": [
            "vehicle_type_id",
            "city_id",
            "country_id",
            "vehicle_mode",
            "granularity",
            "start_timestamp_10mins",
            "start_timestamp_1hour",
            "start_timestamp_4hour"
          ],
          "skipStarNodeCreationForDimensions": [],
          "functionColumnPairs": [
            "SUM__online_seconds",
            "SUM__online_count",
            "SUM__in_transit_seconds"
          ],
          "maxLeafRecords": 10000
        }
      ],

Venkat Boina(VB)

12/01/2023, 6:05 AM

@Elon When can we start using lookup function and other functions in passthrough queries?

Sumitra Saksham

01/04/2024, 5:07 AM

Good Morning Everyone, I have a doubt related to scaling of Apache Pinot. When the volume of data is continuously growing, we need to focus on scaling of the realtime server and not on broker.

Julius Kabugu

04/03/2024, 9:33 PM

Hi There. Does anyone have a sizing guide for a Pinot cluster, if we know the daily number of transactions and raw data size received, as well as the number of queries per day?

somanath joglekar

08/20/2024, 8:09 PM

Hey everyone , we have issues with the performance specifically using the EMR for ingestion with spark3 Initially had issue with the file size, anything greater than 13 mb(parquet files) resulted in a 'No space left on device' error (segmentCreationJobParallelism = 1) so we reduced file size to under 5mb Now able to run the ingestion with segmentCreationJobParallelism = 4 but time is @25 sec/segment. questions #1) Is there a way to avoid the memory error when using parquet files ? #2) Is there any config change required in emr for increasing the speed of ingestion ? #3) Whats the ideal cluster size for EMR with files(parquet) upto 100 mb Note - Ingestion is at @10sec/segment when using my laptop (m3 processor,18gb RAM) More Details pinot version 1.0.0 EMR (Hadoop 3.3.6, Hive 3.1.3, JupyterEnterpriseGateway 2.6.0, Livy 0.8.0, Spark 3.5.0) Core : "m5.12xlarge",gp2,size-64gb Primary : "m5.2xlarge",gp2,size-64gb

Benjamin Greene

08/28/2024, 7:46 PM

Hi All, I'm having some unexpected behaviour in terms of batch ingestion. I'm using spark on EMR with s3 as our filepath. This is my run command: spark-submit -v \ --class org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand \ --master yarn \ --deploy-mode cluster \ --conf "spark.driver.extraJavaOptions=-Dplugins.dir=${PINOT_DISTRIBUTION_DIR}/plugins" \ --conf "spark.driver.extraClassPath=${PINOT_DISTRIBUTION_DIR}/plugins-external/pinot-batch-ingestion/pinot-batch-ingestion-spark-3/pinot-batch-ingestion-spark-3-${PINOT_VERSION}-shaded.jar:${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar:${PINOT_DISTRIBUTION_DIR}/plugins/pinot-file-system/pinot-s3/pinot-s3-${PINOT_VERSION}-shaded.jar:${PINOT_DISTRIBUTION_DIR}/plugins/pinot-input-format/pinot-parquet/pinot-parquet-${PINOT_VERSION}-shaded.jar" \ --conf "spark.executor.extraClassPath=${PINOT_DISTRIBUTION_DIR}/plugins-external/pinot-batch-ingestion/pinot-batch-ingestion-spark-3/pinot-batch-ingestion-spark-3-${PINOT_VERSION}-shaded.jar:${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar:${PINOT_DISTRIBUTION_DIR}/plugins/pinot-file-system/pinot-s3/pinot-s3-${PINOT_VERSION}-shaded.jar:${PINOT_DISTRIBUTION_DIR}/plugins/pinot-input-format/pinot-parquet/pinot-parquet-${PINOT_VERSION}-shaded.jar" \ --jars "${PINOT_DISTRIBUTION_DIR}/plugins-external/pinot-batch-ingestion/pinot-batch-ingestion-spark-3/pinot-batch-ingestion-spark-3-${PINOT_VERSION}-shaded.jar,${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar,${PINOT_DISTRIBUTION_DIR}/plugins/pinot-file-system/pinot-s3/pinot-s3-${PINOT_VERSION}-shaded.jar" \ --files s3://our-s3-path/raw-pinot/spec_file/ingestion_job_spec.yaml \ local://${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar \ -jobSpecFile ingestion_job_spec.yaml Currently the job succeeds and I'm able to generate 1 segment in approximately 20 seconds. However it seems that there is no impact from the segmentCreationJobParallelism flag in the ingestion job spec yaml file. Are there any additional steps I can take to increase the parallelism?

somanath joglekar

09/10/2024, 9:24 PM

Hi All , I am trying to run ingestion on EMR using the spark submit and need to ingest around 9000 parquet files approx 100 mb each, also setup a staging folder on s3 with jobType : SegmentCreationAndMetaDataPush and segmentCreationJobParallelism = 3 The issue is that the job runs for a very long period of time (more than 6 hours) without spitting out any segments (stg or output folder) Question 1: Does it usually take longer to ingest parquet files ? I am using 10 * m5d.12xlarge clusters Question 2: How to i setup a consistent logging channel for EMR Attaching the IngestionSpecFile for more ref , EMR specs are listed below EMR m5d.12xlarge(Primary) + 9 * m5d.12xlarge(Core) 48 vCore, 192 GiB memory, 1800 SSD GB storage emr-7.1.0 Installed apps: Hadoop 3.3.6, Hive 3.1.3, JupyterEnterpriseGateway 2.6.0, Livy 0.8.0, Spark 3.5.0

Ingestion_Spec_Slack.yaml

somanath joglekar

09/24/2024, 7:06 PM

Hi All , We are trying load the segments (SegmentMetadataPush) onto the pinot cluster but scaling is not working as expected What would be the recommended number of controllers , servers , zookeepers and brokers configuration to load about 460 gb of data ? also recommend memory for each if possible Thanks in advance 😃

somanath joglekar

09/25/2024, 3:53 PM

Hi Everyone , I am trying to run ingestion job on EMR for about 8000 files with total size of 460 gb in a single folder but I get a timeout error on s3 list (error below, check attachment for detailed errors ) , each file size ~50mb avro format 24/09/25 145955 INFO S3PinotFS: Listed 8000 files from URI: s3://location/8000_files/, is recursive: true 24/09/25 150133 ERROR ApplicationMaster: Uncaught exception: java.util.concurrent.TimeoutException: Futures timed out after [100000 milliseconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:259) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:263) the same setup (including EMR computing resources + commands) works for 6600 files with total size of 190gb of same data Questions 1. Is there a limit on memory when listing files for ingestion on s3 2. Is there a limit on number of files or size of files when trying to ingestion data

s3_timeout_error

Benjamin Greene

10/16/2024, 9:03 PM

Hi Everyone, I have a question about performing lots of calculations. We have a few queries where we need to calculate the sum of each column after applying a series of where conditions. (10k+ columns with Boolean values) When we have many AND conditions the query is quite fast, as we collect the data and retrieve all the sums on a relatively flat dataset. However when we have a very limited number of AND conditions the query slows way down, as for each column we need to calculate the SUM for millions of rows. This has a propensity to take a very long time or worse, throw an out of memory error for the java heap space. I don't think that pre-aggregations will work here as they are for an entire column and not an entire column with a condition. (please correct me if I'm wrong) Furthermore, our data filtering and retrieval seems to be quite fast, it doesn't feel like it's an issue of retrieving the data but more aggregating the data. Any thoughts on increasing performance? Any ideas or thoughts are welcome.

Hoàng Giang

12/21/2024, 6:18 AM

Hi, can anyone help me teach me how pinot work, how you can tune it,improve requirement

somanath joglekar

01/22/2025, 8:48 PM

Hi Everyone , I have a requirement to condense smaller segments into large ones but i do not have a time based partition nor a time column in my tables ( ALL my tables are OFFLINE tables ) I am trying to configure Merge/Rollup Segments (https://docs.pinot.apache.org/operators/cli#merge-rollup-segments) to somehow achieve it My question is 1. Can we use merge on offline table segments ( with segments on s3 ) 2. How do i use SegmentProcessorFramework ?

Kezheng Xiang

02/18/2025, 6:15 PM

Hi team , I am trying to run ingestion job on EMR for about 87k parquet files with total size of 4.4 Tbi in a single s3 folder but I got timeout error on s3 list (error below, check attachment for detailed errors ). I believe the computation resources I assigned to Pinot and EMR cluster(used for spark-submit ingestion job) is adequate. The ingestion job works fine when it was used for loading smaller dataset. 24/09/25 145955 INFO S3PinotFS: Listed 8000 files from URI: s3://location/8000_files/, is recursive: true 24/09/25 150133 ERROR ApplicationMaster: Uncaught exception: java.util.concurrent.TimeoutException: Futures timed out after [100000 milliseconds] similar issue that @somanath joglekar got on Sep last year. I have same issue he asked in the past 1. Is there a limit on memory when listing files for ingestion on s3 2. Is there a limit on number of files or size of files when trying to ingestion data Can anyone help me with this question?

Sepehr

02/23/2025, 10:13 AM

Hi everyone, I'm trying to test Pinot on large data. my first test have a table with 4 billion rows (around 10 minutes of my whole data) in it and my cluster has 3 controllers, 3 brokers, 9 servers and 8 minions. I'm ingesting data using Flink and didn't change any of the default configs (50,000 rows per segment and executor pool of size 5). Most of my where clauses are on three columns which are IP addresses. I have also tried the Merge Rollup tasks to create segments of size 5,000,000. I have indexed those three columns using bloom filter, dictionary, forward and inverted index. Currently I'm trying some random access and aggregation queries on this architecture and not getting the performance I want (The easiest queries took around 130 seconds). Can anyone help me tuning my cluster?