Apache Pinot #general

Alice

04/12/2022, 1:37 AM

Hi, I have some questions about creating a realtime table to ingest a Kafka topic data. 1, If the topic hasn’t been created yet, will the table be added to Pinot? As my tests, Pinot 0.9.3 failed and returned an error. Pinot 0.10 created the table. 2, Pinot returned ‘Failed to construct kafka consumer’ error, and the table could not be seen on the Pinot Web UI. But when I tried to add the table with the same table config and schema, an error occurred with a message like table exists. So I had to rename the table, but got the same error. In this case, if I tried to this several times, then there would be several tables exist in Pinot but could not be seen in Pinot Web UI. Is there some way I can delete those tables? 3, Any idea of the reason for the error-Could not get PartitionGroupMetadata for topic xyz?

Kevin Liu

04/12/2022, 2:08 AM

Hi, How can I debug the pinot project with idea locally?

Alice

04/12/2022, 3:21 AM

Hi there, are there some background tasks Pinot does every 30 mins?

Chengxuan Wang

04/12/2022, 3:34 AM

just wondering if it is possible to setup multiple h3 resolutions for a geometry column? if so, wondering how it works? like during query, how does pinot choose which resolution to use? 😀 thanks.

sunny

04/12/2022, 5:51 AM

Hi, I created realtime partition table.

Copy code

"tableIndexConfig": {
      "segmentPartitionConfig": {
        "columnPartitionMap": {
          "subject": {
            "functionName": "murmur",
            "numPartitions": 3
          }
        }
      },

And then add kafka topic partition (3->4) and produce data to kafka new partition. But there is no new segment in pinot. So it doesn’t show data in kafka new partition. Although changing configuration numPartitions (3->4) in pinot and rebalance servers, the result is same. It seems that there is no problem in realtime table (none partition). After adding kafka partition and then produce data to new partition, new segment is added in pinot. so It shows data in kafka new partition. Is it normal case? Otherwise, what should I check? Thanks :)

Satyam Raj

04/12/2022, 7:52 AM

hey guys, I’m trying to do batch-ingestion of ORC files from S3 to pinot using the spark batch job.

Copy code

export PINOT_VERSION=0.10.0
  export PINOT_DISTRIBUTION_DIR=/Users/satyam.raj/dataplatform/pinot-dist/apache-pinot-0.10.0-bin
  
  bin/spark-submit \
  --class org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand \
  --master "local[8]" \
  --conf "spark.driver.extraJavaOptions=-Dplugins.dir=${PINOT_DISTRIBUTION_DIR}/plugins -Dlog4j2.configurationFile=${PINOT_DISTRIBUTION_DIR}/conf/pinot-ingestion-job-log4j2.xml" \
  --conf "spark.driver.extraClassPath=${PINOT_DISTRIBUTION_DIR}/plugins/pinot-batch-ingestion/pinot-batch-ingestion-spark/pinot-batch-ingestion-spark-${PINOT_VERSION}-shaded.jar:${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar:${PINOT_DISTRIBUTION_DIR}/plugins/pinot-file-system/pinot-s3/pinot-s3-${PINOT_VERSION}-shaded.jar:${PINOT_DISTRIBUTION_DIR}/plugins/pinot-input-format/pinot-parquet/pinot-parquet-${PINOT_VERSION}-shaded.jar:${PINOT_DISTRIBUTION_DIR}/plugins/pinot-file-system/pinot-hdfs/pinot-hdfs-${PINOT_VERSION}-shaded.jar" \
  ${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar \
  -jobSpecFile '/Users/satyam.raj/dataplatform/pinot-dist/batchjob-spec/batch-job-spec.yaml'

Getting the below weird error:

Copy code

Exception in thread "main" java.lang.VerifyError: Bad type on operand stack
Exception Details:
  Location:
    org/apache/spark/metrics/sink/MetricsServlet.<init>(Ljava/util/Properties;Lcom/codahale/metrics/MetricRegistry;Lorg/apache/spark/SecurityManager;)V @116: invokevirtual
  Reason:
    Type 'com/codahale/metrics/json/MetricsModule' (current frame, stack[2]) is not assignable to 'com/fasterxml/jackson/databind/Module'
  Current Frame:
    bci: @116
    flags: { }
    locals: { 'org/apache/spark/metrics/sink/MetricsServlet', 'java/util/Properties', 'com/codahale/metrics/MetricRegistry', 'org/apache/spark/SecurityManager' }
    stack: { 'org/apache/spark/metrics/sink/MetricsServlet', 'com/fasterxml/jackson/databind/ObjectMapper', 'com/codahale/metrics/json/MetricsModule' }
  Bytecode:
    0000000: 2a2b b500 2a2a 2cb5 002f 2a2d b500 5c2a
    0000010: b700 7e2a 1280 b500 322a 1282 b500 342a
    0000020: 03b5 0037 2a2b 2ab6 0084 b600 8ab5 0039
    0000030: 2ab2 008f 2b2a b600 91b6 008a b600 95bb
    0000040: 0014 592a b700 96b6 009c bb00 1659 2ab7
    0000050: 009d b600 a1b8 00a7 b500 3b2a bb00 7159
    0000060: b700 a8bb 00aa 59b2 00b0 b200 b32a b600
    0000070: b5b7 00b8 b600 bcb5 003e b1

	at java.base/java.lang.Class.forName0(Native Method)
	at java.base/java.lang.Class.forName(Class.java:398)
	at org.apache.spark.util.Utils$.classForName(Utils.scala:238)
	at org.apache.spark.metrics.MetricsSystem$$anonfun$registerSinks$1.apply(MetricsSystem.scala:200)
	at org.apache.spark.metrics.MetricsSystem$$anonfun$registerSinks$1.apply(MetricsSystem.scala:196)
	at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130)
	at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130)
	at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236)
	at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
	at scala.collection.mutable.HashMap.foreach(HashMap.scala:130)
	at org.apache.spark.metrics.MetricsSystem.registerSinks(MetricsSystem.scala:196)
	at org.apache.spark.metrics.MetricsSystem.start(MetricsSystem.scala:104)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:514)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:117)
	at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2550)
	at org.apache.spark.SparkContext.getOrCreate(SparkContext.scala)
	at org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentGenerationJobRunner.run(SparkSegmentGenerationJobRunner.java:196)
	at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.kickoffIngestionJob(IngestionJobLauncher.java:146)
	at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.runIngestionJob(IngestionJobLauncher.java:125)
	at org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand.execute(LaunchDataIngestionJobCommand.java:121)
	at org.apache.pinot.tools.Command.call(Command.java:33)
	at org.apache.pinot.tools.Command.call(Command.java:29)
	at picocli.CommandLine.executeUserObject(CommandLine.java:1953)
	at picocli.CommandLine.access$1300(CommandLine.java:145)
	at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2352)
	at picocli.CommandLine$RunLast.handle(CommandLine.java:2346)
	at picocli.CommandLine$RunLast.handle(CommandLine.java:2311)
	at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179)
	at picocli.CommandLine.execute(CommandLine.java:2078)
	at org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand.main(LaunchDataIngestionJobCommand.java:153)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
	at <http://org.apache.spark.deploy.SparkSubmit.org|org.apache.spark.deploy.SparkSubmit.org>$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:855)
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:930)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:939)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Padma Malladi

04/12/2022, 2:55 PM

Hi, I have time series data flowing in a kafka stream that I am ingesting into Pinot using real time ingestion technique. I have a 48 hour data retention. Data volume is around 6 TB for 48 hours. we have created an inverted index for one of the filter attributes. Query performance is about 18 seconds when we query the data for that filter. However, there is another query parameter of timestamps that are range filters and have no indexes created currently as we thought that the segments will be created based on the time attribute defined in the table config. Is that a correct assumption or do you suggest creating a sorted index for the timestamp attributes? What kind of hardware can be ideal for getting <100 ms query performance time. Query performance is running into seconds initially and then reduces significantly and runs into 200 ms. So, holding the data into memory is key, but we cant have 6TB of memory across 43 pods we have allocated to pinot servers. Each pinot server is configured to have about 28-32 gb out of which 50% is allocated to JVM heap and the rest to the memory mapping

Lars-Kristian Svenøy

04/13/2022, 12:37 PM

Hello team 👋 Any chance we could publish linux/arm64 images for pinot? I see we've started doing that in 0.11.0, but 0.10 and below do not support that architecture. I'm running into problems running pinot locally on the Mac M1 due to the chipset.

francoisa

04/13/2022, 2:23 PM

Hi 🙂 Little question comming with prod getting closer and realData 😄 A few things goes wrong. I’ve got two table reading the same kafka topic. Both of them are using a complexTypeConfig to unnest 30 days arrays. And I gettin an infinite loop error

Copy code

java.lang.RuntimeException: shaded.com.fasterxml.jackson.databind.JsonMappingException: Infinite recursion (StackOverflowError) (through reference chain: org.apache.pinot.spi.data.readers.GenericRow["fieldToValueMap"]->java.util.Collections$UnmodifiableMap["$MULTIPLE_RECORDS_KEY$"]->java.util.ArrayList[0]->org.apache.pinot.spi.data.readers.GenericRow["fieldTo>
        at org.apache.pinot.spi.data.readers.GenericRow.toString(GenericRow.java:247) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
        at java.util.Formatter$FormatSpecifier.printString(Formatter.java:3031) ~[?:?]
        at java.util.Formatter$FormatSpecifier.print(Formatter.java:2908) ~[?:?]
        at java.util.Formatter.format(Formatter.java:2673) ~[?:?]
        at java.util.Formatter.format(Formatter.java:2609) ~[?:?]
        at java.lang.String.format(String.java:2897) ~[?:?]
        at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.processStreamEvents(LLRealtimeSegmentDataManager.java:543) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
        at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.consumeLoop(LLRealtimeSegmentDataManager.java:420) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
        at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager$PartitionConsumer.run(LLRealtimeSegmentDataManager.java:598) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
        at java.lang.Thread.run(Thread.java:829) [?:?]

TransformConfig as Folow ->

Copy code

"complexTypeConfig": {
        "fieldsToUnnest": [
          "data.attributes.regularTimes"
        ],
        "delimiter": ".",
        "collectionNotUnnestedToJson": "NON_PRIMITIVE"
      }

The other table as the same complexTypeConfig but based on another field. Any idea ?

Bodu Janardhan

04/13/2022, 2:25 PM

Hi team @User @User @User..Im doing a POC to use Pinot where we have some 70 tables(in older database) and majorly 10 to 15 tables are queried currently with joins to get aggregations and analytics of the system. Our data is growing at fast pace and wanted to check if pinot satisfies our needs. I wanted to use presto with Pinot, with existing join queries and wanted to minimize new data modelling for pinot. I found some bench marks here(https://www.startree.ai/blogs/real-time-analytics-with-presto-and-apache-pinot-part-ii) for presto + pinot but it was only for single table(name- complexWebsite, with billion records). Merging/Modelling all columns from our tables to single table(to satisfy without joins) is very difficult since we have many field dependencies. Do you have any reference links where I can find kind of similar above benchmarking with multiple tables(with joins) queried from presto to pinot. Can anyone help me in this aspect on how to proceed..Thanks in advance..

Alice

04/14/2022, 4:37 AM

Hi, if I tag a server tenanta and a broker tenanta, then modify the table’s tenants config to use tenanta broker and tenanta server, and finally rebalance brokers and servers, will the table’s index data be moved to the server tagged tenanta? From my test, the index data for this table is still located on the previous server.

Alice

04/14/2022, 5:29 AM

Hi, Pinot creates initially the same number segments as the number of a topic partition when ingesting a Kafka stream data to a table. Can I adjust the segment number instead of the topic partition number?

coco

04/14/2022, 8:13 AM

I'm looking for a way to delete (a row) and change data in pinot . For example, if a member withdraws, all data of the member must be deleted immediately. - I can replace segments of a full period in an offline table. - In realtime tables I would use UPSERT mode. I can upsert null values. But I can't use star-tree index. Can I delete without using UPSERT mode? https://docs.pinot.apache.org/basics/data-import/upsert Is there a way to delete a Row from a segment of an offline + realtime table in Pinot?

Monica

04/14/2022, 8:47 AM

Hi everyone, I was trying to use spark to do batch ingestion. But I got an error like this when I executed:

Copy code

ERROR StatusLogger Unrecognized format specifier [d]
ERROR StatusLogger Unrecognized conversion specifier [d] starting at position 16 in conversion pattern.
ERROR StatusLogger Unrecognized format specifier [thread]
ERROR StatusLogger Unrecognized conversion specifier [thread] starting at position 25 in conversion pattern.
ERROR StatusLogger Unrecognized format specifier [level]
ERROR StatusLogger Unrecognized conversion specifier [level] starting at position 35 in conversion pattern.
ERROR StatusLogger Unrecognized format specifier [logger]
ERROR StatusLogger Unrecognized conversion specifier [logger] starting at position 47 in conversion pattern.
ERROR StatusLogger Unrecognized format specifier [msg]
ERROR StatusLogger Unrecognized conversion specifier [msg] starting at position 54 in conversion pattern.
ERROR StatusLogger Unrecognized format specifier [n]
ERROR StatusLogger Unrecognized conversion specifier [n] starting at position 56 in conversion pattern.
ERROR StatusLogger Reconfiguration failed: No configuration found for '533ddba' at 'null' in 'null'
Exception in thread "main" java.lang.ExceptionInInitializerError
	at org.apache.pinot.tools.admin.command.StartKafkaCommand.<init>(StartKafkaCommand.java:51)
	at org.apache.pinot.tools.admin.PinotAdministrator.<clinit>(PinotAdministrator.java:98)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:348)
	at org.apache.spark.util.Utils$.classForName(Utils.scala:237)
	at <http://org.apache.spark.deploy.SparkSubmit.org|org.apache.spark.deploy.SparkSubmit.org>$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:813)
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:927)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:936)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.util.NoSuchElementException
	at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:365)
	at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
	at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
	at org.apache.pinot.tools.utils.KafkaStarterUtils.getKafkaConnectorPackageName(KafkaStarterUtils.java:54)
	at org.apache.pinot.tools.utils.KafkaStarterUtils.<clinit>(KafkaStarterUtils.java:46)
	... 12 more

It seems like spark couldn't find

org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory

from Kafka plugin. I built pinot from source code on the

master

branch using command (because we use jdk8 in our machines):

Copy code

mvn clean install -DskipTests -Pbin-dist -T 4  -Djdk.version=8

My spark job using commands like this, which I've set

-DPlugins.dir

according to documentation:

Copy code

export PINOT_VERSION=0.10.0-SNAPSHOT
export PINOT_DISTRIBUTION_DIR=/home/xxx/apache-pinot-0.10.0-SNAPSHOT-bin
echo ${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar
cd ${PINOT_DISTRIBUTION_DIR}

${SPARK_HOME}/bin/spark-submit \
  --class org.apache.pinot.tools.admin.PinotAdministrator \
  --master "local[2]" \
  --deploy-mode client \
  --conf "spark.executorEnv.JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.171-8.b10.el7_5.x86_64/jre" \
  --conf "spark.yarn.appMasterEnv.JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.171-8.b10.el7_5.x86_64/jre" \
  --conf "spark.driver.extraJavaOptions=-Dplugins.dir=${PINOT_DISTRIBUTION_DIR}/plugins -Dlog4j2.configurationFile=${PINOT_DISTRIBUTION_DIR}/conf/pinot-ingestion-job-log4j2.xml" \
  --conf "spark.driver.extraClassPath=${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar" \
  ${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar \
  LaunchDataIngestionJob  \
  -jobSpecFile ${PINOT_DISTRIBUTION_DIR}/examples/batch/transcriptData/sparkIngestionJobSpec.yml

Is it because spark couldn't find my plugins' jars from

plugins.dir

, I'm not familiar with spark, do I need to add all plugins' jars to spark classpath using

--jars

or something? Could you help me?

Harish Bohara

04/14/2022, 10:50 PM

Hi.. if I have a large number of segments (for realtime tables). Is there a setting which merges segment at background? Or Any cron Job?

Harish Bohara

04/15/2022, 8:07 AM

Hi, I have a setup with apachepinot/pinot:0.10.0-SNAPSHOT-df1c2681fd-20220207-jdk11 docker version to run my Pinot Server. I am running 4 pinot-servers, running on 4 m5a.large. It is the only container running in one box (i have given 6GB for VM in JVM args) What I expect: The ingestion of events should be constant. What I observed: When I run this cluster, the ingestion is > 5000RPS. However, it goes below 100RPS after 1 day of continue run. If I restart all the pinot-servers, they ingestion rate goes back to 5K+ RPS again Am I missing some setting which is causing this degrade?

Nikhil Varma

04/16/2022, 4:14 AM

Hi all, i had setup an pinot cluster and minio s3 as a deep storage to store my data. but it is not uploading any segments to minio s3. please help if anyone worked on deepstorage with s3.

Diogo Baeder

04/16/2022, 4:45 AM

Hey folks, something I learned about Pinot batch ingestion, today: it can be a bit picky with the input configuration, so for example:

Copy code

inputDirURI: '/foo/bar'
includeFileNamePattern: 'glob:baz/**/*.json'

doesn't work if you want to ingest from JSON files inside

/foo/bar/baz

. Instead, this should be used:

Copy code

inputDirURI: '/foo/bar/baz'
includeFileNamePattern: 'glob:**/*.json'

notice how

inputDirURI

goes to the deepest possible fixed subdirectory, and then the pattern will start from there.

Kevin Xu

04/18/2022, 9:27 AM

Hi, @User Do you mind spending some time reviewing this PR. https://github.com/apache/pinot/pull/8314. Thanks in advance!

👀 1

👍 1

coco

04/19/2022, 4:28 AM

Hi. team~ What is the difference between 'append' and 'refresh' in segmentPushType for offline table? https://docs.pinot.apache.org/configuration-reference/table#segments-config

Harish Bohara

04/20/2022, 9:37 AM

I have a basic question on Pinot search (I am sure Pinot will have it - Just want to know internals of how it does it) - suppose I have thousands of segment files over long time. How does a query avoids looking at all these files to give a query result- e.g. if i query for a data for time range. Does Pinot know which segment files to read to fulfil this query. Any detailed doc will help which can explain this..

Yahya Zuberi

04/20/2022, 12:53 PM

Hey everyone, So we are exploring to use GRAPHQL on pinot. Any experiences, recommendations and challenges that anyone have please share that can make our POC helpful. Regards

Saumya Upadhyay

04/20/2022, 1:34 PM

hi All, I have a question about what is the best way to create segments, like I have realtime table which is ingesting data from kafka topic and topic has deviceid as key, should we create segments as per device id or it should be based on default time based segments, our queries mostly have searches for device id and time range. If we create segments as per device Id will the same device id data go in same segment and if it is like that query will be faster and will it look only for segments which has these deviceIds, how will it work.

Joshua Seagroves

04/21/2022, 5:08 PM

Hi! I was wondering if any of the Thirdeye folks are in here? They used to have their own slack but I don't see it anymore. I was going to ask if I could help them in updated some of the docs as well as code base. I am not seeing much activity, the docker hub container is old and documents are incorrect so maybe thirdeye is not around anymore?

👍 1

Diana Arnos

04/21/2022, 6:06 PM

Hey there, What does the metric about server mapped memory usage refers to? The servers are using 20Gb total (according to k9s and grafana) but that metric states that more than 30gb has been mapped. I’m confused. (I don't have too much exp with ops)

Mesut Özen

04/21/2022, 8:48 PM

Hi Team, Does Pinot has a table limitation per tenant or per cluster? We plan to use pinot for storing client events such click, page view etc. We create ~5 tables per customer for their client events.

Nizar Hejazi

04/22/2022, 9:29 AM

Hi team, since upsert table maintains an in-memory map from the primary key to the record location, is it still recommended to define an inverted index on the table’s primary key? The primary key is unique per record, so is it recommended to define a bloom filter on the primary key instead of an inverted index to save space or is O(n) space per index, where n is the number of records, considered normal?

🍷 1

erik bergsten

04/22/2022, 2:28 PM

Querying question: When we query our table (with 6million rows) and aggregate (sum/avg) metrics over dimensions we get very good performance, usually ~250ms. If we use DATETIMECONVERT and group by month and year we suddenly see a jump to about 1sec or even more. Our solution is to add year and month as dimensions with an ingestion transform, is there a better way?

Carl

04/22/2022, 2:44 PM

Hi team, we have a datetimefield column storing epoc milliseconds from local time zone, is there a transformation in Pinot supporting convert it into UTC timestamp with Pinot query?

Tejaswini Edara

04/25/2022, 11:51 AM

Any open source dashboard which i can integrate with pinot