Apache Pinot #troubleshooting

jose farfan

02/13/2021, 3:38 AM

but : select * from transaction_line limit 2147483645 it is working

Tamás Nádudvari

02/13/2021, 11:09 AM

Hi, we have a realtime table that consumes a Kafka topic and creates new segments every hour (

realtime.segment.flush.threshold.time: "1h"

). Its replica is set to 2, and when I query the number of documents on a not too recent interval, I can see two different numbers alternating. I understand that the Kafka offsets of the two servers consuming the same partition can drift. But when I select an interval that’s couple hours from the current time, so presumably querying from a finished/closed segment, I’m still facing the same issue. According to the docs, shouldn’t the other replica, which has less records consumed, acquire the segment with the more documents in it after it’s closed?

Ravi Singal

02/15/2021, 5:11 PM

Hi, How should we manage idealstate znode for a table having a large number of segments? One of our real time table has more than 18k segments and zookeeper node size is greater than 3.2 MB. zookeper is supposed to have nodes of size less than 1 MB. will it have a negative impact on pinot controller performance when it read or write (during segment completion) ideal state of the table?

Nick Bowles

02/17/2021, 2:49 AM

Perfect thank you. I’m happy to help write some of that documentation. I know the documentation is here I believe. Is there a list of any outstanding things like the minion that need to be added/priority?

minwoo jung

02/18/2021, 4:05 AM

Hello~ When thirdeye is executed using helm, if the following message is displayed, it does not work. The same problem occurs when using the master branch, 0.6.0 release branch. ------------------------------------------------ ------------------------------------------------ Running Thirdeye frontend config: ./config/pinot-quickstart log4j:WARN No appenders could be found for logger (org.apache.pinot.thirdeye.dashboard.ThirdEyeDashboardApplication). log4j:WARN Please initialize the log4j system properly. [2021-02-18 122539] INFO [main] o.h.v.i.u.Version - HV000001: Hibernate Validator null io.dropwizard.configuration.ConfigurationParsingException: ./config/pinot-quickstart/dashboard.yml has an error: * Failed to parse configuration at: logging; Cannot construct instance of

io.dropwizard.logging.DefaultLoggingFactory

, problem: Unable to acquire the logger context at [Source: UNKNOWN; line: -1, column: -1] (through reference chain: org.apache.pinot.thirdeye.dashboard.ThirdEyeDashboardConfiguration["logging"]) at io.dropwizard.configuration.ConfigurationParsingException$Builder.build(ConfigurationParsingException.java:279) at io.dropwizard.configuration.BaseConfigurationFactory.build(BaseConfigurationFactory.java:156) at io.dropwizard.configuration.BaseConfigurationFactory.build(BaseConfigurationFactory.java:89) at io.dropwizard.cli.ConfiguredCommand.parseConfiguration(ConfiguredCommand.java:126) at io.dropwizard.cli.ConfiguredCommand.run(ConfiguredCommand.java:74) at io.dropwizard.cli.Cli.run(Cli.java:78) at io.dropwizard.Application.run(Application.java:93) at org.apache.pinot.thirdeye.dashboard.ThirdEyeDashboardApplication.main(ThirdEyeDashboardApplication.java:200) Caused by: com.fasterxml.jackson.databind.exc.ValueInstantiationException: Cannot construct instance of

io.dropwizard.logging.DefaultLoggingFactory

, problem: Unable to acquire the logger context at [Source: UNKNOWN; line: -1, column: -1] (through reference chain: org.apache.pinot.thirdeye.dashboard.ThirdEyeDashboardConfiguration["logging"]) at com.fasterxml.jackson.databind.exc.ValueInstantiationException.from(ValueInstantiationException.java:47) at com.fasterxml.jackson.databind.DeserializationContext.instantiationException(DeserializationContext.java:1732) at com.fasterxml.jackson.databind.deser.std.StdValueInstantiator.wrapAsJsonMappingException(StdValueInstantiator.java:491) at com.fasterxml.jackson.databind.deser.std.StdValueInstantiator.rewrapCtorProblem(StdValueInstantiator.java:514) at com.fasterxml.jackson.module.afterburner.deser.OptimizedValueInstantiator._handleInstantiationProblem(OptimizedValueInstantiator.java:59) at io.dropwizard.logging.DefaultLoggingFactory$Creator4JacksonDeserializer53fd30f2.createUsingDefault(io/dropwizard/logging/DefaultLoggingFactory$Creator4JacksonDeserializer.java) at com.fasterxml.jackson.databind.deser.BeanDeserializer.vanillaDeserialize(BeanDeserializer.java:277) at com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeOther(BeanDeserializer.java:189) at com.fasterxml.jackson.module.afterburner.deser.SuperSonicBeanDeserializer.deserialize(SuperSonicBeanDeserializer.java:120) at com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer._deserializeTypedUsingDefaultImpl(AsPropertyTypeDeserializer.java:178) at com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer.deserializeTypedFromObject(AsPropertyTypeDeserializer.java:105) at com.fasterxml.jackson.databind.deser.AbstractDeserializer.deserializeWithType(AbstractDeserializer.java:254) at com.fasterxml.jackson.databind.deser.impl.MethodProperty.deserializeAndSet(MethodProperty.java:138) at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:252) at com.fasterxml.jackson.module.afterburner.deser.SuperSonicBeanDeserializer.deserialize(SuperSonicBeanDeserializer.java:155) at com.fasterxml.jackson.databind.ObjectMapper._readValue(ObjectMapper.java:4173) at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2467) at io.dropwizard.configuration.BaseConfigurationFactory.build(BaseConfigurationFactory.java:127) ... 6 more Caused by: java.lang.IllegalStateException: Unable to acquire the logger context at io.dropwizard.logging.LoggingUtil.getLoggerContext(LoggingUtil.java:46) at io.dropwizard.logging.DefaultLoggingFactory.<init>(DefaultLoggingFactory.java:77) ... 19 more ------------------------------------------------ ------------------------------------------------ When I analyzed the problem, it seems to be a logging-related issue, but I do not know how to fix it. Can I get guidance on how to fix it?

Matt

02/18/2021, 5:33 PM

Is there a way to spread the replicas per partition to different AZs? I would like the replicas to be in a different host on different AZ for HA.

Nick Bowles

02/18/2021, 5:50 PM

Hey team I created a table with this in it to attempt to use the

minion

component to ingest data. When doing a POST at tasks/schedule, it looks like the minions are doing something (talks about using AVRO in logs) but they’ll either just hang, or error out. Any insights? I also made these changes: controller.task.scheduler.enabled=true minion config:

Copy code

pinot.set.instance.id.to.hostname=true
      <http://pinot.minion.storage.factory.class.gs|pinot.minion.storage.factory.class.gs>=org.apache.pinot.plugin.filesystem.GcsPinotFS
      pinot.minion.storage.factory.gs.projectId=REDACTED
      pinot.minion.storage.factory.gs.gcpKey=REDACTED
      pinot.minion.segment.fetcher.protocols=file,http,gs
      pinot.minion.segment.fetcher.gs.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher
      plugins.include=pinot-gcs

Added auth key to controller, server, and minion (auth worked before ssh’ing into server and running a job)

Untitled

Matt

02/19/2021, 12:34 AM

Hello, I set the controller config as per the documentation. However controller is not starting up and throwing error.

Copy code

controller.realtime.segment.validation.frequencyInSeconds=900
controller.broker.resource.validation.frequencyInSeconds=900

2021/02/18 14:46:44.389 ERROR [StartServiceManagerCommand] [main] Failed to start a Pinot [CONTROLLER] at 39.246 since launch
java.lang.NumberFormatException: For input string: "[300, 900]"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) ~[?:1.8.0_282]
at java.lang.Integer.parseInt(Integer.java:580) ~[?:1.8.0_282]

Aaron Wishnick

02/19/2021, 5:30 PM

Is anybody using Pinot with an on-prem S3-like filesystem rather than AWS' S3? I am doing this and trying to run a batch ingest, and I get this error:

Copy code

Got exception to kick off standalone data ingestion job -                                                                                             
java.lang.RuntimeException: Caught exception during running - org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner           
        at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.kickoffIngestionJob(IngestionJobLauncher.java:144) ~[pinot-all-0.7.0-SNAPSHOT-jar
-with-dependencies.jar:0.7.0-SNAPSHOT-7ac8650777d6b25c8cae4ca1bd5460f25488a694]                                                                       
        at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.runIngestionJob(IngestionJobLauncher.java:113) ~[pinot-all-0.7.0-SNAPSHOT-jar-wit
h-dependencies.jar:0.7.0-SNAPSHOT-7ac8650777d6b25c8cae4ca1bd5460f25488a694]                                                                           
        at org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand.execute(LaunchDataIngestionJobCommand.java:132) [pinot-all-0.7.0-SNAPSHO
T-jar-with-dependencies.jar:0.7.0-SNAPSHOT-7ac8650777d6b25c8cae4ca1bd5460f25488a694]                                                                  
        at org.apache.pinot.tools.admin.PinotAdministrator.execute(PinotAdministrator.java:164) [pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.
7.0-SNAPSHOT-7ac8650777d6b25c8cae4ca1bd5460f25488a694]                                                                                                
        at org.apache.pinot.tools.admin.PinotAdministrator.main(PinotAdministrator.java:184) [pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0
-SNAPSHOT-7ac8650777d6b25c8cae4ca1bd5460f25488a694]                                                                                                   
Caused by: java.io.IOException: software.amazon.awssdk.services.s3.model.S3Exception: The AWS Access Key Id you provided does not exist in our records
. (Service: S3, Status Code: 403, Request ID: 0306422796023ADB, Extended Request ID: njXFdh82iDAWK78LUjRq1SCfJDgSD0Dcr9EhworrYh4CT7X0ZsPFVmHl2TUSmLK9e
P/EyAwhAm8=)                                                                                                                                          
        at org.apache.pinot.plugin.filesystem.S3PinotFS.mkdir(S3PinotFS.java:308) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-
7ac8650777d6b25c8cae4ca1bd5460f25488a694]                                                                                                             
        at org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner.run(SegmentGenerationJobRunner.java:127) ~[pinot-batch-ingest
ion-standalone-0.7.0-SNAPSHOT-shaded.jar:0.7.0-SNAPSHOT-7ac8650777d6b25c8cae4ca1bd5460f25488a694]                                                     
        at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.kickoffIngestionJob(IngestionJobLauncher.java:142) ~[pinot-all-0.7.0-SNAPSHOT-jar
-with-dependencies.jar:0.7.0-SNAPSHOT-7ac8650777d6b25c8cae4ca1bd5460f25488a694]                                                                       
        ... 4 more

Nick Bowles

02/19/2021, 9:35 PM

Ok so an update on using the minions to ingest, after small changes I see this in the logs. The tar.gz file exists in the bucket, but it looks like it tries to push anyways to the path

/segments/blah.tar.gz

. Not sure if this is a path on the controller, or if it’s supposed to be the bucket. Any ideas?

Untitled

Ken Krugler

02/19/2021, 11:05 PM

I ran a query designed to cause problems for the cluster (

select distinctcount(<super-high cardinality column>) from table

), and it did. The request timed out, even though I gave it a 100,000ms timeout, and now all queries (e.g. select * from crawldata limit 20) time out. I’ve looked at the controller/broker/sample of server logs, and don’t see any errors. In the broker log it looks like it’s getting no responses from servers:

Copy code

2021/02/19 22:21:53.860 INFO [BaseBrokerRequestHandler] [jersey-server-managed-async-executor-59] requestId=41163,table=crawldata_OFFLINE,timeMs=10000,docs=0/0,entries=0/0,segments(queried/processed/matched/consuming/unavailable):0/0/0/0/0,consumingFreshnessTimeMs=0,servers=0/5,groupLimitReached=false,brokerReduceTimeMs=0,exceptions=0,serverStats=(Server=SubmitDelayMs,ResponseDelayMs,ResponseSize,DeserializationTimeMs);116.202.83.208_O=0,-1,0,0;168.119.147.123_O=0,-1,0,0;168.119.147.125_O=1,-1,0,0;168.119.147.124_O=1,-1,0,0;116.202.52.154_O=1,-1,0,0,query=select * from crawldata limit 20

But an example server log has this:

Copy code

2021/02/19 22:21:43.864 INFO [QueryScheduler] [pqr-11] Processed requestId=41163,table=crawldata_OFFLINE,segments(queried/processed/matched/consuming)=213/1/1/-1,schedulerWaitMs=0,reqDeserMs=0,totalExecMs=2,resSerMs=1,totalTimeMs=3,minConsumingFreshnessMs=-1,broker=Broker_168.119.147.124_8099,numDocsScanned=20,scanInFilter=0,scanPostFilter=620,sched=fcfs

Trying to figure out which process or processes are borked because of the query, and why. Any ideas? Thanks!

Fabrício Dutra

02/21/2021, 11:17 PM

Hi team, I'm trying to setup MinIO to be used as deep store for Pinot. After including the extra configs for controller and server I'm getting this kind of error on controller which refuses connection to port 9000. Does anyone have any ideia how to fix it? controller:

Copy code

name: controller
  port: 9000
  replicaCount: 1

  persistence:
    enabled: true
    accessMode: ReadWriteOnce
    size: 1G
    mountPath: /var/pinot/controller/data
    storageClass: "csi-cinder-high-speed"

  data: 
    #dir: /var/pinot/controller/data
    #dir: <http://minio-svc.deepstorage.svc.cluster.local:9000/pinot/segment-store>
    dir: pinot/segment-store


  vip:
    enabled: false
    host: pinot-controller
    port: 9000

  # with monitoring
  #jvmOpts: "-Xms256M -Xmx1G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -Xloggc:/opt/pinot/gc-pinot-controller.log -javaagent:/opt/pinot/etc/jmx_prometheus_javaagent/jmx_prometheus_javaagent-0.12.0.jar=8080:/opt/pinot/etc/jmx_prometheus_javaagent/configs/pinot.yml"

  #without monitoring
  jvmOpts: "-Xms256M -Xmx1G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -Xloggc:/opt/pinot/gc-pinot-controller.log"

  log4j2ConfFile: /opt/pinot/conf/pinot-controller-log4j2.xml
  pluginsDir: /opt/pinot/plugins
  pluginsInclude: pinot-s3,kafka-2.0

  service:
    annotations:
      "<http://prometheus.io/scrape|prometheus.io/scrape>": "true"
      "<http://prometheus.io/port|prometheus.io/port>": "8080"
    clusterIP: ""
    externalIPs: []
    loadBalancerIP: ""
    loadBalancerSourceRanges: []
    type: ClusterIP
    port: 9000
    nodePort: ""

  external:
    enabled: false
    type: LoadBalancer
    port: 9000

  resources: {}

  nodeSelector: {}

  tolerations: []

  affinity: {}

  podAnnotations:
    "<http://prometheus.io/scrape|prometheus.io/scrape>": "true"
    "<http://prometheus.io/port|prometheus.io/port>": "8080"

  updateStrategy:
    type: RollingUpdate

  # Extra configs will be appended to pinot-controller.conf file
  extra:
    configs: |-
      pinot.set.instance.id.to.hostname=true
      controller.local.temp.dir=/tmp/pinot-tmp-data/
      pinot.controller.storage.factory.class.s3=org.apache.pinot.plugin.filesystem.S3PinotFS
      pinot.controller.storage.factory.s3.endpoint=10.3.120.223:9000
      pinot.controller.storage.factory.s3.accessKey=***
      pinot.controller.storage.factory.s3.secretKey=***
      pinot.controller.segment.fetcher.protocols=file,http,s3
      pinot.controller.segment.fetcher.s3.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher

server:

Copy code

name: server

  ports:
    netty: 8098
    admin: 8097

  replicaCount: 1

  dataDir: /var/pinot/server/data/index
  segmentTarDir: /var/pinot/server/data/segment

  persistence:
    enabled: true
    accessMode: ReadWriteOnce
    size: 4G
    mountPath: /var/pinot/server/data
    storageClass: "csi-cinder-high-speed"
    #storageClass: "ssd"

  jvmOpts: "-Xms512M -Xmx1G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -Xloggc:/opt/pinot/gc-pinot-server.log -javaagent:/opt/pinot/etc/jmx_prometheus_javaagent/jmx_prometheus_javaagent-0.12.0.jar=8080:/opt/pinot/etc/jmx_prometheus_javaagent/configs/pinot.yml"

  log4j2ConfFile: /opt/pinot/conf/pinot-server-log4j2.xml
  pluginsDir: /opt/pinot/plugins
  pluginsInclude: pinot-s3,kafka-2.0

  service:
    annotations: 
      "<http://prometheus.io/scrape|prometheus.io/scrape>": "true"
      "<http://prometheus.io/port|prometheus.io/port>": "8080"
    clusterIP: ""
    externalIPs: []
    loadBalancerIP: ""
    loadBalancerSourceRanges: []
    type: ClusterIP
    port: 8098
    nodePort: ""

  resources: {}

  nodeSelector: {}

  affinity: {}

  tolerations: []

  podAnnotations: 
    "<http://prometheus.io/scrape|prometheus.io/scrape>": "true"
    "<http://prometheus.io/port|prometheus.io/port>": "8080"

  updateStrategy:
    type: RollingUpdate

  # Extra configs will be appended to pinot-server.conf file
  # Do not remove pinot.server.storage.factory.s3.region=us-east-1
  extra:
    configs: |-
      pinot.set.instance.id.to.hostname=true
      pinot.server.instance.realtime.alloc.offheap=true
      pinot.server.storage.factory.class.s3=org.apache.pinot.plugin.filesystem.S3PinotFS
      pinot.controller.storage.factory.s3.endpoint=10.3.120.223:9000
      pinot.server.storage.factory.s3.region=us-east-1
      pinot.server.segment.fetcher.protocols=file,http,s3
      pinot.server.segment.fetcher.s3.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher

Jai Patel

02/23/2021, 6:04 PM

I’m having some trouble with upserts where a query through the Pinot UI will sometimes return the latest row, sometimes it’ll return all rows. Query:

Copy code

select * from enriched_customer_orders_jp_upsert_realtime_streaming_v1
where normalized_order_id='62:1221247' and ofo_slug='fofo' and store_id='73f6975b-07e8-407a-97a1-580043094a68'
limit 10

Table Spec:

Copy code

{
  "REALTIME": {
    "tableName": "enriched_customer_orders_jp_upsert_realtime_streaming_v1_REALTIME",
    "tableType": "REALTIME",
    "segmentsConfig": {
      "segmentAssignmentStrategy": "BalanceNumSegmentAssignmentStrategy",
      "timeColumnName": "updated_at_seconds",
      "retentionTimeUnit": "DAYS",
      "retentionTimeValue": "30",
      "segmentPushType": "APPEND",
      "replicasPerPartition": "3",
      "schemaName": "enriched_customer_orders_jp_upsert_realtime_streaming_v1"
    },
    "tenants": {
      "broker": "DefaultTenant",
      "server": "DefaultTenant"
    },
    "tableIndexConfig": {
      "createInvertedIndexDuringSegmentGeneration": true,
      "bloomFilterColumns": [
        "Filter1",
        "Filter2"
      ],
      "loadMode": "MMAP",
      "streamConfigs": {
        "streamType": "kafka",
        "stream.kafka.consumer.type": "LowLevel",
        "stream.kafka.topic.name": "topic-topic-topic-topic-topic",
        "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.inputformat.avro.confluent.KafkaConfluentSchemaRegistryAvroMessageDecoder",
        "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
        "stream.kafka.broker.list": "kafka-host:9092",
        "realtime.segment.flush.threshold.size": "1000",
        "realtime.segment.flush.threshold.rows": "1000",
        "realtime.segment.flush.threshold.time": "6h",
        "realtime.segment.flush.desired.size": "200M",
        "isolation.level": "read_committed",
        "stream.kafka.consumer.prop.auto.offset.reset": "smallest",
        "stream.kafka.consumer.prop.group.id": "enriched_customer_orders_jp_upsert_realtime_streaming_v1_8F6C7BAF-EEA7-441F-ABE3-50BF5F2C4F0A",
        "stream.kafka.consumer.prop.client.id": "v1_732F3C29-4CDA-45AA-85F1-740A0176C6A5",
        "stream.kafka.decoder.prop.schema.registry.rest.url": "<http://confluent-host:8081>"
      },
      "enableDefaultStarTree": false,
      "enableDynamicStarTreeCreation": false,
      "aggregateMetrics": true,
      "nullHandlingEnabled": false,
      "invertedIndexColumns": [
        "store_id"
      ],
      "autoGeneratedInvertedIndex": false
    },
    "metadata": {},
    "routing": {
      "instanceSelectorType": "strictReplicaGroup"
    },
    "upsertConfig": {
      "mode": "FULL"
    }
  }
}

Simplification of our schema. There are a lot of other columns. But trimmed to something that would fit (kept all keys).

Copy code

{
  "schemaName": "enriched_customer_orders_jp_upsert_realtime_streaming_v1",
  "dimensionFieldSpecs": [
    {
      "name": "store_id",
      "dataType": "STRING"
    },
    {
      "name": "updated_at",
      "dataType": "LONG",
      "defaultNullValue": 0
    },
    {
      "name": "normalized_order_id",
      "dataType": "STRING"
    },
    {
      "name": "ofo_slug",
      "dataType": "STRING"
    }
  ],
  "metricFieldSpecs": [
    {
      "name": "usd_exchange_rate",
      "dataType": "DOUBLE"
    },
    {
      "name": "total",
      "dataType": "DOUBLE"
    }
  ],
  "dateTimeFieldSpecs": [
    {
      "name": "updated_at_seconds",
      "dataType": "LONG",
      "defaultNullValue": 0,
      "transformFunction": "toEpochSeconds(updated_at)",
      "format": "1:MILLISECONDS:EPOCH",
      "granularity": "1:SECONDS"
    }
  ],
  "primaryKeyColumns": [
    "ofo_slug",
    "store_id",
    "normalized_order_id"
  ]
}

Our kafka key is:

store_id::ofo_slug::normalized_order_id

as a concatenation.

🙏 1

Matt

02/23/2021, 6:51 PM

Hello, I have 3 Pinot servers with 4 cores and 48Gi each and using realtime table. I noticed that when the load/flow increases there is a lag in the search results (Inverted Index). Once the load is reduced Pinot will catch up. CPU and MEM usage all looks normal. Wondering why this is happening. Are there any settings to make Pinot servers to process faster?

Nick Bowles

02/24/2021, 12:08 AM

Trying to run a fairly simply query, borrowing from the docs, and anytime I try to do any sort of grouping on the date field I get an error. Grouping for others works. Thanks in advance!

Copy code

SELECT COUNT(*)
FROM mytable
GROUP BY DATETIMECONVERT(item_date, '1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd', '1:WEEKS:EPOCH', '1:WEEKS')

error

Copy code

"errorCode": 200,
    "message": "QueryExecutionError:\norg.apache.pinot.core.query.exception.BadQueryRequestException: Caught exception while initializing transform function: datetimeconvert\n\tat org.apache.pinot.core.operator.transform.function.TransformFunctionFactory.get(TransformFunctionFactory.java:207)\n\tat

table date config

Copy code

"dateTimeFieldSpecs": [
    {
      "name": "item_date",
      "dataType": "STRING",
      "format": "1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd",
      "granularity": "1:DAYS"
    }
  ]

Ricardo Bordón

02/24/2021, 11:19 AM

Hi folks! QQ, by reading the contribution guidelines I've found out a link to https://pinot.readthedocs.io/en/latest/dev_env.html#dev-setup which seems to be broken (the same happens with https://pinot.readthedocs.io/en/latest/code_modules.html#code-modules). Where can I find such references? Thanks!

Gergely Lendvai

02/24/2021, 4:54 PM

I saw this other thread which may be related, since I’m also using a jdk 11 based docker image. Could you help me with this?

Nick Bowles

02/24/2021, 6:27 PM

Running a query and I get different results for one field each time I run it. Any explanation for why this might happen?

Copy code

SELECT id, week(fromDateTime(datefield, 'yyyy-MM-dd')) as week, SUM(f1 * f2 * f3) AS "ftotal" ****THIS IS THE FIELD THAT RETURNS DIFFERENT RESULTS****
FROM mytable
WHERE year(fromDateTime(datefield, 'yyyy-MM-dd')) >= cast(year(now())-2 as long)
  AND id in (123, 1234, 12345) AND f4 = 'blah' AND f5 in ('1-2', '3-4') AND f6 in ('foo', 'bar')
GROUP BY 1, 2

Jai Patel

02/25/2021, 6:54 PM

Is it a known behavior for table deletes of a realtime-only upsert tables being long-running in Pinot 0.6? Every realtime upsert table I’ve tried to delete takes a long time and eventually times out.

Josh Highley

02/26/2021, 5:21 PM

New to Pinot. We're using the Pinot Docker images. We've created offline tables successfully, but can't create a realtime table. The segment status is 'bad'. There's no error messages in the logs for the broker, controller, or server so I'm stuck on how to debug this?

Nick Bowles

02/26/2021, 8:53 PM

Untitled

Chundong Wang

02/26/2021, 11:16 PM

Ran into

IllegalStateException

when using string functions in where clause. 😢

Phúc Huỳnh

03/01/2021, 5:31 AM

Hello, i’m having some trouble with minion base on following docs: https://docs.pinot.apache.org/operators/operating-pinot/pinot-managed-offline-flows I want to convert realtime table to offline table. but minions show errors. Here are errors logs:

Josh Highley

03/01/2021, 7:56 PM

After looking at the source some more, I found that the highlevel tables copy properties at a very different level than lowlevel. For example, a highlevel table prop "stream.kafka.consumer.prop.security.protocol":"SASL_SSL" has to be "security.protocol":"SASL_SSL" in streamConfigs { } for lowlevel

Slackbot

03/02/2021, 10:25 PM

This message was deleted.

Josh Highley

03/03/2021, 1:16 AM

I have a realtime table configured for upsert, so a primary key in the schema. If I delete the table, then re-create it with the SAME name, then inserted records in the NEW table will not be returned by a query if they have an earlier timestamp than what the same records had in the deleted table (same record by primary key). The records in the new table are reflected in the query stats (and only the new records) but they aren't returned by the query if they have an earlier timestamp. Is there more I need to delete besides the table? New segments are being created when I create the new table

Elon

03/03/2021, 6:53 PM

Hi, we have an issue where the pinot servers are in a crash loop, they cannot start up. The servers are spewing tons of messages like :

Copy code

[HelixTaskExecutor] [ZkClient-EventThread-23-pinot-us-central1-zookeeper:2181] SessionId does NOT match. expected sessionId: 300000c69e5009a, tgtSessionId in message: 300000c69e50099, messageId: 9d191304-00cc-4138-bb57-7997a960fab0

Xiang Fu

03/04/2021, 12:30 AM

I think the in clause should use single quote

Mohammed Galalen

03/04/2021, 11:21 AM

Hi, I faced this error when trying to do BatchIngestion from the local file system

Failed to generate Pinot segment for file - file:data/orders.csv

java.lang.NumberFormatException: For input string: "2019-05-02 17:49:53"

here is the dateTimeFieldSpecs in the schema file:

Copy code

"dateTimeFieldSpecs": [
        {
            "dataType": "STRING",
            "name": "start_date",
            "format": "1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss",
            "granularity": "1:DAYS"
        },
        {
            "dataType": "STRING",
            "name": "end_date",
            "format": "1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss",
            "granularity": "1:DAYS"
        },
        {
            "dataType": "STRING",
            "name": "created_at",
            "format": "1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss",
            "granularity": "1:DAYS"
        },
        {
            "dataType": "STRING",
            "name": "updated_at",
            "format": "1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss",
            "granularity": "1:DAYS"
        }
    ]

Fabrício Dutra

03/04/2021, 3:03 PM

Hi all, I'm trying to ingest data from kafka using a topic that doesnt has a datetime column and receving this error:

Copy code

{
  "code": 400,
  "error": "Schema should not be null for REALTIME table"
}

I'm using this spec:

Copy code

curl -X POST "<http://localhost:9000/tables>" -H "accept: application/json" -H "Content-Type: application/json" -d "{ \"tableName\": \"realtime_strimzi_dev_acks\", \"tableType\": \"REALTIME\", \"segmentsConfig\": {  \"segmentPushType\": \"REFRESH\", \"schemaName\": \"sch_strimzi_acks\", \"replication\": \"1\", \"replicasPerPartition\": \"1\" }, \"tenants\": {}, \"tableIndexConfig\": { \"loadMode\": \"MMAP\", \"invertedIndexColumns\": [ \"column1\" ], \"streamConfigs\": { \"streamType\": \"kafka\", \"stream.kafka.consumer.type\": \"lowlevel\", \"stream.kafka.topic.name\": \"producer-test-strimzi-dev-acks-0\", \"stream.kafka.decoder.class.name\": \"org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder\", \"stream.kafka.consumer.factory.class.name\": \"org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory\", \"stream.kafka.broker.list\": \"edh-kafka-brokers.ingestion.svc.Cluster.local:9092\", \"realtime.segment.flush.threshold.time\": \"3600000\", \"realtime.segment.flush.threshold.size\": \"50000\", \"stream.kafka.consumer.prop.auto.offset.reset\": \"smallest\" } }, \"metadata\": { \"customConfigs\": {} }}"

Is there a way to create a realtime table autofilling/creating a datetime column?