https://pinot.apache.org/ logo
Join Slack
Powered by
# troubleshooting
  • j

    jose farfan

    02/13/2021, 3:38 AM
    but : select * from transaction_line limit 2147483645 it is working
    k
    k
    x
    • 4
    • 28
  • t

    Tamás Nádudvari

    02/13/2021, 11:09 AM
    Hi, we have a realtime table that consumes a Kafka topic and creates new segments every hour (
    realtime.segment.flush.threshold.time: "1h"
    ). Its replica is set to 2, and when I query the number of documents on a not too recent interval, I can see two different numbers alternating. I understand that the Kafka offsets of the two servers consuming the same partition can drift. But when I select an interval that’s couple hours from the current time, so presumably querying from a finished/closed segment, I’m still facing the same issue. According to the docs, shouldn’t the other replica, which has less records consumed, acquire the segment with the more documents in it after it’s closed?
    k
    x
    s
    • 4
    • 13
  • r

    Ravi Singal

    02/15/2021, 5:11 PM
    Hi, How should we manage idealstate znode for a table having a large number of segments? One of our real time table has more than 18k segments and zookeeper node size is greater than 3.2 MB. zookeper is supposed to have nodes of size less than 1 MB. will it have a negative impact on pinot controller performance when it read or write (during segment completion) ideal state of the table?
    k
    • 2
    • 1
  • n

    Nick Bowles

    02/17/2021, 2:49 AM
    Perfect thank you. I’m happy to help write some of that documentation. I know the documentation is here I believe. Is there a list of any outstanding things like the minion that need to be added/priority?
    x
    • 2
    • 2
  • m

    minwoo jung

    02/18/2021, 4:05 AM
    Hello~ When thirdeye is executed using helm, if the following message is displayed, it does not work. The same problem occurs when using the master branch, 0.6.0 release branch. ------------------------------------------------ ------------------------------------------------ Running Thirdeye frontend config: ./config/pinot-quickstart log4j:WARN No appenders could be found for logger (org.apache.pinot.thirdeye.dashboard.ThirdEyeDashboardApplication). log4j:WARN Please initialize the log4j system properly. [2021-02-18 122539] INFO [main] o.h.v.i.u.Version - HV000001: Hibernate Validator null io.dropwizard.configuration.ConfigurationParsingException: ./config/pinot-quickstart/dashboard.yml has an error: * Failed to parse configuration at: logging; Cannot construct instance of
    io.dropwizard.logging.DefaultLoggingFactory
    , problem: Unable to acquire the logger context at [Source: UNKNOWN; line: -1, column: -1] (through reference chain: org.apache.pinot.thirdeye.dashboard.ThirdEyeDashboardConfiguration["logging"]) at io.dropwizard.configuration.ConfigurationParsingException$Builder.build(ConfigurationParsingException.java:279) at io.dropwizard.configuration.BaseConfigurationFactory.build(BaseConfigurationFactory.java:156) at io.dropwizard.configuration.BaseConfigurationFactory.build(BaseConfigurationFactory.java:89) at io.dropwizard.cli.ConfiguredCommand.parseConfiguration(ConfiguredCommand.java:126) at io.dropwizard.cli.ConfiguredCommand.run(ConfiguredCommand.java:74) at io.dropwizard.cli.Cli.run(Cli.java:78) at io.dropwizard.Application.run(Application.java:93) at org.apache.pinot.thirdeye.dashboard.ThirdEyeDashboardApplication.main(ThirdEyeDashboardApplication.java:200) Caused by: com.fasterxml.jackson.databind.exc.ValueInstantiationException: Cannot construct instance of
    io.dropwizard.logging.DefaultLoggingFactory
    , problem: Unable to acquire the logger context at [Source: UNKNOWN; line: -1, column: -1] (through reference chain: org.apache.pinot.thirdeye.dashboard.ThirdEyeDashboardConfiguration["logging"]) at com.fasterxml.jackson.databind.exc.ValueInstantiationException.from(ValueInstantiationException.java:47) at com.fasterxml.jackson.databind.DeserializationContext.instantiationException(DeserializationContext.java:1732) at com.fasterxml.jackson.databind.deser.std.StdValueInstantiator.wrapAsJsonMappingException(StdValueInstantiator.java:491) at com.fasterxml.jackson.databind.deser.std.StdValueInstantiator.rewrapCtorProblem(StdValueInstantiator.java:514) at com.fasterxml.jackson.module.afterburner.deser.OptimizedValueInstantiator._handleInstantiationProblem(OptimizedValueInstantiator.java:59) at io.dropwizard.logging.DefaultLoggingFactory$Creator4JacksonDeserializer53fd30f2.createUsingDefault(io/dropwizard/logging/DefaultLoggingFactory$Creator4JacksonDeserializer.java) at com.fasterxml.jackson.databind.deser.BeanDeserializer.vanillaDeserialize(BeanDeserializer.java:277) at com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeOther(BeanDeserializer.java:189) at com.fasterxml.jackson.module.afterburner.deser.SuperSonicBeanDeserializer.deserialize(SuperSonicBeanDeserializer.java:120) at com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer._deserializeTypedUsingDefaultImpl(AsPropertyTypeDeserializer.java:178) at com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer.deserializeTypedFromObject(AsPropertyTypeDeserializer.java:105) at com.fasterxml.jackson.databind.deser.AbstractDeserializer.deserializeWithType(AbstractDeserializer.java:254) at com.fasterxml.jackson.databind.deser.impl.MethodProperty.deserializeAndSet(MethodProperty.java:138) at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:252) at com.fasterxml.jackson.module.afterburner.deser.SuperSonicBeanDeserializer.deserialize(SuperSonicBeanDeserializer.java:155) at com.fasterxml.jackson.databind.ObjectMapper._readValue(ObjectMapper.java:4173) at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2467) at io.dropwizard.configuration.BaseConfigurationFactory.build(BaseConfigurationFactory.java:127) ... 6 more Caused by: java.lang.IllegalStateException: Unable to acquire the logger context at io.dropwizard.logging.LoggingUtil.getLoggerContext(LoggingUtil.java:46) at io.dropwizard.logging.DefaultLoggingFactory.<init>(DefaultLoggingFactory.java:77) ... 19 more ------------------------------------------------ ------------------------------------------------ When I analyzed the problem, it seems to be a logging-related issue, but I do not know how to fix it. Can I get guidance on how to fix it?
    k
    s
    • 3
    • 5
  • m

    Matt

    02/18/2021, 5:33 PM
    Is there a way to spread the replicas per partition to different AZs? I would like the replicas to be in a different host on different AZ for HA.
    n
    s
    • 3
    • 11
  • n

    Nick Bowles

    02/18/2021, 5:50 PM
    Hey team I created a table with this in it to attempt to use the
    minion
    component to ingest data. When doing a POST at tasks/schedule, it looks like the minions are doing something (talks about using AVRO in logs) but they’ll either just hang, or error out. Any insights? I also made these changes: controller.task.scheduler.enabled=true minion config:
    Copy code
    pinot.set.instance.id.to.hostname=true
          <http://pinot.minion.storage.factory.class.gs|pinot.minion.storage.factory.class.gs>=org.apache.pinot.plugin.filesystem.GcsPinotFS
          pinot.minion.storage.factory.gs.projectId=REDACTED
          pinot.minion.storage.factory.gs.gcpKey=REDACTED
          pinot.minion.segment.fetcher.protocols=file,http,gs
          pinot.minion.segment.fetcher.gs.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher
          plugins.include=pinot-gcs
    Added auth key to controller, server, and minion (auth worked before ssh’ing into server and running a job)
    Untitled
    d
    t
    • 3
    • 42
  • m

    Matt

    02/19/2021, 12:34 AM
    Hello, I set the controller config as per the documentation. However controller is not starting up and throwing error.
    Copy code
    controller.realtime.segment.validation.frequencyInSeconds=900
    controller.broker.resource.validation.frequencyInSeconds=900
    
    2021/02/18 14:46:44.389 ERROR [StartServiceManagerCommand] [main] Failed to start a Pinot [CONTROLLER] at 39.246 since launch
    java.lang.NumberFormatException: For input string: "[300, 900]"
    at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) ~[?:1.8.0_282]
    at java.lang.Integer.parseInt(Integer.java:580) ~[?:1.8.0_282]
    d
    x
    k
    • 4
    • 21
  • a

    Aaron Wishnick

    02/19/2021, 5:30 PM
    Is anybody using Pinot with an on-prem S3-like filesystem rather than AWS' S3? I am doing this and trying to run a batch ingest, and I get this error:
    Copy code
    Got exception to kick off standalone data ingestion job -                                                                                             
    java.lang.RuntimeException: Caught exception during running - org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner           
            at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.kickoffIngestionJob(IngestionJobLauncher.java:144) ~[pinot-all-0.7.0-SNAPSHOT-jar
    -with-dependencies.jar:0.7.0-SNAPSHOT-7ac8650777d6b25c8cae4ca1bd5460f25488a694]                                                                       
            at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.runIngestionJob(IngestionJobLauncher.java:113) ~[pinot-all-0.7.0-SNAPSHOT-jar-wit
    h-dependencies.jar:0.7.0-SNAPSHOT-7ac8650777d6b25c8cae4ca1bd5460f25488a694]                                                                           
            at org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand.execute(LaunchDataIngestionJobCommand.java:132) [pinot-all-0.7.0-SNAPSHO
    T-jar-with-dependencies.jar:0.7.0-SNAPSHOT-7ac8650777d6b25c8cae4ca1bd5460f25488a694]                                                                  
            at org.apache.pinot.tools.admin.PinotAdministrator.execute(PinotAdministrator.java:164) [pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.
    7.0-SNAPSHOT-7ac8650777d6b25c8cae4ca1bd5460f25488a694]                                                                                                
            at org.apache.pinot.tools.admin.PinotAdministrator.main(PinotAdministrator.java:184) [pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0
    -SNAPSHOT-7ac8650777d6b25c8cae4ca1bd5460f25488a694]                                                                                                   
    Caused by: java.io.IOException: software.amazon.awssdk.services.s3.model.S3Exception: The AWS Access Key Id you provided does not exist in our records
    . (Service: S3, Status Code: 403, Request ID: 0306422796023ADB, Extended Request ID: njXFdh82iDAWK78LUjRq1SCfJDgSD0Dcr9EhworrYh4CT7X0ZsPFVmHl2TUSmLK9e
    P/EyAwhAm8=)                                                                                                                                          
            at org.apache.pinot.plugin.filesystem.S3PinotFS.mkdir(S3PinotFS.java:308) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-
    7ac8650777d6b25c8cae4ca1bd5460f25488a694]                                                                                                             
            at org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner.run(SegmentGenerationJobRunner.java:127) ~[pinot-batch-ingest
    ion-standalone-0.7.0-SNAPSHOT-shaded.jar:0.7.0-SNAPSHOT-7ac8650777d6b25c8cae4ca1bd5460f25488a694]                                                     
            at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.kickoffIngestionJob(IngestionJobLauncher.java:142) ~[pinot-all-0.7.0-SNAPSHOT-jar
    -with-dependencies.jar:0.7.0-SNAPSHOT-7ac8650777d6b25c8cae4ca1bd5460f25488a694]                                                                       
            ... 4 more
    n
    x
    • 3
    • 16
  • n

    Nick Bowles

    02/19/2021, 9:35 PM
    Ok so an update on using the minions to ingest, after small changes I see this in the logs. The tar.gz file exists in the bucket, but it looks like it tries to push anyways to the path
    /segments/blah.tar.gz
    . Not sure if this is a path on the controller, or if it’s supposed to be the bucket. Any ideas?
    Untitled
    x
    • 2
    • 22
  • k

    Ken Krugler

    02/19/2021, 11:05 PM
    I ran a query designed to cause problems for the cluster (
    select distinctcount(<super-high cardinality column>) from table
    ), and it did. The request timed out, even though I gave it a 100,000ms timeout, and now all queries (e.g. select * from crawldata limit 20) time out. I’ve looked at the controller/broker/sample of server logs, and don’t see any errors. In the broker log it looks like it’s getting no responses from servers:
    Copy code
    2021/02/19 22:21:53.860 INFO [BaseBrokerRequestHandler] [jersey-server-managed-async-executor-59] requestId=41163,table=crawldata_OFFLINE,timeMs=10000,docs=0/0,entries=0/0,segments(queried/processed/matched/consuming/unavailable):0/0/0/0/0,consumingFreshnessTimeMs=0,servers=0/5,groupLimitReached=false,brokerReduceTimeMs=0,exceptions=0,serverStats=(Server=SubmitDelayMs,ResponseDelayMs,ResponseSize,DeserializationTimeMs);116.202.83.208_O=0,-1,0,0;168.119.147.123_O=0,-1,0,0;168.119.147.125_O=1,-1,0,0;168.119.147.124_O=1,-1,0,0;116.202.52.154_O=1,-1,0,0,query=select * from crawldata limit 20
    But an example server log has this:
    Copy code
    2021/02/19 22:21:43.864 INFO [QueryScheduler] [pqr-11] Processed requestId=41163,table=crawldata_OFFLINE,segments(queried/processed/matched/consuming)=213/1/1/-1,schedulerWaitMs=0,reqDeserMs=0,totalExecMs=2,resSerMs=1,totalTimeMs=3,minConsumingFreshnessMs=-1,broker=Broker_168.119.147.124_8099,numDocsScanned=20,scanInFilter=0,scanPostFilter=620,sched=fcfs
    Trying to figure out which process or processes are borked because of the query, and why. Any ideas? Thanks!
    x
    j
    • 3
    • 19
  • f

    Fabrício Dutra

    02/21/2021, 11:17 PM
    Hi team, I'm trying to setup MinIO to be used as deep store for Pinot. After including the extra configs for controller and server I'm getting this kind of error on controller which refuses connection to port 9000. Does anyone have any ideia how to fix it? controller:
    Copy code
    name: controller
      port: 9000
      replicaCount: 1
    
      persistence:
        enabled: true
        accessMode: ReadWriteOnce
        size: 1G
        mountPath: /var/pinot/controller/data
        storageClass: "csi-cinder-high-speed"
    
      data: 
        #dir: /var/pinot/controller/data
        #dir: <http://minio-svc.deepstorage.svc.cluster.local:9000/pinot/segment-store>
        dir: pinot/segment-store
    
    
      vip:
        enabled: false
        host: pinot-controller
        port: 9000
    
      # with monitoring
      #jvmOpts: "-Xms256M -Xmx1G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -Xloggc:/opt/pinot/gc-pinot-controller.log -javaagent:/opt/pinot/etc/jmx_prometheus_javaagent/jmx_prometheus_javaagent-0.12.0.jar=8080:/opt/pinot/etc/jmx_prometheus_javaagent/configs/pinot.yml"
    
      #without monitoring
      jvmOpts: "-Xms256M -Xmx1G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -Xloggc:/opt/pinot/gc-pinot-controller.log"
    
      log4j2ConfFile: /opt/pinot/conf/pinot-controller-log4j2.xml
      pluginsDir: /opt/pinot/plugins
      pluginsInclude: pinot-s3,kafka-2.0
    
      service:
        annotations:
          "<http://prometheus.io/scrape|prometheus.io/scrape>": "true"
          "<http://prometheus.io/port|prometheus.io/port>": "8080"
        clusterIP: ""
        externalIPs: []
        loadBalancerIP: ""
        loadBalancerSourceRanges: []
        type: ClusterIP
        port: 9000
        nodePort: ""
    
      external:
        enabled: false
        type: LoadBalancer
        port: 9000
    
      resources: {}
    
      nodeSelector: {}
    
      tolerations: []
    
      affinity: {}
    
      podAnnotations:
        "<http://prometheus.io/scrape|prometheus.io/scrape>": "true"
        "<http://prometheus.io/port|prometheus.io/port>": "8080"
    
      updateStrategy:
        type: RollingUpdate
    
      # Extra configs will be appended to pinot-controller.conf file
      extra:
        configs: |-
          pinot.set.instance.id.to.hostname=true
          controller.local.temp.dir=/tmp/pinot-tmp-data/
          pinot.controller.storage.factory.class.s3=org.apache.pinot.plugin.filesystem.S3PinotFS
          pinot.controller.storage.factory.s3.endpoint=10.3.120.223:9000
          pinot.controller.storage.factory.s3.accessKey=***
          pinot.controller.storage.factory.s3.secretKey=***
          pinot.controller.segment.fetcher.protocols=file,http,s3
          pinot.controller.segment.fetcher.s3.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher
    server:
    Copy code
    name: server
    
      ports:
        netty: 8098
        admin: 8097
    
      replicaCount: 1
    
      dataDir: /var/pinot/server/data/index
      segmentTarDir: /var/pinot/server/data/segment
    
      persistence:
        enabled: true
        accessMode: ReadWriteOnce
        size: 4G
        mountPath: /var/pinot/server/data
        storageClass: "csi-cinder-high-speed"
        #storageClass: "ssd"
    
      jvmOpts: "-Xms512M -Xmx1G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -Xloggc:/opt/pinot/gc-pinot-server.log -javaagent:/opt/pinot/etc/jmx_prometheus_javaagent/jmx_prometheus_javaagent-0.12.0.jar=8080:/opt/pinot/etc/jmx_prometheus_javaagent/configs/pinot.yml"
    
      log4j2ConfFile: /opt/pinot/conf/pinot-server-log4j2.xml
      pluginsDir: /opt/pinot/plugins
      pluginsInclude: pinot-s3,kafka-2.0
    
      service:
        annotations: 
          "<http://prometheus.io/scrape|prometheus.io/scrape>": "true"
          "<http://prometheus.io/port|prometheus.io/port>": "8080"
        clusterIP: ""
        externalIPs: []
        loadBalancerIP: ""
        loadBalancerSourceRanges: []
        type: ClusterIP
        port: 8098
        nodePort: ""
    
      resources: {}
    
      nodeSelector: {}
    
      affinity: {}
    
      tolerations: []
    
      podAnnotations: 
        "<http://prometheus.io/scrape|prometheus.io/scrape>": "true"
        "<http://prometheus.io/port|prometheus.io/port>": "8080"
    
      updateStrategy:
        type: RollingUpdate
    
      # Extra configs will be appended to pinot-server.conf file
      # Do not remove pinot.server.storage.factory.s3.region=us-east-1
      extra:
        configs: |-
          pinot.set.instance.id.to.hostname=true
          pinot.server.instance.realtime.alloc.offheap=true
          pinot.server.storage.factory.class.s3=org.apache.pinot.plugin.filesystem.S3PinotFS
          pinot.controller.storage.factory.s3.endpoint=10.3.120.223:9000
          pinot.server.storage.factory.s3.region=us-east-1
          pinot.server.segment.fetcher.protocols=file,http,s3
          pinot.server.segment.fetcher.s3.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher
    x
    • 2
    • 47
  • j

    Jai Patel

    02/23/2021, 6:04 PM
    I’m having some trouble with upserts where a query through the Pinot UI will sometimes return the latest row, sometimes it’ll return all rows. Query:
    Copy code
    select * from enriched_customer_orders_jp_upsert_realtime_streaming_v1
    where normalized_order_id='62:1221247' and ofo_slug='fofo' and store_id='73f6975b-07e8-407a-97a1-580043094a68'
    limit 10
    Table Spec:
    Copy code
    {
      "REALTIME": {
        "tableName": "enriched_customer_orders_jp_upsert_realtime_streaming_v1_REALTIME",
        "tableType": "REALTIME",
        "segmentsConfig": {
          "segmentAssignmentStrategy": "BalanceNumSegmentAssignmentStrategy",
          "timeColumnName": "updated_at_seconds",
          "retentionTimeUnit": "DAYS",
          "retentionTimeValue": "30",
          "segmentPushType": "APPEND",
          "replicasPerPartition": "3",
          "schemaName": "enriched_customer_orders_jp_upsert_realtime_streaming_v1"
        },
        "tenants": {
          "broker": "DefaultTenant",
          "server": "DefaultTenant"
        },
        "tableIndexConfig": {
          "createInvertedIndexDuringSegmentGeneration": true,
          "bloomFilterColumns": [
            "Filter1",
            "Filter2"
          ],
          "loadMode": "MMAP",
          "streamConfigs": {
            "streamType": "kafka",
            "stream.kafka.consumer.type": "LowLevel",
            "stream.kafka.topic.name": "topic-topic-topic-topic-topic",
            "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.inputformat.avro.confluent.KafkaConfluentSchemaRegistryAvroMessageDecoder",
            "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
            "stream.kafka.broker.list": "kafka-host:9092",
            "realtime.segment.flush.threshold.size": "1000",
            "realtime.segment.flush.threshold.rows": "1000",
            "realtime.segment.flush.threshold.time": "6h",
            "realtime.segment.flush.desired.size": "200M",
            "isolation.level": "read_committed",
            "stream.kafka.consumer.prop.auto.offset.reset": "smallest",
            "stream.kafka.consumer.prop.group.id": "enriched_customer_orders_jp_upsert_realtime_streaming_v1_8F6C7BAF-EEA7-441F-ABE3-50BF5F2C4F0A",
            "stream.kafka.consumer.prop.client.id": "v1_732F3C29-4CDA-45AA-85F1-740A0176C6A5",
            "stream.kafka.decoder.prop.schema.registry.rest.url": "<http://confluent-host:8081>"
          },
          "enableDefaultStarTree": false,
          "enableDynamicStarTreeCreation": false,
          "aggregateMetrics": true,
          "nullHandlingEnabled": false,
          "invertedIndexColumns": [
            "store_id"
          ],
          "autoGeneratedInvertedIndex": false
        },
        "metadata": {},
        "routing": {
          "instanceSelectorType": "strictReplicaGroup"
        },
        "upsertConfig": {
          "mode": "FULL"
        }
      }
    }
    Simplification of our schema. There are a lot of other columns. But trimmed to something that would fit (kept all keys).
    Copy code
    {
      "schemaName": "enriched_customer_orders_jp_upsert_realtime_streaming_v1",
      "dimensionFieldSpecs": [
        {
          "name": "store_id",
          "dataType": "STRING"
        },
        {
          "name": "updated_at",
          "dataType": "LONG",
          "defaultNullValue": 0
        },
        {
          "name": "normalized_order_id",
          "dataType": "STRING"
        },
        {
          "name": "ofo_slug",
          "dataType": "STRING"
        }
      ],
      "metricFieldSpecs": [
        {
          "name": "usd_exchange_rate",
          "dataType": "DOUBLE"
        },
        {
          "name": "total",
          "dataType": "DOUBLE"
        }
      ],
      "dateTimeFieldSpecs": [
        {
          "name": "updated_at_seconds",
          "dataType": "LONG",
          "defaultNullValue": 0,
          "transformFunction": "toEpochSeconds(updated_at)",
          "format": "1:MILLISECONDS:EPOCH",
          "granularity": "1:SECONDS"
        }
      ],
      "primaryKeyColumns": [
        "ofo_slug",
        "store_id",
        "normalized_order_id"
      ]
    }
    Our kafka key is:
    store_id::ofo_slug::normalized_order_id
    as a concatenation.
    🙏 1
    k
    y
    +2
    • 5
    • 71
  • m

    Matt

    02/23/2021, 6:51 PM
    Hello, I have 3 Pinot servers with 4 cores and 48Gi each and using realtime table. I noticed that when the load/flow increases there is a lag in the search results (Inverted Index). Once the load is reduced Pinot will catch up. CPU and MEM usage all looks normal. Wondering why this is happening. Are there any settings to make Pinot servers to process faster?
    s
    • 2
    • 15
  • n

    Nick Bowles

    02/24/2021, 12:08 AM
    Trying to run a fairly simply query, borrowing from the docs, and anytime I try to do any sort of grouping on the date field I get an error. Grouping for others works. Thanks in advance!
    Copy code
    SELECT COUNT(*)
    FROM mytable
    GROUP BY DATETIMECONVERT(item_date, '1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd', '1:WEEKS:EPOCH', '1:WEEKS')
    error
    Copy code
    "errorCode": 200,
        "message": "QueryExecutionError:\norg.apache.pinot.core.query.exception.BadQueryRequestException: Caught exception while initializing transform function: datetimeconvert\n\tat org.apache.pinot.core.operator.transform.function.TransformFunctionFactory.get(TransformFunctionFactory.java:207)\n\tat
    table date config
    Copy code
    "dateTimeFieldSpecs": [
        {
          "name": "item_date",
          "dataType": "STRING",
          "format": "1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd",
          "granularity": "1:DAYS"
        }
      ]
    x
    • 2
    • 11
  • r

    Ricardo Bordón

    02/24/2021, 11:19 AM
    Hi folks! QQ, by reading the contribution guidelines I've found out a link to https://pinot.readthedocs.io/en/latest/dev_env.html#dev-setup which seems to be broken (the same happens with https://pinot.readthedocs.io/en/latest/code_modules.html#code-modules). Where can I find such references? Thanks!
    j
    • 2
    • 3
  • g

    Gergely Lendvai

    02/24/2021, 4:54 PM
    I saw this other thread which may be related, since I’m also using a jdk 11 based docker image. Could you help me with this?
    j
    x
    • 3
    • 7
  • n

    Nick Bowles

    02/24/2021, 6:27 PM
    Running a query and I get different results for one field each time I run it. Any explanation for why this might happen?
    Copy code
    SELECT id, week(fromDateTime(datefield, 'yyyy-MM-dd')) as week, SUM(f1 * f2 * f3) AS "ftotal" ****THIS IS THE FIELD THAT RETURNS DIFFERENT RESULTS****
    FROM mytable
    WHERE year(fromDateTime(datefield, 'yyyy-MM-dd')) >= cast(year(now())-2 as long)
      AND id in (123, 1234, 12345) AND f4 = 'blah' AND f5 in ('1-2', '3-4') AND f6 in ('foo', 'bar')
    GROUP BY 1, 2
    j
    • 2
    • 5
  • j

    Jai Patel

    02/25/2021, 6:54 PM
    Is it a known behavior for table deletes of a realtime-only upsert tables being long-running in Pinot 0.6? Every realtime upsert table I’ve tried to delete takes a long time and eventually times out.
    j
    y
    c
    • 4
    • 11
  • j

    Josh Highley

    02/26/2021, 5:21 PM
    New to Pinot. We're using the Pinot Docker images. We've created offline tables successfully, but can't create a realtime table. The segment status is 'bad'. There's no error messages in the logs for the broker, controller, or server so I'm stuck on how to debug this?
    n
    c
    +2
    • 5
    • 9
  • n

    Nick Bowles

    02/26/2021, 8:53 PM
    Untitled
    Untitled
    x
    • 2
    • 20
  • c

    Chundong Wang

    02/26/2021, 11:16 PM
    Ran into
    IllegalStateException
    when using string functions in where clause. 😢
    j
    • 2
    • 23
  • p

    Phúc Huỳnh

    03/01/2021, 5:31 AM
    Hello, i’m having some trouble with minion base on following docs: https://docs.pinot.apache.org/operators/operating-pinot/pinot-managed-offline-flows I want to convert realtime table to offline table. but minions show errors. Here are errors logs:
    x
    n
    • 3
    • 8
  • j

    Josh Highley

    03/01/2021, 7:56 PM
    After looking at the source some more, I found that the highlevel tables copy properties at a very different level than lowlevel. For example, a highlevel table prop "stream.kafka.consumer.prop.security.protocol":"SASL_SSL" has to be "security.protocol":"SASL_SSL" in streamConfigs { } for lowlevel
    w
    • 2
    • 3
  • s

    Slackbot

    03/02/2021, 10:25 PM
    This message was deleted.
    x
    j
    n
    • 4
    • 31
  • j

    Josh Highley

    03/03/2021, 1:16 AM
    I have a realtime table configured for upsert, so a primary key in the schema. If I delete the table, then re-create it with the SAME name, then inserted records in the NEW table will not be returned by a query if they have an earlier timestamp than what the same records had in the deleted table (same record by primary key). The records in the new table are reflected in the query stats (and only the new records) but they aren't returned by the query if they have an earlier timestamp. Is there more I need to delete besides the table? New segments are being created when I create the new table
    x
    j
    • 3
    • 6
  • e

    Elon

    03/03/2021, 6:53 PM
    Hi, we have an issue where the pinot servers are in a crash loop, they cannot start up. The servers are spewing tons of messages like :
    Copy code
    [HelixTaskExecutor] [ZkClient-EventThread-23-pinot-us-central1-zookeeper:2181] SessionId does NOT match. expected sessionId: 300000c69e5009a, tgtSessionId in message: 300000c69e50099, messageId: 9d191304-00cc-4138-bb57-7997a960fab0
    j
    s
    • 3
    • 11
  • x

    Xiang Fu

    03/04/2021, 12:30 AM
    I think the in clause should use single quote
    m
    j
    • 3
    • 8
  • m

    Mohammed Galalen

    03/04/2021, 11:21 AM
    Hi, I faced this error when trying to do BatchIngestion from the local file system
    Failed to generate Pinot segment for file - file:data/orders.csv
    java.lang.NumberFormatException: For input string: "2019-05-02 17:49:53"
    here is the dateTimeFieldSpecs in the schema file:
    Copy code
    "dateTimeFieldSpecs": [
            {
                "dataType": "STRING",
                "name": "start_date",
                "format": "1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss",
                "granularity": "1:DAYS"
            },
            {
                "dataType": "STRING",
                "name": "end_date",
                "format": "1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss",
                "granularity": "1:DAYS"
            },
            {
                "dataType": "STRING",
                "name": "created_at",
                "format": "1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss",
                "granularity": "1:DAYS"
            },
            {
                "dataType": "STRING",
                "name": "updated_at",
                "format": "1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss",
                "granularity": "1:DAYS"
            }
        ]
    k
    • 2
    • 4
  • f

    Fabrício Dutra

    03/04/2021, 3:03 PM
    Hi all, I'm trying to ingest data from kafka using a topic that doesnt has a datetime column and receving this error:
    Copy code
    {
      "code": 400,
      "error": "Schema should not be null for REALTIME table"
    }
    I'm using this spec:
    Copy code
    curl -X POST "<http://localhost:9000/tables>" -H "accept: application/json" -H "Content-Type: application/json" -d "{ \"tableName\": \"realtime_strimzi_dev_acks\", \"tableType\": \"REALTIME\", \"segmentsConfig\": {  \"segmentPushType\": \"REFRESH\", \"schemaName\": \"sch_strimzi_acks\", \"replication\": \"1\", \"replicasPerPartition\": \"1\" }, \"tenants\": {}, \"tableIndexConfig\": { \"loadMode\": \"MMAP\", \"invertedIndexColumns\": [ \"column1\" ], \"streamConfigs\": { \"streamType\": \"kafka\", \"stream.kafka.consumer.type\": \"lowlevel\", \"stream.kafka.topic.name\": \"producer-test-strimzi-dev-acks-0\", \"stream.kafka.decoder.class.name\": \"org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder\", \"stream.kafka.consumer.factory.class.name\": \"org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory\", \"stream.kafka.broker.list\": \"edh-kafka-brokers.ingestion.svc.Cluster.local:9092\", \"realtime.segment.flush.threshold.time\": \"3600000\", \"realtime.segment.flush.threshold.size\": \"50000\", \"stream.kafka.consumer.prop.auto.offset.reset\": \"smallest\" } }, \"metadata\": { \"customConfigs\": {} }}"
    Is there a way to create a realtime table autofilling/creating a datetime column?
    k
    n
    c
    • 4
    • 12
1...91011...166Latest