https://pinot.apache.org/ logo
Join Slack
Powered by
# troubleshooting
  • e

    Ehsan Irshad

    12/06/2022, 7:15 AM
    Hi, Is the limitation resolved in 0.11.0 version, which is mentioned at the end of Star Tree Index? https://docs.pinot.apache.org/basics/indexing/star-tree-index
    m
    • 2
    • 1
  • a

    Aaron Weiss

    12/06/2022, 7:06 PM
    We made some schema changes and ran Reload All Segments after that, but still getting this message:
    There are 3 invalid segment/s. This usually means that they were created with an older schema. Please reload the table in order to refresh these segments to the new schema.
    Is there any way to determine which segments these are?
    s
    • 2
    • 23
  • p

    Pyry Kovanen

    12/06/2022, 9:21 PM
    (SOLVED) Hi all, I'm using Azure Data Lake as the file system to ingest batch data to Pinot. I have followed this example: https://docs.pinot.apache.org/basics/data-import/pinot-file-system/import-from-adls-azure While I have been able to solve a bunch problems along the way, there is one Exception that leaves me clueless. In my storage account, there is one csv file called
    data3.csv
    that matches to the glob in the Job Spec. Also, based on the exception it seems that the ADLS plugin is able to see the file there. This is my job spec YAML:
    Copy code
    executionFrameworkSpec:
      name: 'standalone'
      segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'
      segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'
      segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner'
    jobType: SegmentCreationAndTarPush
    inputDirURI: '<adl2://my-example-storage.blob.core.windows.net/my-beatiful-fs>'
    includeFileNamePattern: 'glob:**/*.csv'
    outputDirURI: '<adl2://my-example-storage.blob.core.windows.net/my-beatiful-fs/segments>'
    overwriteOutput: true
    cleanUpOutputDir: true
    pinotFSSpecs:
      - scheme: adl2
        className: org.apache.pinot.plugin.filesystem.ADLSGen2PinotFS
        configs:
          accountName: 'my-example-storage'
          accessKey: 'xxxx'
          fileSystemName: 'my-beatiful-fs'
          enableChecksum: true
    recordReaderSpec:
      dataFormat: 'csv'
      className: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReader'
      configClassName: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReaderConfig'
    tableSpec:
      tableName: 'foo_data'
    pinotClusterSpecs:
      - controllerURI: '<http://localhost:9000>'
    And this is the stack trace from the exception I get:
    Copy code
    ADLSGen2PinotFS is initialized (accountName=my-example-storage, fileSystemName=my-beatiful-fs, dfsServiceEndpointUrl=<https://my-example-storage.dfs.core.windows.net>, blobServiceEndpointUrl=<https://my-example-storage.blob.core.windows.net>, enableChecksum=true)
    Creating an executor service with 1 threads(Job parallelism: 0, available cores: 1.)
    Got exception to kick off standalone data ingestion job - 
    java.lang.RuntimeException: Caught exception during running - org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner
            at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.kickoffIngestionJob(IngestionJobLauncher.java:152) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
            at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.runIngestionJob(IngestionJobLauncher.java:121) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
            at org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand.execute(LaunchDataIngestionJobCommand.java:130) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
            at org.apache.pinot.tools.Command.call(Command.java:33) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
            at org.apache.pinot.tools.Command.call(Command.java:29) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
            at picocli.CommandLine.executeUserObject(CommandLine.java:1953) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
            at picocli.CommandLine.access$1300(CommandLine.java:145) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
            at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2352) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
            at picocli.CommandLine$RunLast.handle(CommandLine.java:2346) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
            at picocli.CommandLine$RunLast.handle(CommandLine.java:2311) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
            at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
            at picocli.CommandLine.execute(CommandLine.java:2078) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
            at org.apache.pinot.tools.admin.PinotAdministrator.execute(PinotAdministrator.java:165) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
            at org.apache.pinot.tools.admin.PinotAdministrator.main(PinotAdministrator.java:196) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
    Caused by: java.io.FileNotFoundException: /tmp/pinot-69425772-9d5a-4b3c-b9bc-1d812becb5b3/input/09e20700-f285-44a1-81f9-9914aa28e6ac/data3.csv (No such file or directory)
            at java.io.FileOutputStream.open0(Native Method) ~[?:?]
            at java.io.FileOutputStream.open(FileOutputStream.java:298) ~[?:?]
            at java.io.FileOutputStream.<init>(FileOutputStream.java:237) ~[?:?]
            at java.io.FileOutputStream.<init>(FileOutputStream.java:187) ~[?:?]
            at org.apache.pinot.plugin.filesystem.ADLSGen2PinotFS.copyToLocalFile(ADLSGen2PinotFS.java:451) ~[pinot-adls-0.11.0-shaded.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
            at org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner.submitSegmentGenTask(SegmentGenerationJobRunner.java:258) ~[pinot-batch-ingestion-standalone-0.11.0-shaded.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
            at org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner.run(SegmentGenerationJobRunner.java:224) ~[pinot-batch-ingestion-standalone-0.11.0-shaded.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
            at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.kickoffIngestionJob(IngestionJobLauncher.java:150) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
            ... 13 more
    java.lang.RuntimeException: Caught exception during running - org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner
            at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.kickoffIngestionJob(IngestionJobLauncher.java:152)
            at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.runIngestionJob(IngestionJobLauncher.java:121)
            at org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand.execute(LaunchDataIngestionJobCommand.java:130)
            at org.apache.pinot.tools.Command.call(Command.java:33)
            at org.apache.pinot.tools.Command.call(Command.java:29)
            at picocli.CommandLine.executeUserObject(CommandLine.java:1953)
            at picocli.CommandLine.access$1300(CommandLine.java:145)
            at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2352)
            at picocli.CommandLine$RunLast.handle(CommandLine.java:2346)
            at picocli.CommandLine$RunLast.handle(CommandLine.java:2311)
            at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179)
            at picocli.CommandLine.execute(CommandLine.java:2078)
            at org.apache.pinot.tools.admin.PinotAdministrator.execute(PinotAdministrator.java:165)
            at org.apache.pinot.tools.admin.PinotAdministrator.main(PinotAdministrator.java:196)
    Caused by: java.io.FileNotFoundException: /tmp/pinot-69425772-9d5a-4b3c-b9bc-1d812becb5b3/input/09e20700-f285-44a1-81f9-9914aa28e6ac/data3.csv (No such file or directory)
            at java.base/java.io.FileOutputStream.open0(Native Method)
            at java.base/java.io.FileOutputStream.open(FileOutputStream.java:298)
            at java.base/java.io.FileOutputStream.<init>(FileOutputStream.java:237)
            at java.base/java.io.FileOutputStream.<init>(FileOutputStream.java:187)
            at org.apache.pinot.plugin.filesystem.ADLSGen2PinotFS.copyToLocalFile(ADLSGen2PinotFS.java:451)
            at org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner.submitSegmentGenTask(SegmentGenerationJobRunner.java:258)
            at org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner.run(SegmentGenerationJobRunner.java:224)
            at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.kickoffIngestionJob(IngestionJobLauncher.java:150)
            ... 13 more
    Does anyone have any idea on what might be wrong here and how to move forward? Thanks already in advance!
    s
    n
    a
    • 4
    • 11
  • c

    Caleb Shei

    12/06/2022, 10:15 PM
    I'm trying to run hadoop batch ingestion job and get this error
    Copy code
    Caused by: java.lang.ClassNotFoundException: org.apache.pinot.plugin.ingestion.batch.hadoop.HadoopSegmentGenerationJobRunner
    	at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:471)
    	at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:589)
    	at org.apache.pinot.spi.plugin.PluginClassLoader.loadClass(PluginClassLoader.java:104)
    	at org.apache.pinot.spi.plugin.PluginManager.createInstance(PluginManager.java:354)
    	at org.apache.pinot.spi.plugin.PluginManager.createInstance(PluginManager.java:325)
    	at org.apache.pinot.spi.plugin.PluginManager.createInstance(PluginManager.java:306)
    	at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.kickoffIngestionJob(IngestionJobLauncher.java:143)
    s
    • 2
    • 6
  • g

    Gaurav Pant

    12/07/2022, 3:58 PM
    What is the best way to query Apache Pinot using a Java spring app. 1. Java client jar 2. Jdbc or Hibernate/Spring Data JPA(if supported) If there are any other way possible,please suggest.
    t
    r
    • 3
    • 7
  • s

    Shubham Kumar

    12/07/2022, 4:49 PM
    Hi team, We had deployed pinot a while back using helm. Today, I was trying to redeploy pinot again with deep store configured. these are the controller configs provided :
    Copy code
    extraEnv:
        - name: AWS_ACCESS_KEY_ID
          value: XAWS_ACCESS_KEY_ID
        - name: AWS_SECRET_ACCESS_KEY
          value: XAWS_SECRET_ACCESS_KEY
        - name: LOG4J_CONSOLE_LEVEL
          value: all
      extra:
        configs: |-
          pinot.controller.storage.factory.class.s3=org.apache.pinot.plugin.filesystem.S3PinotFS
          pinot.controller.storage.factory.s3.region=ap-south-1
          pinot.controller.segment.fetcher.protocols=file,http,s3
          pinot.controller.segment.fetcher.s3.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher
          controller.data.dir=<s3://test-data/pinot-data/pinot-default/controller-data/>
          controller.local.temp.dir=/tmp/pinot-tmp-data/
    Controller pod creation is failing with :
    Copy code
    Failed to start a Pinot [CONTROLLER] at 5.321 since launch
    java.lang.NullPointerException: null
    at org.apache.pinot.common.utils.helix.HelixHelper.updateHostnamePort(HelixHelper.java:630) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
    at org.apache.pinot.controller.BaseControllerStarter.updateInstanceConfigIfNeeded(BaseControllerStarter.java:623) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
    at org.apache.pinot.controller.BaseControllerStarter.registerAndConnectAsHelixParticipant(BaseControllerStarter.java:599) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
    at org.apache.pinot.controller.BaseControllerStarter.setUpPinotController(BaseControllerStarter.java:392) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
    at org.apache.pinot.controller.BaseControllerStarter.start(BaseControllerStarter.java:322) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
    at org.apache.pinot.tools.service.PinotServiceManager.startController(PinotServiceManager.java:118) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
    at org.apache.pinot.tools.service.PinotServiceManager.startRole(PinotServiceManager.java:87) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
    at org.apache.pinot.tools.admin.command.StartServiceManagerCommand.lambda$startBootstrapServices$0(StartServiceManagerCommand.java:251) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
    at org.apache.pinot.tools.admin.command.StartServiceManagerCommand.startPinotService(StartServiceManagerCommand.java:304) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
    at org.apache.pinot.tools.admin.command.StartServiceManagerCommand.startBootstrapServices(StartServiceManagerCommand.java:250) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
    at org.apache.pinot.tools.admin.command.StartServiceManagerCommand.execute(StartServiceManagerCommand.java:196) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
    at org.apache.pinot.tools.admin.command.StartControllerCommand.execute(StartControllerCommand.java:187) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
    at org.apache.pinot.tools.Command.call(Command.java:33) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
    at org.apache.pinot.tools.Command.call(Command.java:29) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
    at picocli.CommandLine.executeUserObject(CommandLine.java:1953) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
    at picocli.CommandLine.access$1300(CommandLine.java:145) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
    at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2352) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
    at picocli.CommandLine$RunLast.handle(CommandLine.java:2346) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
    at picocli.CommandLine$RunLast.handle(CommandLine.java:2311) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
    at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
    at picocli.CommandLine.execute(CommandLine.java:2078) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
    at org.apache.pinot.tools.admin.PinotAdministrator.execute(PinotAdministrator.java:165) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
    at org.apache.pinot.tools.admin.PinotAdministrator.main(PinotAdministrator.java:196) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
    Shutting down Pinot Service Manager with all running Pinot instances...
    Can you please help with what I am missing here?
    t
    • 2
    • 3
  • l

    Leon Liu

    12/07/2022, 8:12 PM
    Hello, i’m relatively new to apache pinot, and run into some issues or misunderstanding while batch ingesting the data. I’m following the document documented here: https://docs.pinot.apache.org/basics/getting-started/pushing-your-data-to-pinot to batch load multiple tables using csv (table1 and table2, table1 has 1000 rows, and table2 has 2000 rows) after loading table1, the following query returns 1000 (select count(*) from table1) but after loading table2, query “select count(*) from table2” returns 3000 records. after looking into it, there are a lot of rows returned with null values if i run query “select * from table2". is this normal or i did something wrong? thanks a lot in advance
    t
    • 2
    • 9
  • g

    Gerrit van Doorn

    12/07/2022, 9:48 PM
    How can I verify what index my query is using? For example, I'm trying to use a star-tree index but have no idea if it's actually being used
    m
    n
    • 3
    • 5
  • a

    Abhishek Dubey

    12/08/2022, 11:27 AM
    Hi Team, While creating table via UI, I get this error even though the table config doesn’t exist. Also sometimes, I could see a message with “heap space” as error. Any reason why it’s not allowing me to add new table config ?
    m
    • 2
    • 21
  • a

    Abhishek Dubey

    12/08/2022, 11:28 AM
    where can I see details of this error if it’s part of server log ?
  • g

    Gaurav Pant

    12/08/2022, 5:06 PM
    Hi team, Apache Pinot's doc says the jdbc client is not completely ANSI SQL 92 compliant. Does it mean we cannot use joins in select clause? Or is there any other limitation for the jdbc client.
    t
    • 2
    • 1
  • n

    Neeraja Sridharan

    12/08/2022, 9:20 PM
    Hey Team! For partition based segment pruning in
    offline
    pinot tables, is it sufficient that the partition implementation logic (murmur) is the same on source & pinot or should the number of partitions also match?
    s
    • 2
    • 9
  • p

    Pratik Bhadane

    12/09/2022, 5:27 AM
    Hello Team, Issue: Losing tables after every server(EC2) reboot. I have just started exploring Pinot. We have configured s3 for as deep storage. When we restart our EC2 instance we are losing all our tables however if we just restart the Pinot services then we are not losing tables. Could you please help me to understand what is wrong with my configuration? and what changes we need to make to fix this issue. controller.conf ################################################### # Pinot Role pinot.service.role=CONTROLLER # Pinot Cluster name pinot.cluster.name=pinot-dev controller.data.dir=s3://mybucket/pinot-data/controller-data controller.local.temp.dir=/opt/pinot/controller-data/pinot-tmp-data/ controller.zk.str=localhost:2181 controller.host=127.0.0.1 controller.port=9000 controller.helix.cluster.name=pinot-dev pinot.controller.storage.factory.class.s3=org.apache.pinot.plugin.filesystem.S3PinotFS pinot.controller.storage.factory.s3.region=ap-south-1 pinot.controller.segment.fetcher.protocols=file,http,s3 pinot.controller.segment.fetcher.s3.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher pinot.controller.storage.factory.s3.disableAcl=false pinot.controller.storage.factory.s3.accessKey=A**** pinot.controller.storage.factory.s3.secretKey=W**** ################################################### server.conf ################################################### # Pinot Role pinot.service.role=SERVER # Pinot Cluster name pinot.cluster.name=pinot-dev pinot.server.netty.port=8098 pinot.server.adminapi.port=8097 pinot.server.instance.dataDir=/opt/pinot/server-data/server/index pinot.server.instance.segmentTarDir=/opt/pinot/server-data/server/segmentTars pinot.server.storage.factory.class.s3=org.apache.pinot.plugin.filesystem.S3PinotFS pinot.server.storage.factory.s3.region=ap-south-1 pinot.server.segment.fetcher.protocols=file,http,s3 pinot.server.segment.fetcher.s3.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher pinot.controller.storage.factory.s3.disableAcl=false ################################################### Commands used to start the services ################################################### Zookeeper: /opt/pinot/apache-pinot-0.11.0-bin/bin/pinot-admin.sh StartZookeeper -zkPort 2181 Controller: /opt/pinot/apache-pinot-0.11.0-bin/bin/pinot-admin.sh StartController -configFileName conf/controller.conf Broker: /opt/pinot/apache-pinot-0.11.0-bin/bin/pinot-admin.sh StartBroker -zkAddress localhost:2181 -clusterName pinot-dev Server: /opt/pinot/apache-pinot-0.11.0-bin/bin/pinot-admin.sh StartServer -configFileName conf/server.conf -zkAddress localhost:2181 -clusterName pinot-dev ################################################### transcript-schema.json ################################################### { "schemaName": "studentfour", "dimensionFieldSpecs": [ { "name": "studentID", "dataType": "INT" }, { "name": "firstName", "dataType": "STRING" }, { "name": "lastName", "dataType": "STRING" }, { "name": "gender", "dataType": "STRING" }, { "name": "subject", "dataType": "STRING" } ], "metricFieldSpecs": [ { "name": "score", "dataType": "FLOAT" } ], "dateTimeFieldSpecs": [{ "name": "timestampInEpoch", "dataType": "LONG", "format" : "1MILLISECONDSEPOCH", "granularity": "1:MILLISECONDS" }] } ################################################### transcript-table-offline.json ################################################### { "tableName": "studentfour", "segmentsConfig" : { "timeColumnName": "timestampInEpoch", "timeType": "MILLISECONDS", "replication" : "1", "schemaName" : "studentfour", "peerSegmentDownloadScheme" : "https" }, "tableIndexConfig" : { "invertedIndexColumns" : [], "loadMode" : "MMAP" }, "tenants" : {}, "ingestionConfig":{ "batchIngestionConfig":{ "segmentIngestionType":"APPEND", "segmentIngestionFrequency": "DAILY", "batchConfigMaps": [ { "input.fs.className": "org.apache.pinot.plugin.filesystem.S3PinotFS", "input.fs.prop.region": "ap-south-1", "input.fs.prop.secretKey": "W****", "input.fs.prop.accessKey": "A***”, "inputDirURI": "s3://mybucket/batch/student/rawdata/", "includeFileNamePattern": "glob:**/*.csv", "excludeFileNamePattern": "glob:**/*.tmp", "inputFormat": "csv" } ] } }, "task":{ "taskTypeConfigsMap":{ "SegmentGenerationAndPushTask": { "schedule": "0 */10 * * * ?", "tableMaxNumTasks": 10 } } }, "tableType":"OFFLINE", "metadata": {} } ################################################### transcriptIngestionJobSpec.yaml ################################################################### executionFrameworkSpec: name: 'standalone' segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner' segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner' segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner' segmentMetadataPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentMetadataPushJobRunner' jobType: SegmentCreationAndMetadataPush inputDirURI: 's3://mybucket/batch/student/rawdata/' includeFileNamePattern: 'glob:**/*.csv' outputDirURI: 's3://mybucket/pinot-data/controller-data/studentfour/segments' overwriteOutput: true pinotFSSpecs: - scheme: s3 className: org.apache.pinot.plugin.filesystem.S3PinotFS configs: region: 'ap-south-1' - scheme: file className: org.apache.pinot.spi.filesystem.LocalPinotFS recordReaderSpec: dataFormat: 'csv' className: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReader' tableSpec: tableName: 'studentfour' schemaURI: 'http://localhost:9000/tables/studentfour/schema' tableConfigURI: 'http://localhost:9000/tables/studentfour' pinotClusterSpecs: - controllerURI: 'http://localhost:9000' pushJobSpec: copyToDeepStoreForMetadataPush: true pushFileNamePattern: 'glob:**/*.tar.gz' pushAttempts: 2 pushRetryIntervalMillis: 1000 ###################################################################
    s
    • 2
    • 3
  • h

    harnoor

    12/09/2022, 7:10 PM
    Hi team. I have one doubt: As per the code, the logic for
    numSegmentsPrunedInvalid
    during query execution is:
    Copy code
    private static boolean isInvalidSegment(IndexSegment segment, QueryContext query) {
        return !segment.getColumnNames().containsAll(query.getColumns());
      }
    A Few days ago I added a new column to my Realtime table and reloaded all the segments. I am not even running queries on the new column but I can see my query returning an incorrect response as all the completed segments are getting pruned for the query. I saw that
    numSegmentsPrunedInvalid
    =
    numSegmentsQueried
    -
    numConsumingSegmentsQueried
    in the query result and hence looks like the query is working fine only for the consuming segments. Shouldn’t the query run fine unless the new column is being selected? Pinot version: 0.11.0
    s
    • 2
    • 3
  • t

    Tony Requist

    12/09/2022, 11:41 PM
    Hi - We had an issue where a table with
    Copy code
    "retentionTimeUnit": "DAYS",
        "retentionTimeValue": "90",
    had two segments that were much older, one 112 days old and one ~260 days old. We have 6 tables with varying retention and this is the only case where old segments were not properly deleted when they passed the retention time. Any ideas why this might happen?
    s
    • 2
    • 7
  • j

    Jatin Kumar

    12/10/2022, 10:57 AM
    Hello, We are trying to insert data in pinot (offline mode through insert feature of trino) but we have seen that job keeps on failing with error that it cannot replace segments After looking into segment_lineage , some segment are in
    IN_PROGRESS
    forever and next time we trying the job it failed 1. whats the reason that segment is in
    IN_PROGRESS
    state and not going to
    COMPLETED
    state. 2. How can we solve this issue? cc: @Elon
    s
    e
    +2
    • 5
    • 55
  • l

    Lee Wei Hern Jason

    12/12/2022, 4:34 AM
    Hi Team, I noticed that my broker (at times controller) throughput been spiky which at times making my instance unresponsive. The IOPS is about > 2k and read throughput is about 100MB/s. I have configured my EBS to use GP3 which has 3k IOPS and 125 MB/s. I was wondering what does Broker read from EBS ? Does broker get segments from server and write to EBS, then read from it which incur the high read throughput ? I compared my write and read throughput, read throughput is significantly higher than write. What could be the reason for high read throughput in Broker and Controller ? Thank you🙏
    s
    p
    • 3
    • 18
  • s

    Shreeram Goyal

    12/12/2022, 7:01 AM
    Hi, I have been trying to ingest data into realtime tables from kafka topics. I have been able to ingest data for two tables but somehow, couldn't ingest data for one another table. Here are the logs for both the controller and server. Could you please help with the issue? Controller logs:
    Copy code
    2022/12/12 12:16:11.538 INFO [LLCSegmentCompletionHandlers] [grizzly-http-server-6] Processing segmentConsumed:Offset: -1,Segment name: order_items__0__0__20221212T0636Z,Instance Id: Server_i43592-a14160_8098,Reason: timeLimit,NumRows: 49392,BuildTimeMillis: -1,WaitTimeMillis: -1,ExtraTimeSec: -1,SegmentLocation: null,MemoryUsedBytes: 18772108,SegmentSizeBytes: -1,StreamPartitionMsgOffset: 49392
    2022/12/12 12:16:11.542 INFO [SegmentCompletionManager] [grizzly-http-server-6] Created FSM {order_items__0__0__20221212T0636Z,HOLDING,1670827571540,null,null,true,<http://localhost:9001>}
    2022/12/12 12:16:11.542 INFO [SegmentCompletionFSM_order_items__0__0__20221212T0636Z] [grizzly-http-server-6] Processing segmentConsumed(Server_i43592-a14160_8098, 49392)
    2022/12/12 12:16:11.542 INFO [SegmentCompletionFSM_order_items__0__0__20221212T0636Z] [grizzly-http-server-6] HOLDING:Picking winner time=2 size=1
    2022/12/12 12:16:11.542 INFO [SegmentCompletionFSM_order_items__0__0__20221212T0636Z] [grizzly-http-server-6] HOLDING:Committer notified winner instance=Server_i43592-a14160_8098 offset=49392
    2022/12/12 12:16:11.542 INFO [SegmentCompletionFSM_order_items__0__0__20221212T0636Z] [grizzly-http-server-6] HOLDING:COMMIT for instance=Server_i43592-a14160_8098 offset=49392 buldTimeSec=126
    2022/12/12 12:16:11.542 INFO [LLCSegmentCompletionHandlers] [grizzly-http-server-6] Response to segmentConsumed for segment:order_items__0__0__20221212T0636Z is :{"offset":49392,"status":"COMMIT","isSplitCommitType":true,"controllerVipUrl":"<http://localhost:9001>","streamPartitionMsgOffset":"49392","buildTimeSec":126}
    2022/12/12 12:16:11.543 INFO [ControllerResponseFilter] [grizzly-http-server-6] Handled request from 10.64.14.160 GET <http://i40790-a46135:9001/segmentConsumed?reason=timeLimit&streamPartitionMsgOffset=49392&instance=Server_i43592-a14160_8098&offset=-1&name=order_items__0__0__20221212T0636Z&rowCount=49392&memoryUsedBytes=18772108>, content-type null status code 200 OK
    2022/12/12 12:16:12.979 INFO [LLCSegmentCompletionHandlers] [grizzly-http-server-1] Processing segmentCommitStart:Offset: -1,Segment name: order_items__0__0__20221212T0636Z,Instance Id: Server_i43592-a14160_8098,Reason: null,NumRows: 49392,BuildTimeMillis: 995,WaitTimeMillis: 0,ExtraTimeSec: -1,SegmentLocation: null,MemoryUsedBytes: 18772108,SegmentSizeBytes: 15179507,StreamPartitionMsgOffset: 49392
    2022/12/12 12:16:12.981 INFO [SegmentCompletionFSM_order_items__0__0__20221212T0636Z] [grizzly-http-server-1] Processing segmentCommitStart(Server_i43592-a14160_8098, 49392)
    2022/12/12 12:16:12.981 INFO [SegmentCompletionFSM_order_items__0__0__20221212T0636Z] [grizzly-http-server-1] COMMITTER_NOTIFIED:Uploading for instance=Server_i43592-a14160_8098 offset=49392
    2022/12/12 12:16:12.981 INFO [LLCSegmentCompletionHandlers] [grizzly-http-server-1] Response to segmentCommitStart for segment:order_items__0__0__20221212T0636Z is:{"offset":-1,"status":"COMMIT_CONTINUE","isSplitCommitType":false,"streamPartitionMsgOffset":null,"buildTimeSec":-1}
    2022/12/12 12:16:12.981 INFO [ControllerResponseFilter] [grizzly-http-server-1] Handled request from 10.64.14.160 GET <http://i40790-a46135:9001/segmentCommitStart?segmentSizeBytes=15179507&buildTimeMillis=995&streamPartitionMsgOffset=49392&instance=Server_i43592-a14160_8098&offset=-1&name=order_items__0__0__20221212T0636Z&rowCount=49392&memoryUsedBytes=18772108>, content-type null status code 200 OK
    2022/12/12 12:16:13.015 INFO [LLCSegmentCompletionHandlers] [grizzly-http-server-4] Processing segmentCommitEndWithMetadata:Offset: -1,Segment name: order_items__0__0__20221212T0636Z,Instance Id: Server_i43592-a14160_8098,Reason: null,NumRows: 49392,BuildTimeMillis: 995,WaitTimeMillis: 0,ExtraTimeSec: -1,SegmentLocation: file:/data/pinot/controller/data/order_items/order_items__0__0__20221212T0636Z.tmp.d86245bc-1c04-4b81-99fc-9becd3bea891,MemoryUsedBytes: 18772108,SegmentSizeBytes: 15179507,StreamPartitionMsgOffset: 49392
    2022/12/12 12:16:13.020 INFO [SegmentCompletionFSM_order_items__0__0__20221212T0636Z] [grizzly-http-server-4] Processing segmentCommitEnd(Server_i43592-a14160_8098, 49392)
    2022/12/12 12:16:13.020 INFO [SegmentCompletionFSM_order_items__0__0__20221212T0636Z] [grizzly-http-server-4] Committing segment order_items__0__0__20221212T0636Z at offset 49392 winner Server_i43592-a14160_8098
    2022/12/12 12:16:13.020 INFO [PinotLLCRealtimeSegmentManager] [grizzly-http-server-4] Committing segment file for segment: order_items__0__0__20221212T0636Z
    2022/12/12 12:16:13.021 WARN [BasePinotFS] [grizzly-http-server-4] Source file:/data/pinot/controller/data/order_items/order_items__0__0__20221212T0636Z.tmp.d86245bc-1c04-4b81-99fc-9becd3bea891 does not exist
    2022/12/12 12:16:13.021 ERROR [SegmentCompletionFSM_order_items__0__0__20221212T0636Z] [grizzly-http-server-4] Caught exception while committing segment file for segment: order_items__0__0__20221212T0636Z
    java.lang.IllegalStateException: Failed to move segment file for segment: order_items__0__0__20221212T0636Z from: file:/data/pinot/controller/data/order_items/order_items__0__0__20221212T0636Z.tmp.d86245bc-1c04-4b81-99fc-9becd3bea891 to: file:/data/pinot/controller/data/order_items/order_items__0__0__20221212T0636Z
    	at org.apache.pinot.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:738) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
    	at org.apache.pinot.controller.helix.core.realtime.PinotLLCRealtimeSegmentManager.commitSegmentFile(PinotLLCRealtimeSegmentManager.java:486) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
    	at org.apache.pinot.controller.helix.core.realtime.SegmentCompletionManager$SegmentCompletionFSM.commitSegment(SegmentCompletionManager.java:1085) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
    	at org.apache.pinot.controller.helix.core.realtime.SegmentCompletionManager$SegmentCompletionFSM.segmentCommitEnd(SegmentCompletionManager.java:660) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
    	at org.apache.pinot.controller.helix.core.realtime.SegmentCompletionManager.segmentCommitEnd(SegmentCompletionManager.java:326) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
    	at org.apache.pinot.controller.api.resources.LLCSegmentCompletionHandlers.segmentCommitEndWithMetadata(LLCSegmentCompletionHandlers.java:430) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
    	at jdk.internal.reflect.GeneratedMethodAccessor146.invoke(Unknown Source) ~[?:?]
    	at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
    	at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
    	at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:52) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
    	at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:124) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
    	at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:167) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
    	at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$TypeOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:219) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
    	at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:79) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
    	at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:475) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
    	at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:397) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
    	at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:81) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
    	at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:255) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
    	at org.glassfish.jersey.internal.Errors$1.call(Errors.java:248) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
    	at org.glassfish.jersey.internal.Errors$1.call(Errors.java:244) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
    	at org.glassfish.jersey.internal.Errors.process(Errors.java:292) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
    	at org.glassfish.jersey.internal.Errors.process(Errors.java:274) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
    	at org.glassfish.jersey.internal.Errors.process(Errors.java:244) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
    	at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:265) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
    	at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:234) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
    	at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:684) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
    	at org.glassfish.jersey.grizzly2.httpserver.GrizzlyHttpContainer.service(GrizzlyHttpContainer.java:356) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
    	at org.glassfish.grizzly.http.server.HttpHandler$1.run(HttpHandler.java:200) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
    	at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:569) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
    	at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.run(AbstractThreadPool.java:549) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
    	at java.lang.Thread.run(Thread.java:829) [?:?]
    m
    • 2
    • 6
  • s

    Shubham Kumar

    12/12/2022, 1:03 PM
    Hi team, We had deployed pinot using helm with deep store configured few days ago. we are trying to batch ingest data using spark job with
    SegmentCreationAndMetadataPush
    jobType. But we are observing that spark job gets succeeded but segments created are in
    BAD
    state. This is the spec yaml file :
    Copy code
    executionFrameworkSpec:
      name: 'spark'
      segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.spark3.SparkSegmentGenerationJobRunner'
      segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.spark3.SparkSegmentTarPushJobRunner'
      segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.spark3.SegmentUriPushJobRunner'
      segmentMetadataPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.spark3.SegmentMetadataPushJobRunner'
    
    # Recommended to set jobType to SegmentCreationAndMetadataPush for production environment where Pinot Deep Store is configured  
    jobType: SegmentCreationAndMetadataPush
    
    inputDirURI: '<s3://test-data/tpch-data/lineitem_dummy/parquet/>'
    includeFileNamePattern: 'glob:**/*.parquet'
    excludeFileNamePattern: 'glob:**/*.tmp'
    outputDirURI: '<s3://test-data/pinot/segment_stage/>'
    overwriteOutput: true
    pinotFSSpecs:
      - scheme: s3
        className: org.apache.pinot.plugin.filesystem.S3PinotFS
        configs:
          region: ap-south-1
          accessKey: *****************
          secretKey: SdF************************HAd
    recordReaderSpec:
      dataFormat: 'parquet'
      className: 'org.apache.pinot.plugin.inputformat.parquet.ParquetRecordReader'
    tableSpec:
      tableName: 'lineitem_spark_92'
      schemaURI: '<https://pinot.np.tech.in/schemas/li_spark_append>'
      tableConfigURI: '<https://pinot.np.tech.in/tables/li_spark_append>'
    pinotClusterSpecs:
      - controllerURI: '<https://pinot.np.tech.in/>'
    pushJobSpec:
      pushAttempts: 2
      pushRetryIntervalMillis: 1000
    In the minion logs I am able to see :
    Copy code
    Copied segment: li_spark_append_OFFLINE_1993-09-17_1993-11-08_5 of table: li_spark_append_OFFLINE to final location: file:/var/pinot/controller/data,<s3://test-data/pinot-data/pinot-default/controller-data//li_spark_append/li_spark_append_OFFLINE_1993-09-17_1993-11-08_5>
    but there is no data present at the deep store controller directory. However, segments get pushed to segment store provided in jobSpec file. In the above minion log we are able to see copied to deep store path and controller mount path. But upon verifying, there isn’t any segment files there extra controller configs :
    Copy code
    configs: |-
          pinot.set.instance.id.to.hostname=true
          pinot.controller.storage.factory.class.s3=org.apache.pinot.plugin.filesystem.S3PinotFS
          pinot.controller.storage.factory.s3.region=ap-south-1
          pinot.controller.segment.fetcher.protocols=file,http,s3
          pinot.controller.segment.fetcher.s3.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher
          controller.data.dir=<s3://test-data/pinot-data/pinot-default/controller-data/>
          controller.local.temp.dir=/tmp/pinot-tmp-data/
    k
    h
    x
    • 4
    • 34
  • j

    Jatin Kumar

    12/12/2022, 7:56 PM
    Hello, We are facing issue with segment lineage Lineage got stuck in
    IN_PROGRESS
    state and it never completed, i am attaching one of lineage example in thread. any pointer how to debug this further? cc: @Xiang Fu
    x
    • 2
    • 3
  • v

    vishal

    12/13/2022, 5:43 AM
    Hi Team, i am trying push data to offline table from s3 bucket. i've pushed one file and pushed 2nd file after some time, old data is overlapping some time and some time its not. so can somebody help me on what basis data are overlapping deeper level? is it timestamp base? or primary key base? or anything else?
    s
    s
    +2
    • 5
    • 44
  • v

    Venkatesh Radhakrishnan

    12/13/2022, 10:02 AM
    Pinot Segments going into bad state and we loose all the data - looking for help. Hi All, Below is the summary. Kindly review and let us know if we are missing something or Is this expected behaviour with Pinot. 1. Infra - AWS, 8GB RAM, 30GB storage 2. We setup pinot and ingest data into pinot through Kafka 3. All well until we reach 1 million records. 4. When we were around records (total in Pinot DB), we execute some read queries from Pinot and servers went into unresponsive mode. 5. The only option we had is to restart servers from AWS console. 6. After restarting, we re-run the docker to re-start the Pinot cluster. 7. We see that all the segments are in bad state and we have lost all the million records that were ingested earlier. Importantly, we cannot afford to loose data as Pinot is our primary DB source.
    m
    b
    +2
    • 5
    • 11
  • c

    Carl

    12/13/2022, 10:27 PM
    Hi team, we are trying to use lastwithtime aggregation, but run into queryexcutionexception when the data column is string and has Unicode character in it. Do we have a fix for this?
    s
    j
    • 3
    • 2
  • a

    Apoorv

    12/14/2022, 12:59 PM
    Hi team, we are working with realtime partial upsert table where multiple records are push to kafka with given primary key, we are observing that for many records its only reflecting the data as per the latest records pushed to kafka for a given key (behaving like a full upsert) although for other records partial upsert works fine. Is there any known edge condition with respect to partial upsert table my config is as below
    Copy code
    "upsertConfig": {
          "mode": "PARTIAL",
          "partialUpsertStrategies": {},
          "defaultPartialUpsertStrategy": "OVERWRITE",
          "hashFunction": "NONE"
        },
  • f

    francoisa

    12/14/2022, 3:49 PM
    Hi 🙂 I’ve maybe find a way to reproduce my error @Mark Needham -> https://apache-pinot.slack.com/archives/C011C9JHN7R/p1669909874305169 Quite simple add a derivated column on realtime table with a given formula lets say existingVal*1000. Apply all my changes reload segments when well. Change the formula of the derivated column does not works anymore i will keep having existingVal*1000 except on consuming segments. Am I the only one having this issue ?
    m
    h
    • 3
    • 9
  • p

    Padma Malladi

    12/14/2022, 7:14 PM
    Hi, all of our servers are crashing with a SIGSEGV error at some point or the other. This is in kubernetes env.
    Copy code
    #
    # A fatal error has been detected by the Java Runtime Environment:
    #
    #  SIGSEGV (0xb) at pc=0x00007f64cd30b3f6, pid=1, tid=5336
    #
    # JRE version: OpenJDK Runtime Environment 18.9 (11.0.14.1+1) (build 11.0.14.1+1)
    # Java VM: OpenJDK 64-Bit Server VM 18.9 (11.0.14.1+1, mixed mode, sharing, tiered, compressed oops, g1 gc, linux-amd64)
    # Problematic frame:
    # J 19588 c2 java.nio.DirectByteBuffer.getInt(I)I java.base@11.0.14.1 (28 bytes) @ 0x00007f64cd30b3f6 [0x00007f64cd30b3a0+0x0000000000000056]
    #
    # Core dump will be written. Default location: /opt/pinot/core.1
    #
    # An error report file with more information is saved as:
    # /opt/pinot/hs_err_pid1.log
    x
    j
    p
    • 4
    • 19
  • b

    Bobby Richard

    12/14/2022, 8:39 PM
    Can the Pinot Flink connector be used with flink checkpointing enabled? I see PinotSinkFunction implements CheckpointedFunction but just throws an UnsupportedOperationException in the snapshotState function
    s
    x
    +2
    • 5
    • 34
  • s

    Shubham Kumar

    12/15/2022, 7:03 AM
    Hi team, We are testing both APPEND and REFRESH mode for offline batch ingestion. We have carried out some evaluations with
    OverwriteOutput: true
    flag. I have some fundamental doubts, suppose I have input s3 directory with few partitions lets say
    /1
    and `/2`: 1. For both APPEND and REFRESH, I ingested historical data from the input path using spark job and then before next run I remove a file from a partition (from partition
    /1
    ) , In this case, after the next run I have observed a. older segment files were not removed from staging directory or deep store, moreover segments corresponding to all the remaining files present in partition /1 got created again b. resulted in duplication of segments 2. Everything was as expected for REFRESH mode when we added a file to partition, added a new partition or removed entire partition. 3. For APPEND mode, when we added a file to a partition, all the segments corresponding to that partition were refreshed. though, we were expecting only one new segment. but it seems segment refresh is working on partition level? Is there any documentation on how is it decided which files to pick for segment refresh/creation? I have observed this behaviour for both APPEND and REFRESH mode . I doubt that this is the expected behaviour, can somebody please explain what is happening here. Is this use case supported by pinot? We are using this table config :
    Copy code
    executionFrameworkSpec:
      name: 'spark'
      segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.spark3.SparkSegmentGenerationJobRunner'
      segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.spark3.SparkSegmentTarPushJobRunner'
      segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.spark3.SparkSegmentUriPushJobRunner'
      segmentMetadataPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.spark3.SparkSegmentMetadataPushJobRunner'
    
    # Recommended to set jobType to SegmentCreationAndMetadataPush for production environment where Pinot Deep Store is configured
    jobType: SegmentCreationAndMetadataPush
    
    inputDirURI: '<s3://test-data/tpch-data/lineitem_dummy/parquet/>'
    includeFileNamePattern: 'glob:**/*.parquet'
    excludeFileNamePattern: 'glob:**/*.tmp'
    outputDirURI: '<s3://test-data/pinot/segment_stageII/li/append/>'
    overwriteOutput: true
    pinotFSSpecs:
      - scheme: s3
        className: org.apache.pinot.plugin.filesystem.S3PinotFS
        configs:
          region: ap-south-1
          accessKey: *******
          secretKey: ********************+5M+EX3
    recordReaderSpec:
      dataFormat: 'parquet'
      className: 'org.apache.pinot.plugin.inputformat.parquet.ParquetRecordReader'
    tableSpec:
      tableName: 'li_spark_inc'
      schemaURI: .............
      tableConfigURI: .............
    pinotClusterSpecs:
      - controllerURI: '<https://pinot.np.tech.in/>'
    pushJobSpec:
      pushAttempts: 2
      copyToDeepStoreForMetadataPush: true
      pushRetryIntervalMillis: 1000
    m
    s
    • 3
    • 4
  • m

    Mathieu Alexandre

    12/15/2022, 2:32 PM
    Hello unfortunately our pinot
    0.10.0
    can't pause stream ingestion like this. On your opinion, what would be the best way to get the same result (stop kafka broker maybe) ? i'd like to keep active reads on Pinot but stop modifications in order to copy datas without side effects
    m
    • 2
    • 7
  • a

    Aaron Weiss

    12/15/2022, 3:03 PM
    Hey, we're having a recurring issue while performing schema evolution. In the last case, I added 2 new columns to the immutable_events schema. We have a hybrid table for immutable_events, but after applying the schema changes, we get two problems when querying immutable_events_REALTIME (see screenshot below). There are 2 issues: 1. segments unavailable message: This one is a showstopper as you can't query the table at all. In this case, 4 segments are in this state, but I can't see a pattern for why those have an issue. The segment files are in our deep store in GCS just like all the other segments. These segments show as "Bad" in the Pinot UI. I can temporarily "fix" this issue, but running a segment reset on each of the unavailable segments. However, that only allows the table to be queryable temporarily (5-10 minutes), then the error comes back. After running a reset, I get the following message in the server logs:
    Failed to download segment immutable_events__1__1075__20221212T1720Z from deep store:
    . However, as I said, these segments are for sure in the deep store. 2. Invalid segment(s) / older schema message_:_ This one isn't horrible because you can still query the table, but that specific segment is unavailable as I understand it. Based on the schema evolution doc, reloading all segments should fix this. But after running and completing a reload all segments, this error message persists. In addition, I have found no way in the Pinot UI or Swagger command to determine which segment(s) are impacted.
    f
    k
    +3
    • 6
    • 62
1...656667...166Latest