https://pinot.apache.org/ logo
Join Slack
Powered by
# troubleshooting
  • l

    Larry Meadors

    10/13/2022, 4:58 PM
    this may be a bit of a stretch - i am running pinot with docker locally and trying to connect to it via jdbc and getting a SQLException
    r
    • 2
    • 16
  • m

    Mahesh babu

    10/17/2022, 11:06 AM
    Hi Team, we are creating pinot docker image where we are looking to store pinot logs into json format, may i know is this possible to do if yes how?
    r
    p
    • 3
    • 6
  • b

    Bruno Mendes

    10/17/2022, 5:38 PM
    Hi team 👋 I want to keep the realtime table configuration in a git repo but dont want to expose the kafka login credentials, is it possible to pass these on a jaas like configuration file?
    k
    x
    • 3
    • 24
  • t

    Thomas Steinholz

    10/17/2022, 6:40 PM
    Hi all, is there a way to use an ISO 8601 datetime string format format? I have the following datetime field spec defined:
    Copy code
    "dateTimeFieldSpecs" : [ {
        "name" : "time_string",
        "dataType" : "STRING",
        "format" : "1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd'T'HH:mm:ss.SSS",
        "granularity" : "1:MILLISECONDS"
      } ]
    however, getting the following error when running an Realtime to Offline segment task in the minion-stateless pod:
    Copy code
    java.lang.IllegalArgumentException: Invalid minTimeValue: 2022-09-26T14:46:40.760 for SimpleSegmentNameGenerator
    From the code, it seems like it is automatically creating a segment name based on the .toString method of the datetime field spec, which is outputting “2022-09-26T144640.760" and then gets rejected by the regex validator for matching the following regex expression
    .*[\\\\/:\\*?\"<>|].*
    for segment names. Is there a way to specify different name generation logic? Or do I have to ETL my offline data and update my real time data to publish with a different format? The data I loaded into Offline tables seperately automatically created the following example segment name
    <table name>_OFFLINE_2021-03-15-06_2022-08-01-14_11
    , seeming to automatically convert the semi-colons “:” to hyphens “-”. It does not seem like this logic is consistent between the batch load and Realtime to Offline Segment jobs.
    x
    j
    t
    • 4
    • 15
  • m

    Matthew Kerian

    10/17/2022, 6:45 PM
    Hello, I tried to add a table through the webpage and get an error that the table already exists, but it isn’t showing up in the list of tables. Any possible reasons for this?
    a
    b
    x
    • 4
    • 4
  • k

    Kevin Xu

    10/18/2022, 6:12 AM
    Hi all, Could someone help me find out Which reasons could cause RealtimeToOfflineSegmentsTask minion task canceled when running more than 2 hours?
    Copy code
    INFO [BaseMultipleSegmentsConversionExecutor] [TaskStateModelFactory-task_thread-0] RealtimeToOfflineSegmentsTask on table got canceled
    Thread in Slack Conversation
    f
    r
    • 3
    • 2
  • p

    Prakhar Pande

    10/18/2022, 11:18 AM
    Hi ! what could be the possible reason for the below error in controller logs.
    Copy code
    Caught 'java.net.ConnectException: Connection timed out (Connection timed out)' while executing: GET on URL: <http://100.64.49.183:8097/table/catalog_views_test_REALTIME/size>
    Caught 'java.net.ConnectException: Connection timed out (Connection timed out)' while executing: GET on URL: <http://100.64.4.104:8097/table/catalog_views_test_REALTIME/size>
    Connection error
    Thanks in advance.
    x
    • 2
    • 10
  • t

    Thomas Steinholz

    10/18/2022, 4:02 PM
    Hi all, what is the best way to deal with bad datetimes in REALTIME pinot tables? I have events that are sending invalid timestamps close to the epoch (1970), which is being flagged as an invalid date time value by the Realtime to Offline Segment task as it is using it to calculate the segments. The old time column that has unsanitized data in it is
    time_string
    while the new (empty) column I added is
    message_time
    . I have set the new
    message_time
    column as the time column of the table and defined table transforms and filters in both the realtime and the offline tables for the new time column to detect if the datetimes are valid and replace them with a secondary timestamp from the data row or a default value if all others are null when the timestamps are invalid, otherwise it just uses the value provided.
    Copy code
    "transformFunction": "Groovy({
        def default_time = '2014-01-01 00:00:00.000000';
        def t_ingest = new groovy.json.JsonSlurper().parseText(message_str).get('__prop.t_ingest');
        !time_string || time_string < default_time ?
            (!t_ingest ? default_time : new Date(Long.valueOf(t_ingest)).format(\"yyyy-MM-dd'T'HH:mm:ss.SSS\")) :
            time_string;
    }, time_string, message_str)"
    
    "filterFunction": "Groovy({
        message_time=='2014-01-01 00:00:00.000000'
    }, message_time)"
    however, when I run the Realtime to Offline job I get the following warnings/error:
    Copy code
    Default time: null does not comply with format: 1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd'T'HH:mm:ss.SSS, using current time: 2022-10-18T15:49:34.760 as the default time for table: uplinkpayloadevent_OFFLINE
    
    Caught exception while executing task: Task_RealtimeToOfflineSegmentsTask_a2250009-a3b5-47c0-b495-085d15b405cf_1666108161685_0
    java.lang.IllegalArgumentException: Invalid format: "null"
    	at org.joda.time.format.DateTimeParserBucket.doParseMillis(DateTimeParserBucket.java:187) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
    	at org.joda.time.format.DateTimeFormatter.parseMillis(DateTimeFormatter.java:826) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
    	at org.apache.pinot.spi.data.DateTimeFormatSpec.fromFormatToMillis(DateTimeFormatSpec.java:303) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
    	at org.apache.pinot.core.segment.processing.timehandler.EpochTimeHandler.handleTime(EpochTimeHandler.java:56) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
    	at org.apache.pinot.core.segment.processing.mapper.SegmentMapper.writeRecord(SegmentMapper.java:143) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
    	at org.apache.pinot.core.segment.processing.mapper.SegmentMapper.map(SegmentMapper.java:126) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
    	at org.apache.pinot.core.segment.processing.framework.SegmentProcessorFramework.process(SegmentProcessorFramework.java:96) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
    	at org.apache.pinot.plugin.minion.tasks.realtimetoofflinesegments.RealtimeToOfflineSegmentsTaskExecutor.convert(RealtimeToOfflineSegmentsTaskExecutor.java:163) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
    	at org.apache.pinot.plugin.minion.tasks.BaseMultipleSegmentsConversionExecutor.executeTask(BaseMultipleSegmentsConversionExecutor.java:165) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
    	at org.apache.pinot.plugin.minion.tasks.BaseMultipleSegmentsConversionExecutor.executeTask(BaseMultipleSegmentsConversionExecutor.java:62) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
    	at org.apache.pinot.minion.taskfactory.TaskFactoryRegistry$1.runInternal(TaskFactoryRegistry.java:113) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
    	at org.apache.pinot.minion.taskfactory.TaskFactoryRegistry$1.run(TaskFactoryRegistry.java:89) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
    	at org.apache.helix.task.TaskRunner.run(TaskRunner.java:75) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
    	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
    	at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
    	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) [?:?]
    	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
    	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
    	at java.lang.Thread.run(Thread.java:829) [?:?]
    I cannot tell if the error is because the transform is running and giving a null value (which shouldn’t be possible) or if this error is because the new time column is actually null and the transforms are not being run on the rows as they are not being ingested, just processed
    • 1
    • 1
  • t

    Thomas Steinholz

    10/18/2022, 5:57 PM
    Not sure if anyone is familiar with the Pinot Trino Connector, but I also have this problem querying the pinot table only through trino after making my pinot table hybrid (instead of just realtime). I am having issues with the pinot pinot connector after ingesting Offline Segments into my Pinot table, going from a Realtime only table to a hybrid Realtime and Offline table. I am able to query pinot directly, however, as soon as segments become available in the offline table, the trino pinot connector starts to raise SQL exceptions for some reason.. Example: Pinot passthrough works with trino:
    Copy code
    > SELECT * from pinot.default."SELECT * FROM uplinkpayloadevent WHERE time_string < '2022-01-01T00:00:00' ORDER BY time_string DESC LIMIT 300"
    [2022-10-18 13:45:13] 300 rows retrieved starting from 1 in 1 s 339 ms (execution: 1 s 138 ms, fetching: 201 ms)
    However, using trino itself fails:
    Copy code
    > SELECT * from pinot.default.uplinkpayloadevent WHERE time_string < '2022-01-01T00:00:00' ORDER BY time_string DESC
    [2022-10-18 13:46:13] 0 rows retrieved in 398 ms (execution: 333 ms, fetching: 65 ms)
    [2022-10-18 13:46:13] [65536] Query failed (#20221018_174613_00013_q2hjp): Caught exception while parsing query: SELECT "app_tok", "gatewayaddress", "message_str", "key_hash", "net_tok", "acctid", "id", "moduleaddress", "time_string", "key_range" FROM uplinkpayloadevent_REALTIME  WHERE time_string >= 2022-10-18T09:12:39.768 AND (("time_string" < '2022-01-01T00:00:00')) LIMIT 2147483647
    [2022-10-18 13:46:13] org.apache.calcite.sql.parser.babel.ParseException: Encountered "T09" at line 1, column 200.
    ... stack trace continues....
    x
    e
    • 3
    • 51
  • a

    Andy Cooper

    10/19/2022, 8:20 PM
    Hello - I need to build pinot v0.11.0 from source using JDK8 for the ingestion job on our EMR cluster. error:
    Copy code
    [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.8.0:compile (default-compile) on project pinot-fmpp-maven-plugin: Compilation failure -> [Help 1]
    org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.8.0:compile (default-compile) on project pinot-fmpp-maven-plugin: Compilation failure
        at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:215)
        at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:156)
        at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:148)
        at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:117)
        at org.apache.maven.lifecycle.internal.builder.multithreaded.MultiThreadedBuilder$1.call (MultiThreadedBuilder.java:200)
        at org.apache.maven.lifecycle.internal.builder.multithreaded.MultiThreadedBuilder$1.call (MultiThreadedBuilder.java:196)
        at java.util.concurrent.FutureTask.run (FutureTask.java:266)
        at java.util.concurrent.Executors$RunnableAdapter.call (Executors.java:511)
        at java.util.concurrent.FutureTask.run (FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:624)
        at java.lang.Thread.run (Thread.java:750)
    Caused by: org.apache.maven.plugin.compiler.CompilationFailureException: Compilation failure
        at org.apache.maven.plugin.compiler.AbstractCompilerMojo.execute (AbstractCompilerMojo.java:1219)
        at org.apache.maven.plugin.compiler.CompilerMojo.execute (CompilerMojo.java:188)
        at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo (DefaultBuildPluginManager.java:137)
        at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:210)
        at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:156)
        at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:148)
        at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:117)
        at org.apache.maven.lifecycle.internal.builder.multithreaded.MultiThreadedBuilder$1.call (MultiThreadedBuilder.java:200)
        at org.apache.maven.lifecycle.internal.builder.multithreaded.MultiThreadedBuilder$1.call (MultiThreadedBuilder.java:196)
        at java.util.concurrent.FutureTask.run (FutureTask.java:266)
        at java.util.concurrent.Executors$RunnableAdapter.call (Executors.java:511)
        at java.util.concurrent.FutureTask.run (FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:624)
        at java.lang.Thread.run (Thread.java:750)
    [ERROR]
    [ERROR]
    [ERROR] For more information about the errors and possible solutions, please read the following articles:
    [ERROR] [Help 1] <http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException>
    [ERROR]
    [ERROR] After correcting the problems, you can resume the build with the command
    [ERROR]   mvn <goals> -rf :pinot-fmpp-maven-plugin
    environment:
    Copy code
    openjdk version "11.0.16" 2022-07-19
    OpenJDK Runtime Environment (build 11.0.16+8-post-Ubuntu-0ubuntu118.04)
    OpenJDK 64-Bit Server VM (build 11.0.16+8-post-Ubuntu-0ubuntu118.04, mixed mode, sharing) Maven home: /usr/share/maven
    
    Java version: 11.0.16, vendor: Ubuntu, runtime: /usr/lib/jvm/java-11-openjdk-amd64
    Default locale: en, platform encoding: UTF-8
    OS name: "linux", version: "5.4.0-1088-aws", arch: "amd64", family: "unix"
    build command:
    Copy code
    mvn clean install -DskipTests -Pbin-dist -T 4  -Djdk.version=8 -X
    --- <edited to remove mvn version troubleshooting that was a red herring> I've tested compiling for both java8 and java11 at this point and always receiving the same error. Makes me think there is something wrong with the fmpp plugin, but I'm struggling to get past this since I'm lacking in java experience.
    m
    r
    x
    • 4
    • 11
  • a

    Alice

    10/20/2022, 3:58 AM
    Hi team, I have a question. Is a table schema with a string field of which “maxLength” is set to 10240 makes a table size much larger than the one without this field? Even if the value of this field in the first case for all records is null, and these two tables share all other table configs.
    m
    • 2
    • 4
  • s

    Shaun Sawyer

    10/20/2022, 5:30 PM
    Hi everyone, I would like to spin up the most basic, lightweight running pinot cluster to use for things like testing in CI. I was trying to run it in a single docker container where I can control the mem/cpu usage. I came across
    pinot-admin.sh StartServiceManager
    but have been unsuccessful in getting this up and running and the errors are not helpful. I know I need zookeeper running ahead of time. Can someone just provide a way to do this? It must be a common thing and yet I cannot find anything which brings up the controller, broker and server in the docs. I would expect something like
    pinot-admin.sh StartServiceManager -zkAddress localhost:2181 -clusterName PinotCluster -bootstrapServices CONTROLLER BROKER SERVER
    to just work, perhaps I am missing something.
    r
    • 2
    • 14
  • h

    harnoor

    10/20/2022, 6:19 PM
    Hi Experts. Need some help. Our Pinot queries are aggregation heavy and I have observed a lot of them are quite slow. All of the queries have a range filter in it like -
    ( start_time_millis >= 1666256876000 AND start_time_millis < 1666260935000 )
    where
    start_time_millis
    is the timeColumnName. Most of the queries have a range filter to get data for <last 6 hours. We added the Startree index to improve latency, however, we cannot leverage it since the segment size is big.
    max(start_time_millis) - min(start_time_millis)
    for a segment comes out to be > ~6 hours. All the segments have around ~6 hours gap for
    start_time_millis
    . If we don’t add
    start_time_millis
    in dimension split order, the startree index doesn’t get picked (as the segment’s time range is not the subset of the queried time range in most of the cases). And we cannot add
    start_time_millis
    in dimension split order due to high cardinality and it consumes a lot of disk space. We are looking to fix this problem. We want to leverage the startree index and hence are looking to reduce the number of Kafka partitions in order to reduce segment size. We want the segment size to be close to ~1 hour. Our tables have around ~40050 segments. Hence I wanted to know if decreasing the number of partitions is the right path and what can be other action items, we can perform to solve this problem.
    r
    j
    • 3
    • 3
  • m

    Matthew Kerian

    10/20/2022, 6:45 PM
    Hello, running into an issue when trying to implement a realtime table. Table is created successfully however is not ingesting. Logs show some errors:
    Copy code
    2022/10/20 18:38:57.780 WARN [AbstractDataCache] [HelixController-pipeline-task-pinot-1-(2a076887_TASK)] stat is null for key: /pinot-1/INSTANCES/Server_pinot-server-9.pinot-server-headless.pinot.svc.cluster.local_8098/CURRENTSTATES/1400000600f604bd/table_REALTIME
    2022/10/20 18:38:57.781 WARN [ZkBaseDataAccessor] [HelixController-pipeline-task-pinot-1-(2a076887_TASK)] Fail to read record for paths: {/pinot-1/INSTANCES/Server_pinot-server-9.pinot-server-headless.pinot.svc.cluster.local_8098/CURRENTSTATES/1400000600f604bd/table_REALTIME=-101}
    2022/10/20 18:38:57.781 WARN [AbstractDataCache] [HelixController-pipeline-task-pinot-1-(2a076887_TASK)] znode is null for key: /pinot-1/INSTANCES/Server_pinot-server-9.pinot-server-headless.pinot.svc.cluster.local_8098/CURRENTSTATES/1400000600f604bd/tble_REALTIME
    2022/10/20 18:39:52.048 WARN [TopStateHandoffReportStage] [HelixController-pipeline-default-pinot-1-(c50faee9_DEFAULT)] Event c50faee9_DEFAULT : Cannot confirm top state missing start time. Use the current system time as the start time.
    2022/10/20 18:39:52.136 WARN [TopStateHandoffReportStage] [HelixController-pipeline-default-pinot-1-(05e06be1_DEFAULT)] Event 05e06be1_DEFAULT : Cannot confirm top state missing start time. Use the current system time as the start time.
    Curious of any debugging steps I can take here.
    r
    • 2
    • 3
  • s

    suraj sheshadri

    10/20/2022, 9:26 PM
    i am using apache-pinot-0.11.0-SNAPSHOT-bin.. according to the documentation https://docs.pinot.apache.org/v/release-0.11.0/users/user-guide-query/scalar-functions I configured the “queryConfig” : { “disableGroovy”: false } but seeing this error {“unrecognizedProperties”{“/queryConfig/disableGroovy”false},“status”:“Table offlinebookingwide_poc_OFFLINE successfully added”} does this property work or is there any issue
    p
    m
    t
    • 4
    • 15
  • v

    Viper

    10/21/2022, 8:58 AM
    Hey folks, wanted to get some insights on SQL injection mitigation while using some user inputs. Has anyone come across this? Considering prepared statement/ parameterized queries are not supported on pinot.
  • t

    Thomas Steinholz

    10/21/2022, 2:59 PM
    Hello team, having some difficulties with the
    pinot-minion-statless
    pod in kubernetes, seems to be getting evicted after starting a Realtime to Offline Segment Job, with the following error:
    The node was low on resource: ephemeral-storage. Container minion-stateless was using 5774600Ki, which exceeds its request of 0.
    I am using S3 as a Deep Store and have the following pinot minion config:
    Copy code
    pinot.minion.port=9514
    dataDir=/var/pinot/minion/data
    pinot.set.instance.id.to.hostname=true
    pinot.minion.storage.factory.class.s3=org.apache.pinot.plugin.filesystem.S3PinotFS
    pinot.minion.storage.factory.s3.region=us-east-1
    pinot.minion.segment.fetcher.protocols=file,http,s3
    pinot.minion.segment.fetcher.s3.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher
    Should my dataDir also be an S3 directory? I do not see any information about this in the S3 Deep Storage Documentation page (only for the config data dir). https://docs.pinot.apache.org/users/tutorials/use-s3-as-deep-store-for-pinot It does not look like the helm chart supplies any volume to the stateless minion, so it doesn’t seem like I am able to just increase the volume size for the pod
    x
    m
    • 3
    • 13
  • t

    Thomas Steinholz

    10/24/2022, 7:37 PM
    Hi, having a similar but different issue with the ‘standalone’ BatchIngestionJob, where ultimately the pod runs out of ephemeral storage (requires downloading more than 10G of data:
    The node was low on resource: ephemeral-storage. Container pinot-job-batch-ingestion was using 1944724Ki, which exceeds its request of 0.
    ). I have actually mounted a persistent volume to the pod executing this job, but it does not seem to be using it. I am currently mounting it at
    /var/pinot/minion/data
    and
    /var/pinot/server/data
    but this is not working for either. What directory should this volume be mounted to so that the BatchIngestJob uses the volume instead of the ephemeral storage? As a secondary question, is there a simpler way to do this within the kubernetes cluster running Pinot? Or is the standard way to utilize an external Spark Cluster with the custom compiled pinot image?
    m
    h
    k
    • 4
    • 7
  • a

    Alice

    10/25/2022, 3:40 AM
    Hi team, I’ve a question about startree index size. Does the order of columns listed in dimensionsSplitOrder make a large difference in startree index size?
    k
    • 2
    • 2
  • a

    Alice

    10/25/2022, 5:35 AM
    Hi team, what does this warn usually mean? WARN [ClientCnxn] [Start a Pinot [SERVER]-SendThread(pinot-zookeeper:2181)] Client session timed out, have not heard from server in 29068ms for sessionid 0x1005d5cdfda0003
    m
    • 2
    • 1
  • s

    Sukesh Boggavarapu

    10/25/2022, 5:16 PM
    What is the default
    acks
    for real time tables reading from kafka? Is it
    acks=all
    ?
    t
    • 2
    • 2
  • a

    Ajay Chintala

    10/25/2022, 6:27 PM
    Hello team.. we are using
    BIG_DECIMAL
    metric fields in our schema with a hybrid table and have a
    RealtimeToOfflineSegmentsTask
    to move segments from realtime to offline table. We are hitting an exception
    ava.lang.IllegalStateException: Unsupported SV stored type: BIG_DECIMAL at org.apache.pinot.core.segment.processing.genericrow.GenericRowSerializer.serialize(GenericRowSerializer.java:108)
    in the job.. Looking at the code https://github.com/apache/pinot/blob/6fef2108098dfae4173b104aa5e5e221cc89dc9e/pino[…]ot/core/segment/processing/genericrow/GenericRowSerializer.java, I don't see support for
    BIG_DECIMAL
    .. any idea if we are missing some config to not hit this?
    t
    j
    • 3
    • 9
  • t

    Thomas Steinholz

    10/27/2022, 2:22 PM
    Hi all, I am using the Pinot connector for Trino and for the exact same query pinot is able to query in 53 ms but it takes trino over 46 seconds to return the same result. Would anyone know why there is such a discrepancy there?
    m
    x
    e
    • 4
    • 16
  • t

    Thomas Steinholz

    10/27/2022, 6:02 PM
    I seem to have issues querying table bach ingested into OFFLINE segments.. I have data for the past year, but can only query data back a few days. When I do a
    select count(*)
    for the whole table, it says there are 12,391,295 records, yet even in the query stats it says the total number of docs is 333,029,029. The query stats also say that only 344 of the segments match
    *
    yet that is not even half of the real time segments and I am assuming none of the offline segments
    m
    • 2
    • 16
  • r

    reallyonthemove tous

    10/27/2022, 6:52 PM
    Hi folks, I am seeing unexpected behaviour while testing the mergeandrollup task. I see that the table originally had 3 segments, and after the MergeAndRollup task ran, it now has 5. The query results before and after the merge are correct. But I was expected the segment count to go down automatically. Is that the right assumption? I see the following exception in the minion logs. Caught exception while executing task: Task_MergeRollupTask_3735cc4d-aea0-4bb3-915d-aaeee735b4f1_1666896120043_0 org.apache.pinot.common.exception.HttpErrorStatusException: Got error status code: 500 (Internal Server Error) with reason: "Any segments from 'segmentsTo' should not be available in the table at this point. (tableName = 'videocollection_OFFLINE', segmentsFrom = '[merged_1d_1666896060010_0_videocollection_2022-09-19_2022-09-19_0]', segmentsTo = '[merged_1d_1666896120028_0_videocollection_2022-09-19_2022-09-19_0]', segmentsFromTable = '[merged_1d_1666896120028_0_videocollection_2022-09-19_2022-09-19_0, merged_1d_1666896060010_0_videocollection_2022-09-19_2022-09-19_0]')" while sending request: http://pinot-controller-0.pinot-controller-headless.pinot-quickstart.svc.cluster.local:9000/segments/videocollection/startReplaceSegments?type=OFFLINE&amp;forceCleanup=true to controller: pinot-controller-0.pinot-controller-headless.pinot-quickstart.svc.cluster.local, version: Unknown at org.apache.pinot.common.utils.http.HttpClient.wrapAndThrowHttpException(HttpClient.java:442) ~[pinot-all-0.12.0-SNAPSHOT-jar-with-dependencies.jar:0.12.0-SNAPSHOT-6ffb9718e40dcc65a50b1c8854904a04a0f241b8] at org.apache.pinot.common.utils.FileUploadDownloadClient.startReplaceSegments(FileUploadDownloadClient.java:945) ~[pinot-all-0.12.0-SNAPSHOT-jar-with-dependencies.jar:0.12.0-SNAPSHOT-6ffb9718e40dcc65a50b1c8854904a04a0f241b8] at org.apache.pinot.plugin.minion.tasks.SegmentConversionUtils.startSegmentReplace(SegmentConversionUtils.java:144) ~[pinot-all-0.12.0-SNAPSHOT-jar-with-dependencies.jar:0.12.0-SNAPSHOT-6ffb9718e40dcc65a50b1c8854904a04a0f241b8] at org.apache.pinot.plugin.minion.tasks.SegmentConversionUtils.startSegmentReplace(SegmentConversionUtils.java:130) ~[pinot-all-0.12.0-SNAPSHOT-jar-with-dependencies.jar:0.12.0-SNAPSHOT-6ffb9718e40dcc65a50b1c8854904a04a0f241b8] at org.apache.pinot.plugin.minion.tasks.BaseMultipleSegmentsConversionExecutor.preUploadSegments(BaseMultipleSegmentsConversionExecutor.java:117) ~[pinot-all-0.12.0-SNAPSHOT-jar-with-dependencies.jar:0.12.0-SNAPSHOT-6ffb9718e40dcc65a50b1c8854904a04a0f241b8] at org.apache.pinot.plugin.minion.tasks.BaseMultipleSegmentsConversionExecutor.executeTask(BaseMultipleSegmentsConversionExecutor.java:228) ~[pinot-all-0.12.0-SNAPSHOT-jar-with-dependencies.jar:0.12.0-SNAPSHOT-6ffb9718e40dcc65a50b1c8854904a04a0f241b8] at org.apache.pinot.plugin.minion.tasks.BaseMultipleSegmentsConversionExecutor.executeTask(BaseMultipleSegmentsConversionExecutor.java:65) ~[pinot-all-0.12.0-SNAPSHOT-jar-with-dependencies.jar:0.12.0-SNAPSHOT-6ffb9718e40dcc65a50b1c8854904a04a0f241b8] at org.apache.pinot.minion.taskfactory.TaskFactoryRegistry$1.runInternal(TaskFactoryRegistry.java:121) [pinot-all-0.12.0-SNAPSHOT-jar-with-dependencies.jar:0.12.0-SNAPSHOT-6ffb9718e40dcc65a50b1c8854904a04a0f241b8] at org.apache.pinot.minion.taskfactory.TaskFactoryRegistry$1.run(TaskFactoryRegistry.java:95) [pinot-all-0.12.0-SNAPSHOT-jar-with-dependencies.jar:0.12.0-SNAPSHOT-6ffb9718e40dcc65a50b1c8854904a04a0f241b8] at org.apache.helix.task.TaskRunner.run(TaskRunner.java:75) [pinot-all-0.12.0-SNAPSHOT-jar-with-dependencies.jar:0.12.0-SNAPSHOT-6ffb9718e40dcc65a50b1c8854904a04a0f241b8] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?] at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) [?:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?] at java.lang.Thread.run(Thread.java:829) [?:?]
    m
    h
    s
    • 4
    • 46
  • p

    Priyank Bagrecha

    10/28/2022, 7:45 AM
    seeing this in server logs
    Copy code
    Slow query: request handler processing time: 518, send response latency: 0, total time to handle request: 518
    
    Processed requestId=1198,table=offlinebookingwide_main_OFFLINE,segments(queried/processed/matched/consumingQueried/consumingProcessed/consumingMatched/invalid/limit/value)=9/9/9/-1/0/0/0/0/0,schedulerWaitMs=508,reqDeserMs=0,totalExecMs=63,resSerMs=0,totalTimeMs=571,minConsumingFreshnessMs=-1,broker=Broker_pinot-offline-broker-1.pinot-offline-broker-headless.de-nrt-pinot.svc.cluster.local_8099,numDocsScanned=15836,scanInFilter=0,scanPostFilter=981832,sched=FCFS,threadCpuTimeNs(total/thread/sysActivity/resSer)=0/0/0/0
    
    Processed requestId=1199,table=offlinebookingwide_main_OFFLINE,segments(queried/processed/matched/consumingQueried/consumingProcessed/consumingMatched/invalid/limit/value)=6/6/6/-1/0/0/0/0/0,schedulerWaitMs=510,reqDeserMs=1,totalExecMs=59,resSerMs=0,totalTimeMs=570,minConsumingFreshnessMs=-1,broker=Broker_pinot-offline-broker-1.pinot-offline-broker-headless.de-nrt-pinot.svc.cluster.local_8099,numDocsScanned=10476,scanInFilter=0,scanPostFilter=649512,sched=FCFS,threadCpuTimeNs(total/thread/sysActivity/resSer)=0/0/0/0
    
    Slow query: request handler processing time: 571, send response latency: 0, total time to handle request: 571
    
    Slow query: request handler processing time: 570, send response latency: 0, total time to handle request: 570
    
    Processed requestId=1197,table=offlinebookingwide_main_OFFLINE,segments(queried/processed/matched/consumingQueried/consumingProcessed/consumingMatched/invalid/limit/value)=10/10/10/-1/0/0/0/0/0,schedulerWaitMs=501,reqDeserMs=1,totalExecMs=72,resSerMs=0,totalTimeMs=574,minConsumingFreshnessMs=-1,broker=Broker_pinot-offline-broker-1.pinot-offline-broker-headless.de-nrt-pinot.svc.cluster.local_8099,numDocsScanned=17615,scanInFilter=0,scanPostFilter=1092130,sched=FCFS,threadCpuTimeNs(total/thread/sysActivity/resSer)=0/0/0/0
    
    Slow query: request handler processing time: 574, send response latency: 0, total time to handle request: 574
    and seeing query latency in seconds on the client side. neither broker or server cpu is more than 10-15%. what should i look at to try and debug slow query performance?
  • k

    Kishore G

    10/28/2022, 7:50 AM
    do you have presto/trino or is the response too big?
  • p

    Priyank Bagrecha

    10/28/2022, 7:51 AM
    i am issuing queries via http calls to broker sql endpoint
  • p

    Priyank Bagrecha

    10/28/2022, 7:51 AM
    Copy code
    "select count(*) from offlinebookingwide_main where hour > 14"
  • p

    Priyank Bagrecha

    10/28/2022, 7:51 AM
    query is literally that
1...596061...166Latest