https://pinot.apache.org/ logo
Join Slack
Powered by
# troubleshooting
  • g

    Grace Lu

    02/07/2022, 8:33 PM
    Hi team, I ran into a weird issue when querying pinot with trino. I am at pinot 0.10.0 (tried switch to 0.9.2, 0.8.0 and the error remains), trino v369, I have a table called
    usage_test
    which showing good state in pinot and queryable through the pinot console, but when I run simple select query in Trino, like
    select * from pinot_1.default.usage_test limit 10
    , query will fail with error like
    null value in entry: Server_pinot-server-2.pinot-server-headless.pinot.svc.cluster.local_8098=null
    , but when I run aggregation queries like
    select count(*), hostname from pinot_1.default.usage_test group by hostname
    query can succeed in Trino without issue. Does anyone have clue about this issue? Thanks in advance! 🙏
    ✅ 1
    m
    x
    • 3
    • 13
  • s

    Sergii Balganbaiev

    02/07/2022, 8:36 PM
    Hi, seems like there is regression bug with executing query under the load. Created issues about it - https://github.com/apache/pinot/issues/8156
    r
    • 2
    • 4
  • l

    Luis Fernandez

    02/07/2022, 10:22 PM
    does anyone have experience setting up the offline pinot managed flows? I have 2 questions… 1. is this the same as moving completed realtime segments to offline and is this config required for it to work
    Copy code
    },
      "tenants": {
        "broker": "DefaultTenant",
        "server": "DefaultTenant",
        "tagOverrideConfig": {
          "realtimeCompleted": "DefaultTenant_OFFLINE"
        }
      }
    2. is there anywhere I can see a log that this is in fact working I have setup the configs but I’m unsure as to how to tell it’s doing what it’s supposed to be doing 3. documentation is a little bit misleading in the sense of the new updates that we have done to pinot as well as different examples doing different things that are not explained in the documentation
    m
    n
    • 3
    • 127
  • r

    Ravishankar Nair

    02/08/2022, 4:54 PM
    2022/02/08 114843.990 INFO [SecurityManager] [main] SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ravishankar); groups with view permissions: Set(); users with modify permissions: Set(ravishankar); groups with modify permissions: Set() Exception in thread "main" java.lang.NoSuchMethodException: org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand.main([Ljava.lang.String;) at java.base/java.lang.Class.getMethod(Class.java:2108) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:42) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:855) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:930) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:939) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 2022/02/08 114844.084 INFO [ShutdownHookManager] [Thread-0] Shutdown hook called
    a
    x
    • 3
    • 51
  • s

    Shadab Anwar

    02/09/2022, 10:01 AM
    I am trying to setup pinot deep store, Controller config is this , controller.data.dir=s3://shipment-pinot-dev/pinot-data/pinot-s3-dev/controller-data pinot.controller.storage.factory.class.s3=org.apache.pinot.plugin.filesystem.S3PinotFS pinot.controller.storage.factory.s3.region=us-west-2 pinot.controller.segment.fetcher.protocols=file,http,s3 pinot.controller.segment.fetcher.s3.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher pinot.controller.storage.factory.s3.accessKey=************************ pinot.controller.storage.factory.s3.secretKey=*************************** , The server config is this pinot.server.storage.factory.class.s3=org.apache.pinot.plugin.filesystem.S3PinotFS pinot.server.storage.factory.s3.region=us-west-2 pinot.server.segment.fetcher.protocols=file,http,s3 pinot.server.segment.fetcher.s3.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher pinot.server.storage.factory.s3.accessKey=***************************** pinot.server.storage.factory.s3.secretKey=2q**************************** i restarted my controller , i have some tables created and they have data. I am seeing no data being pushed to my s3 and no logs reletaed to this . Anything that i need to add ????
    d
    m
    n
    • 4
    • 9
  • a

    Ayush Kumar Jha

    02/09/2022, 11:44 AM
    Hi can anyone help me in getting alerts if any of the node goes down in pinot through prometheus metrics that pinot participants emit as well as for any query exceptions
    m
    • 2
    • 2
  • s

    Sandeep R

    02/09/2022, 11:04 PM
    Hi, any idea, why simple count is failing to print output? and the screen is closing of automatically.
    Copy code
    select count(*) from uapi-testing;
    m
    j
    r
    • 4
    • 10
  • a

    Awadesh Kumar

    02/10/2022, 7:11 AM
    Hi Team, We are onboarding on Pinot. Faced a strange issue that two tables out of 4 in Pinot got deleted for an unknown reason. We might have done something wrong. Any idea about the possible reason for tables deletion. Also any logs I can refer to check this deletion operation. Any help would be appreciated. Thanks
    r
    • 2
    • 6
  • a

    Anish Nair

    02/10/2022, 8:25 AM
    Hey guys, Regarding presto-pinot. Trying to run the following query:
    select b.entity_name, sum(metric1) metric1
    from pinot_hybrid_table a
    join dim_test_table b
    ON a.entity_id = b.entity_id
    WHERE a.time_column = '2022020800'
    group by b.entity_name;
    getting following error on presto cli: Query 20220210_075959_00003_9hwvs failed: Unsupported data table version: 3 Can someone help?
    r
    r
    x
    • 4
    • 10
  • g

    Grace Walkuski

    02/10/2022, 4:57 PM
    Hello! What does it mean when
    -Infinity
    is returned?
    m
    k
    y
    • 4
    • 13
  • a

    Anand Sainath

    02/10/2022, 5:57 PM
    Hey folks, excited to try out Pinot. I’m having issues understanding how I can ingest data (specifically parquet data). Trying out a POC where I’m bringing up a vanilla pinot cluster up (based on these docs). I’m trying to ingest a single parquet file using the ingestFromFile API.
    k
    m
    x
    • 4
    • 19
  • l

    Luis Fernandez

    02/10/2022, 8:48 PM
    hey friends, I have a question regarding the offset consumption from pinot, we use the lowlevel setup, in case we say… disabled the table for sometime would pinot resume from where it left off? how does the
    smallest
    and
    largest
    options affect pinot, what if we needed to reprocess data for some reason that was corrupted and needed to make it reprocess some of it, would the better route here go thru an offline table and somehow sync it from another source? are there any capabilities for realtime tables around this?
    m
    • 2
    • 1
  • j

    James Mnatzaganian

    02/10/2022, 9:03 PM
    Similar to an above thread - I'm trying to convert parquet to a segment. The data is in S3. I'm using Amazon EMR 6.4, which has Spark 3.1.2 (note not OSS Spark, but an Amazon fork), and EMR runs Java 8. I built Pinot 0.9.3 from source to target Java 8
    mvn install package -DskipTests -Pbin-dist -Djdk.version=8
    . After building, I took the contents of
    pinot-distribution/target/apache-pinot-0.9.3-bin.tar.gz
    and put them on HDFS (enabling all nodes to have access to the jars). I've been following this doc for general guidance. Running
    spark-submit
    results in
    Exception in thread "main" java.lang.NoSuchMethodException: org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand.main
    . Any advice? Is there a way to validate that my build is valid? My current thought is that it's either a bad build or I need to push the jars to each node and reference them locally instead of through HDFS.
    m
    x
    +2
    • 5
    • 21
  • t

    troywinter

    02/11/2022, 4:20 AM
    Is the Presto Pinot connector will not get update anymore? We are trying to switch from Presto to Trino, but some sql are using much more memories than Presto. Like this one, it only cost 140M memory on Presto, but on Trino it will OOM.
    Copy code
    WITH tmp_table_vmrss_inc_steps AS (
    		WITH t1 AS (
    				SELECT device_sn, event_time, ts, vmrss
    					, row_number() OVER (PARTITION BY device_sn, event_time ORDER BY ts ASC) AS rank
    				FROM (
    					SELECT format_datetime(from_unixtime((ts - ts % (60 * 60 * 1000)) / 1000 + 8 * 60 * 60), 'yyyy-MM-dd HH:mm:00') AS event_time
    						, ts, device_sn, vmrss
    					FROM pad_streamer_track_info
    					WHERE ts >= 1644494400000
    						AND ts < 1644498000000
    				) t
    			), 
    			t2 AS (
    				SELECT device_sn, event_time, ts, vmrss
    					, rank - 1 AS rank
    				FROM t1
    				WHERE rank > 1
    			)
    		SELECT device_sn, event_time
    			, sum(CASE 
    				WHEN diff > 0 THEN 1
    				ELSE 0
    			END) AS vmrss_inc_steps
    		FROM (
    			SELECT t2.ts, t2.device_sn, t2.event_time, t2.vmrss - t1.vmrss AS diff
    			FROM t2
    				LEFT JOIN t1
    				ON t1.device_sn = t2.device_sn
    					AND t1.event_time = t2.event_time
    					AND t2.rank = t1.rank
    		) a
    		GROUP BY device_sn, event_time
    	), 
    	tmp_table_device_report_num AS (
    		SELECT device_sn, event_time, count(vmrss) AS device_report_num
    		FROM (
    			SELECT format_datetime(from_unixtime((ts - ts % (60 * 60 * 1000)) / 1000 + 8 * 60 * 60), 'yyyy-MM-dd HH:mm:00') AS event_time
    				, ts, device_sn, vmrss
    			FROM pad_streamer_track_info
    			WHERE ts >= 1644494400000
    				AND ts < 1644498000000
    		) t
    		GROUP BY device_sn, event_time
    	), 
    	all_table AS (
    		SELECT device_sn, event_time
    		FROM (
    			SELECT device_sn, event_time
    			FROM tmp_table_vmrss_inc_steps
    			UNION ALL
    			SELECT device_sn, event_time
    			FROM tmp_table_device_report_num
    		)
    		GROUP BY device_sn, event_time
    	)
    SELECT all_table.device_sn, all_table.event_time, coalesce(tmp_table_vmrss_inc_steps.vmrss_inc_steps, 0) AS vmrss_inc_steps
    	, coalesce(tmp_table_device_report_num.device_report_num, 0) AS device_report_num
    FROM all_table
    	LEFT JOIN tmp_table_vmrss_inc_steps
    	ON all_table.device_sn = tmp_table_vmrss_inc_steps.device_sn
    		AND all_table.event_time = tmp_table_vmrss_inc_steps.event_time
    	LEFT JOIN tmp_table_device_report_num
    	ON all_table.device_sn = tmp_table_device_report_num.device_sn
    		AND all_table.event_time = tmp_table_device_report_num.event_time
    LIMIT 1000
    m
    e
    n
    • 4
    • 8
  • s

    Shivam Sajwan

    02/11/2022, 7:41 AM
    does anyone know how to change the default values of dimension field spec values in pinot?
    ✅ 1
    m
    • 2
    • 2
  • d

    Diana Arnos

    02/11/2022, 10:42 AM
    Did anyone ever had trouble building the docker images through the scripts like explained here in the docs? Every time I try to do that it fails while still downloading maven packages, but it does not give me a reason. And it always fails at a different moment - so not a problem from a specific repo or package. I'm suspecting the problem lies inside my own laptop, but I can't exactly figure out what. Example of the last time I tried running `docker-build.sh`:
    Copy code
    #13 256.9 Downloading from central: <https://repo.maven.apache.org/maven2/org/codehaus/groovy/groovy-all/2.4.21/groovy-all-2.4.21.jar>
    Progress (4): 0.8/1.4 MB | 349 kB | 426/588 kB | 565/632 kB
    #13 257.0 [output clipped, log limit 1MiB reached]
    ------
    executor failed running [/bin/sh -c git clone ${PINOT_GIT_URL} ${PINOT_BUILD_DIR} &&     cd ${PINOT_BUILD_DIR} &&     git checkout ${PINOT_BRANCH} &&     mvn install package -DskipTests -Pbin-dist -Pbuild-shaded-jar -Dkafka.version=${KAFKA_VERSION} -Djdk.version=${JDK_VERSION} &&     mkdir -p ${PINOT_HOME}/configs &&     mkdir -p ${PINOT_HOME}/data &&     cp -r pinot-distribution/target/apache-pinot-*-bin/apache-pinot-*-bin/* ${PINOT_HOME}/. &&     chmod +x ${PINOT_HOME}/bin/*.sh]: exit code: 1
    r
    m
    • 3
    • 9
  • a

    Anand Sainath

    02/14/2022, 6:02 AM
    Is there a way to only select the first value of a multi-valued column?
    m
    k
    • 3
    • 5
  • a

    Anand Sainath

    02/14/2022, 6:44 AM
    I’ve a query where I’m bucketing by values in a column. I want to calculate the percentage of rows that contain said value for that column as compared to all the rows. Essentially, want to do something like
    Copy code
    SELECT Count(*), SUM(COUNT(*)) FROM ... GROUP BY COL_1
    The above syntax seems invalid for Pinot/calcite. What’s the best way to express this?
    j
    • 2
    • 2
  • a

    Ali Atıl

    02/14/2022, 1:11 PM
    Hello everyone, Does Pinot store pre-aggregated results (for
    sum
    ,
    max
    ,
    min
    , etc) for each column defined in metricFieldsSpec for offline table segments? Can i achieve pre-aggregation for a single column without Star-tree index?
    m
    k
    • 3
    • 3
  • d

    Diana Arnos

    02/14/2022, 2:26 PM
    Aaaand here I am again 😄 While deploying to staging we found out a problem that blocks a helm chart upgrade - this is bad because it means we have to uninstall and re-install the deployment everytime we want to upgrade stuff on k8s. Looks like someone already opened up a PR to fix this, but it never got merged. Can someone take a look into it or tell us how can we help? https://github.com/apache/pinot/pull/7177
    m
    x
    • 3
    • 2
  • a

    Ayush Kumar Jha

    02/14/2022, 4:12 PM
    Hello everyone,I am not able to see the metrics data like
    Partial Responses
    in my prometheus metrics that are getting emitted.Can anyone tell me what I am doing wrong here??
    m
    t
    • 3
    • 14
  • p

    Prashant Korade

    02/14/2022, 4:58 PM
    Hey Team, Need some guidance regarding rebalance realtime servers. We are trying some operational test cases in dev environment. We have 4 server cluster , we have one realtime table which is consuming data from kafka topic having 4 partitions. Real time table has replication factor 2. Now lets say as part of test case one of our server goes down(We stopped it for testing purpose) and we are not able to start same server but we can add one additional server to cluster. Table goes in BAD state (External View != IDEAL STATE) How can we rebalance real time table, i.e can we move completed segments to newly added server to make it balanced (i.e change IDEAL state to include new server but drop previous server from IDEAL STATE which is DEAD)? How can we recreate CONSUMING segment ( hope this will be recreated from checkpoint saved as part of replica). Appreciate any suggestions regarding rebalancing REAL TIME servers/tables.
    n
    l
    • 3
    • 8
  • j

    James Mnatzaganian

    02/14/2022, 8:05 PM
    I'm having some of my snappy compressed parquet files failing to convert to segments. It seems to succeed on most of the files, but some are hitting a null pointer exception. Any ideas on what would cause this? I'm not sure if this is a bug, an issue with my underlying data, or an issue with my schema. The stack trace is below:
    Copy code
    Failed to generate Pinot segment for file - <s3://REDACTED.snappy.parquet>
    java.lang.NullPointerException: null
    	at shaded.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:770) ~[pinot-all-0.9.3-jar-with-dependencies.jar:0.9.3-e23f213cf0d16b1e9e086174d734a4db868542cb]
    	at org.apache.pinot.segment.local.utils.CrcUtils.getAllNormalFiles(CrcUtils.java:63) ~[pinot-all-0.9.3-jar-with-dependencies.jar:0.9.3-e23f213cf0d16b1e9e086174d734a4db868542cb]
    	at org.apache.pinot.segment.local.utils.CrcUtils.forAllFilesInFolder(CrcUtils.java:52) ~[pinot-all-0.9.3-jar-with-dependencies.jar:0.9.3-e23f213cf0d16b1e9e086174d734a4db868542cb]
    	at org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.handlePostCreation(SegmentIndexCreationDriverImpl.java:314) ~[pinot-all-0.9.3-jar-with-dependencies.jar:0.9.3-e23f213cf0d16b1e9e086174d734a4db868542cb]
    	at org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.build(SegmentIndexCreationDriverImpl.java:258) ~[pinot-all-0.9.3-jar-with-dependencies.jar:0.9.3-e23f213cf0d16b1e9e086174d734a4db868542cb]
    	at org.apache.pinot.plugin.ingestion.batch.common.SegmentGenerationTaskRunner.run(SegmentGenerationTaskRunner.java:119) ~[pinot-all-0.9.3-jar-with-dependencies.jar:0.9.3-e23f213cf0d16b1e9e086174d734a4db868542cb]
    	at org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner.lambda$submitSegmentGenTask$1(SegmentGenerationJobRunner.java:263) ~[pinot-batch-ingestion-standalone-0.9.3-shaded.jar:0.9.3-e23f213cf0d16b1e9e086174d734a4db868542cb]
    	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?]
    	at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
    	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
    	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
    	at java.lang.Thread.run(Thread.java:833) [?:?]
    k
    m
    • 3
    • 14
  • j

    Jagannath Timma

    02/15/2022, 10:13 PM
    From what I understand, pinot does not have ability to recover from this condition itself.
    m
    l
    +3
    • 6
    • 66
  • p

    Peter Pringle

    02/16/2022, 11:18 AM
    Any guidance on how to fix a missing segment? Table shows up as bad debug endpoint gives this. See this error in table debug "Did not get any response from servers for segment:" Data is coming realtime from Kafka; so assume there must be a way to reprocess this segment.
    r
    m
    • 3
    • 10
  • j

    Jacob M

    02/16/2022, 7:45 PM
    hi! i am trying to store a column with a serialized HLL. i wrote a small spark app for an offline table and i'm using
    com.clearspring.analytics
    and generating my HLL like this:
    Copy code
    val hll = new HyperLogLog(12)
    hll.offer(rows.filter(p => p.customer.isDefined).map(p => p.customer.get))
    val serializedHll = ObjectSerDeUtils.HYPER_LOG_LOG_SER_DE.serialize(hll)
    it seems to ~work ok but when i generate segments i'm getting some errors that I'm not really sure how to debug. any thoughts?
    Copy code
    java.lang.RuntimeException: Caught exception while de-serializing HyperLogLog
    Suppressed: java.lang.NullPointerException	atorg.apache.pinot.segment.local.startree.v2.builder.OffHeapSingleTreeBuilder.close(OffHeapSingleTreeBuilder.java:346)
    at org.apache.pinot.segment.local.startree.v2.builder.MultipleTreesBuilder.build(MultipleTreesBuilder.java:143)
    and when i look at where the null pointer is coming from, it seems to be from
    _starTreeRecordBuffer.close()
    which seems odd?
    m
    • 2
    • 6
  • s

    Sumit Lakra

    02/17/2022, 5:43 AM
    Hi ! I want to understand how Pinot decides whether to use hostnames or IP addresses in a component’s name
    m
    p
    • 3
    • 9
  • k

    kaivalya apte

    02/17/2022, 10:08 AM
    Hey 👋 , I am new to Apache Pinot and playing around with it atm. I found that listing the consumer group on my kafka broker doesn’t list pinot consumer.
    group.id
    I see on the pinot server logs is
    null
    . Is there a config I could set to have an assigned group id? I tried setting
    stream.kafka.consumer.prop.group.id
    on the
    streamConfigs
    , but it didn’t work. Thanks
    p
    n
    • 3
    • 4
  • k

    kaivalya apte

    02/17/2022, 10:25 AM
    Another question. I keep getting the following error while querying.
    Copy code
    [
      {
        "message": "null:\n6 segments [pinotemails__72__62__20220217T0806Z, pinotemails__89__59__20220217T0716Z, pinotemails__38__61__20220217T0817Z, pinotemails__4__61__20220217T0817Z, pinotemails__21__62__20220217T0737Z, pinotemails__55__63__20220217T0811Z] unavailable",
        "errorCode": 305
      }
    ]
    I could see the segment from the rebalance API mapped to 2 servers. Any pointers?
    r
    • 2
    • 12
  • k

    kaivalya apte

    02/17/2022, 11:21 AM
    I have a question around
    DISTINCTCOUNT
    , I have a use case where I want a distinct count based on a column, which has very high cardinality. 900million rows out of which 890million might be distinct.
    DISTINCTCOUNT
    function fails because of OOM. I see that
    DISTINCTCOUNT
    is implemented using a HashSet which will load all the distinct values in memory. I was thinking if a bloom filter with low false positivity rate may help here. Thoughts?
    p
    r
    • 3
    • 13
1...333435...166Latest