https://pinot.apache.org/ logo
Join Slack
Powered by
# troubleshooting
  • k

    Kishore G

    08/10/2020, 4:30 PM
    https://docs.pinot.apache.org/basics/components/broker
  • k

    Kishore G

    08/10/2020, 4:30 PM
    any time a segment is pushed the maxTime is re-calculated
  • k

    Kishore G

    08/10/2020, 4:31 PM
    then broker splits the rewrites the query into two different queries
  • k

    Kishore G

    08/10/2020, 4:32 PM
    select sum(metric) from table_REALTIME where date >= time_boundary select sum(metric) from table_OFFLINE where date < time_boundary
  • d

    Dan Hill

    08/10/2020, 5:06 PM
    Ah sorry. Let's say I already have offline data for a previous date and I onboard a new customer who wants to backfill for that date range. Without modifying the segment name structure in Pinot, what happens if I run a data ingestion job for that date?
  • n

    Neha Pawar

    08/10/2020, 5:09 PM
    E.g. after the whole batch ingestion job completes? Or is it after each segment gets uploaded?
    - you will see new data after each segment gets uploaded.
  • d

    Dan Hill

    08/10/2020, 5:25 PM
    Cool. I'm assuming this can cause temporary removal or duplicate metric data.
  • n

    Neha Pawar

    08/10/2020, 5:27 PM
    yes. only way to avoid that for now, is to push a single segment. but that may not always be practical
  • k

    Kishore G

    08/10/2020, 6:00 PM
    another option is to partition the data based on customerId so that impact is minimal
  • k

    Kishore G

    08/10/2020, 6:01 PM
    you can go as far as creating a segment per day per customer but that will result in a big skew
  • d

    Dan Hill

    08/10/2020, 7:18 PM
    Yea. I figured that's something we could do longer term. I was also curious about having other virtual tables that reference most of the same segments but can be used as a transition
  • p

    Pradeep

    08/11/2020, 1:17 AM
    QQ, is there a way to optimize/improve the queries of following format (basically time series queries)?
    Copy code
    SELECT DATETIMECONVERT(timestampMillis, '1:MILLISECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '1:HOURS'), count(*) as count_0 
    FROM table 
    WHERE   timestampMillis < 1597104168752 and <some filters>
    GROUP BY DATETIMECONVERT(timestampMillis, '1:MILLISECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '1:HOURS') 
    ORDER BY count(*) desc
    LIMIT 100
    numEntriesScannedInFilter: 18325029 numEntriesScannedPostFilter: 10665158 I guess caching on client side is simple way to go from our side to decrease the latency, wondering if there are any alternatives.
  • k

    Kishore G

    08/11/2020, 2:20 AM
    How long does it take?
  • k

    Kishore G

    08/11/2020, 2:23 AM
    You can create another column for hour and add star tree if you want something really fast
  • p

    Pradeep

    08/11/2020, 3:12 AM
    it takes close to 30seconds for weeks worth of data, but I am being a bit miserly on cpu (doubling it reduced the latency to half, 16cores) Yeah will probably add an additional column or cache on the client side.
  • p

    Pradeep

    08/11/2020, 3:12 AM
    thanks
  • k

    Kishore G

    08/11/2020, 3:15 AM
    Range index on time column might also help but not sure how much it will help with time column since data is already time partitioned
  • p

    Pradeep

    08/11/2020, 3:26 AM
    yeah, true, star tree with with (date, hour) might be easier and faster it seems
  • x

    Xiang Fu

    08/11/2020, 10:03 AM
    how many groups you have here?
  • p

    Pradeep

    08/11/2020, 4:31 PM
    For time range queries should be less than ~1000, for other group by queries ~10000
  • e

    Elon

    08/11/2020, 7:22 PM
    qq - are there any performance differences between dimensionFields and metricFields? We had a user create a table with only dimensionFields and star tree indexes, etc. and didn't see any problems. Should the fields that were aggregated on (i.e. sum__<column>) have been metric fields?
  • p

    Pradeep

    08/14/2020, 6:43 PM
    QQ on star tree index, does just specifying
    dimensionsSplitOrder
    would only include dimensions included in this list? or would we have to also specify “skipStarNodeCreationForDimensions” to avoid other dimensions to get included in the start tree?
  • k

    Kishore G

    08/14/2020, 6:47 PM
    thats right
  • p

    Pradeep

    08/14/2020, 6:51 PM
    got it, thanks
  • o

    Oguzhan Mangir

    08/14/2020, 7:22 PM
    In batch ingestion, table is created automatically if it is not exist right?
  • x

    Xiang Fu

    08/14/2020, 7:25 PM
    no
  • l

    Laxman Ch

    08/17/2020, 4:10 PM
    Hey Folks. Facing some issues while trying to enable pinot controller HA. Earlier, multiple controllers are there but they were using local directory. Now, we changed this to gcs path. After this, we notice segments are getting uploaded without any issues. However, we see errors while trying to download these segments from gcs on demand.
  • m

    Mayank

    08/17/2020, 4:11 PM
    what's the segment downoad url you see in the segmentZkMetadata?
  • l

    Laxman Ch

    08/17/2020, 4:11 PM
    I see http url
  • l

    Laxman Ch

    08/17/2020, 4:12 PM
    Copy code
    2020/08/17 15:49:55.661 ERROR [SegmentOnlineOfflineStateModelFactory$SegmentOnlineOfflineStateModel] [HelixTaskExecutor-message_handle_thread] Caught exception in state transition from OFFLINE -> ONLINE for resource: rawServiceView_REALTIME, partition: rawServiceView__3__31__20200815T1540Z
    java.lang.RuntimeException: org.apache.pinot.spi.utils.retry.AttemptsExceededException: Operation failed after 3 attempts
    	at org.apache.pinot.core.data.manager.realtime.RealtimeTableDataManager.downloadAndReplaceSegment(RealtimeTableDataManager.java:285) ~[pinot-all-0.4.0-SNAPSHOT-jar-with-dependencies.jar:0.4.0-SNAPSHOT-889889e2020f0fcbd2ef316b7fd7fe3eb985c65a]
    	at org.apache.pinot.core.data.manager.realtime.RealtimeTableDataManager.addSegment(RealtimeTableDataManager.java:251) ~[pinot-all-0.4.0-SNAPSHOT-jar-with-dependencies.jar:0.4.0-SNAPSHOT-889889e2020f0fcbd2ef316b7fd7fe3eb985c65a]
    	at org.apache.pinot.server.starter.helix.HelixInstanceDataManager.addRealtimeSegment(HelixInstanceDataManager.java:132) ~[pinot-all-0.4.0-SNAPSHOT-jar-with-dependencies.jar:0.4.0-SNAPSHOT-889889e2020f0fcbd2ef316b7fd7fe3eb985c65a]
    	at org.apache.pinot.server.starter.helix.SegmentOnlineOfflineStateModelFactory$SegmentOnlineOfflineStateModel.onBecomeOnlineFromOffline(SegmentOnlineOfflineStateModelFactory.java:164) [pinot-all-0.4.0-SNAPSHOT-jar-with-dependencies.jar:0.4.0-SNAPSHOT-889889e2020f0fcbd2ef316b7fd7fe3eb985c65a]
    	at sun.reflect.GeneratedMethodAccessor47.invoke(Unknown Source) ~[?:?]
    	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_265]
    	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_265]
    	at org.apache.helix.messaging.handling.HelixStateTransitionHandler.invoke(HelixStateTransitionHandler.java:404) [pinot-all-0.4.0-SNAPSHOT-jar-with-dependencies.jar:0.4.0-SNAPSHOT-889889e2020f0fcbd2ef316b7fd7fe3eb985c65a]
    	at org.apache.helix.messaging.handling.HelixStateTransitionHandler.handleMessage(HelixStateTransitionHandler.java:331) [pinot-all-0.4.0-SNAPSHOT-jar-with-dependencies.jar:0.4.0-SNAPSHOT-889889e2020f0fcbd2ef316b7fd7fe3eb985c65a]
    	at org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:97) [pinot-all-0.4.0-SNAPSHOT-jar-with-dependencies.jar:0.4.0-SNAPSHOT-889889e2020f0fcbd2ef316b7fd7fe3eb985c65a]
    	at org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:49) [pinot-all-0.4.0-SNAPSHOT-jar-with-dependencies.jar:0.4.0-SNAPSHOT-889889e2020f0fcbd2ef316b7fd7fe3eb985c65a]
    	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_265]
    	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_265]
    	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_265]
    	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_265]
    Caused by: org.apache.pinot.spi.utils.retry.AttemptsExceededException: Operation failed after 3 attempts
    	at org.apache.pinot.spi.utils.retry.BaseRetryPolicy.attempt(BaseRetryPolicy.java:61) ~[pinot-all-0.4.0-SNAPSHOT-jar-with-dependencies.jar:0.4.0-SNAPSHOT-889889e2020f0fcbd2ef316b7fd7fe3eb985c65a]
    	at org.apache.pinot.common.utils.fetcher.HttpSegmentFetcher.fetchSegmentToLocal(HttpSegmentFetcher.java:40) ~[pinot-all-0.4.0-SNAPSHOT-jar-with-dependencies.jar:0.4.0-SNAPSHOT-889889e2020f0fcbd2ef316b7fd7fe3eb985c65a]
    	at org.apache.pinot.common.utils.fetcher.SegmentFetcherFactory.fetchSegmentToLocal(SegmentFetcherFactory.java:108) ~[pinot-all-0.4.0-SNAPSHOT-jar-with-dependencies.jar:0.4.0-SNAPSHOT-889889e2020f0fcbd2ef316b7fd7fe3eb985c65a]
    	at org.apache.pinot.common.utils.fetcher.SegmentFetcherFactory.fetchSegmentToLocal(SegmentFetcherFactory.java:116) ~[pinot-all-0.4.0-SNAPSHOT-jar-with-dependencies.jar:0.4.0-SNAPSHOT-889889e2020f0fcbd2ef316b7fd7fe3eb985c65a]
    	at org.apache.pinot.core.data.manager.realtime.RealtimeTableDataManager.downloadAndReplaceSegment(RealtimeTableDataManager.java:277) ~[pinot-all-0.4.0-SNAPSHOT-jar-with-dependencies.jar:0.4.0-SNAPSHOT-889889e2020f0fcbd2ef316b7fd7fe3eb985c65a]
    	... 14 more
1...126127128...166Latest