https://pinot.apache.org/ logo
Join Slack
Powered by
# troubleshooting
  • k

    Kishore G

    09/22/2020, 7:52 PM
    Are you looking at JVM graph or system memory graph
  • s

    Subbu Subramaniam

    09/22/2020, 8:01 PM
    what is the retention of your table? By default, completed segments are kept on the same servers that they are consumed in Is this a hybrid or realtime-only table? It will be useful to run the
    RealtimeProvisioningHelper
    to get an idea of memory usage
  • p

    Pradeep

    09/22/2020, 8:03 PM
    This is the memory usage graph, I am looking at system memory (no other active processes live on the system)
  • p

    Pradeep

    09/22/2020, 8:04 PM
    retention is set to more than 30 days I believe, it’s a. hybrid table
  • p

    Pradeep

    09/22/2020, 8:04 PM
    I can try running that
  • s

    Subbu Subramaniam

    09/22/2020, 8:06 PM
    If it is hybrid table, with a frequent push on the offline side, your retention for realime table should be short. e.g, 5 days for daily offline push
  • k

    Kishore G

    09/22/2020, 8:08 PM
    thats expected and is the right thing, OS is pretty good at managing the system memory
  • p

    Pradeep

    09/22/2020, 8:17 PM
    I did observe query latencies going on when the system memory is high, I believe if I use the offheap memory then page caches should cleared up, once the mmaped memory used for backing realtime segments are deleted. Let me try that and see (don’t want to touch the system now, will try this in offpeak hours)
  • k

    Kishore G

    09/22/2020, 8:19 PM
    can you do lsof on the realtime process
  • y

    Yash Agarwal

    09/23/2020, 7:22 AM
    Is there a way to configure which servers should be part of a single replica group? or will pinot randomly assign them ?
  • x

    Xiang Fu

    09/23/2020, 7:31 AM
    it should be randomly
  • s

    Subbu Subramaniam

    09/23/2020, 5:20 PM
    @Yash Agarwal you should be able to create a znode with specific assignments of each replica group if desired. @Jackie have we documented this?
    👍 1
  • p

    Pradeep

    09/30/2020, 12:12 AM
    @Neha Pawar wondering if you know what’s going on here, Jackie refered to you I have segment which is in
    consuming
    state for close to 20h, from zk metadata
    Copy code
    "segment.creation.time": "1601350869326",
    But my table config has segment rolling config as
    Copy code
    "realtime.segment.flush.threshold.size": "0",
            "realtime.segment.flush.threshold.time": "2h",
            "realtime.segment.flush.desired.size": "500M",
  • p

    Pradeep

    09/30/2020, 5:47 PM
    Just a note on the star-tree documentation, supported functions list DISTINCT_COUNT_HLL but it seems like correct way to specify that is DISTINCTCOUNTHLL__<colname>
    👍 1
  • j

    Jackie

    09/30/2020, 5:50 PM
    @Pradeep Good point, we should also support
    DISTINCT_COUNT_HLL__<colname>
  • p

    Pradeep

    09/30/2020, 5:51 PM
    I tried that but it threw an exception
  • j

    Jackie

    09/30/2020, 5:51 PM
    Yeah, will submit a fix for that
  • p

    Pradeep

    09/30/2020, 5:55 PM
    thanks
  • j

    Jackie

    09/30/2020, 6:03 PM
    @Pradeep Here is the fix: https://github.com/apache/incubator-pinot/pull/6079. Once it's merged, it should accept both format
    👍 1
  • n

    Neha Pawar

    10/01/2020, 1:13 AM
    @Chinmay Soman ^^
  • p

    Pradeep

    10/01/2020, 5:31 PM
    Hi, I am trying to optimize a query of the format
    Copy code
    select colA, distinctCountHll(colb)
    from table
    where timestamp > X
    group by colA
    We added a star-tree with dimensionsSplitOrder: [“colA”] and aggregateFunction as DistinctCountHLL__colB I am not seeing much query-time improvements, I am doing a comparison against aggregate by colC which is not part of star-treee and see very similar times. I see that star-tree index is getting generated. Wondering if I am missing someething?
  • k

    Kishore G

    10/01/2020, 5:41 PM
    @Pradeep output stats?
  • p

    Pradeep

    10/01/2020, 5:43 PM
    timeMs=8541,docs=361440597/16479407613,entries=336003488/722881194,segments(queried/processed/matched/consuming/unavailable):5229/227/227/8/0,consumingFreshnessTimeMs=1601574112264,servers=4/4,groupLimitReached=false,exceptions=0,serverStats=(Server=SubmitDelayMs,ResponseDelayMs,ResponseSize,DeserializationTimeMs);172.31.17.90_R=1,8536,68556,0;172.31.30.139_O=1,3,372,0;172.31.34.149_O=1,4,372,0;172.31.24.127_R=1,8470,69174,0,
  • p

    Pradeep

    10/01/2020, 5:44 PM
    2servres have old data, so thy don’t match anything
  • k

    Kishore G

    10/01/2020, 5:44 PM
    its still scanning a lot
  • p

    Pradeep

    10/01/2020, 5:44 PM
    yaeh
  • k

    Kishore G

    10/01/2020, 5:44 PM
    @Jackie ^^
  • y

    Yash Agarwal

    10/01/2020, 5:45 PM
    Also I think timestamp should be part of the dimensionsSplitOrder right ?
  • j

    Jackie

    10/01/2020, 5:46 PM
    Yes, yash gives the answer
  • j

    Jackie

    10/01/2020, 5:47 PM
    Can you try removing the filter on
    timestamp
    and see the latency?
1...133134135...166Latest