https://pinot.apache.org/ logo
Join Slack
Powered by
# general
  • c

    Chinmay Soman

    02/25/2019, 5:49 PM
    that makes sense
  • c

    Chinmay Soman

    02/25/2019, 5:49 PM
    but just to close the loop on that one - if the query hits columns that are common across such segments - then there shouldn't be an issue ?
  • c

    Chinmay Soman

    02/25/2019, 5:50 PM
    since the 2 blocks should have the same schema at that point
  • m

    Mayank

    02/25/2019, 5:50 PM
    I believe so
  • c

    Chinmay Soman

    02/25/2019, 5:50 PM
    ok great. That answers it. Thanks a lot Mayank !
  • c

    Chinmay Soman

    02/25/2019, 5:50 PM
    I"ll try it out
  • m

    Mayank

    02/25/2019, 5:50 PM
    Sure. Let me know if you find anything contrary to what I mentioned.
  • c

    Chinmay Soman

    02/25/2019, 5:51 PM
    but on a related note, somehow, we got a lot of timeout exceptions :
    Copy code
    2019-02-25 07:53:17 ERROR CombineOperator:172 - Caught TimeoutException
    java.util.concurrent.TimeoutException
    ...
    Not sure if its related - but this actually caused a perf hit on the servers. Will update with more info as I find.
  • c

    Chinmay Soman

    02/25/2019, 5:52 PM
    from the code, it looks like schema compatibility should fail the query (not timeout)
  • m

    Mayank

    02/25/2019, 5:52 PM
    I guess expensive query?
  • m

    Mayank

    02/25/2019, 5:52 PM
    Those might be unrelated
  • c

    Chinmay Soman

    02/25/2019, 5:52 PM
    well these 2 exceptions seem strongly correlated from my brief reading
  • c

    Chinmay Soman

    02/25/2019, 5:52 PM
    yeah lemme debug more
  • m

    Mayank

    02/25/2019, 5:52 PM
    Ok, let me know what you find
  • c

    Chinmay Soman

    02/25/2019, 5:52 PM
    sure
  • m

    Mayank

    02/26/2019, 10:59 PM
    The precondition is to ensure that the size to be allocated offheap for indexing is > 0. Looking at the code, it seems that this could happen if cardinality is < 10:``` public static MutableDictionary getMutableDictionary(FieldSpec.DataType dataType, boolean isOffHeapAllocation, PinotDataBufferMemoryManager memoryManager, int avgLength, int cardinality, String allocationContext) { if (isOffHeapAllocation) { // OnHeap allocation int maxOverflowSize = cardinality / 10; switch (dataType) { case INT: ```
  • m

    Mayank

    02/26/2019, 10:59 PM
    cc: @User
  • a

    Ananth Packkildurai

    02/26/2019, 11:01 PM
    hmm, that field is a
    noDictionaryColumns
    and worked well so far 🤔
  • m

    Mayank

    02/26/2019, 11:02 PM
    The stack trace suggests it is trying to create a dictionary
    MutableDictionaryFactory.getMutableDictionary
  • a

    Ananth Packkildurai

    02/26/2019, 11:03 PM
    yea,
    Copy code
    "noDictionaryColumns": [
          "clog_json",
        ],
    This is my config, I'm not sure why it try to create dict.
  • s

    Subbu Subramaniam

    02/26/2019, 11:34 PM
    @User the code you are pointing to is just the overflow size. That can be 0. What maybe happening is that there are segments in which the value does not appear at all.
  • s

    Subbu Subramaniam

    02/26/2019, 11:38 PM
    @User was there an issue in this regard before? I recollect seeing it before and concluded it to be a config error
  • m

    Mayank

    02/26/2019, 11:42 PM
    @User is this from the realtime table?
  • a

    Ananth Packkildurai

    02/26/2019, 11:42 PM
    I don’t think so. It worked fine. The only difference I could think of in the upstream pipeline we switched to json library ( from minimal json to Gson) I’m not sure that causes the issue
  • a

    Ananth Packkildurai

    02/26/2019, 11:42 PM
    Yes it’s the real-time table
  • m

    Mayank

    02/26/2019, 11:43 PM
    Hmm, the only way that could cause an issue is if this field is not deserialized correctly 🤔 ?
  • m

    Mayank

    02/26/2019, 11:44 PM
    I think there are two issues here:
    Copy code
    1. Why is the noDict config not picked up? 
    2. Even if dictionary was being created, why did it see zero size?
  • s

    Subbu Subramaniam

    02/26/2019, 11:44 PM
    even though you configure no-dictionary it is possible that all conditions are not met for the consuming segment, so we create a dictinary (or, attempt to create one). Can you check if there are logs that say ? should be printed when the segment first starts consuming
  • m

    Mayank

    02/26/2019, 11:45 PM
    Copy code
    // Check whether to generate raw index for the column while consuming
          // Only support generating raw index on single-value non-string columns that do not have inverted index while
          // consuming. After consumption completes and the segment is built, all single-value columns can have raw index
  • m

    Mayank

    02/26/2019, 11:45 PM
    non-string columns
1...666768...160Latest