https://pinot.apache.org/ logo
Join Slack
Powered by
# general
  • m

    Mayank

    02/26/2019, 11:46 PM
    @User When it was working before, do we know if it was being treated as noDict? I am guessing not.
  • a

    Ananth Packkildurai

    02/26/2019, 11:48 PM
    oh, it was fine until last week, since last week the error started showing up.
  • m

    Mayank

    02/26/2019, 11:48 PM
    My guess is it was working with dictionary. But something is triggering the size to be zero
  • m

    Mayank

    02/26/2019, 11:49 PM
    So the latter still needs to be investigated
  • a

    Ananth Packkildurai

    02/26/2019, 11:49 PM
    ☝️ I remember there was an issue with trying to sort in the heap for the no-dict columns.
  • m

    Mayank

    02/26/2019, 11:49 PM
    As per the comment we don't support no-dict for String columns for the consuming segments
  • a

    Ananth Packkildurai

    02/26/2019, 11:51 PM
    oh ```&& dataType != FieldSpec.DataType.STRING`!!!! 😕
  • m

    Mayank

    02/26/2019, 11:52 PM
    Right. So the dictionary was always created I am guessing. Not sure what is causing the size to allocate to be zero.
  • m

    Mayank

    02/26/2019, 11:56 PM
    @User Did a new deployment trigger this?
  • a

    Ananth Packkildurai

    02/26/2019, 11:56 PM
    it seems so.
  • a

    Ananth Packkildurai

    02/26/2019, 11:56 PM
    nope, I can see the code all the way back to 2017.
  • m

    Mayank

    02/26/2019, 11:57 PM
    Copy code
    MutableDictionary dictionary = MutableDictionaryFactory
                .getMutableDictionary(dataType, _offHeap, _memoryManager, dictionaryColumnSize,
                    Math.min(_statsHistory.getEstimatedCardinality(column), _capacity), allocationContext);
            _dictionaryMap.put(column, dictionary);
  • m

    Mayank

    02/26/2019, 11:57 PM
    _statsHistory.getEstimatedCardinality(column)
  • m

    Mayank

    02/26/2019, 11:58 PM
    The stats are written into a file on the server
  • m

    Mayank

    02/26/2019, 11:58 PM
    May be the stats file is corrupted?
  • m

    Mayank

    02/26/2019, 11:58 PM
    @User What do you think?
  • m

    Mayank

    02/26/2019, 11:58 PM
    IIRC there's a tool for viewing this stats file
  • a

    Ananth Packkildurai

    02/26/2019, 11:59 PM
    to step back, any reason why string data type is not supported for no dict? I thought that is a major differentiator
  • m

    Mayank

    02/27/2019, 12:00 AM
    It is not supported only for consuming segments. For consumed segments it is supported.
  • s

    Subbu Subramaniam

    02/27/2019, 12:02 AM
    @User i just didn't code it up for string datatype. Contributions welcome 😉
  • a

    Ananth Packkildurai

    02/27/2019, 12:03 AM
    ah okay, there are no design issues?
  • s

    Subbu Subramaniam

    02/27/2019, 12:03 AM
    is it possible that the column had 0 cardinality in many segments? (i.e. it was never populated in the stream)?
  • a

    Ananth Packkildurai

    02/27/2019, 12:06 AM
    hmm, I can think of some edge condition that could happen (maybe a parsing exception). We usually set an empty string for it. Let me double check that as well.
  • a

    Ananth Packkildurai

    02/27/2019, 12:10 AM
    yeh, I got some few good points to investigate further. 🙇 @User can you please send me some reference to look on how to add no dict for string when you got a chance.
  • a

    Ananth Packkildurai

    02/27/2019, 12:12 AM
    tbh, the low cardinality check seems little rigid since usually we won't have much control over the input stream. It will be great if we can fail safe on these scenarios.
  • s

    Subbu Subramaniam

    02/27/2019, 12:15 AM
    the following code is to get estimated carinality:
    Copy code
    public synchronized int getEstimatedCardinality(@Nonnull String columnName) {
        int numEntriesToScan = getNumntriesToScan();
        if (numEntriesToScan == 0) {
          return DEFAULT_EST_CARDINALITY;
        }
        int totalCardinality = 0;
        int numValidValues = 0;
        for (int i = 0; i < numEntriesToScan; i++) {
          SegmentStats segmentStats = getSegmentStatsAt(i);
          ColumnStats columnStats = segmentStats.getColumnStats(columnName);
          if (columnStats != null) {
            totalCardinality += columnStats.getCardinality();
            numValidValues++;
          }
        }
        if (numValidValues > 0) {
          int avgEstimatedCardinality = totalCardinality / numValidValues;
          if (avgEstimatedCardinality > 0) {
            return avgEstimatedCardinality;
          }
        }
        return DEFAULT_EST_CARDINALITY;
      }
  • s

    Subbu Subramaniam

    02/27/2019, 12:16 AM
    from what i can see, this always returns DEFAULT_EST_CARDINALITY
  • s

    Subbu Subramaniam

    02/27/2019, 12:17 AM
    which is 5000
  • s

    Subbu Subramaniam

    02/27/2019, 12:17 AM
    has been since https://github.com/apache/incubator-pinot/pull/1999 (2017 PR)
  • s

    Subbu Subramaniam

    02/27/2019, 12:19 AM
    the code to create dictionary is as:
    Copy code
    MutableDictionary dictionary = MutableDictionaryFactory
                .getMutableDictionary(dataType, _offHeap, _memoryManager, dictionaryColumnSize,
                    Math.min(_statsHistory.getEstimatedCardinality(column), _capacity), allocationContext);
1...676869...160Latest