https://pinot.apache.org/ logo
Join Slack
Powered by
# general
  • t

    troywinter

    05/10/2021, 3:56 AM
    Can multiple table consume from the same kafka topic using low level consumer in Pinot?
    m
    • 2
    • 3
  • r

    RK

    05/10/2021, 5:16 AM
    Hi Team, I am trying to use hdfs as deepatorage and created controller.conf server.conf and broker.conf files. Kindly review and Provide your valuable suggestions on attached file. Please suggest/guide if anything needs to change or add.
    a
    • 2
    • 3
  • r

    RK

    05/10/2021, 5:17 AM
    @User @User @User @User @User kindly help.
    t
    • 2
    • 1
  • s

    Syed Akram

    05/10/2021, 7:18 AM
    Hi, I am running a query which involves AND , OR and with some filters on string and long values. It has basically 34Million rows , and querying(selecting few columns for an ID) takes almost 2 sec and numEntriesScannedInFilter(89Million) & numEntriesScannedPostFilter are bigger values. Can someone help me to understand, how come this many entries scanned in filter, where i am using Inverted index...?
    m
    • 2
    • 60
  • p

    Pedro Silva

    05/10/2021, 9:17 AM
    Hello, Is there a way to check the current kafka offset that a realtime table is reading on at a given point in time?
    m
    n
    • 3
    • 11
  • p

    Pedro Silva

    05/10/2021, 2:53 PM
    Hello, Can I combine built-in json functions within groovy scripts?
    n
    k
    • 3
    • 3
  • t

    troywinter

    05/11/2021, 3:43 AM
    What kind of index should I use if I have datetime string column to enable faster ranged query? Will a ranged index help?
    x
    k
    • 3
    • 38
  • p

    Pedro Silva

    05/11/2021, 10:08 AM
    Hello, Does Pinot support defining a computed field (metric) based on a field that does not appear in the schema but exists in the ingestion message? This is a realtime table if that makes a difference.
    n
    • 2
    • 4
  • a

    Ambika

    05/11/2021, 2:58 PM
    Hi Team -- A basic question , If we use S3 for storing the segments, how does pinot take care of query latency since there will be netw call involved?
    p
    k
    m
    • 4
    • 14
  • p

    Pedro Silva

    05/11/2021, 3:41 PM
    Hello everyone, What does Pinot store in zookeeper metadata. I currently have 2GB out of 2.5GB of disk used up (78.5%) in my zookeeper instance. Should this be a cause for concern?
    m
    • 2
    • 7
  • s

    Santhi Kollipara

    05/11/2021, 4:53 PM
    Hello Guys! I am checking out Pinot repo and I noticed the code for thrideye is not in incubator-pinot anymore and all the references to thirdeye are broken in the docs😞 . Is this intentional?
    k
    • 2
    • 3
  • v

    Vengatesh Babu

    05/11/2021, 5:15 PM
    does pinot support window functions like presto? https://prestodb.io/docs/current/functions/window.html
    m
    k
    j
    • 4
    • 7
  • a

    Ambika

    05/12/2021, 2:32 AM
    Hi Team -- How do you recommend to handle cases where we need to delete a record due to gdpr/ccpa ?
    j
    • 2
    • 4
  • a

    Ambika

    05/12/2021, 8:41 AM
    I would expect the aggr for the fact table to happen on pinot and then only the mapping of the ids to the values from dim table happen in presto.. Let me know if its not clear and i will post an example.
    x
    • 2
    • 17
  • t

    troywinter

    05/12/2021, 12:06 PM
    What are the limitations when using noDictionaryColumns? I got the following exceptions when doing an orderby on a noDictionaryColumn:
    Copy code
    [
      {
        "errorCode": 200,
        "message": "QueryExecutionError:\njava.lang.IndexOutOfBoundsException\n\tat java.nio.Buffer.checkBounds(Buffer.java:571)\n\tat java.nio.DirectByteBuffer.get(DirectByteBuffer.java:264)\n\tat org.apache.pinot.core.segment.index.readers.forward.VarByteChunkSVForwardIndexReader.getStringCompressed(VarByteChunkSVForwardIndexReader.java:80)\n\tat org.apache.pinot.core.segment.index.readers.forward.VarByteChunkSVForwardIndexReader.getString(VarByteChunkSVForwardIndexReader.java:60)\n\tat org.apache.pinot.core.segment.index.readers.forward.VarByteChunkSVForwardIndexReader.getString(VarByteChunkSVForwardIndexReader.java:34)\n\tat org.apache.pinot.core.common.DataFetcher$ColumnValueReader.readStringValues(DataFetcher.java:465)\n\tat org.apache.pinot.core.common.DataFetcher.fetchStringValues(DataFetcher.java:146)\n\tat org.apache.pinot.core.common.DataBlockCache.getStringValuesForSVColumn(DataBlockCache.java:194)\n\tat org.apache.pinot.core.operator.docvalsets.ProjectionBlockValSet.getStringValuesSV(ProjectionBlockValSet.java:94)\n\tat org.apache.pinot.core.common.RowBasedBlockValueFetcher.createFetcher(RowBasedBlockValueFetcher.java:64)\n\tat org.apache.pinot.core.common.RowBasedBlockValueFetcher.<init>(RowBasedBlockValueFetcher.java:32)\n\tat org.apache.pinot.core.operator.query.SelectionOrderByOperator.computePartiallyOrdered(SelectionOrderByOperator.java:237)\n\tat org.apache.pinot.core.operator.query.SelectionOrderByOperator.getNextBlock(SelectionOrderByOperator.java:178)\n\tat org.apache.pinot.core.operator.query.SelectionOrderByOperator.getNextBlock(SelectionOrderByOperator.java:73)"
      }
    ]
    m
    j
    • 3
    • 6
  • r

    Ricardo Bernardino

    05/12/2021, 1:29 PM
    Hi everyone! When using the realtime table with upsert, is there any compaction mechanism on segments? Or will they just keep on being created and kept forever? Thanks!
    m
    • 2
    • 3
  • r

    RK

    05/12/2021, 5:04 PM
    Is there any way to generate schema JSON file for pinot table of JSON sample data.I have data for 250+ column in Kafka topic and here manulaay I am writing JSON schema file for pinot table. Kindly suggest me if is there any way to generate directly from sample data and can use same as schema file for pinot.
    m
    • 2
    • 2
  • a

    Ambika

    05/12/2021, 5:36 PM
    Question -- Is there any limit to the number of tenants we can have on a single cluster ? Eg - is 5000 tenants too much ?
    m
    • 2
    • 6
  • v

    Vengatesh Babu

    05/12/2021, 6:50 PM
    For most of Time Series /Audit data, Time Criteria is the basic one. (E.g) For one-year data, segments created on daily basis will have 365 segments per year. Even for queries that access only last month, last week data will be scheduled to scan all segments including unnecessary ones. is it possible to maintain min/max values of the primary time column in table Meta ?. maintaining time column meta will help broker side segment pruning similar to partition.
    m
    • 2
    • 9
  • a

    Aaron Wishnick

    05/12/2021, 7:07 PM
    If data ingestion jobs take a lot of memory to create a star tree index, how can I tune that? Does maxLeafRecords affect the memory usage of the segment creation job at all?
    j
    • 2
    • 36
  • a

    Akash

    05/12/2021, 7:48 PM
    Need some feedback on the star tree index.
    Copy code
    "tableIndexConfig" : {
        "starTreeIndexConfigs":[{
          "maxLeafRecords": 1000,
          "functionColumnPairs": ["DISTINCT_COUNT_HLL__user_id","COUNT__dt"],
          "dimensionsSplitOrder": ["dt","dim1","dim2","dim3","dim4"]
        }],
        "enableDynamicStarTreeCreation" : true
      },
    This is to optimise following queries.
    Copy code
    select dt,DISTINCT_COUNT_HLL(user_id) FROM TABLE GROUP BY dt
    select dt,count(1) FROM TABLE GROUP BY dt
    select dt,dim2,DISTINCT_COUNT_HLL(user_id) FROM TABLE where dim1 = 3 GROUP BY dt, dim2 
    select dt,dim2,count(1) FROM TABLE where dim1 = 3 GROUP BY dt, dim2
    dim1,2,3,4 does not have too much high cardinality. User_id has the biggest cardinality.
    m
    j
    • 3
    • 3
  • y

    Yupeng Fu

    05/12/2021, 10:33 PM
    @User Nice talk at Kafka summit today! A Pinot table of PB size is amazing..
    🍷 6
    👍 5
    🎉 4
    m
    x
    • 3
    • 7
  • t

    troywinter

    05/13/2021, 3:12 AM
    How do I cast a string value to int or long using sql in pinot?
    j
    • 2
    • 3
  • v

    Vengatesh Babu

    05/13/2021, 12:08 PM
    Does pinot supports partition only for RealTime Tables? For the Offline Table all partition data written in the same segment file. segment metadata.properties
    column.RELATEDID.partitionFunction = Murmur
    column.RELATEDID.numPartitions = 10
    column.RELATEDID.partitionValues = 0,1,2,3,4,5,6,7,8,9
    Note: Running Data Ingestion using pinot-admin.sh LaunchDataIngestionJob
    m
    • 2
    • 3
  • t

    troywinter

    05/13/2021, 2:22 PM
    Why the datetimeconvert transform function is much slower than time_floor in Druid? We are migrating our Druid table to Pinot, but found datetime transform and granularity is very slow compare to Druid.
    k
    • 2
    • 17
  • p

    Pedro Silva

    05/13/2021, 3:24 PM
    Hello, If updating an existing realtime table with a new transformConfig, is the transformed field automatically computed for existing segments or is there some endpoint I need to call to force that computation? The same question but for altering an existing transformConfig.
    🙌 1
    m
    n
    • 3
    • 11
  • p

    Pedro Silva

    05/13/2021, 4:50 PM
    What is the performance implications of defining a dimension field in a schema as a string with a max length of Integer.MaxValue (2GB if all space is fully used)?
    m
    • 2
    • 29
  • a

    Arun Vasudevan

    05/13/2021, 6:13 PM
    I have added few new columns to the Pinot Table and Pinot Schema, inorder for the new columns to be populated I did
    Reload All Segments
    for the Table in the UI. 2 Questions here: • I see
    Reload All Segments
    to re-index data is this the right approach to re-populate new columns? • I don’t see the progress of the
    Reload All Segments
    I see this PR is completed - https://github.com/apache/incubator-pinot/issues/5390 which release is this part of?
    m
    • 2
    • 5
  • a

    Aaron Wishnick

    05/14/2021, 5:28 PM
    I got some data ingested and am using a star tree index and I'm running a query like
    select foo, percentiletdigest(bar, 0.5) from mytable group by foo
    . I've got
    foo
    in my
    dimensionsSplitOrder
    and I've got
    PERCENTILE_TDIGEST__bar
    as well as
    AVG__bar
    in my
    functionColumnPairs
    . My query takes about 700 ms but if I switch it to
    avg(bar)
    it takes 15 ms. Is it expected that the t-digest would be that much slower? Anything I can do to speed it up?
    x
    j
    m
    • 4
    • 49
  • v

    Vishnu

    05/16/2021, 2:14 PM
    Hi I'm new to pinot and find the star tree concept very interesting Can anyone explain how does it handle upserts? Is it reconstructed everytime??
    k
    a
    e
    • 4
    • 8
1...192021...160Latest