Apache Pinot #general

troywinter

05/10/2021, 3:56 AM

Can multiple table consume from the same kafka topic using low level consumer in Pinot?

05/10/2021, 5:16 AM

Hi Team, I am trying to use hdfs as deepatorage and created controller.conf server.conf and broker.conf files. Kindly review and Provide your valuable suggestions on attached file. Please suggest/guide if anything needs to change or add.

05/10/2021, 5:17 AM

@User @User @User @User @User kindly help.

Syed Akram

05/10/2021, 7:18 AM

Hi, I am running a query which involves AND , OR and with some filters on string and long values. It has basically 34Million rows , and querying(selecting few columns for an ID) takes almost 2 sec and numEntriesScannedInFilter(89Million) & numEntriesScannedPostFilter are bigger values. Can someone help me to understand, how come this many entries scanned in filter, where i am using Inverted index...?

Pedro Silva

05/10/2021, 9:17 AM

Hello, Is there a way to check the current kafka offset that a realtime table is reading on at a given point in time?

Pedro Silva

05/10/2021, 2:53 PM

Hello, Can I combine built-in json functions within groovy scripts?

troywinter

05/11/2021, 3:43 AM

What kind of index should I use if I have datetime string column to enable faster ranged query? Will a ranged index help?

Pedro Silva

05/11/2021, 10:08 AM

Hello, Does Pinot support defining a computed field (metric) based on a field that does not appear in the schema but exists in the ingestion message? This is a realtime table if that makes a difference.

Ambika

05/11/2021, 2:58 PM

Hi Team -- A basic question , If we use S3 for storing the segments, how does pinot take care of query latency since there will be netw call involved?

Pedro Silva

05/11/2021, 3:41 PM

Hello everyone, What does Pinot store in zookeeper metadata. I currently have 2GB out of 2.5GB of disk used up (78.5%) in my zookeeper instance. Should this be a cause for concern?

Santhi Kollipara

05/11/2021, 4:53 PM

Hello Guys! I am checking out Pinot repo and I noticed the code for thrideye is not in incubator-pinot anymore and all the references to thirdeye are broken in the docs😞 . Is this intentional?

Vengatesh Babu

05/11/2021, 5:15 PM

does pinot support window functions like presto? https://prestodb.io/docs/current/functions/window.html

Ambika

05/12/2021, 2:32 AM

Hi Team -- How do you recommend to handle cases where we need to delete a record due to gdpr/ccpa ?

Ambika

05/12/2021, 8:41 AM

I would expect the aggr for the fact table to happen on pinot and then only the mapping of the ids to the values from dim table happen in presto.. Let me know if its not clear and i will post an example.

troywinter

05/12/2021, 12:06 PM

What are the limitations when using noDictionaryColumns? I got the following exceptions when doing an orderby on a noDictionaryColumn:

Copy code

[
  {
    "errorCode": 200,
    "message": "QueryExecutionError:\njava.lang.IndexOutOfBoundsException\n\tat java.nio.Buffer.checkBounds(Buffer.java:571)\n\tat java.nio.DirectByteBuffer.get(DirectByteBuffer.java:264)\n\tat org.apache.pinot.core.segment.index.readers.forward.VarByteChunkSVForwardIndexReader.getStringCompressed(VarByteChunkSVForwardIndexReader.java:80)\n\tat org.apache.pinot.core.segment.index.readers.forward.VarByteChunkSVForwardIndexReader.getString(VarByteChunkSVForwardIndexReader.java:60)\n\tat org.apache.pinot.core.segment.index.readers.forward.VarByteChunkSVForwardIndexReader.getString(VarByteChunkSVForwardIndexReader.java:34)\n\tat org.apache.pinot.core.common.DataFetcher$ColumnValueReader.readStringValues(DataFetcher.java:465)\n\tat org.apache.pinot.core.common.DataFetcher.fetchStringValues(DataFetcher.java:146)\n\tat org.apache.pinot.core.common.DataBlockCache.getStringValuesForSVColumn(DataBlockCache.java:194)\n\tat org.apache.pinot.core.operator.docvalsets.ProjectionBlockValSet.getStringValuesSV(ProjectionBlockValSet.java:94)\n\tat org.apache.pinot.core.common.RowBasedBlockValueFetcher.createFetcher(RowBasedBlockValueFetcher.java:64)\n\tat org.apache.pinot.core.common.RowBasedBlockValueFetcher.<init>(RowBasedBlockValueFetcher.java:32)\n\tat org.apache.pinot.core.operator.query.SelectionOrderByOperator.computePartiallyOrdered(SelectionOrderByOperator.java:237)\n\tat org.apache.pinot.core.operator.query.SelectionOrderByOperator.getNextBlock(SelectionOrderByOperator.java:178)\n\tat org.apache.pinot.core.operator.query.SelectionOrderByOperator.getNextBlock(SelectionOrderByOperator.java:73)"
  }
]

Ricardo Bernardino

05/12/2021, 1:29 PM

Hi everyone! When using the realtime table with upsert, is there any compaction mechanism on segments? Or will they just keep on being created and kept forever? Thanks!

05/12/2021, 5:04 PM

Is there any way to generate schema JSON file for pinot table of JSON sample data.I have data for 250+ column in Kafka topic and here manulaay I am writing JSON schema file for pinot table. Kindly suggest me if is there any way to generate directly from sample data and can use same as schema file for pinot.

Ambika

05/12/2021, 5:36 PM

Question -- Is there any limit to the number of tenants we can have on a single cluster ? Eg - is 5000 tenants too much ?

Vengatesh Babu

05/12/2021, 6:50 PM

For most of Time Series /Audit data, Time Criteria is the basic one. (E.g) For one-year data, segments created on daily basis will have 365 segments per year. Even for queries that access only last month, last week data will be scheduled to scan all segments including unnecessary ones. is it possible to maintain min/max values of the primary time column in table Meta ?. maintaining time column meta will help broker side segment pruning similar to partition.

Aaron Wishnick

05/12/2021, 7:07 PM

If data ingestion jobs take a lot of memory to create a star tree index, how can I tune that? Does maxLeafRecords affect the memory usage of the segment creation job at all?

Akash

05/12/2021, 7:48 PM

Need some feedback on the star tree index.

Copy code

"tableIndexConfig" : {
    "starTreeIndexConfigs":[{
      "maxLeafRecords": 1000,
      "functionColumnPairs": ["DISTINCT_COUNT_HLL__user_id","COUNT__dt"],
      "dimensionsSplitOrder": ["dt","dim1","dim2","dim3","dim4"]
    }],
    "enableDynamicStarTreeCreation" : true
  },

This is to optimise following queries.

Copy code

select dt,DISTINCT_COUNT_HLL(user_id) FROM TABLE GROUP BY dt
select dt,count(1) FROM TABLE GROUP BY dt
select dt,dim2,DISTINCT_COUNT_HLL(user_id) FROM TABLE where dim1 = 3 GROUP BY dt, dim2 
select dt,dim2,count(1) FROM TABLE where dim1 = 3 GROUP BY dt, dim2

dim1,2,3,4 does not have too much high cardinality. User_id has the biggest cardinality.

Yupeng Fu

05/12/2021, 10:33 PM

@User Nice talk at Kafka summit today! A Pinot table of PB size is amazing..

🍷 6

👍 5

🎉 4

troywinter

05/13/2021, 3:12 AM

How do I cast a string value to int or long using sql in pinot?

Vengatesh Babu

05/13/2021, 12:08 PM

Does pinot supports partition only for RealTime Tables? For the Offline Table all partition data written in the same segment file. segment metadata.properties

column.RELATEDID.partitionFunction = Murmur

column.RELATEDID.numPartitions = 10

column.RELATEDID.partitionValues = 0,1,2,3,4,5,6,7,8,9

Note: Running Data Ingestion using pinot-admin.sh LaunchDataIngestionJob

troywinter

05/13/2021, 2:22 PM

Why the datetimeconvert transform function is much slower than time_floor in Druid? We are migrating our Druid table to Pinot, but found datetime transform and granularity is very slow compare to Druid.

Pedro Silva

05/13/2021, 3:24 PM

Hello, If updating an existing realtime table with a new transformConfig, is the transformed field automatically computed for existing segments or is there some endpoint I need to call to force that computation? The same question but for altering an existing transformConfig.

🙌 1

Pedro Silva

05/13/2021, 4:50 PM

What is the performance implications of defining a dimension field in a schema as a string with a max length of Integer.MaxValue (2GB if all space is fully used)?

Arun Vasudevan

05/13/2021, 6:13 PM

I have added few new columns to the Pinot Table and Pinot Schema, inorder for the new columns to be populated I did

Reload All Segments

for the Table in the UI. 2 Questions here: • I see

Reload All Segments

to re-index data is this the right approach to re-populate new columns? • I don’t see the progress of the

Reload All Segments

I see this PR is completed - https://github.com/apache/incubator-pinot/issues/5390 which release is this part of?

Aaron Wishnick

05/14/2021, 5:28 PM

I got some data ingested and am using a star tree index and I'm running a query like

select foo, percentiletdigest(bar, 0.5) from mytable group by foo

. I've got

foo

in my

dimensionsSplitOrder

and I've got

PERCENTILE_TDIGEST__bar

as well as

AVG__bar

in my

functionColumnPairs

. My query takes about 700 ms but if I switch it to

avg(bar)

it takes 15 ms. Is it expected that the t-digest would be that much slower? Anything I can do to speed it up?

Vishnu

05/16/2021, 2:14 PM

Hi I'm new to pinot and find the star tree concept very interesting Can anyone explain how does it handle upserts? Is it reconstructed everytime??