Apache Pinot #general

EO W

11/30/2022, 5:43 AM

Hi all, I want to know about timestamp index. I have a dataset containing 1 billion records representing user behavior logs. It contains details such as request_timestamp, url, action, and section. I want to do 5-minute aggregations on this dataset. What I'm curious about is if a timestamp index with "MINUTE" granularities is applied to a timestamp column, can this index be used for 5-minute aggregation? For example, if I use the round function for the

$ts$MINUTE

column as a group by condition as shown below, will a timestamp index be used?

Copy code

select 
	toDateTime(round($request_timestamp$MINUTE, 300000), 'yyyy-MM-dd HH:mm', 'Asia/Seoul') AS time_bucket
	, url
	, action
	, count(1)
from user_behavior_log
group by time_bucket, url, action

How should I query for the 5 minute aggregation?

Rohit Anilkumar

12/01/2022, 6:13 AM

Hey, quick question. In a scenario where pinot stops consuming from kinesis data stream, is there any API we need to call to restore the normal working? Is there any documentation around this?

Rostan TABET

12/01/2022, 2:06 PM

Are there plans to support

GAPFILL

in the Multi-Stage Query engine?

Damon

12/02/2022, 6:23 AM

Hello, does anyone have experience with monitoring Pinot on Grafana with Prometheus? I am using

prometheus-community/kube-prometheus-stack

but the CPU usage panels is showing no data since I am metric container_cpu_`user`_seconds_total , but I have container_cpu_`usage`_seconds_total Is there a way to enable the missing user metric?

Ajay Chintala

12/02/2022, 8:22 PM

Hi team.. quick question: in a hybrid table, can we upload offline segments with overlapping time ranges but with unique names? We have a use case where we will need to backfill a hybrid table but the new data is not organized in time buckets but rather as a single csv spanning a large time range across existing segments. I'm wondering if we can simply create a new segment with the backfill data with unique name and upload it to the offline table and let mergeAndRollup job handle recreating segments with the correct ranges..

Alice

12/05/2022, 7:44 AM

Hi team, does Pinot support group by top n query? Example query, query a,b,c from table group by f1,f2,f3. How can I get max n rows for each combination of f1 and f2?

Sonit Rathi

12/05/2022, 8:28 AM

Hi Team, have this requirement to purge data according to brand id (a column inside a table) . How can I go about this? Do I create tenants and table for each brand id?

vishal

12/05/2022, 11:09 AM

Hi Team, I am trying to create multiple tenant. its needed untagged servers and broker. i define 6 server while installing pinot but the problem all are adding to defaulttenant. how can i prevent it? even i tried to remove from default tenant but its still showing tagged in logs.

✅ 1

Mohit Garg4628

12/05/2022, 11:36 AM

Hi Team, Can you please tell me when v2 version of pinot will be more stable, we have use case for nested and join query. Thanks

Jeff Bolle

12/05/2022, 3:33 PM

Hi everyone, I am trying to evaluate Kafka + Pinot as a replacement for OpenSearch on a 15.5bn document dataset. There are a number of aggregations we run on the events that group by a user id and then run some statistics (min, max for a number of timestamps, counts of events that match additional specific criteria, etc) as well as some text based analytics. I think solving the statistics part of the aggregations will be straightforward (if very different from what we are doing in opensearch). The core of my question is whether there is any capability for running an aggregation similar to Significant Terms in the Kafka / ksqldb / Pinot suite of tools. What we are trying to produce is a list of the top X results from a given field (email addresses, url domain names, etc) that are most unique in this user's subset of data, compared to all data.

Kevin Xu

12/06/2022, 2:44 AM

Hi Team, If server missed some segments but don't miss segment metadata and download url，how to trigger pinot to download missed segments from deep store?

Jeff Bolle

12/06/2022, 4:20 PM

What are the best practices for handling IP addresses in Pinot? Storing as ints? How would joins or lookups between IP addresses and CIDR blocks be done?

Marco Ndoping

12/07/2022, 4:22 AM

Hi, I'm getting an error when running these queries

SELECT ToDateTime(1639137263000, 'yyyy-MM-dd') AS dateTimeString FROM ignoreMe

and

SELECT FromDateTime('2019-08-07', 'yyyy-MM-dd') AS epochMillis FROM ignoreMe

in Pinot's multi stage engine from the Pinot UI. Here's the error message: { "message": "SQLParsingError\njava.lang.RuntimeException Error composing query plan for: SELECT FromDateTime('2019-08-07', 'yyyy-MM-dd') AS epochMillis\nFROM ignoreMe\n\tat org.apache.pinot.query.QueryEnvironment.planQuery(QueryEnvironment.java:137)\n\tat org.apache.pinot.broker.requesthandler.MultiStageBrokerRequestHandler.handleRequest(MultiStageBrokerRequestHandler.java:153)\n\tat org.apache.pinot.broker.requesthandler.MultiStageBrokerRequestHandler.handleRequest(MultiStageBrokerRequestHandler.java:128)\n...\nCaused by: java.lang.IllegalArgumentException: Could not find schema for table: 'ignoreMe'. This is likely indicative of some kind of corruption and should not happen! If you are running this via the a test environment, check to make sure you're specifying the correct tables.\n\tat org.apache.pinot.query.catalog.PinotCatalog.getTable(PinotCatalog.java:68)\n\tat org.apache.calcite.jdbc.SimpleCalciteSchema.getImplicitTable(SimpleCalciteSchema.java:126)\n\tat org.apache.calcite.jdbc.CalciteSchema.getTable(CalciteSchema.java:295)\n\tat org.apache.calcite.sql.validate.EmptyScope.resolve_(EmptyScope.java:145)", "errorCode": 150 } Is this a known issue? Can you please investigate and confirm whether we need to create an issue here?

Saqlain Khan

12/07/2022, 9:58 AM

Hi #CDRCA57FC, recently i deployed apache pinot on my cluster.. i want to host the pinot controller behind reverse proxy..could you please help me with the configuration or any env variable which will help me to deploy controller as reverse proxy..

Vishnu Ghanta

12/07/2022, 11:19 PM

Hi team, I deployed pinot and kafka in kubernetes and was trying to integrate kafka as a source for pinot. I am facing an error as below: These are the logs from pinot-controller

Transient Exception: Could not get partition count for topic test1

org.apache.pinot.spi.stream.TransientConsumerException: <http://org.apache.pinot.shaded.org|org.apache.pinot.shaded.org>.apache.kafka.common.errors.TimeoutException: Failed to get offsets by times in 5002ms

. . .

Caused by: <http://org.apache.pinot.shaded.org|org.apache.pinot.shaded.org>.apache.kafka.common.errors.TimeoutException: Failed to get offsets by times in 5002ms

If I describe the topic using kafka cli, I am able to see the partition count Kafka is deployed in another namespace and its version is 2.13-2.6.3 Please help if anyone has faced this error. TIA

Kevin Xu

12/08/2022, 2:33 AM

Hi team， I tried to add dependency between jobs. And Job DAG has been constructed. But Pinot seems don't execute task like i wish. So I hope someone could tell me the mechanism of acquire task in pinot. Below is the code that acquire task in pinot. But it is a callback method. So I don't know the pipeline.

Copy code

TaskFactory taskFactory = context -> {
        try {
          return new Task() {
            private final TaskConfig _taskConfig = context.getTaskConfig();
            private final PinotTaskExecutor _taskExecutor = taskExecutorFactory.create();
            private final MinionEventObserver _eventObserver = eventObserverFactory.create();
            private final MinionMetrics _minionMetrics = MinionContext.getInstance().getMinionMetrics();

            @Override
            public TaskResult run() {
              HelixManager helixManager = context.getManager();
              JobContext jobContext = TaskDriver.getJobContext(helixManager, context.getJobConfig().getJobId());
              // jobContext.getStartTime() return the time in milliseconds of job being put into helix queue.
              long jobInQueueTimeMs = jobContext.getStartTime();
              long jobDequeueTimeMs = System.currentTimeMillis();

Xiang Fu

12/08/2022, 6:48 AM

Hi all, I plan to cut a new Pinot release on Monday, please reply this thread or just ping me if any pending pr or features you want to get in

Yarden Rokach

12/08/2022, 2:00 PM

Check out the #C03N1JNHXLY channel to be updated with the upcoming events. The next one is happening today 😉

vishal

12/09/2022, 6:59 AM

Hi Team, do we have any deeper level article for data overlapping in offline data push?

Rohit Anilkumar

12/09/2022, 10:42 AM

as per the documentation, https://docs.pinot.apache.org/basics/indexing/inverted-index

invertedIndexColumns

can be specified using this keyword but how do we specify sorted inverted index cols?

Timothy Spann

12/11/2022, 10:01 PM

https://dzone.com/articles/building-real-time-weather-dashboards-with-apache

🍷 2

💥 2

👍 3

Chengxuan Wang

12/13/2022, 4:02 AM

hey , wondering if there is a way to get first row of a group after group by?

Yarden Rokach

12/13/2022, 5:55 PM

Hi everyone! Join us in 5 min for Apache Pinot- 2022 year in review 🍷 https://www.meetup.com/apache-pinot/events/290226108/

Yarden Rokach

12/13/2022, 6:00 PM

https://startreedata.zoom.us/j/85899658760?pwd=RXgvN2FPY1RwYlp0Y2EwYklXWHVGUT09

Shaun Sawyer

12/15/2022, 2:47 PM

We are using Pinot for near realtime analytics, and we would also like to support the ability to export query results to CSV and upload to GCS. Is anyone doing something similar? What might be the current recommended approach for this? Happy to provide more details about our current usecase.

🦗 1

Yarden Rokach

12/15/2022, 7:14 PM

Great blog post of @Timothy Spann about Building Real-Time Weather Dashboards With Apache Pinot ☂️ Relevant than ever in these days 🌍 https://dzone.com/articles/building-real-time-weather-dashboards-with-apache

🍷 1

Aaron Weiss

12/16/2022, 3:53 PM

Is it possible to use SegmentTarPush Ingestion job to upload an existing segment file to a REALTIME table? If not, is there another way to do this? I get the following error which I think is because the table spec starts with REALTIME and the process is expecting OFFLINE.

Copy code

Caused by: org.apache.pinot.shaded.com.fasterxml.jackson.databind.exc.MismatchedInputException: Missing required creator property 'tableName' (index 0)
 at [Source: UNKNOWN; line: -1, column: -1] (through reference chain: org.apache.pinot.spi.config.table.TableConfig["tableName"])

Arpan Samariya

12/17/2022, 6:38 AM

How can Pinot help in CDC from production MySQL tables?

vishal

12/19/2022, 12:51 PM

Hi Team, we are trying to contribute to opensource-pinot. can somebody help me any procedure to start?

chandarasekaran m

12/20/2022, 5:45 AM

Hi team , can we read fields from Kafka header now ?? Is that change pushed to master ?? Cc: @Kishore G