Apache Pinot #getting-started

Mamlesh

10/26/2022, 11:26 AM

Hi All, ive some questions which are mostly releated to Queries in Pinot. 1. Is their case sensetive and in-sensetive searching available in queries? 2. wild card searching is available? (? and *) 3. Pagination support ? Thankyou in advance. :)

Machhindra

10/26/2022, 1:27 PM

Team, I am experimenting with apache pinot. Using realtime tables to store metrics data into pinot. The data streaming is live. After few days, I had to update table schema to add new columns. I used UI to add new columns, updated table config, reload all segments. Query console is complaining. What else I should be doing? “There are 82 invalid segment/s. This usually means that they were created with an older schema. Please reload the table in order to refresh these segments to the new schema.”

Rohit Anilkumar

10/26/2022, 2:22 PM

Is there any restriction on using an on demand kinesis data stream for streaming ingestion. Asking this because the partitions/ shards in on demand kinesis stream scales up and down based on the load. And since the partitions keep on changing, do we have to run the rebalance every now and then- how is this handled by Pinot? Or should one use a provisioned Kinesis stream where we have the number of shards set during its creation?

Mamlesh

10/27/2022, 3:04 PM

Hi All, can anyone explain exact meaning/difference of 'controller.data.dir' configuration in Controller and 'pinot.server.instance.dataDir' and 'pinot.server.instance.segmentTarDir' in Server's configuration. Thankyou in advance :)

SSD

10/28/2022, 3:19 AM

Hi, I am curious in learning how partial upsert works in Pinot. Wondering where would be a good place to start.😀

Larry Meadors

11/03/2022, 12:53 PM

i have a ksqldb table without a defined time field, how can i create a table in pinot that represents that? i tried defining the schema like this, but when i tried to create the table, it fails (message below):

Copy code

{
  "schemaName": "ecommerce_customer",
  "dimensionFieldSpecs": [
    {
      "name": "NAME",
      "dataType": "STRING"
    },
    {
      "name": "SAPID",
      "dataType": "STRING"
    },
    {
      "name": "COUNTRYID",
      "dataType": "LONG"
    },
    {
      "name": "ENABLED",
      "dataType": "BOOLEAN"
    }
  ]
}

creating a table for this causes this error:

Copy code

{
  "_code": 400,
  "_error": "Cannot find valid fieldSpec for timeColumn: rowtime from the table config: ecommerce_customer_REALTIME, in the schema: ecommerce_customer"
}

which kinda makes sense, i think - i just don't know how to define that time column (yet) - any tips?

Dhruvil Shah

11/03/2022, 7:45 PM

Hello

Dhruvil Shah

11/03/2022, 7:45 PM

I am trying run the docker but it is not starting

Dhruvil Shah

11/03/2022, 7:47 PM

Screen Shot 2022-11-03 at 3.44.47 PM.png

Dhruvil Shah

11/03/2022, 7:47 PM

This screen for a long time

Gerrit van Doorn

11/07/2022, 8:49 PM

Hi folks, what is the best way to add existing segments to an offline table? Say I had created an offline table in the past and the segments are in the deep store, I deleted the offline table to start from scratch. How would I go about adding those segments (or a subset) back to an offline table?

Abdelhakim Bendjabeur

11/08/2022, 11:37 AM

Hi folks, is there a terraform provider that allows the management of pinot clusters in kubernetes?

Shubham Kumar

11/11/2022, 6:36 AM

Hi Team, we have deployed pinot using helm on kubernetes. We have seen that increasing pvc size in helm does not update the existing persistent volume size. Is there a way to update the existing pvc or to attach additional volumes to existing pvc.

Sonit Rathi

11/12/2022, 10:12 AM

Hi team, I am trying to build an upsert realtime table with hashing function. I have a composite key. Do I have to implement the same hashing function for partitioning in kafka or can I use one individual key out of the composite key for partitioning?

Aly Ibrahim

11/14/2022, 5:07 PM

Hello, I installed Pinot using Helm Chart and it works good. Is there an initiative to build a Kubernetes Operator for Pinot? This is one of the projects that would benefit a lot from the Operator pattern.

➕ 1

Dhar Rawal

11/14/2022, 5:50 PM

Is there a way to check that pinot controller is up and running before starting broker and broker is up and running before starting server? And finally running "AddTable"? I am setting up a kafka/pinot/streamlit kube cluster and the "AddTable" script is inside yaml Without this dependency setup, I am currently ending up with dead servers and brokers and the "AddTable" script leaves tables in a bad state as they are associated with these dead servers and brokers

rajkumar

11/23/2022, 2:13 AM

Is it a right choice to replace Apache hive with Pinot?.

Abdelhakim Bendjabeur

11/24/2022, 2:10 PM

Hello 👋 We are considering self-managing Pinot and I am wondering what are the main problems one can face when doing it. If someone has the experience with it, I would really appreciate the input 🙏

Leon Liu

12/07/2022, 8:16 PM

Hello, i’m relatively new to apache pinot, and run into some issues or misunderstanding while batch ingesting the data. I’m following the document documented here: https://docs.pinot.apache.org/basics/getting-started/pushing-your-data-to-pinot to batch load multiple tables using csv (table1 and table2, table1 has 1000 rows, and table2 has 2000 rows) after loading table1, the following query returns 1000 (select count(*) from table1) but after loading table2, query “select count(*) from table2” returns 3000 records. after looking into it, there are a lot of rows returned with null values if i run query “select * from table2". is this normal or i did something wrong? thanks a lot in advance (edited)

Caleb Shei

12/08/2022, 2:33 PM

I plan to setup a number of pinot servers with hdfs as deep store. Any recommended local disk space for pinot server to cache indices and data locally? 500MB? 1TB? or more?

Shreeram Goyal

01/04/2023, 7:33 AM

Hi, I was exploring indexing in pinot and was wondering if it is possible to have columns without any indexing at all. From the doc, had an impression that a column would have atleast one type of indexing. I am currently using Release v0.11.0.

Shreeram Goyal

01/04/2023, 4:14 PM

For data types like INT, LONG etc, there is a default value given in the doc. We have a use case where we have columns of these data types being null. Is it somehow possible to keep them null instead of INTEGER.MIN_VALUE and so on? We are ingesting data from kafka topic into a RT table.

Avi Zrachya

01/09/2023, 9:58 AM

Hi All

Avi Zrachya

01/09/2023, 9:58 AM

getting this error when adding table

Copy code

Executing command: AddTable -tableConfigFile /import/daily-table-offline.json -offlineTableConfigFile null -realtimeTableConfigFilenull -schemaFile /import/daily-schema.json -controllerProtocol http -controllerHost 10.52.0.7 -controllerPort 9000 -user null -password [hidden] -exec
{"code":400,"error":"Invalid TableConfigs. Invalid TableConfigs: daily4. SIMPLE_DATE_FORMAT pattern dd/MM/yyyy has to be sorted by both lexicographical and datetime order"}

Avi Zrachya

01/09/2023, 9:58 AM

Copy code

"dateTimeFieldSpecs": [
    {
      "name": "date",
      "dataType": "STRING",
      "format": "1:DAYS:SIMPLE_DATE_FORMAT:dd/MM/yyyy",
      "granularity": "1:DAYS"
    }
  ]

Avi Zrachya

01/09/2023, 9:59 AM

this is the date config. any idea what’s wrong ?

Amol

01/10/2023, 7:23 AM

Hello Team, Very basic question. I can understand we can ingest data in Pinot from MYSQL using kafka CDC streams, but that looks like for ongoing data. Any recommendations/ guides on how we can bulk import initial dump?

Amol

01/10/2023, 7:23 AM

Also second question, if we decide to dump from mysql every week how will Pinot handle duplicates?

Luis Fernandez

01/11/2023, 6:59 PM

hey friends, long time no chat! we currently have a table with search terms for products, it’s realtime table and we keep the data for 7 days, however we have a project to increment the retention from 7 days to 30 days, however, the longer we look this data by the longer is gonna take for pinot to return results, so we were thinking about strategies to make this better, and one thinking that we had is that we currently store at hourly resolution, but we really don’t need to keep data with that resolution, it could be daily, so we were wondering if we could leverage the realToOfflineTable task or the mergeRollup task. so basically this table would become an hybrid table, we were wondering if you all have any recommendations on how to achieve this with this 2 tools or if there’s any better way to achieve this. also is there a way to tell these task to move data only from a certain period (?) like for example we would like that the data from the oldest day in realtime table be moved to the offline servers but also roll that data up instead of hourly do daily. Thoughts, prayers, concerns?

🍷 1

Carlos

01/17/2023, 4:19 PM

Hi to every one. I just started a data analytics project. I’m puting raw data to an s3 bucket, transforming and cleaning it to finally get it ingested by Pinot using an offline table. I have put some data, the made some updates at the data origina side expecting seeing at pinot that the corresponding row being updated. What I got is a duplicated row, so I’m afraid that there si something I didn’t understood about Pinot. My goal is to build a Superset instance for the marketing team, consuming data from pinot. This data is mainly application entities that I pass through the mentioned pipeline, so on the other side I expect to query the final state of that entity. ¿Is Pinot a suitable option for that or should I continue investigating?