Apache Pinot #general

Tejaswini Edara

04/26/2022, 11:52 AM

Hi Team, Anyone aware of dbt integration with pinot. I am trying use dbt to transformation data and push the data to pinot

Diana Arnos

04/26/2022, 12:44 PM

Hello, everyone 👋 Does Pinot have (or plan to have) something available inside AWS marketplace?

KISHORE B R

04/27/2022, 12:52 PM

Hi all, I have a question regarding historical data. How will pinot handle data which is existing say for quite a few years. Will there be any change in performance metrics when such historical data is queried after very long time?

Nisheet

04/27/2022, 3:06 PM

Hi team, I am trying to bootstrap realtime upsert enabled table. I have around 2-3 years that I want to upload to this realtime table. I was trying to utilize the segment generation using spark to create segments and then upload those segments to realtime table. But the initial segment creation job itself fails as it tries to search for OFFLINE table in the table config. I couldn't find any better guide/documentation to perform this. I was just going through whatever changes is there in this PR https://github.com/apache/pinot/pull/6567 and was trying accordingly

Alice

04/28/2022, 2:12 AM

Hi team, can Pinot ingest one partition of one Kafka topic which has many partition?

Chengxuan Wang

04/28/2022, 9:36 AM

wondering if it works for kafka streaming ingestion:

Copy code

"ingestionConfig": {
      "transformConfigs": [
        {
          "columnName": "brand_name_facility_id_tuple",
          "transformFunction": "concat(brand_name, facility_id, ':')"
        }
      ]
    },

not sure if the

concat

works here. the examples here are mostly groovy function: https://docs.pinot.apache.org/developers/advanced/ingestion-level-transformations#column-transformation

Alice

04/28/2022, 11:10 AM

Hi team, Is it possible that two tables belongs to different tenant server and broker has the same table name?

Alice

04/29/2022, 3:14 AM

Hi team, is it a requirement to enable partitioning in Pinot to use upsert feature?

Joe Lane

04/29/2022, 10:21 PM

I’m interested in building a segment fetcher that builds virtual segments on the fly from an OLTP transaction log.

francoisa

05/02/2022, 8:44 AM

Hi. Is there any way from the rest API to retreive informations to monitor like nb_messages read by consumer / nb messages indexed . The goal here in my question is to monitor the ingestion and ensure we are not missing messages. I’ve found messages like that on the pinot-all.log but I want them from API if possible. Any recomanded way ?

Vishnu Ghanta

05/02/2022, 12:22 PM

Hey guys, I am trying to establish jdbc connection to execute queries on pinot cluster. The pinot cluster is deployed on production environment and i am connecting from local(port forwarded pinot controller) to test the jdbc feature. I think while executing the query, the controller is resolving the broker with its name rather than IP and hence getting unknownhost exception.

Copy code

Caused by: org.apache.pinot.client.PinotClientException: java.util.concurrent.ExecutionException: java.util.concurrent.ExecutionException: java.net.UnknownHostException: pinot-broker-0.pinot-broker-headless.xxxxx-v2.svc.cluster.local: nodename nor servname provided, or not known
	at org.apache.pinot.client.JsonAsyncHttpPinotClientTransport.executeQuery(JsonAsyncHttpPinotClientTransport.java:104)
	at org.apache.pinot.client.Connection.execute(Connection.java:127)
	at org.apache.pinot.client.Connection.execute(Connection.java:96)
	at org.apache.pinot.client.PinotStatement.executeQuery(PinotStatement.java:63)
	... 1 more

Is there a way i can avoid this error because the same might happen when i move to production(Application is in different k8s cluster). TIA

Aswini Nellimarla

05/02/2022, 12:28 PM

Hi, Apache Pinot can directly talk to datastores like Cassandra/Cosmos NoSql DB stores?

Jinal Panchal

05/02/2022, 12:42 PM

Hello, I've started exploring Pinot.. So is there any way to define primary key & foreign key relationships so that we can maintain mapping? Because, how will it support join without maintaining relationships?

erik bergsten

05/02/2022, 1:06 PM

We started using the "latest" tagged docker image so we can use timestamp indexes but in this version kafka sasl_plain authentication doesnt work (class not found). Is it broken or will we just have to wait for an official release to get timestamp indexes and full kafka support in one image?

Alice

05/02/2022, 3:05 PM

Hi team, I noticed Timestamp Index is supported and tried to use it. But there is this error. {“code”400,“error”“Cannot deserialize value of type

org.apache.pinot.spi.config.table.FieldConfig$IndexType

from String \“TIMESTAMP\“: not one of the values accepted for Enum class: [INVERTED, FST, JSON, H3, TEXT, SORTED, RANGE]\n at [Source: (String)\“{\“tableName\“\“test time index\“,\“tableType\“\“REALTIME\“,\“segmentsConfig\“{\“schemaName\“\“test_time_index\“,\“timeColumnName\“\“created on\“,\“timeType\“\“MILLISECONDS\“,\“allowNullTimeValue\“true,\“replicasPerPartition\“\“1\“,\“retentionTimeUnit\“\“DAYS\“,\“retentionTimeValue\“\“30\“,\“segmentPushType\“\“APPEND\“,\“completionConfig\“{\“completionMode\“\“DOWNLOAD\“}},\“tenants\“{},\“fieldConfigList\“[{\“name\“\“timestamp\“,\“encodingType\“\“DICTIONARY\“,\“indexTypes\“[\“TIMESTAMP\“],\“time\“[truncated 3199 chars]; line: 1, column: 483] (through reference chain: org.apache.pinot.spi.config.table.TableConfig[\“fieldConfigList\“]->java.util.ArrayList[0]->org.apache.pinot.spi.config.table.FieldConfig[\“indexTypes\“]->java.util.ArrayList[0])“} Part of my table schema is: “dateTimeFieldSpecs”: [ { “name”: “timestamp”, “dataType”: “TIMESTAMP”, “format”: “1MILLISECONDSEPOCH”, “granularity”: “1:MILLISECONDS” } And part of my table config is: “fieldConfigList”: [ { “name”: “timestamp”, “encodingType”: “DICTIONARY”, “indexTypes”: [“TIMESTAMP”], “timestampConfig”: { “granularities”: [ “DAY”, “WEEK”, “MONTH” ] } } ] Any idea how to fix it?

Padma Malladi

05/02/2022, 11:01 PM

Hi all, I am working on improving the query latency for my realtime time series table. There is no corresponding offline table and all the data is realtime data. It has about 61 billion records with 3.5 million unique ids and a size of 2.7 TB. I have the range index set as the timestamp and the unique id as the inverted index. I have the incoming streaming data coming from kafka partitioned. I have the segmentation strategy set to the default of balanced segmentation. Stats are saying that there are 2 servers queried, 34 segments matched, 34 segments processed and 34 segments matched. I am getting a query response time of ~2 seconds and sometimes 4 sec and repeated querying is giving me 50 ms. Would the following changes improve the query performance? 1. Changing the segmentation strategy to Partitioned Replica-Group Segment Assignment 2. Bloom filter (does it improve the performance for individual queries or aggregate queries only?) 3. I am assuming star tree index helps with aggregation and not independent records 4. we have the partitioning set as murmur in the table config 5. How can I allocate / increase the hot/warm memory 6. Tenants are set to DefaultTenant for both server and broker. Would changing this improve? If so, what should be changed 7. Would enabling default star tree and dynamic start tree creation help? 8. Would disabling nullhandling affect the performance? Its currently set to true, but i dont expect null values for the indexed id and timestamp fields 9. Should I set autoGeneratedInvertedIndex and createInvertedIndexDuringSegmentGeneration to true. They are false currently

Weixiang Sun

05/04/2022, 4:04 AM

What is the difference between timeColumnName and sortedColumn inside the tableConfig from query performance perspective? If my query is mainly based on timeColumnName, should I use use the same column as sortedColumn?

BUNTY kumar

05/04/2022, 10:19 AM

Hi All, Is it possible to launch pinot cluster on kubernetes and point it to an already deployed zookeeper consisting of 1 month old metadata.This is more of migration of all the components except zookeeper to another kubernetes cluster within the same VPN.

Saumya Upadhyay

05/04/2022, 11:33 AM

Hi All, if we increase kafka partition later as per requirement how pinot will behave and do we we need to change some config to tackle this situation in pinot to avoid any issues or it is fine pinot will create new segments as soon as the new partitions will be added to kafka topics?

Karin Wolok

05/04/2022, 4:45 PM

Just a reminder! 📣 StarTree's FIRST in-person conference is scheduled and we're looking for speakers!!! 📣 Real Time Analytics Summit (August 16/17 in San Francisco) You can submit a session or register here: https://www.startree.ai/real-time-analytics-summit Sponsorship opps also available. If interested, please shoot me a message! 🙂

Xiang Fu

05/04/2022, 7:50 PM

Dear Community, TL;DR, Pinot removed PQL query endpoint and response format from the current master branch. Only SQL endpoint is supported starting from 0.11.0 release. More info: https://github.com/apache/pinot/issues/7430 Thanks @Jackie for all the works!

👍 7

Ryan Ruane

05/05/2022, 11:16 AM

Pinot Client Rust Hi there. I wrote in the other day about multi-value column ingestion jobs, and at the request of @Mayank, I created the issue: https://github.com/apache/pinot/issues/8635. The reason I was trying to create a table with ingestion of all possible types is because I am writing a rustlang client modelled after https://github.com/startreedata/pinot-client-go. Here is the repo, if anyone is interested: https://github.com/yougov/pinot-client-rust

Tonya Moore

05/05/2022, 5:32 PM

Hi, folks! 👋 StarTree and Cisco Webex are co-hosting a virtual MeetUp on 12May at 7p CDT called WebEx: Real-Time Observability and Analytics with Apache Pinot pinot

Presenters are Sachin Joshi, Vaibhav Mittal, and Tim Berglund.▾

Please join us! 💻

🆒 1

🍷 4

❤️ 10

Mohemmad Zaid Khan

05/06/2022, 4:56 AM

Hi, I have started

PinotController

PinotBroker

and

PinotServer

using git branch

multi_stage_query_engine

code, still the join query is not working. Do I need to do something else?

Jinal Panchal

05/06/2022, 12:04 PM

Hello, I didn't quite get the concept of dimension columns in Pinot. If we have datatypes well-defined for the columns, then what's the significance of specifying Pinot field specification, like metricsField, dimensionFields, etc?

ashutosh singh

05/06/2022, 2:35 PM

👋 Hi everyone!

Diogo Baeder

05/06/2022, 3:12 PM

So, I just created a table with >40k rows, but with daily segments, 318 segments in total - not good, I want to rollup to monthly segments later -, and defined a JSON index for my main columns which contain dynamic data (data that just can't be defined as static columns). Even trying to brutalize this thing by querying all the data with a limit that surpasses the amount of rows I still get ~600ms queries! Geez, this thing is fast! 🙂

Mathieu Druart

05/06/2022, 10:52 PM

Hi ! this PR : https://github.com/apache/pinot/pull/7272/files removed the Pulsar plug-in from the Pinot build because of this issue : https://github.com/apache/pinot/issues/7270. Now that the issue is marked as closed, does anyone know if the plug-in will be added back to the build ? Thank you !

Alice

05/07/2022, 1:22 AM

Hi team, I have a question and don’t know how to solve it. How can I extract numOfStas.Policy in Kafka message and save it to a Pinot table field? When I use transformFunction, it doesn’t work. { “columnName”: “stas_policy”, “transformFunction”: “jsonPathString(stats, ‘$.text_body.fields.numOfStas.Policy’)” } And a sample Kafka message is like this: { “name”: “telemetry_signal_gfw_api_usage”, “stats”: { “text_body”: { “fields”: { “numOfStas”: 0, “numOfStas.Policy”: 21 } } } }

Diogo Baeder

05/09/2022, 12:45 PM

Hey guys, I'd like to ask a question which is not really a problem, but rather just a curiosity on how an aspect of the system works: every time I spin up Pinot with my docker-compose, create the tables, add data and query it for the first time, it does't query as fast as I'd like, but then right on the second and subsequent queries it gets blazing fast, even if I change many constraints in my query. I know that Pinot doesn't do "caching", but why is there such a big difference in query times? For example, it may drop from 900ms on the first query to 40ms, 30ms or even lower on the second, third, fourth etc queries.