Apache Pinot #general

Deepak Kumar Mishra

03/16/2021, 4:07 AM

Is update query is possible using pinot

Ravikumar Maddi

03/16/2021, 6:11 AM

is It correct?? I have a column contains list of integers("madIds": [1111, 2222, 3444]) for that I am writing like in schema config file, please correct me and confirm me.

Copy code

{
        "name": "madIds",
        "datatype": "INT",
        "delimiter":",",
        "singleValueField":false
},

Ravikumar Maddi

03/16/2021, 7:43 AM

@All - how to write schema for date column I have a column with date: "startDate": "2021-01-04 000000" Need help 🙂

Ravikumar Maddi

03/16/2021, 7:50 AM

@All - I added a table by using addTable pinot command, but after I changed the schema, how to update the existing table already added. How to do update and delete table here.

Vibhor Jain

03/16/2021, 8:39 AM

Hi All, what is the general approach preferred for retrofitting old data in Pinot? I see that MS teams uses Pinot. Now if I sent a msg via teams and later updated that, how can such use case be handled in Pinot where there is no update supported? Suggestions welcome.

Ravikumar Maddi

03/16/2021, 2:59 PM

Hi All, I have three date columns, So, I written like this,

Copy code

"dateTimeFieldSpecs": [
  {
    "name": "_source.startDate",
    "dataType": "STRING",
    "format": "1:SECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd",
    "granularity": "1:DAYS"
  },
  {
    "name": "_source.lastUpdate",
    "dataType": "STRING",
    "format": "1:SECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss",
    "granularity": "1:DAYS"
  },
  {
    "name": "_source.sDate",
    "dataType": "STRING",
    "format": "1:SECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss",
    "granularity": "1:DAYS"
  }
]

can you please correct. I am getting error

Copy code

{
  "code": 400,
  "error": "Cannot find valid fieldSpec for timeColumn: timestamp from the table config: eventflow_REALTIME, in the schema: eventflowstats"
}

Need your help 🙂

Karin Wolok

03/16/2021, 5:08 PM

👋 Welcome all the new Pinot 🍷 community members! How did you find out about Pinot? What are you working on? @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User

👋 2

Ron Kitay

03/16/2021, 5:12 PM

Hi, what data types does

Pinot

support out-of-the-box? I’m guessing

String

, numerics (integers and floating points),

date

and

boolean

- are there any others supported? For example -

ip-address

Ron Kitay

03/16/2021, 5:19 PM

Is there any limitation on the size of a single record written into

Pinot

? Our average records are about 6KB when stored in

AVRO

, but can reach up to ~50K in edge cases

Chad Preisler

03/17/2021, 2:40 AM

I need to transform an encrypted Kafka message before Pinot processes it. Right now for our stream apps we use a custom serde to do it. How can I do it in Pinot? Looks like it would be fairly easy to change Pinot to allow a deserializer to be plugged in. Thoughts?

Chad Preisler

03/17/2021, 2:46 AM

Seems like Pinot is stuck on an older version of the JDK due to its use of off memory heap APIs that no longer exist. The code does not compile on JDK 15. Also the “shade” plugin does not work on JDK 15. I read JDK 16 has some new methods for using off heap memory. Is there a plan to move to a modern JDK? Is off heap even necessary now that ZGC can handle 16TB of heap with little to no pause time?

troywinter

03/17/2021, 5:52 AM

Is there a way to specify the group id for Kafka realtime ingestion? What’s the ingestion config key should be?

Ronak

03/17/2021, 4:18 PM

I was exploring TEXT_MATCH functionality with pinot-0.7.0/0.6.0 and had configured one of the columns for it. Is there any configuration for the refresh time interval for the index - https://docs.pinot.apache.org/basics/indexing/text-search-support After enabling indexing (with

index type: Text

and

encoding type: RAW

) on the column and doing TEXT_MATCH, I was first getting an empty result, but after some time, I was getting the result. So, what is the initial delay for such a column to be searchable? Is any settings/configuration (e.g num of docs, indexed size, etc) for the same?

Brian Olsen

03/17/2021, 5:00 PM

Hey all 👋 Just jumping into this awesome tech called Pinot! I'm a developer advocate from the Trino project (formerly PrestoSQL). Tomorrow we're having an episode of the Trino Community Broadcast with @User and @User about the Pinot Connector. We're covering the benefits of Trino + Pinot and why you really need Pinot to speed up your common aggregation queries for predictable response times but also gaining the benefit of federated queries over your data lake or other data sources. We'll cover a bit of the specific limitations and current work going on in the Trino-Pinot connector, and finally i'll run a simple demo with the connector! Come watch me crash my docker containers @11am EDT on https://www.twitch.tv/trinodb.

👍 2

🍷 4

🥳 4

Brian Olsen

03/17/2021, 7:19 PM

@User We'll be discussing this PR tommorrow and @User has a pretty neat solution coming in future versions of Trino. See you all tomorrow @11am EDT! 🐇🐇 https://www.twitch.tv/trinodb https://apache-pinot.slack.com/archives/CDRCA57FC/p1616005851043300?thread_ts=1616000429.041000&cid=CDRCA57FC

🍷 1

Ravikumar Maddi

03/18/2021, 2:07 PM

Hi All, one basic doubt, I run quick start stream, I understand the all the ports and components behind that, I am not able to understand about 2191. what is running with 2191 port?

Josh Highley

03/18/2021, 4:58 PM

Will upsert work with hybrid tables? Will a realtime record become active over an offline record having the same primary key value?

Ken Krugler

03/19/2021, 5:59 PM

OK - but it’s in Maven Central 🙂 Should we avoid upgrading to that version?

Aaron Wishnick

03/19/2021, 6:33 PM

Does Pinot's batch insert have any way to avoid inserting duplicate data? Say that ever day I want to batch-insert the previous day of data, and I have multiple batches of data per day (say each batch of data corresponds to data from a different ice cream flavor). If I'm generating + batch inserting yesterday's data for each ice cream flavor in parallel, and the "strawberry" job fails, so I rerun it, how do I make sure I'm not batch-inserting "strawberry" data that was already inserting?

Ken Krugler

03/19/2021, 7:16 PM

My ops guy is trying to validate JMX metrics, and he asked me how to trigger NUM_MISSING_SEGMENTS. Any suggestions?

Oguzhan Mangir

03/20/2021, 3:52 PM

Does pinot stores min max values for dimensions in segment metadata? Or does it just store min max values for date time fields? And can we create inverted or any other indices on date time column?

Oguzhan Mangir

03/21/2021, 10:22 AM

The first question; When we enable

aggregateMetrics

to pre-aggregation as it is consumed, pinot aggregates data based on fields which defined in

dimensionFieldSpecs

and

dateTimeFieldSpecs

. Can pinot aggregates data only based on fields which defined in

dimensionFieldSpecs

while applying pre-aggregation using

aggregateMetrics?

The second question; We can set time to generate segments for real-time table using

realtime.segment.flush.threshold.time

config. Let's assume current hour is 10:25. When i set

realtime.segment.flush.threshold.time

1 hour

, pinot creates segment with startTime 10:25, and it will close this segment when time is 11:25. As a result, start/end time of that segment is 1025 1125. But when the new hour starts, I want pinot to close segment.. Start/end time of that segment should be 1000 1100. How can i achieve that?

Dan Hill

03/22/2021, 3:14 AM

Any design recommendations for Pinot setups that need to deal with data protection requirements of different locations where certain personal data should remain in location boundaries (e.g. GDPR)? Do people try to setup global tables and use Server and Segment definitions to limit scope? Or do people create separate tables?

Ravikumar Maddi

03/22/2021, 6:58 AM

Hi All, I have a doubt, If there is nested JSON(Very large nested entities at-least 5 to 7 levels of embedded jsons entries) . Which is better way of doing schema for that 1. Flatten the JSON -- Schema becomes un-scalable 2. Store Embedded JSONs(JSON indexing concept), and use JSON Evolution functions, but it showing very high time taking. I saw one technical session on Nested Indexing, they said , if one million records there, JSON evolution function might take 10 to 15 seconds to get result. Could you please tell me which is better way. How to design schema for nested JSONs.

Karin Wolok

03/22/2021, 6:02 PM

Welcome new 🍷 Pinot slack members!!! Curious who you are and how you found the Pinot community! Want to share what you're working on? @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User

🐇 2

👋 4

🍷 6

virtualandy

03/23/2021, 4:20 AM

Hi! I’m Andy. I came across Pinot late last year (I think in a blog but I have Neha’s

https://youtu.be/mRkWT_EU99M▾

as my earliest bookmark haha) I’m an engineering manager at Handshake where I help a team focused on building features with (you guessed it) data and analytics. We use a lot of Elastic, Postgres and BigQuery and I’m always personally looking to expand my 🧠 with projects like Pinot. Only just learning but excited to be part of this community.

🍷 1

👋 3

👍 8

Karin Wolok

03/24/2021, 1:11 AM

📣 If you're new to Pinot, 🍷 and interested to learn the basic fundamentals (Pinot 101), we invite you to join us this Thursday for 💡 Intro to Apache Pinot! 🧠 Presented by Apache Pinot committer, @User https://www.meetup.com/apache-pinot/events/275991991/

🎉 3

Charles

03/25/2021, 10:10 AM

Hi all, when pinot to consuming kafka , how to parse nest json such like { “data”: { “name”: “cc”, “age”: 3 } } I just need “name” and age in table

Oguzhan Mangir

03/25/2021, 10:46 AM

Hi, is there any article about kubernetes production experience for pinot? We want to learn things like optimal server count, num of segments per server, optimal resources for realtime and offline servers etc. I've found a few articles, but i want to know if there are other articles

Charles

03/26/2021, 12:42 AM

Hi. All , If my kafka topic has 32 partitions, can we control pinot consuming threads self