https://pinot.apache.org/ logo
Join Slack
Powered by
# general
  • d

    Deepak Kumar Mishra

    03/16/2021, 4:07 AM
    Is update query is possible using pinot
    m
    k
    v
    • 4
    • 5
  • r

    Ravikumar Maddi

    03/16/2021, 6:11 AM
    is It correct?? I have a column contains list of integers("madIds": [1111, 2222, 3444]) for that I am writing like in schema config file, please correct me and confirm me.
    Copy code
    {
            "name": "madIds",
            "datatype": "INT",
            "delimiter":",",
            "singleValueField":false
    },
    x
    • 2
    • 3
  • r

    Ravikumar Maddi

    03/16/2021, 7:43 AM
    @All - how to write schema for date column I have a column with date: "startDate": "2021-01-04 000000" Need help 🙂
    x
    • 2
    • 6
  • r

    Ravikumar Maddi

    03/16/2021, 7:50 AM
    @All - I added a table by using addTable pinot command, but after I changed the schema, how to update the existing table already added. How to do update and delete table here.
    x
    • 2
    • 4
  • v

    Vibhor Jain

    03/16/2021, 8:39 AM
    Hi All, what is the general approach preferred for retrofitting old data in Pinot? I see that MS teams uses Pinot. Now if I sent a msg via teams and later updated that, how can such use case be handled in Pinot where there is no update supported? Suggestions welcome.
    g
    • 2
    • 1
  • r

    Ravikumar Maddi

    03/16/2021, 2:59 PM
    Hi All, I have three date columns, So, I written like this,
    Copy code
    "dateTimeFieldSpecs": [
      {
        "name": "_source.startDate",
        "dataType": "STRING",
        "format": "1:SECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd",
        "granularity": "1:DAYS"
      },
      {
        "name": "_source.lastUpdate",
        "dataType": "STRING",
        "format": "1:SECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss",
        "granularity": "1:DAYS"
      },
      {
        "name": "_source.sDate",
        "dataType": "STRING",
        "format": "1:SECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss",
        "granularity": "1:DAYS"
      }
    ]
    can you please correct. I am getting error
    Copy code
    {
      "code": 400,
      "error": "Cannot find valid fieldSpec for timeColumn: timestamp from the table config: eventflow_REALTIME, in the schema: eventflowstats"
    }
    Need your help 🙂
    k
    • 2
    • 5
  • k

    Karin Wolok

    03/16/2021, 5:08 PM
    👋 Welcome all the new Pinot 🍷 community members! How did you find out about Pinot? What are you working on? @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User
    👋 2
    c
    s
    +2
    • 5
    • 5
  • r

    Ron Kitay

    03/16/2021, 5:12 PM
    Hi, what data types does
    Pinot
    support out-of-the-box? I’m guessing
    String
    , numerics (integers and floating points),
    date
    and
    boolean
    - are there any others supported? For example -
    ip-address
    ?
    k
    k
    • 3
    • 12
  • r

    Ron Kitay

    03/16/2021, 5:19 PM
    Is there any limitation on the size of a single record written into
    Pinot
    ? Our average records are about 6KB when stored in
    AVRO
    , but can reach up to ~50K in edge cases
    m
    k
    • 3
    • 7
  • c

    Chad Preisler

    03/17/2021, 2:40 AM
    I need to transform an encrypted Kafka message before Pinot processes it. Right now for our stream apps we use a custom serde to do it. How can I do it in Pinot? Looks like it would be fairly easy to change Pinot to allow a deserializer to be plugged in. Thoughts?
    k
    • 2
    • 3
  • c

    Chad Preisler

    03/17/2021, 2:46 AM
    Seems like Pinot is stuck on an older version of the JDK due to its use of off memory heap APIs that no longer exist. The code does not compile on JDK 15. Also the “shade” plugin does not work on JDK 15. I read JDK 16 has some new methods for using off heap memory. Is there a plan to move to a modern JDK? Is off heap even necessary now that ZGC can handle 16TB of heap with little to no pause time?
    k
    • 2
    • 3
  • t

    troywinter

    03/17/2021, 5:52 AM
    Is there a way to specify the group id for Kafka realtime ingestion? What’s the ingestion config key should be?
    x
    • 2
    • 9
  • r

    Ronak

    03/17/2021, 4:18 PM
    I was exploring TEXT_MATCH functionality with pinot-0.7.0/0.6.0 and had configured one of the columns for it. Is there any configuration for the refresh time interval for the index - https://docs.pinot.apache.org/basics/indexing/text-search-support After enabling indexing (with
    index type: Text
    and
    encoding type: RAW
    ) on the column and doing TEXT_MATCH, I was first getting an empty result, but after some time, I was getting the result. So, what is the initial delay for such a column to be searchable? Is any settings/configuration (e.g num of docs, indexed size, etc) for the same?
    s
    r
    k
    • 4
    • 6
  • b

    Brian Olsen

    03/17/2021, 5:00 PM
    Hey all 👋 Just jumping into this awesome tech called Pinot! I'm a developer advocate from the Trino project (formerly PrestoSQL). Tomorrow we're having an episode of the Trino Community Broadcast with @User and @User about the Pinot Connector. We're covering the benefits of Trino + Pinot and why you really need Pinot to speed up your common aggregation queries for predictable response times but also gaining the benefit of federated queries over your data lake or other data sources. We'll cover a bit of the specific limitations and current work going on in the Trino-Pinot connector, and finally i'll run a simple demo with the connector! Come watch me crash my docker containers @11am EDT on https://www.twitch.tv/trinodb.
    👍 2
    🍷 4
    🥳 4
    s
    j
    +2
    • 5
    • 5
  • b

    Brian Olsen

    03/17/2021, 7:19 PM
    @User We'll be discussing this PR tommorrow and @User has a pretty neat solution coming in future versions of Trino. See you all tomorrow @11am EDT! 🐇🐇 https://www.twitch.tv/trinodb https://apache-pinot.slack.com/archives/CDRCA57FC/p1616005851043300?thread_ts=1616000429.041000&cid=CDRCA57FC
    🍷 1
  • r

    Ravikumar Maddi

    03/18/2021, 2:07 PM
    Hi All, one basic doubt, I run quick start stream, I understand the all the ports and components behind that, I am not able to understand about 2191. what is running with 2191 port?
    k
    • 2
    • 1
  • j

    Josh Highley

    03/18/2021, 4:58 PM
    Will upsert work with hybrid tables? Will a realtime record become active over an offline record having the same primary key value?
    m
    j
    • 3
    • 2
  • k

    Ken Krugler

    03/19/2021, 5:59 PM
    OK - but it’s in Maven Central 🙂 Should we avoid upgrading to that version?
    k
    • 2
    • 1
  • a

    Aaron Wishnick

    03/19/2021, 6:33 PM
    Does Pinot's batch insert have any way to avoid inserting duplicate data? Say that ever day I want to batch-insert the previous day of data, and I have multiple batches of data per day (say each batch of data corresponds to data from a different ice cream flavor). If I'm generating + batch inserting yesterday's data for each ice cream flavor in parallel, and the "strawberry" job fails, so I rerun it, how do I make sure I'm not batch-inserting "strawberry" data that was already inserting?
    k
    • 2
    • 12
  • k

    Ken Krugler

    03/19/2021, 7:16 PM
    My ops guy is trying to validate JMX metrics, and he asked me how to trigger NUM_MISSING_SEGMENTS. Any suggestions?
    k
    • 2
    • 5
  • o

    Oguzhan Mangir

    03/20/2021, 3:52 PM
    Does pinot stores min max values for dimensions in segment metadata? Or does it just store min max values for date time fields? And can we create inverted or any other indices on date time column?
    k
    m
    • 3
    • 29
  • o

    Oguzhan Mangir

    03/21/2021, 10:22 AM
    The first question; When we enable
    aggregateMetrics
    to pre-aggregation as it is consumed, pinot aggregates data based on fields which defined in
    dimensionFieldSpecs
    and
    dateTimeFieldSpecs
    . Can pinot aggregates data only based on fields which defined in
    dimensionFieldSpecs
    while applying pre-aggregation using
    aggregateMetrics?
    The second question; We can set time to generate segments for real-time table using
    realtime.segment.flush.threshold.time
    config. Let's assume current hour is 10:25. When i set
    realtime.segment.flush.threshold.time
    to
    1 hour
    , pinot creates segment with startTime 10:25, and it will close this segment when time is 11:25. As a result, start/end time of that segment is 1025 1125. But when the new hour starts, I want pinot to close segment.. Start/end time of that segment should be 1000 1100. How can i achieve that?
    k
    • 2
    • 4
  • d

    Dan Hill

    03/22/2021, 3:14 AM
    Any design recommendations for Pinot setups that need to deal with data protection requirements of different locations where certain personal data should remain in location boundaries (e.g. GDPR)? Do people try to setup global tables and use Server and Segment definitions to limit scope? Or do people create separate tables?
    m
    • 2
    • 5
  • r

    Ravikumar Maddi

    03/22/2021, 6:58 AM
    Hi All, I have a doubt, If there is nested JSON(Very large nested entities at-least 5 to 7 levels of embedded jsons entries) . Which is better way of doing schema for that 1. Flatten the JSON -- Schema becomes un-scalable 2. Store Embedded JSONs(JSON indexing concept), and use JSON Evolution functions, but it showing very high time taking. I saw one technical session on Nested Indexing, they said , if one million records there, JSON evolution function might take 10 to 15 seconds to get result. Could you please tell me which is better way. How to design schema for nested JSONs.
    x
    j
    • 3
    • 11
  • k

    Karin Wolok

    03/22/2021, 6:02 PM
    Welcome new 🍷 Pinot slack members!!! Curious who you are and how you found the Pinot community! Want to share what you're working on? @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User
    🐇 2
    👋 4
    🍷 6
    s
    • 2
    • 1
  • v

    virtualandy

    03/23/2021, 4:20 AM
    Hi! I’m Andy. I came across Pinot late last year (I think in a blog but I have Neha’s

    https://youtu.be/mRkWT_EU99M▾

    as my earliest bookmark haha) I’m an engineering manager at Handshake where I help a team focused on building features with (you guessed it) data and analytics. We use a lot of Elastic, Postgres and BigQuery and I’m always personally looking to expand my 🧠 with projects like Pinot. Only just learning but excited to be part of this community.
    🍷 1
    👋 3
    👍 8
    u
    • 2
    • 2
  • k

    Karin Wolok

    03/24/2021, 1:11 AM
    📣 If you're new to Pinot, 🍷 and interested to learn the basic fundamentals (Pinot 101), we invite you to join us this Thursday for 💡 Intro to Apache Pinot! 🧠 Presented by Apache Pinot committer, @User https://www.meetup.com/apache-pinot/events/275991991/
    🎉 3
    s
    r
    +2
    • 5
    • 5
  • c

    Charles

    03/25/2021, 10:10 AM
    Hi all, when pinot to consuming kafka , how to parse nest json such like { “data”: { “name”: “cc”, “age”: 3 } } I just need “name” and age in table
    t
    • 2
    • 10
  • o

    Oguzhan Mangir

    03/25/2021, 10:46 AM
    Hi, is there any article about kubernetes production experience for pinot? We want to learn things like optimal server count, num of segments per server, optimal resources for realtime and offline servers etc. I've found a few articles, but i want to know if there are other articles
    m
    • 2
    • 13
  • c

    Charles

    03/26/2021, 12:42 AM
    Hi. All , If my kafka topic has 32 partitions, can we control pinot consuming threads self
    m
    • 2
    • 9
1...151617...160Latest