https://pinot.apache.org/ logo
Join Slack
Powered by
# general
  • t

    Tiger Zhao

    12/08/2021, 5:54 PM
    Is there a way to set up the access configurations to easily limit tables to certain users? It looks like right now we can only limit users to certain tables?
    m
    • 2
    • 5
  • s

    Suraj

    12/08/2021, 9:06 PM
    Hello - we are exploring storing metrics at higher granularities by rolling up the data at lower granularities. Ex: 1s metrics rolled up and stored at 1 min granularity. Does pinot support percentile aggregations ?
    m
    • 2
    • 9
  • n

    Nicholas Yu

    12/09/2021, 4:20 AM
    hello friends, i’m looking for information around running spark batch ingestion jobs using AWS EMR. thanks
    👍 1
    m
    • 2
    • 1
  • t

    Ty Brooks

    12/09/2021, 11:08 PM
    In the docs, there are references to the “Filesystem backend” and “Deep Storage”… are those meant to be conceptually synonymous?
    m
    • 2
    • 1
  • m

    Map

    12/10/2021, 2:37 PM
    Is there a way or an API to get the latest offset consumed for a real-time table/segment?
    n
    m
    • 3
    • 7
  • l

    Lars-Kristian Svenøy

    12/10/2021, 5:02 PM
    Hey guys. Regarding https://nvd.nist.gov/vuln/detail/CVE-2021-44228 (The Log4j vulnerability) when can we expect a release of Pinot to mitigate that? I see you just recently merged a PR to deal with it: https://github.com/apache/pinot/pull/7889
    m
    p
    +2
    • 5
    • 11
  • d

    Diogo Baeder

    12/11/2021, 6:02 PM
    Hi folks, I got a question about publishing events for Pinot realtime tables. I have this situation where I have tons of analytics logs backed up, and I want to send all that to Pinot, and also start sending logs in realtime. I'm preparing my table threshold time for 24h and size for 200M, however it's not clear how I can set up the tables so that I can have a cleaner "1 day of data per segment" kind of deal. Should I perhaps use hybrid tables, where I would publish the old logs to the offline table, and live logs to the realtime table? What do you recommend me doing in this case where the logs are out-of-order for uploading from my backups?
    m
    k
    • 3
    • 5
  • d

    Diogo Baeder

    12/12/2021, 2:34 PM
    One more question, folks: when it comes to segments of ~200M in size, what segment storage technology would you recommend using when running a cluster in AWS? HDFS? S3? EFS mounted?
    k
    m
    +2
    • 5
    • 16
  • a

    Ashish

    12/12/2021, 9:07 PM
    Is there any way to extract more than one fields from a json column? jsonextractscalar only allows one field at a time. So, if I do select jsonextractscala(jsonColumn, ‘field1’), jsonextractscalar(jsonColumn, ‘field2’), will it result in parsing the json document twice for each doc/row?
    k
    • 2
    • 2
  • a

    Ashish

    12/13/2021, 12:52 AM
    There does not seem to be a way to exclude properties in json path expression used by jsonextractscalar. I guess, only way seems to be write my own jsonextractscalars that calls json parser.delete(propertiesToDelete).read(propertiesToFetch) is my understanding right? Any other suggestions?
    k
    • 2
    • 2
  • x

    Xiang Fu

    12/13/2021, 9:53 PM
    <!here>
    Copy code
    Hello Community,
    
    We are pleased to announce that Apache Pinot 0.9.1 is released!
    
    Apache Pinot is a realtime distributed OLAP datastore, designed to answer OLAP queries with low latency use-cases.
    
    This is a bug fix release that includes the upgrade to the latest log4j library, v2.15.0. This is our response to CVE-2021-44228.
    
    The release can be downloaded at <https://pinot.apache.org/download>
    
    The release note is available at <https://docs.pinot.apache.org/basics/releases/0.9.1>
    
    Additional resources -
    Project website: <https://pinot.apache.org>
    Getting started: <https://docs.pinot.apache.org/getting-started>
    Pinot developer blogs: <https://medium.com/apache-pinot-developer-blog>
    Intro to Pinot Video: <https://www.youtube.com/watch?v=T70jTTYhYyM>
    
    Join Pinot Community -
    Twitter: <https://twitter.com/ApachePinot>
    Meetup: <https://www.meetup.com/apache-pinot/>
    Slack channel: <https://communityinviter.com/apps/apache-pinot/apache-pinot>
    
    Best Regards,
    
    Apache Pinot Team
    ❤️ 5
    👍 22
    m
    • 2
    • 4
  • w

    Weixiang Sun

    12/14/2021, 5:29 PM
    We are working on offline segment ingestion. Currently we are using the TarPush. But its problem is that the controller need get involved with the data path by downloading the segment. Just curious, how does metadata push prevent the controller getting involved with data path?
    k
    e
    • 3
    • 13
  • c

    Chris Theodore Jayakumar

    12/14/2021, 11:30 PM
    Hello folks, what is the recommended system specs for each of the services required for a pinot cluster. Is there a formula to calculate this based on the size of the data?
    🍷 1
    m
    • 2
    • 4
  • x

    Xiang Fu

    12/15/2021, 8:32 AM
    Hello <!here>, We are pleased to announce that Apache Pinot 0.9.2 is released! Apache Pinot is a realtime distributed OLAP datastore, designed to answer OLAP queries with low latency use-cases. This is a bug fixing release contains: - Upgrade log4j to 2.16.0 to fix CVE-2021-45046 (#7903) - Upgrade swagger-ui to 3.23.11 to fix CVE-2019-17495 (#7902) - Fix the bug that RealtimeToOfflineTask failed to progress with large time bucket gaps (#7814). The release can be downloaded at https://pinot.apache.org/download The release note is available at https://docs.pinot.apache.org/basics/releases/0.9.2 Additional resources - Project website: https://pinot.apache.org Getting started: https://docs.pinot.apache.org/getting-started Pinot developer blogs: https://medium.com/apache-pinot-developer-blog Intro to Pinot Video: 

    https://www.youtube.com/watch?v=T70jTTYhYyM▾

    Join Pinot Community - Twitter: https://twitter.com/ApachePinot Meetup: https://www.meetup.com/apache-pinot/ Slack channel: https://communityinviter.com/apps/apache-pinot/apache-pinot Best Regards, Apache Pinot Team
    🙌 17
    s
    • 2
    • 1
  • m

    Map

    12/16/2021, 5:05 PM
    Hi what would be easiest way to clean up all the pinot configs for a cluster in Zookeeper?
    s
    • 2
    • 4
  • j

    Jeff Moszuti

    12/16/2021, 8:21 PM
    I'll like to try out tag-based instance assignment. Which file do I need to edit to set the TAG_LIST for a server?
    n
    j
    • 3
    • 6
  • a

    Ashish

    12/16/2021, 10:03 PM
    Pql support is being deprecated but is the pql result format is going to be supported for sql queries? pql format seems to be more efficient for group by/aggregate queries.
    m
    • 2
    • 3
  • w

    Weixiang Sun

    12/18/2021, 1:30 AM
    What is the difference between dimensionFieldSpecs and metricFieldSpecs? When should we use them?
    m
    • 2
    • 3
  • p

    Prashant Pandey

    12/20/2021, 6:37 AM
    We are planning to migrate Pinot to a new kafka cluster. Our plan is to point Pinot to the new endpoints, and update
    segment.realtime.startOffset
    of each CONSUMING segment to 0, and restart the servers. Do we need to take care of anything else?
    m
    • 2
    • 4
  • s

    Slackbot

    12/21/2021, 3:19 PM
    This message was deleted.
    m
    k
    • 3
    • 2
  • e

    Evan Galpin

    12/21/2021, 10:15 PM
    nvm, I think I found my answer in code[1]: Yes, all values in the MV column (array) are taken into account. It would be interesting to be able to filter at that level as well. Ex. an equality check to count only elements in the column equal to an input value:
    Copy code
    COUNTMATCHMV(my_column, "foo")
    where a string MV column containing:
    Copy code
    ["foo", "bar", "foo", "baz"]
    might return 2. Thoughts on the feasibility? [1] https://github.com/apache/pinot/blob/f8c7e1fc8603f4091e418f3841dcb6bc2d75d5d8/pino[…]core/query/aggregation/function/CountMVAggregationFunction.java
    m
    j
    • 3
    • 5
  • a

    Anshu Jalan

    12/22/2021, 5:10 AM
    In rollup, its mentioned in the doc as (perform metrics aggregations across common dimensions + time),  so will it treat all dimension and time columns as primary key to aggregate the metrics? Also, in dedup what is meant by duplicate rows (which columns are used)?
    n
    • 2
    • 1
  • s

    Sunil Chaurasia

    12/22/2021, 7:36 AM
    Hey Guys, I am Sunil, My organisation is planning to use the Pinot for some of our use cases, currently we are in sort of POC phase. I would like to get some information around the benchmarking, if any one has done in this group. Also, I would like to know your opinion around taking the Managed service vs self managed. It would be really great if any one can help me on this.
    a
    m
    k
    • 4
    • 8
  • p

    Prashant Pandey

    12/22/2021, 12:13 PM
    Hi folks, we wanted to change our table names (from camel case to snake case) in Pinot. For this, we supplied existing table configs to the create table api with the changed table name (all other configs remained unchanged), and disabled the old tables. But we observed that the new tables contained data quite old (that wasn’t present in kafka). For example, our kafka retention is 2h but the new table still contained data as old as 6h old! Is there some sort of data migration happening from old segments to new segments?
    m
    • 2
    • 4
  • a

    Anshu Jalan

    12/24/2021, 9:19 AM
    As per the design doc: UpsertConfig can also include customMergeStrategies if Groovy mergers is enabled.
    Copy code
    {
       "upsertConfig":{
          "mode":"PARTIAL",
          "globalUpsertStategy": "OVERWRITE",
          "customMergeStrategies":{
             "field3":"Groovy({firstName+' '+lastName}, firstName, lastName)"
          },}
    }
    so these customMergeStrategies is executed before or after transformConfigs?
    k
    y
    • 3
    • 3
  • p

    Priyank Bagrecha

    12/30/2021, 1:19 AM
    wget <https://downloads.apache.org/pinot/apache-pinot-0.9.3/apache-pinot-0.9.3-bin.tar.gz>
    seems to be timing out. tried locally as well as from aws ec2 instances.
    x
    • 2
    • 2
  • v

    Vinod Adwani

    12/30/2021, 8:41 AM
    Hi folks! I am facing some issues in Kafka stream ingestion in pinot. Pinot is able to connect to Kafka but not able to consume any records or create segments. Can someone please help me?
    sample_kafka_message.txttable_schema.jsontable_config.json
    k
    m
    • 3
    • 7
  • s

    Syed Akram

    12/30/2021, 9:00 AM
    Hi folks, when can we expect a Pinot release with log4j 2.17.1? @User https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-44832
    k
    x
    • 3
    • 3
  • a

    Abhishek Kedia

    01/03/2022, 10:36 AM
    Hi everyone, my team is facing error reading data from Confluent Kafka to Pinot. Does anyone here have experience with the Confluent Kafka -> Pinot ? Would appreciate any help here.
    m
    l
    • 3
    • 4
  • x

    xtrntr

    01/04/2022, 7:45 AM
    2nd question: if i plan to use lookup table for joins, i can only use it for decorating the query results - if i use lookup joins in the WHERE clause, queries will be very slow because the lookup join cannot benefit from indexing. is my understanding correct here?
    k
    • 2
    • 23
1...313233...160Latest