https://pinot.apache.org/ logo
Join Slack
Powered by
# getting-started
  • s

    Saoirse Amarteifio

    09/11/2021, 1:58 PM
    Hello - im just getting setup and would like to get some advice... I have added Pinot via Helm on EKS and i will want to (a) ingest parquet from S3 and (b) stream data from MSK(Kafka 2.2.1 on the same VPC) and probably requiring SSL 1. I added a simple schema and table spec - they look ok 2. I (think) i configure deep storage for S3 3. Batch ingestion - Really i am interested in any recommended (standalone) way to ingest data from S3 (watching folders on some interval) and im not sure if the docs are providing what i need? Is there an example of posting directly to the controller? 4. I would like to try with Kafka too in this case im just wondering about the configs (in thread). I am feeling my way though and in this case if anyone has a sample config that would be nice to see for this MSK setup but i expect ill get there with some trial and error a. I was a little put off but the mention of needing to update the pom for Kafka version 2.2.1 and i was not really sure if that was indeed needed or how i would do that via Helm
    x
    • 2
    • 9
  • s

    Slackbot

    09/14/2021, 8:34 AM
    This message was deleted.
    j
    r
    • 3
    • 5
  • d

    Dan DC

    09/14/2021, 3:41 PM
    Hey, is there a way to tag servers upon deployment/start up? I.e. via config files inatead of using the API?
    k
    • 2
    • 1
  • r

    RZ

    09/16/2021, 10:44 AM
    Do You have any idea?
    n
    • 2
    • 1
  • t

    Tiger Zhao

    09/16/2021, 6:22 PM
    Any tips for debugging slow queries? I was stress testing my cluster, and noticed a behavior where when I send a bunch of queries at once, the query latency goes from ~100ms to 4-5 seconds. The latency then stays relatively high for a few minutes after the stress test and then returns back to ~100ms. I also noticed behavior where sometimes a single server would take significantly longer to process a query, which ends up increasing the overall latency by a lot. That one slow server also stays consistently slow for a while, so every query is bottlenecked by that server. Thanks!
    k
    b
    • 3
    • 2
  • x

    xtrntr

    09/21/2021, 6:05 AM
    hello, i’ll like to clarify the usage of dimension tables - can i use the columns in
    dimTable
    but not
    factTable
    to filter in the WHERE clause? https://docs.google.com/document/d/1InWmxbRqwcqIakzvoEWHLxtX4XR9H5L01256EbAUHV8/edit#
    Copy code
    Table factTable:
    string    uuid
    int       metric
    timestamp event_time
    string    status
    Copy code
    Table dimTable:
    string uuid
    string name 
    string country
    Copy code
    SELECT
      f.uuid,
      d.name,
      d.country,
      abs(sum(m.metric)) as sum_metric
    FROM
      factTable f join dimTable d on f.uuid = d.uuid
    WHERE 
      d.country in ('USA')
    GROUP BY
      1,
      2,
      3
    ORDER BY
      2
    m
    k
    • 3
    • 6
  • a

    arun muralidharan

    09/21/2021, 3:47 PM
    Hello Folks, Can someone point to me a document about how segments are read from both local storage and deep storage ? Can the cluster automatically recover from deep storage when local segment store is cleared ? I want to basically know how the read/write path is in the presence and absence of deep storage.
    m
    • 2
    • 3
  • t

    Tiger Zhao

    09/22/2021, 9:14 PM
    Does pinot support features like the
    with
    clause, or views?
    k
    • 2
    • 1
  • t

    Tiger Zhao

    10/01/2021, 8:20 PM
    Is there a way to view the number of nodes that are generated for a star tree? (I'm exploring various indexing configs and was wondering how different setups affects the storage and performance)
    • 1
    • 1
  • d

    Dan DC

    10/07/2021, 2:50 PM
    Hey, I've seen somewhere that pinot have some special columns with metadata about the row segment path and other stuff. I don't seem to find that anywhere and I wonder if someone could kindly point me at where they are documented
    k
    • 2
    • 2
  • n

    Neha Pawar

    10/08/2021, 8:22 PM
    it has to be a new name, you cannot transform a column and put it into the same name
    👍 1
    k
    • 2
    • 6
  • s

    Saoirse Amarteifio

    10/11/2021, 5:12 PM
    Im running my first batch ingestion job ingestion from S3 parquet files - the task was kicked off and the 8 rows of the input sample are read but then it fails and im not sure what the error message is telling me ... what is the illegal argument in this context? I did not get any closer looking at the source for Segment Name Generator...
    Copy code
    RecordReader initialized will read a total of 8 records.
    at row 0. reading next block
    block read in memory in 1 ms. row count = 8
    Start building IndexCreator!
    Finished records indexing in IndexCreator!
    Failed to generate Pinot segment for file - <s3://bucket/samples/data/myData/test.parquet>
    java.lang.IllegalArgumentException: null
            at shaded.com.google.common.base.Preconditions.checkArgument(Preconditions.java:108) ~[pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-11f8550b9b2881ede4d105416ed970a5dd708463]
            at org.apache.pinot.segment.spi.creator.name.SimpleSegmentNameGenerator.generateSegmentName(SimpleSegmentNameGenerator.java:53) ~[pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-11f8550b9b2881ede
    4d105416ed970a5dd708463]
    Can anyone suggest what illegal thing i am doing from this error message? adding jobSpec in thread...
    • 1
    • 2
  • s

    Saoirse Amarteifio

    10/11/2021, 8:43 PM
    When i query presto when there is a column with a reserved keyword like
    timestamp
    even though the spec for presto suggests that it can be escaped with double quotes, i cannot seem to submit a query that includes
    "timestamp"
    It might be specific to the clients I am using; i have tried the presto-cli freshly downloaded and a python client and both result in a PQLParsingError. What to do in this situation? (this is testing the presto-pinot connector but maybe not a Pinot question for this channel)
    x
    • 2
    • 3
  • s

    Sharon Akinyi

    10/13/2021, 8:18 AM
    Hello I am new in using Apache Pinot. I am trying to learn more about Pinot operators. Would anyone help me in getting to unserstand how it works and how to go about it?
    m
    k
    • 3
    • 3
  • c

    Courage Noko

    10/13/2021, 7:56 PM
    hey, I deployed pinot on Kubernetes, is there a way to set Google Cloud Storage configs such as
    pinot.controller.storage.factory.gs.projectId
    on the server/controller during deployment or update these?
    k
    • 2
    • 3
  • n

    Neha Pawar

    11/02/2021, 6:22 PM
    if you done have a schema registry, you need to provide schema as a config in the stream config @User
    p
    • 2
    • 5
  • p

    Priyank Bagrecha

    11/02/2021, 6:25 PM
    one more question - do i need to keep port 8098 and 8099 open on server and broker nodes? i am setting everything up manually right now.
    n
    x
    • 3
    • 7
  • p

    Priyank Bagrecha

    11/02/2021, 10:34 PM
    I finally got it working. Thanks a ton for all the help. Had to wrangle with the schema json a bit but finally victory!
    👍 2
    n
    • 2
    • 3
  • t

    tyler dobbs

    11/03/2021, 4:54 AM
    Been trying to just start Pinot locally in a docker container. I'm using pinot version
    0.8.0
    and
    openjdk:11
    . I'm on a mac. I'm trying to start the cluster by using the pinot admin commands
    StartZookeeper
    StartController
    StartBroker
    and
    StartServer
    as shown in the getting started. However inevitably the controller will go down before I can start the Broker and the Server with this error:
    Expiring session 0x100080c84b20005, timeout of 30000ms exceeded
    , Is there a way to avoid this?
    k
    • 2
    • 2
  • p

    Priyank Bagrecha

    11/09/2021, 12:56 AM
    Can someone please point me to documentation for
    enableDefaultStarTree
    and
    enableDynamicStarTreeCreation
    fields in the table confi? I want to understand what does a default / dynamic star-tree index mean.
    n
    • 2
    • 16
  • p

    Priyank Bagrecha

    11/09/2021, 9:56 PM
    Oh no theta sketch either?
    n
    • 2
    • 2
  • p

    Priyank Bagrecha

    11/11/2021, 9:11 PM
    link for
    Optimizing Scatter and Gather
    is broken on https://docs.pinot.apache.org/operators/operating-pinot/tuning
    m
    j
    • 3
    • 4
  • n

    Neha Pawar

    11/15/2021, 3:55 PM
    You don't need the group id or any of the properties that say "hlc". Your tables might be out of sync because you've set offset criteria "largest". Each table will start consuming from the last message in the topic, so if your rate of events is high, second table will miss out on events that were emitted between creation of first and second table
    p
    • 2
    • 22
  • p

    Priyank Bagrecha

    11/15/2021, 9:40 PM
    The link for
    Transform Function in Aggregation Grouping
    is broken on https://docs.pinot.apache.org/users/user-guide-query/querying-pinot#udf. I am guessing it should be pointing to https://docs.pinot.apache.org/users/user-guide-query/supported-transformations.
    m
    • 2
    • 2
  • x

    xtrntr

    11/16/2021, 6:09 AM
    will using
    IdSet
    with “NOT IN” clause have any unintended performance impact? e.g.
    select * from table where userid not in IDSET(...)
    m
    j
    • 3
    • 4
  • p

    Priyank Bagrecha

    11/18/2021, 9:02 AM
    both https://downloads.apache.org/pinot/apache-pinot-0.8.0/apache-pinot-0.8.0-bin.tar.gz and https://downloads.apache.org/pinot/apache-pinot-incubating-0.7.1/apache-pinot-incubating-0.7.1-bin.tar.gz are returning 404
    x
    • 2
    • 1
  • p

    Priyank Bagrecha

    11/19/2021, 7:25 AM
    i am noticing that disk on a controller instance starts filling up pretty fast. what can i do to slow it down?
    x
    • 2
    • 2
  • d

    Diana Arnos

    11/19/2021, 2:10 PM
    Hello there 👋 I'm developing something that uses Pinot, consuming straight from a new kafka topic. I was able to run everything I need and it is beautiful (thanks for the work on this project 💪 ) Now I'm trying to improve some things on my project and wondered if there is a way to use a schema registry instead of leaving the table schema inside the project itself. What I would like to happen: I have a json schema related to the topic Pinot will consume from and instead of manually editing/creating the table schema (as explained here in the docs), I would like for Pinot to read the JSON schema from my registry and automagically use it when ingesting. I'm not sure if the configs
    stream.kafka.decoder.prop.schema.registry.rest.url
    and
    stream.kafka.decoder.prop.schema.registry.schema.name
    could help me achieve this.
    👋 1
    r
    m
    • 3
    • 10
  • p

    Pavel Stejskal

    11/29/2021, 7:29 PM
    Hello! I’ve got a question related to simple use case. Currently we have a Hadoop cluster for netflow ingestion ~ 320 TB data. Ingestion is from Kafka via Spark app directly to Hive (external table - simple parquet files). Searching in stored data is via Spark. Table is partitoned by hour but still we’re missing indexes. I’d like to replace current flow with Apache Pinot, but I’m not sure about segment store. We need to keep HDFS as a data backend and from documentation it seems like Pinot needs store data locally. We’re targeting to hybrid table, e.g. keep 1 hour from real time Kafka topis and older data to be pulled from HDFS. My questio is: a) real-time part of data need local disks - every Pinot server is holding a part of data from Kafka (consumer in group), right? b) hour + 1 data are stored “optimized” and indexed locally and pushed to HDFS? c) When I query data, current data are pulled from local semgment, older data are pulled in lazy fashion from HDFS/s3? d) is possible to host 200 TB table with ~ 12 columns (half nums, half strings) with @ 6 Pinot servers and get some benefit from indexes, just be more efficient than Spark with partition pruning?
    m
    j
    x
    • 4
    • 7
  • l

    Luis Fernandez

    12/01/2021, 7:27 PM
    has anyone tried to move the segments folder if you use google to bigquery? to do big data computations that may not be possible in pinot?
    m
    m
    • 3
    • 17
12345...11Latest