https://pinot.apache.org/ logo
Join Slack
Powered by
# getting-started
  • k

    Kishore G

    07/01/2021, 7:53 PM
    added instructions to quickstart pinot from IDE as well
    👍 1
  • b

    Bruce Ritchie

    07/13/2021, 4:26 PM
    Question on cluster and node sizing. I have 30b rows with ~90 columns (5+TB of parquet files) to ingest into pinot. qps once ingested is likely < 10/sec. Is there a document outlining sizing recommendations for various node types?
  • m

    Matt Landers

    08/30/2021, 4:10 PM
    set the channel topic: New to Pinot? Start here: https://www.youtube.com/playlist?list=PLihIrF0tCXdeimVCZwuejXb7FkjsyN9_k
    👍 3
  • l

    Luis Fernandez

    09/01/2021, 5:01 PM
    hey friends, I have a need in my current project to do stats for ads, (impressions, click_count, click_spent) etc…, now my client has many dimensions they may want to look stuff by (locale, user_id, search query, device etc) … we currently track all of this data thru kafka and was thinking about using pinot to make this data queryable, the user facing dashboard looks at this data by set timeranges and also custom time ranges, I was wondering if pinot is a good candidate for given problem. Right now i’m working in a POC with pinot so would appreciate any insights 🙂 thank you!
  • x

    xtrntr

    09/05/2021, 6:27 PM
    do dimension tables support upsert? i plan to update the dimension table on a daily basis
  • k

    Kishore G

    09/05/2021, 6:31 PM
    If it’s small enough, use refresh and update the entire table
    👍 1
  • x

    xtrntr

    09/07/2021, 10:31 PM
    if i wish to use the native java client but only can have my broker/controller exposed outside of the cluster, is my only option to use
    ConnectionFactory.fromHostList(brokerUrl)
    ? im not all that familiar with ZK and i dont see a way in the API to retrieve broker addresses from the zookeeper category of APIs exposed by the controller https://docs.pinot.apache.org/users/clients/java
  • x

    Xiang Fu

    09/08/2021, 1:46 AM
    you need to expose broker externally then use broker list to query
  • r

    RZ

    09/16/2021, 10:44 AM
    Hello friends, I want to test the ThirdEye solution for Pinot anomaly detection, so I followed the documentation https://docs.pinot.apache.org/integrations/thirdeye, but failed to connect to http://localhost:1426/
  • a

    arun muralidharan

    09/21/2021, 3:47 PM
    Thanks in advance.
  • k

    Kamal Chavda

    10/08/2021, 8:11 PM
    When using ingestionconfig > transformconfigs, does it HAVE to be a new column for the transformFunction? I would like to transform an existing column from source and keep the same column name for the Pinot table instead of creating a new column.
  • p

    Priyank Bagrecha

    11/02/2021, 6:09 AM
    hello, i am just getting started. i am trying to consume avro records from a 2.x kafka stream which doesn't use schema registry. does this look correct?
    Copy code
    "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.inputformat.avro.KafkaAvroMessageDecoder",
    "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory"
    table status says bad in cluster manager and i am trying to figure out what i am missing. i am looking at the code in github, and seems like i need to provide schema for parsing however there is a comment saying not to use schema as it will be dropped in future release. any pointers will be greatly appreciated. thanks in advance!
  • p

    Priyank Bagrecha

    11/02/2021, 6:18 AM
    should i use
    SimpleAvroMessageDecoder
    ? even that one has the same comment
    Copy code
    Do not use schema in the implementation, as schema will be removed from the params
  • p

    Priyank Bagrecha

    11/02/2021, 6:34 AM
    I am using version 0.7.1 with Java 8
  • n

    Niteesh Hegde

    11/02/2021, 10:35 AM
    Hi, I am new to pinot Can I ingest data to pinot from postgres logs?
  • p

    Priyank Bagrecha

    11/02/2021, 6:07 PM
    this one didn't work either. :(
  • n

    Neha Pawar

    11/02/2021, 6:24 PM
    "stream.kafka.decoder.prop.schema" : "<your avro schema here>"
  • p

    Priyank Bagrecha

    11/02/2021, 6:24 PM
    got it. thanks!
  • o

    Orbit

    11/08/2021, 9:44 PM
    @User has left the channel
  • p

    Priyank Bagrecha

    11/09/2021, 1:14 AM
    Also what happens when I update star-tree index configs in scenarios like - adding a new dimension to
    dimensionsSplitOrder
    or even removing one - what happens to the index and the segments? same for
    functionColumnPairs
    . I am thinking of editing as adding a new one and dropping the old one.
  • p

    Priyank Bagrecha

    11/09/2021, 7:27 PM
    Does the query console only show limited results for a query? I am wondering why I am seeing only some rows in results to query like
    Copy code
    SELECT col1, col2, col3, DISTINCTCOUNT(col4) AS distinct_col4
    FROM   table
    GROUP  BY col1, col2, col3
    the star-tree index looks like
    Copy code
    "starTreeIndexConfigs": [
          {
            "dimensionsSplitOrder": [
              "col1",
              "col2",
              "col3"
            ],
            "skipStarNodeCreationForDimensions": [],
            "functionColumnPairs": [
              "DISTINCTCOUNT__col4"
            ],
            "maxLeafRecords": 1
          }
        ],
    can i also add
    DistinctCountHLL__col4
    and
    DistinctCountThetaSketch__col4
    to
    functionColumnPairs
    and evaluate the performance for all 3 for this query?
  • j

    Jackie

    11/09/2021, 9:05 PM
    Startree only supports
    distinctcounthll
    because it's intermediate result size is bounded
  • j

    Jackie

    11/09/2021, 9:05 PM
    You need to add
    limit
    to the query, or it defaults to 10
  • p

    Priyank Bagrecha

    11/09/2021, 9:56 PM
    And thank you Jackie!
  • p

    Priyank Bagrecha

    11/15/2021, 9:46 AM
    hello. i started two pinot clusters with both of them consuming from the same kafka cluster and same topic. one pinot cluster is using inverted index on the same set of fields that the other one uses for star-tree index. so basically two pinot tables where the only difference is that first one uses inverted index while second one uses star-tree index. i created tables at the same time so i am assuming that both start consuming from the kafka topic at the same time. when i issue same query to both tables one after another, i see that
    totalDocs
    is 2x/3x for table with inverted index in comparison to table with star-tree index. if it matters, i started querying tables after ~5-10 mins of creating them. i also confirmed this by running
    Copy code
    select count(*) from <table_name>
    is this expected?
  • p

    Priyank Bagrecha

    11/15/2021, 10:11 AM
    i noticed that
    group.id =
    (basically empty) as so maybe both pinot tables are using the same group id.
  • p

    Priyank Bagrecha

    11/15/2021, 12:05 PM
    i tried using
    Copy code
    "streamConfigs": {
          "streamType": "kafka",
          "stream.kafka.consumer.type": "lowLevel",
          "stream.kafka.topic.name": <topic_name>,
          "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.inputformat.avro.SimpleAvroMessageDecoder",
          "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
          "stream.kafka.broker.list": <broker_list>,
          "realtime.segment.flush.threshold.size": "0",
          "realtime.segment.flush.threshold.time": "24h",
          "realtime.segment.flush.desired.size": "50M",
          "stream.kafka.consumer.prop.auto.offset.reset": "largest",
          "stream.kafka.consumer.prop.group.id": <group_id>,
          "stream.kafka.decoder.prop.schema": <schema>
        }
    and
    Copy code
    "streamConfigs": {
          "streamType": "kafka",
          "stream.kafka.consumer.type": "highLevel",
          "stream.kafka.topic.name": <topic_name>,
          "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.inputformat.avro.SimpleAvroMessageDecoder",
          "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
          "stream.kafka.hlc.bootstrap.server": <broker_list>,
          "realtime.segment.flush.threshold.size": "0",
          "realtime.segment.flush.threshold.time": "24h",
          "realtime.segment.flush.desired.size": "50M",
          "stream.kafka.consumer.prop.auto.offset.reset": "largest",
          "stream.kafka.consumer.prop.group.id": <group_id>,
          "stream.kafka.decoder.prop.schema": <schema>
        }
    and
    Copy code
    "streamConfigs": {
          "streamType": "kafka",
          "stream.kafka.consumer.type": "highLevel",
          "stream.kafka.topic.name": <topic_name>,
          "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.inputformat.avro.SimpleAvroMessageDecoder",
          "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
          "stream.kafka.hlc.bootstrap.server": <broker_list>,
          "realtime.segment.flush.threshold.size": "0",
          "realtime.segment.flush.threshold.time": "24h",
          "realtime.segment.flush.desired.size": "50M",
          "stream.kafka.consumer.prop.auto.offset.reset": "largest",
          "stream.kafka.consumer.prop.hlc.group.id": <group_id>,
          "stream.kafka.decoder.prop.schema": <schema>
        }
    and none of those worked. finally after looking at code i tried
    Copy code
    "streamConfigs": {
            "streamType": "kafka",
            "stream.kafka.consumer.type": "lowLevel",
            "stream.kafka.topic.name": <topic_name>,
            "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.inputformat.avro.SimpleAvroMessageDecoder",
            "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
            "stream.kafka.broker.list": <broker_list>,
            "stream.kafka.consumer.prop.auto.offset.reset": "largest",
            "stream.kafka.group.id": <group_id>,
            "stream.kafka.decoder.prop.schema": <schema>,
            "realtime.segment.flush.threshold.size": "0",
            "realtime.segment.flush.threshold.time": "24h",
            "realtime.segment.flush.desired.size": "50M"
          },
    and that was able to consume from kafka but i don't see it in the list of kafka consumer groups. logs still say group.id is empty. any help / pointers are appreciated.
  • p

    Priyank Bagrecha

    11/15/2021, 12:24 PM
    also tried
    Copy code
    "streamConfigs": {
            "streamType": "kafka",
            "stream.kafka.consumer.type": "highLevel",
            "stream.kafka.topic.name": <topic_name>,
            "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.inputformat.avro.SimpleAvroMessageDecoder",
            "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
            "stream.kafka.hlc.bootstrap.server": <broker_list>,
            "stream.kafka.consumer.prop.auto.offset.reset": "smallest",
            "stream.kafka.hlc.group.id": <group_id>,
            "stream.kafka.decoder.prop.schema": <schema>,
            "realtime.segment.flush.threshold.size": "0",
            "realtime.segment.flush.threshold.time": "24h",
            "realtime.segment.flush.desired.size": "50M"
          },
    but it doesn't consume any events from kafka at all.
  • n

    Neha Pawar

    11/15/2021, 5:09 PM
    @User ^
  • c

    Caesar Yao

    06/06/2023, 2:29 AM
    Hello everyone, does pinot support NFS as the deep store?
    m
    • 2
    • 11
1...7891011Latest