https://pinot.apache.org/ logo
Join Slack
Powered by
# getting-started
  • a

    Amit Chopra

    12/11/2020, 4:50 PM
    Then i changed config as mentioned in https://docs.pinot.apache.org/operators/operating-pinot/decoupling-controller-from-the-data-path. And now segments are not being written to S3. I do see segments being created, as they show up on query browser. But the segments show up as status BAD. Can someone help to point what is wrong with the configuration: Configs: controller.conf ------------------------------ controller.helix.cluster.name=pinot-quickstart controller.port=9000 controller.enable.split.commit=true controller.allow.hlc.tables=false controller.data.dir=/tmp/pinot-tmp-data/ controller.local.temp.dir=/tmp/pinot-tmp-data/ pinot.controller.storage.factory.class.s3=org.apache.pinot.plugin.filesystem.S3PinotFS pinot.controller.storage.factory.s3.region=us-west-2 pinot.controller.segment.fetcher.protocols=file,http,s3 pinot.controller.segment.fetcher.s3.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher controller.zk.str=pinot-zookeeper:2181 pinot.set.instance.id.to.hostname=true server.conf ------------------ pinot.server.netty.port=8098 pinot.server.instance.enable.split.commit=true pinot.server.adminapi.port=8097 pinot.server.instance.dataDir=/tmp/pinot-tmp/server/index pinot.server.instance.segment.store.uri=s3://pinot-quickstart-s3/pinot-data/pinot-s3-example/controller-data pinot.server.instance.segmentTarDir=/tmp/pinot-tmp/server/segmentTars pinot.server.storage.factory.class.s3=org.apache.pinot.plugin.filesystem.S3PinotFS pinot.server.storage.factory.s3.region=us-west-2 pinot.server.segment.fetcher.protocols=file,http,s3 pinot.server.segment.fetcher.s3.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher pinot.set.instance.id.to.hostname=true pinot.server.instance.realtime.alloc.offheap=true table conf ---------------------- { “REALTIME”: { “tableName”: “demo1_REALTIME”, “tableType”: “REALTIME”, “segmentsConfig”: { “timeType”: “MILLISECONDS”, “schemaName”: “demo1", “timeColumnName”: “mergedTimeMillis”, “retentionTimeUnit”: “DAYS”, “retentionTimeValue”: “60", “replication”: “1", “replicasPerPartition”: “1", “completionConfig”: { “completionMode”: “DOWNLOAD” }, “peerSegmentDownloadScheme”: “http” }, “tenants”: { “broker”: “DefaultTenant”, “server”: “DefaultTenant” }, “tableIndexConfig”: { “streamConfigs”: { “streamType”: “kafka”, “stream.kafka.consumer.type”: “lowlevel”, “stream.kafka.topic.name”: “demo1", “stream.kafka.decoder.class.name”: “org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder”, “stream.kafka.consumer.factory.class.name”: “org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory”, “stream.kafka.zk.broker.url”: “z-1.pinot-quickstart-msk-d.9sahwk.c7.kafka.us-west-2.amazonaws.com:2181,z-3.pinot-quickstart-msk-d.9sahwk.c7.kafka.us-west-2.amazonaws.com:2181,z-2.pinot-quickstart-msk-d.9sahwk.c7.kafka.us-west-2.amazonaws.com:2181", “stream.kafka.broker.list”: “b-2.pinot-quickstart-msk-d.9sahwk.c7.kafka.us-west-2.amazonaws.com:9092,b-1.pinot-quickstart-msk-d.9sahwk.c7.kafka.us-west-2.amazonaws.com:9092", “realtime.segment.flush.threshold.time”: “10m”, “realtime.segment.flush.threshold.size”: “10000", “stream.kafka.consumer.prop.auto.offset.reset”: “smallest” }, “enableDefaultStarTree”: false, “enableDynamicStarTreeCreation”: false, “loadMode”: “MMAP”, “autoGeneratedInvertedIndex”: false, “createInvertedIndexDuringSegmentGeneration”: false, “aggregateMetrics”: false, “nullHandlingEnabled”: false }, “metadata”: { “customConfigs”: {} } } }
    x
    t
    +2
    • 5
    • 61
  • t

    Ting Chen

    12/11/2020, 8:00 PM
    In our set up, controller.data.dir points to the deep store and also is consistent with the server upload destination.
    a
    • 2
    • 1
  • m

    Mahesh Yeole

    12/14/2020, 9:04 PM
    @User @User I see lot of files are written to S3 under same timestamp but i do see error on controller as well as server.    I see on cluster manager console, segment keep showing consuming…. We are tying to use split commit feature but even setting split.commit to true for both controller and server , we do see "isSplitCommitType":false in server error. error on server logs [LLRealtimeSegmentDataManager_pullRequestMergedEventsAwsMskDemo__0__1__20201214T1851Z] [pullRequestMergedEventsAwsMskDemo__0__1__20201214T1851Z] CommitEnd failed with response {"isSplitCommitType":false,"streamPartitionMsgOffset":null,"buildTimeSec":-1,"status":"FAILED","offset":-1} Error on controller logs [SegmentCompletionFSM_pullRequestMergedEventsAwsMskDemo__0__1__20201214T1851Z] [grizzly-http-server-1] Caught exception while committing segment file for segment: pullRequestMergedEventsAwsMskDemo__0__1__20201214T1851Z java.io.IOException: software.amazon.awssdk.services.s3.model.NoSuchKeyException: The specified key does not exist. (Service: S3, Status Code: 404, Request ID: E62169F11317304B, Extended Request ID: 3dlRY25FjPWIVJsA82PfQnhwlyp/26Nw1VM2xZCzlqEUvNSIXpFSexbvMewbLTR3ZuaDSHE6rq8=) This is my controller.conf   controller.helix.cluster.name=pinot-cluster controller.port=9000 controller.local.temp.dir=/var/pinot/controller/data controller.data.dir=s3://pinot-cluster-segment-s3/pinot-data/pinot-s3-example/controller-data/ controller.zk.str=pinot-zookeeper:2181 pinot.controller.storage.factory.class.s3=org.apache.pinot.plugin.filesystem.S3PinotFS pinot.controller.storage.factory.s3.region=us-west-2 pinot.controller.segment.fetcher.protocols=file,http,s3 pinot.controller.segment.fetcher.s3.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher controller.allow.hlc.tables=false controller.enable.split.commit=true pinot.set.instance.id.to.hostname=true                                                  This is my server.conf pinot.server.netty.port=8098 pinot.server.adminapi.port=8097 pinot.server.instance.dataDir=/var/pinot/server/data/index pinot.server.instance.segmentTarDir=/var/pinot/server/data/segment pinot.set.instance.id.to.hostname=true pinot.server.instance.realtime.alloc.offheap=true pinot.server.instance.segment.store.uri=s3://pinot-cluster-segment-s3/pinot-data/pinot-s3-example/controller-data/ pinot.server.instance.enable.split.commit=true pinot.server.storage.factory.class.s3=org.apache.pinot.plugin.filesystem.S3PinotFS pinot.server.storage.factory.s3.region=us-west-2 pinot.server.segment.fetcher.protocols=file,http,s3 pinot.server.segment.fetcher.s3.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcherroot@pinot-server-0:/opt/pinot#
    x
    • 2
    • 1
  • a

    Amit Chopra

    01/13/2021, 6:46 PM
    ok, thanks. Let me try and see what difference it makes
    j
    k
    • 3
    • 5
  • j

    Jackie

    01/13/2021, 6:53 PM
    Do you have it configured explicitly? The config key is
    pinot.server.query.executor.num.groups.limit
    a
    k
    • 3
    • 20
  • z

    Zac Farrell

    01/20/2021, 8:12 PM
    Hey folks - i'm trying to get the jdbc client working but running into an issue:
    Copy code
    java.lang.NoClassDefFoundError: org/apache/pinot/client/JsonAsyncHttpPinotClientTransportFactory
    i've tried running both v0.6.0 (latest) and 0.5.0 (version called out in docs) but both produce the same error. I've also tried compiling the jar from source, as well as including it as an explicit dependency in maven. Any help is appreciated, thanks!
    x
    k
    • 3
    • 20
  • m

    Mohit Singh

    05/23/2021, 2:42 PM
    Hello Everyone.. i am trying to inject data from kafka topic to apache pinot but i didn't see any data loaded do i am missing anything in config related to avro ? Schema
    Copy code
    {
      "schemaName": "test_schema",
      "dimensionFieldSpecs": [
        {
          "name": "client_id",
          "dataType": "STRING"
        },
        {
          "name": "master_property_id",
          "dataType": "INT"
        },
        {
          "name": "business_unit",
          "dataType": "STRING"
        },
        {
          "name": "error_info_str",
          "dataType": "STRING"
        }
      ],
      "dateTimeFieldSpecs": [
        {
          "name": "timestamp",
          "dataType": "LONG",
          "format": "1:MILLISECONDS:EPOCH",
          "granularity": "1:MILLISECONDS"
        }
      ]
    }
    Table:
    Copy code
    {
      "REALTIME": {
        "tableName": "test_schema_REALTIME",
        "tableType": "REALTIME",
        "segmentsConfig": {
          "schemaName": "test_schema",
          "replication": "1",
          "replicasPerPartition": "1",
          "timeColumnName": "timestamp"
        },
        "tenants": {
          "broker": "DefaultTenant",
          "server": "DefaultTenant",
          "tagOverrideConfig": {}
        },
        "tableIndexConfig": {
          "bloomFilterColumns": [],
          "noDictionaryColumns": [],
          "onHeapDictionaryColumns": [],
          "varLengthDictionaryColumns": [],
          "enableDefaultStarTree": false,
          "enableDynamicStarTreeCreation": false,
          "aggregateMetrics": false,
          "nullHandlingEnabled": false,
          "invertedIndexColumns": [],
          "rangeIndexColumns": [],
          "autoGeneratedInvertedIndex": false,
          "createInvertedIndexDuringSegmentGeneration": false,
          "sortedColumn": [],
          "loadMode": "MMAP",
          "streamConfigs": {
            "streamType": "kafka",
            "stream.kafka.topic.name": "TestTopic",
            "stream.kafka.broker.list": "localhost:9092",
            "stream.kafka.consumer.type": "lowlevel",
            "stream.kafka.consumer.prop.auto.offset.reset": "smallest",
            "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.inputformat.avro.confluent.KafkaConfluentSchemaRegistryAvroMessageDecoder",
            "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
            "schema.registry.url": "<http://localhost:8081>",
            "realtime.segment.flush.threshold.rows": "0",
            "realtime.segment.flush.threshold.time": "24h",
            "realtime.segment.flush.segment.size": "100M"
          }
        },
        "metadata": {},
        "quota": {},
        "routing": {},
        "query": {},
        "ingestionConfig": {
          "transformConfigs": [
            {
              "columnName": "error_info_str",
              "transformFunction": "json_format(error_info)"
            }
          ]
        },
        "isDimTable": false
      }
    }
    Kafka Avro Schema:
    Copy code
    {
      "type": "record",
      "name": "TestRecord",
      "namespace": "com.test.ns",
      "fields": [
        {
          "name": "client_id",
          "type": [
            "null",
            "string"
          ]
        },
        {
          "name": "master_property_id",
          "type": "int"
        },
        {
          "name": "business_unit",
          "type": [
            "null",
            "string"
          ]
        },
        {
          "name": "error_info",
          "type": {
            "type": "record",
            "name": "ErrorInfo",
            "fields": [
              {
                "name": "code",
                "type": [
                  "null",
                  "string"
                ]
              },
              {
                "name": "description",
                "type": [
                  "null",
                  "string"
                ]
              }
            ]
          }
        },
        {
          "name": "timestamp",
          "type": [
            "null",
            "long"
          ],
          "default": null
        }
      ]
    }
    ✅ 1
    k
    • 2
    • 10
  • k

    Kaushik Ranganath

    06/07/2021, 4:01 AM
    When I do a kubectl get all -n pinot-quickstart, I see this has brought up classic load balancers to expose both the broker and the controller on tcp ports, and when I make a curl/browser request to the DNSs, I expect these to show up the UI for the broker and the Swagger UI for the controller, but the request times out eventually without bringing up the UI. I am a beginner in AWS Networking as well, but the security groups created by these setup instructions which I have followed exactly allows TCP requests from 0.0.0.0/0. Any inputs on bringing the UI up for the broker and server are much appreciated!
    • 1
    • 1
  • k

    Kamal Chavda

    07/09/2021, 3:24 PM
    Hello! I am trying to figure out if there are any restrictions no column naming conventions for the schema file. Can it be snake_case or has to be camelCase?
    k
    j
    • 3
    • 3
  • b

    Bruce Ritchie

    07/09/2021, 6:27 PM
    Hello all. Quick question before I install pinot - is it possible to alter a table to change indices on columns after data is loaded?
    n
    • 2
    • 2
  • b

    Bruce Ritchie

    07/09/2021, 6:56 PM
    For batch ingestion standalone mode where is the work being performed?
    m
    • 2
    • 3
  • b

    Bruce Ritchie

    08/01/2021, 6:31 PM
    Is it possible to add S3 deep storage after a cluster has ingested data? The documentation I found uses a s3 url for the controller.data.dir property which in my poc currently points to a directory on the fs on the controller.
    k
    • 2
    • 6
  • x

    xtrntr

    08/10/2021, 2:30 AM
    hello, im just curious how queries run faster when you call the exact same query the second time round. how does caching work in pinot? if it matters, i’m using the inverted index, with no partition pruning (only 1 broker running). broker looks something like this:
    Copy code
    Processed requestId=34,table=sorted_events_OFFLINE,segments(queried/processed/matched/consuming)=198/198/198/-1,schedulerWaitMs=0,reqDeserMs=0,totalExecMs=426,resSerMs=0,totalTimeMs=426,minConsumingFreshnessMs=-1,broker=Broker_172.26.0.4_8099,numDocsScanned=259467,scanInFilter=619119085,scanPostFilter=259467,sched=fcfs
    
    Slow query: request handler processing time: 427, send response latency: 3, total time to handle request: 430
    Processed requestId=35,table=events_OFFLINE,segments(queried/processed/matched/consuming)=198/198/118/-1,schedulerWaitMs=0,reqDeserMs=5,totalExecMs=221,resSerMs=0,totalTimeMs=226,minConsumingFreshnessMs=-1,broker=Broker_172.26.0.4_8099,numDocsScanned=657,scanInFilter=346815,scanPostFilter=657,sched=fcfs
    also, i’m wondering what is considered prompts
    "Slow query: …"
    to show up in logs? does this mean pinot is suggesting that some optimization is possible to speed up my queries?
    m
    • 2
    • 5
  • t

    Tiger Zhao

    08/16/2021, 3:19 PM
    Hi, I'm trying to batch ingest a lot of data in some ORC files, what is the recommended way of doing this? I'm currently using the SegmentCreationAndMetadataPush job with the command line interface.
    k
    m
    • 3
    • 6
  • x

    xtrntr

    08/20/2021, 5:14 AM
    am i missing something, but if you have
    Copy code
    # table1
    <s3://bucket/pinot-segments/table1/>
    
    # table2
    <s3://bucket/pinot-segments/table2/>
    do you need to tell controller where segments for each table? i only see
    controller.data.dir
    k
    m
    • 3
    • 9
  • t

    Tiger Zhao

    08/20/2021, 2:50 PM
    When using S3 as the deepstore with the SegmentCreationAndMetadataPush, should the
    controller.data.dir
    (from the controller conf) be the same as the
    outputDirURI
    (from the ingestion jobspec) ?
    x
    • 2
    • 2
  • t

    Tiger Zhao

    08/24/2021, 9:26 PM
    Is there a way to enable/disable individual segments?
    k
    m
    • 3
    • 6
  • t

    Tiger Zhao

    08/25/2021, 2:18 PM
    What does the process look like for changing the table config for an existing table with segments in deepstore?
    k
    n
    • 3
    • 6
  • t

    Tiger Zhao

    08/26/2021, 7:16 PM
    By default, is the broker supposed to limit queries to 10 results?
    n
    j
    • 3
    • 13
  • t

    Thiago Pereira

    08/28/2021, 12:51 PM
    Does anyone have a good tutorial to help me?
    k
    • 2
    • 2
  • j

    J K

    08/31/2021, 2:04 PM
    I'm having issues running the basic scripts in the 0.8.0 apache pinot release with windows 7 and java 8 using git-bash-here as the terminal. I'm following this link for setup https://docs.pinot.apache.org/v/release-0.4.0/basics/getting-started/running-pinot-locally It seems like it can not find the java class files correctly. I currently only have JAVA_HOME set to pointed to the JDK8. I noticed in the 0.3.0 release notes there was something regarding java 8 (see image attached). Is there something special I need to do to get this to work?
    m
    k
    x
    • 4
    • 20
  • l

    Luis Fernandez

    08/31/2021, 3:02 PM
    question, if we have to scale pinot servers horizontally (where data is stored) do we rebalance the data the segments within that server host? how does that work?
    k
    n
    • 3
    • 3
  • t

    Tiger Zhao

    09/01/2021, 5:39 PM
    Is there a way to specify the SegmentPush job to only push a single segment instead of a directory?
    n
    • 2
    • 7
  • t

    Tiger Zhao

    09/02/2021, 9:49 PM
    Is Pinot able to efficiently run queries that use REGEXP_LIKE? I'm not sure if there is any indexing or pre aggregations that would make that fast?
    k
    m
    • 3
    • 3
  • l

    Luis Fernandez

    09/07/2021, 4:38 PM
    when we insert data into pinot how is replication achieved? is it when a segment is completed that we make this data available to other nodes?
    k
    • 2
    • 3
  • x

    xtrntr

    09/08/2021, 2:52 AM
    thats what i did, but it says in the document:
    This should only be used in standalone setups or for POC.
    x
    • 2
    • 2
  • t

    Tiger Zhao

    09/08/2021, 2:53 PM
    If I set
    enableDefaultStarTree=True
    , is it possible to also specify extra aggregations in functionColumnPairs or change the maxLeafRecords (or any other config)? I think having it automatically generate and sort the dimensionSplitOrder list is very helpful but I also want to add more aggregations on top of the default.
    k
    n
    j
    • 4
    • 6
  • t

    Tiger Zhao

    09/08/2021, 10:17 PM
    Is pinot able to do PERCENTILEs and PERCENTILE aggregates in the star tree on columns with NULL values?
    j
    • 2
    • 4
  • s

    sina

    09/10/2021, 6:51 AM
    Hi team, I am working on a project for realtime speed test calculation. I get the speed test data from devices with kafka ingestion. Once they are in Pinot the following calculations need to be performed:  - peak hour 7pm -11 pm data to be selected.  - data comes in different time stamp, the average speed needs to be calculated every hour between 7 pm to 11 pm everyday E.g: 7-8 pm average , 8-9 average, 9-10 average and 10 -11 pm average. ( the average data for every hour should be available as soon as the 1 hour windows is completed )  - 4 average data needs to be stored into another table where we would have 4 sample data points per day.   - from the second table past 14 days data need to be selected and 3rd worse speed should be reported and stored into another table. Both these two tables would be my reports. The question is if pinot is the suitable platform to do these sort of calculations ? What would be the best way to run ETL jobs or tasks for run the query to do the calculations ? I have already done this with InfluxDB, however I would like to design/implement this with Pinot. Note that I also have other use cases with the same data where I need the data to be reported on realtime. Thank you in advance for your help.
    k
    e
    • 3
    • 4
  • t

    Tiger Zhao

    09/10/2021, 8:24 PM
    Does Pinot automatically delete the generated indices from the servers after deleting a segment? I'm running into an issue where I would delete the segment through the REST API, but it leaves the index files under PinotServer/index behind. The indices built up over time from my test tables and now the servers are out of disk space
    j
    • 2
    • 15
12345...11Latest