https://pinot.apache.org/ logo
Join Slack
Powered by
# getting-started
  • m

    Michael Latta

    06/22/2022, 3:11 PM
    Questions: 1) do table names need to end in _REALTIME in the spec when they are real time tables? 2) When submitting a spect using pinot-admin.sh should the table spec be prefixed with REALTIME: as shown in the UI?
    m
    • 2
    • 1
  • m

    Michael Latta

    06/25/2022, 6:24 PM
    I would like to say that the community here has been very helpful and welcoming and has made it possible to go from watching videos to having a working PoC in 7 days.
    ❤️ 4
    k
    m
    y
    • 4
    • 3
  • k

    K.N. Bhargav

    06/28/2022, 5:18 AM
    👋 Hello, team! I’m trying the transcript example out and trying to upload the table config via the following curl call
    Copy code
    curl -X POST "<http://localhost:9000/tableConfigs>" -H "accept: application/json" -H "Content-Type: application/json" -d "{ \"tableName\": \"transcript\", \"segmentsConfig\" : { \"timeColumnName\": \"timestampInEpoch\", \"timeType\": \"MILLISECONDS\", \"replication\" : \"1\", \"schemaName\" : \"transcript\" }, \"tableIndexConfig\" : { \"invertedIndexColumns\" : [], \"loadMode\" : \"MMAP\" }, \"tenants\" : { \"broker\":\"DefaultTenant\", \"server\":\"DefaultTenant\" }, \"tableType\":\"OFFLINE\", \"metadata\": {}}"
    but getting the following error
    Copy code
    {
      "_code": 400,
      "_error": "Invalid TableConfigs. Missing required creator property 'schema' (index 1)\n at [Source: (String)\"{\n \"tableName\": \"transcript\",\n \"segmentsConfig\" : {\n \"timeColumnName\": \"timestampInEpoch\",\n \"timeType\": \"MILLISECONDS\",\n \"replication\" : \"1\",\n \"schemaName\" : \"transcript\"\n },\n \"tableIndexConfig\" : {\n \"invertedIndexColumns\" : [],\n \"loadMode\" : \"MMAP\"\n },\n \"tenants\" : {\n \"broker\":\"DefaultTenant\",\n \"server\":\"DefaultTenant\"\n },\n \"tableType\":\"OFFLINE\",\n \"metadata\": {}\n}\"; line: 19, column: 1] (through reference chain: org.apache.pinot.spi.config.TableConfigs[\"schema\"])"
    }
    I don’t want to execute stuff from the CLI in order to add a table config as shown here. Can someone help me in understanding where I’m going wrong and what’s the right request structure / way to do this via the REST API?
    n
    • 2
    • 2
  • m

    Michael Latta

    06/29/2022, 12:55 AM
    Is there a way to configure a retention interval for a real time table?
    m
    • 2
    • 2
  • d

    Deepika Eswar

    07/14/2022, 11:19 AM
    hello all I am new to apache pinot. I want to know how to perform Dml operations (like insert ) in a offline table .Can anyone help? I have ingested the data from Nifi server to pinot through batch ingestion job, I want to perform operation from one table to other like other databases. Does Pinot support ? P.S I am using Windows. Since most of the documentation is for linux and Mac. I couldnt deep dive much about how to use PINOT locally in windows
    m
    • 2
    • 1
  • j

    Jeff Behl

    07/14/2022, 5:34 PM
    howdy all - just starting to get familiar with Pinot. I’ve got a REALTIME table feeding off of kafka for testing purposes and that’s it. I’m pumping a lot of data at for stress testing. One thing I’m not understanding is why my controllers have their disk space filled? I know controllers are involved somehow in dealing with offline segments, but I’ve not configured this yet. From the controllers logs:
    Copy code
    Caught exception while uploading segment: aws_flowlogs__0__331__20220714T1655Z from instance: Server_pinot-server-5.pinot-server-headless.pinot-eng.svc.cluster.local_8098
    java.nio.file.FileSystemException: /var/pinot/controller/data/pinot-controller-1.pinot-controller-headless.pinot-eng.svc.cluster.local_9000/fileUploadTemp/aws_flowlogs__0__331__20220714T1655Z.ef130a54-cefe-4619-91b4-e1370c086d4f -> /var/pinot/controller/data/aws_flowlogs/aws_flowlogs__0__331__20220714T1655Z.tmp.73e9073d-2d5d-4e92-899d-8a246177f759: No space left on device
    thanks for any explanation to this newbie!
    👀 1
    x
    • 2
    • 9
  • d

    Deepika Eswar

    07/15/2022, 6:38 AM
    how to read an offline table in Pinot using python ?
  • k

    karsen53

    07/15/2022, 9:53 AM
    Hello 👋 im just started learning Pinot 😄 I have MYSQL database with dataset how can I populate Pinot with this data. I do not see any integration with MYSQL here https://docs.pinot.apache.org/basics/data-import. Could you advise on it? 😅
    m
    s
    +2
    • 5
    • 19
  • a

    Alice

    07/18/2022, 8:02 AM
    Hi team, NaN is returned with below query. But how to locate records with value=NaN? I tried select * from table_name where value = NaN. It didn’t work.
    Copy code
    select min(value) from table_name
    m
    • 2
    • 3
  • j

    Jeff Behl

    07/19/2022, 2:01 PM
    hey folks - i’ve two fields that are epoch with second granularity at input time. I have them configured as such:
    Copy code
    "dateTimeFieldSpecs": [
        {
          "name": "start",
          "dataType": "LONG",
          "format": "1:SECONDS:EPOCH",
          "granularity": "1:SECONDS"
        },
        {
          "name": "end",
          "dataType": "LONG",
          "format": "1:SECONDS:EPOCH",
          "granularity": "1:SECONDS"
        }
      ]
    but am now thinking they should be TIMESTAMP dataType? I’m not clear on the implications. We will be doing sum aggregations based groupings of like minute/hour/day/etc. thanks in advance
    m
    • 2
    • 3
  • m

    Marc Kriguer

    07/22/2022, 10:18 PM
    Hi, folks - I am fairly new to Pinot, and I am just trying to understand what needs to be packaged into the directory (along with the data elements) for the
    UgloadSegment
    command to work, I am invoking the following command from the "pinot" directory (where the "git clone" command brought in all the pinot source code):
    Copy code
    ./build/bin/pinot-admin.sh UploadSegment -controllerHost A.B.C.D -controllerPort 9000 -segmentDir ./july-13-segment
    where A.B.C.D is a Linux machine (provisioned through Google Cloud) that our team has set up to be our initial instance of Pinot (we are initially just provisioning one instance, until we need to scale up). The
    july-13-segment
    directory just contains 3 files: two data files that are meant to wind up in the same segment; the files are named 2022-07-13T22_02_50.179274000Z.json and 2022-07-13T22_02_52.770718122Z.json, and each contains a single JSON string that we were able to successfully import into Pinot via Kafka (until we determined that Kafka seemed to be our performance bottleneck). The third file, named
    schema.json
    , is the definition of the schema of the table I want the segments to go into. I'll attach it to this message, in case the contents do matter. When I run the above command, the output/error message are:
    Copy code
    ...   [Lots of messages about plugins]
    Uploading segment tar file: ./july-13-segment/schema.json
    Sending request: <http://A.B.C.D:9000/v2/segments?tableName> to controller: pinot-controller-0.pinot-controller-headless.pinot-quickstart.svc.cluster.local, version: Unknown
    org.apache.pinot.common.exception.HttpErrorStatusException: Got error status code: 500 (Internal Server Error) with reason: "Exception while uploading segment: Input is not in the .gz format" while sending request: <http://A.B.C.D:9000/v2/segments?tableName> to controller: pinot-controller-0.pinot-controller-headless.pinot-quickstart.svc.cluster.local, version: Unknown
    	at org.apache.pinot.common.utils.http.HttpClient.wrapAndThrowHttpException(HttpClient.java:442)
    	at org.apache.pinot.common.utils.FileUploadDownloadClient.uploadSegment(FileUploadDownloadClient.java:597)
    	at org.apache.pinot.tools.admin.command.UploadSegmentCommand.execute(UploadSegmentCommand.java:176)
    	at org.apache.pinot.tools.Command.call(Command.java:33)
    	at org.apache.pinot.tools.Command.call(Command.java:29)
    	at picocli.CommandLine.executeUserObject(CommandLine.java:1953)
    	at picocli.CommandLine.access$1300(CommandLine.java:145)
    	at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2352)
    	at picocli.CommandLine$RunLast.handle(CommandLine.java:2346)
    	at picocli.CommandLine$RunLast.handle(CommandLine.java:2311)
    	at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179)
    	at picocli.CommandLine.execute(CommandLine.java:2078)
    	at org.apache.pinot.tools.admin.PinotAdministrator.execute(PinotAdministrator.java:165)
    	at org.apache.pinot.tools.admin.PinotAdministrator.main(PinotAdministrator.java:196)
    If, instead of specifying the directory with the 2 data files and the schema.json file, I created a .
    july-13-segment.tar.gz
    file (the same directory, tarred and gzipped), and specify that filename instead of the directory, namely
    Copy code
    build/bin/pinot-admin.sh UploadSegment -controllerHost 35.226.77.155 -controllerPort 9000 -segmentDir ./july-13-segment.tar.gz
    I get nearly the same error message (just without the "Input is not in the .gz format" part of the error:
    Copy code
    ...
    Executing command: UploadSegment -controllerProtocol http -controllerHost A.B.C.D -controllerPort 9000 -segmentDir ./july-13-segment.tar.gz
    java.lang.NullPointerException
    	at org.apache.pinot.shaded.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:770)
    	at org.apache.pinot.tools.admin.command.UploadSegmentCommand.execute(UploadSegmentCommand.java:158)
    	at org.apache.pinot.tools.Command.call(Command.java:33)
    	at org.apache.pinot.tools.Command.call(Command.java:29)
    	at picocli.CommandLine.executeUserObject(CommandLine.java:1953)
    	at picocli.CommandLine.access$1300(CommandLine.java:145)
    	at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2352)
    	at picocli.CommandLine$RunLast.handle(CommandLine.java:2346)
    	at picocli.CommandLine$RunLast.handle(CommandLine.java:2311)
    	at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179)
    	at picocli.CommandLine.execute(CommandLine.java:2078)
    	at org.apache.pinot.tools.admin.PinotAdministrator.execute(PinotAdministrator.java:165)
    	at org.apache.pinot.tools.admin.PinotAdministrator.main(PinotAdministrator.java:196)
    My question: The "UploadSegment" documentation is rather incomplete; it really does not spell out what needs to be included in the directory (or the tarred-and-gzipped version of the directory, except that the files should have a suffix indicating their type; I am using "json" for all 3). Do I need to include additional files (if so, what is needed?), or rename any of the files? (Thanks in advance!)
    schema.json
    m
    • 2
    • 6
  • n

    Nathan Maves

    07/23/2022, 1:13 PM
    Is there a quickstart guide to connecting to confluent cloud authentication?
  • c

    Cheguri Vinay Goud

    07/25/2022, 10:12 AM
    Hello, Is there any way to push data from REALTIME table to offline table when the segments are full, so that there is no stop in consuming records?
    s
    m
    n
    • 4
    • 9
  • j

    James Kelleher

    07/25/2022, 5:19 PM
    hello! trying to test out kafka ingest, but documetation seems really outdated, has inconsistencies between intstructions and refers to files that no longer exist. is there a more recent, up-to-date guide on how to do this?
    m
    • 2
    • 4
  • a

    Abhinav Rai

    07/27/2022, 7:31 AM
    Hey guys, my team and I are looking to use pinot for serving data to different teams and was wondering if there was a comparison chart somewhere with data vs latency vs pods or something along those lines. Basically what I would like to understand is how much time would data extraction take with different configurations using presto.
    m
    • 2
    • 1
  • d

    Deepika Eswar

    08/03/2022, 7:11 AM
    hello all, how to do incremental loading in Pinot
    m
    • 2
    • 1
  • r

    Romil Punetha

    08/09/2022, 11:22 AM
    Is there an example where I could see how SimpleAvroMessageDecoder is used? I’m pushing an encoded byte[] in kafka. When I consume that in a service to recreate the object from that byte array, it works. However when I connect pinot to the same topic and provide a schema, it says Index out of bounds.
  • a

    Abdelhakim Bendjabeur

    08/11/2022, 12:43 PM
    Hello everyone 👋 I have recently started experimenting on Pinot in order to have run in production. I have a question about disaster recovery in case metrics are wrong for some reason and we'd like to recompute them. In the case of a kafka realtime table, is there a way to pause the Kafka Consumers, truncate the table and then restart consumption from scratch? Or should I just consider a blueGreen way here (create new table with correct metrics and redirect the queries to it when it's up to date)?
    k
    m
    • 3
    • 4
  • a

    austin macciola

    08/11/2022, 6:53 PM
    Hello 👋 Just getting started with Pinot. Was reading documentation and it looks like when creating Ingest and Table specs for ingesting from something like Kafka. You can only have 1 table created per topic you are ingesting from ? Just wanted to confirm this. There is no way have many topics be ingested and write to the same table ?
    m
    • 2
    • 2
  • d

    Devang Shah

    08/23/2022, 7:53 PM
    Hello,
    👋 3
  • v

    ValarieR

    08/24/2022, 8:34 PM
    Hello! I am a new Developer Advocate with StarTree (2nd week!) and am attempting to follow the startup guide HERE for an M1 Mac. I ran into errors on the execution of
    mvn install package -DskipTests -Pbin-dist
    , and after asking around for help, was told to remove the
    ~/.m2/settings.xml
    configuration. After doing so, I ran into the following error, and am kind of at a loss:
    Copy code
    [INFO] Pinot Service Provider Interface ................... FAILURE [  0.570 s]
    
    [ERROR] Failed to execute goal com.diffplug.spotless:spotless-maven-plugin:2.9.0:check (default) on project pinot-spi: Execution default of goal com.diffplug.spotless:spotless-maven-plugin:2.9.0:check failed: java.lang.reflect.InvocationTargetException: class com.google.googlejavaformat.java.RemoveUnusedImports (in unnamed module @0x3ba015b1) cannot access class com.sun.tools.javac.util.Context (in module jdk.compiler) because module jdk.compiler does not export com.sun.tools.javac.util to unnamed module @0x3ba015b1 -> [Help 1]
    Has anyone seen this before?
    m
    h
    • 3
    • 5
  • h

    Hari

    08/30/2022, 4:05 AM
    Hi ALL We just evaluate Pinot, i have few questions on it. 1- Is there any readymade plugin or connector available to ingest data from Kafka to Pinot. 2- As zookeeper is removing from Kafka, is there any plan for remove dependency from Pinot ? 3- Is there any solution for cross data centre replication for DR.
    m
    b
    • 3
    • 6
  • n

    Naveen Nagarajan

    09/02/2022, 1:18 AM
    Hi All I am new to Pinot. I have my Pinot cluster running in Kind cluster with kafka broker running outside kind. I keep seeing “Failed to construct kafka consumer” when adding a realtime table. How can I debug the consumer logs ?
    m
    • 2
    • 1
  • p

    Prabhagaran Ks

    09/02/2022, 3:42 AM
    Hi All, Getting the below error while running V2 engine..followed the config provided(https://docs.pinot.apache.org/developers/advanced/v2-multi-stage-query-engine) to enable the v2 engine.. Please help me with any documentation to setup V2 engine. [ { "message": "SQLParsingError\njava.lang.RuntimeException Error composing query plan for: select * from test limit 10\n\tat org.apache.pinot.query.QueryEnvironment.planQuery(QueryEnvironment.java:136)\n\tat org.apache.pinot.broker.requesthandler.MultiStageBrokerRequestHandler.handleRequest(MultiStageBrokerRequestHandler.java:146)\n\tat org.apache.pinot.broker.requesthandler.MultiStageBrokerRequestHandler.handleRequest(MultiStageBrokerRequestHandler.java:127)\n\tat org.apache.pinot.broker.requesthandler.BrokerRequestHandlerDelegate.handleRequest(BrokerRequestHandlerDelegate.java:102)\n...\nCaused by: *java.lang.NullPointerException\n\tat org.apache.pinot.query.routing.WorkerManager.assignWorkerToStage*(WorkerManager.java:63)\n\tat org.apache.pinot.query.planner.logical.StagePlanner.makePlan(StagePlanner.java:99)\n\tat org.apache.pinot.query.QueryEnvironment.toDispatchablePlan(QueryEnvironment.java:202)\n\tat org.apache.pinot.query.QueryEnvironment.planQuery(QueryEnvironment.java:134)", "errorCode": 150 } ]
    m
    r
    • 3
    • 6
  • m

    Matt Fysh

    09/06/2022, 5:12 AM
    hi everyone, I’m wondering how I can assess whether Pinot is the right choice for me. I have a very large table with ~1gb of data ingested each day, I want to write several thousands of queries on top of this data that run continuously, transform data in python, then write a table (each active query has its own table/sink) Then, there may be other queries that are also continuously querying the table that was just written to. I’ve started looking into Pinot, and looked at the streaming pages in the docs, I see that there is a large amount of support for streaming on the ingestion side, but not so much for running queries? I have looked into Delta Live Tables on Databricks which seems fairly close to what I have in mind, but couldn’t understand if running thousands of queries continuously on one large table was going to cause issues.
    k
    • 2
    • 3
  • e

    Eaugene Thomas

    09/07/2022, 8:34 AM
    Hi , I was working an a POC for using Encryption in transit in Pinot . In my case the pinot nodes are distributed across system , So if say I want to use self signed certificates for TLS , how does that work with Pinot ? I got some answers in https://stackoverflow.com/questions/2893819/accept-servers-self-signed-ssl-certificate-in-java-client which says to modify the trust manager , is there any other alternate options for accepting self signed certificates between pinot nodes ?
  • a

    Abdelhakim Bendjabeur

    09/07/2022, 1:16 PM
    Hello 👋 Does anyone have insight on how much of a bottleneck can Presto become if plugged on top of Pinot to unlock full SQL syntax? i.e. how much will Presto hurt the latency and the high load resistance that make Pinot a suitable solution for user-facing analytics?
    m
    n
    • 3
    • 4
  • d

    Deena Dhayalan

    09/14/2022, 7:50 AM
    Hi does anyone explain me about the intermediary stage in detail? And what is intermediary server 1,2,3 is it a multistage engine works on background when its enabled?
    r
    • 2
    • 5
  • j

    Josh Clum

    09/19/2022, 3:26 PM
    Hello, is it possible to combine multiple indexing techniques (e.g. star tree and native text search indexes) on a single table?
    k
    • 2
    • 2
  • j

    Josh Clum

    09/19/2022, 8:42 PM
    Do Ingest Level Aggregations happen across segments? Also, do the aggregations occur across partitions? I saw the requirement for the lowLevel kafka api, so I'm guessing aggregations only happen at the partition level.
    m
    l
    j
    • 4
    • 8
1...456...11Latest