Apache Pinot #getting-started

Michael Latta

06/22/2022, 3:11 PM

Questions: 1) do table names need to end in _REALTIME in the spec when they are real time tables? 2) When submitting a spect using pinot-admin.sh should the table spec be prefixed with REALTIME: as shown in the UI?

Michael Latta

06/25/2022, 6:24 PM

I would like to say that the community here has been very helpful and welcoming and has made it possible to go from watching videos to having a working PoC in 7 days.

❤️ 4

K.N. Bhargav

06/28/2022, 5:18 AM

👋 Hello, team! I’m trying the transcript example out and trying to upload the table config via the following curl call

Copy code

curl -X POST "<http://localhost:9000/tableConfigs>" -H "accept: application/json" -H "Content-Type: application/json" -d "{ \"tableName\": \"transcript\", \"segmentsConfig\" : { \"timeColumnName\": \"timestampInEpoch\", \"timeType\": \"MILLISECONDS\", \"replication\" : \"1\", \"schemaName\" : \"transcript\" }, \"tableIndexConfig\" : { \"invertedIndexColumns\" : [], \"loadMode\" : \"MMAP\" }, \"tenants\" : { \"broker\":\"DefaultTenant\", \"server\":\"DefaultTenant\" }, \"tableType\":\"OFFLINE\", \"metadata\": {}}"

but getting the following error

Copy code

{
  "_code": 400,
  "_error": "Invalid TableConfigs. Missing required creator property 'schema' (index 1)\n at [Source: (String)\"{\n \"tableName\": \"transcript\",\n \"segmentsConfig\" : {\n \"timeColumnName\": \"timestampInEpoch\",\n \"timeType\": \"MILLISECONDS\",\n \"replication\" : \"1\",\n \"schemaName\" : \"transcript\"\n },\n \"tableIndexConfig\" : {\n \"invertedIndexColumns\" : [],\n \"loadMode\" : \"MMAP\"\n },\n \"tenants\" : {\n \"broker\":\"DefaultTenant\",\n \"server\":\"DefaultTenant\"\n },\n \"tableType\":\"OFFLINE\",\n \"metadata\": {}\n}\"; line: 19, column: 1] (through reference chain: org.apache.pinot.spi.config.TableConfigs[\"schema\"])"
}

I don’t want to execute stuff from the CLI in order to add a table config as shown here. Can someone help me in understanding where I’m going wrong and what’s the right request structure / way to do this via the REST API?

Michael Latta

06/29/2022, 12:55 AM

Is there a way to configure a retention interval for a real time table?

Deepika Eswar

07/14/2022, 11:19 AM

hello all I am new to apache pinot. I want to know how to perform Dml operations (like insert ) in a offline table .Can anyone help? I have ingested the data from Nifi server to pinot through batch ingestion job, I want to perform operation from one table to other like other databases. Does Pinot support ? P.S I am using Windows. Since most of the documentation is for linux and Mac. I couldnt deep dive much about how to use PINOT locally in windows

Jeff Behl

07/14/2022, 5:34 PM

howdy all - just starting to get familiar with Pinot. I’ve got a REALTIME table feeding off of kafka for testing purposes and that’s it. I’m pumping a lot of data at for stress testing. One thing I’m not understanding is why my controllers have their disk space filled? I know controllers are involved somehow in dealing with offline segments, but I’ve not configured this yet. From the controllers logs:

Copy code

Caught exception while uploading segment: aws_flowlogs__0__331__20220714T1655Z from instance: Server_pinot-server-5.pinot-server-headless.pinot-eng.svc.cluster.local_8098
java.nio.file.FileSystemException: /var/pinot/controller/data/pinot-controller-1.pinot-controller-headless.pinot-eng.svc.cluster.local_9000/fileUploadTemp/aws_flowlogs__0__331__20220714T1655Z.ef130a54-cefe-4619-91b4-e1370c086d4f -> /var/pinot/controller/data/aws_flowlogs/aws_flowlogs__0__331__20220714T1655Z.tmp.73e9073d-2d5d-4e92-899d-8a246177f759: No space left on device

thanks for any explanation to this newbie!

👀 1

Deepika Eswar

07/15/2022, 6:38 AM

how to read an offline table in Pinot using python ?

karsen53

07/15/2022, 9:53 AM

Hello 👋 im just started learning Pinot 😄 I have MYSQL database with dataset how can I populate Pinot with this data. I do not see any integration with MYSQL here https://docs.pinot.apache.org/basics/data-import. Could you advise on it? 😅

Alice

07/18/2022, 8:02 AM

Hi team, NaN is returned with below query. But how to locate records with value=NaN? I tried select * from table_name where value = NaN. It didn’t work.

Copy code

select min(value) from table_name

Jeff Behl

07/19/2022, 2:01 PM

hey folks - i’ve two fields that are epoch with second granularity at input time. I have them configured as such:

Copy code

"dateTimeFieldSpecs": [
    {
      "name": "start",
      "dataType": "LONG",
      "format": "1:SECONDS:EPOCH",
      "granularity": "1:SECONDS"
    },
    {
      "name": "end",
      "dataType": "LONG",
      "format": "1:SECONDS:EPOCH",
      "granularity": "1:SECONDS"
    }
  ]

but am now thinking they should be TIMESTAMP dataType? I’m not clear on the implications. We will be doing sum aggregations based groupings of like minute/hour/day/etc. thanks in advance

Marc Kriguer

07/22/2022, 10:18 PM

Hi, folks - I am fairly new to Pinot, and I am just trying to understand what needs to be packaged into the directory (along with the data elements) for the

UgloadSegment

command to work, I am invoking the following command from the "pinot" directory (where the "git clone" command brought in all the pinot source code):

Copy code

./build/bin/pinot-admin.sh UploadSegment -controllerHost A.B.C.D -controllerPort 9000 -segmentDir ./july-13-segment

where A.B.C.D is a Linux machine (provisioned through Google Cloud) that our team has set up to be our initial instance of Pinot (we are initially just provisioning one instance, until we need to scale up). The

july-13-segment

directory just contains 3 files: two data files that are meant to wind up in the same segment; the files are named 2022-07-13T22_02_50.179274000Z.json and 2022-07-13T22_02_52.770718122Z.json, and each contains a single JSON string that we were able to successfully import into Pinot via Kafka (until we determined that Kafka seemed to be our performance bottleneck). The third file, named

schema.json

, is the definition of the schema of the table I want the segments to go into. I'll attach it to this message, in case the contents do matter. When I run the above command, the output/error message are:

Copy code

...   [Lots of messages about plugins]
Uploading segment tar file: ./july-13-segment/schema.json
Sending request: <http://A.B.C.D:9000/v2/segments?tableName> to controller: pinot-controller-0.pinot-controller-headless.pinot-quickstart.svc.cluster.local, version: Unknown
org.apache.pinot.common.exception.HttpErrorStatusException: Got error status code: 500 (Internal Server Error) with reason: "Exception while uploading segment: Input is not in the .gz format" while sending request: <http://A.B.C.D:9000/v2/segments?tableName> to controller: pinot-controller-0.pinot-controller-headless.pinot-quickstart.svc.cluster.local, version: Unknown
	at org.apache.pinot.common.utils.http.HttpClient.wrapAndThrowHttpException(HttpClient.java:442)
	at org.apache.pinot.common.utils.FileUploadDownloadClient.uploadSegment(FileUploadDownloadClient.java:597)
	at org.apache.pinot.tools.admin.command.UploadSegmentCommand.execute(UploadSegmentCommand.java:176)
	at org.apache.pinot.tools.Command.call(Command.java:33)
	at org.apache.pinot.tools.Command.call(Command.java:29)
	at picocli.CommandLine.executeUserObject(CommandLine.java:1953)
	at picocli.CommandLine.access$1300(CommandLine.java:145)
	at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2352)
	at picocli.CommandLine$RunLast.handle(CommandLine.java:2346)
	at picocli.CommandLine$RunLast.handle(CommandLine.java:2311)
	at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179)
	at picocli.CommandLine.execute(CommandLine.java:2078)
	at org.apache.pinot.tools.admin.PinotAdministrator.execute(PinotAdministrator.java:165)
	at org.apache.pinot.tools.admin.PinotAdministrator.main(PinotAdministrator.java:196)

If, instead of specifying the directory with the 2 data files and the schema.json file, I created a .

july-13-segment.tar.gz

file (the same directory, tarred and gzipped), and specify that filename instead of the directory, namely

Copy code

build/bin/pinot-admin.sh UploadSegment -controllerHost 35.226.77.155 -controllerPort 9000 -segmentDir ./july-13-segment.tar.gz

I get nearly the same error message (just without the "Input is not in the .gz format" part of the error:

Copy code

...
Executing command: UploadSegment -controllerProtocol http -controllerHost A.B.C.D -controllerPort 9000 -segmentDir ./july-13-segment.tar.gz
java.lang.NullPointerException
	at org.apache.pinot.shaded.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:770)
	at org.apache.pinot.tools.admin.command.UploadSegmentCommand.execute(UploadSegmentCommand.java:158)
	at org.apache.pinot.tools.Command.call(Command.java:33)
	at org.apache.pinot.tools.Command.call(Command.java:29)
	at picocli.CommandLine.executeUserObject(CommandLine.java:1953)
	at picocli.CommandLine.access$1300(CommandLine.java:145)
	at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2352)
	at picocli.CommandLine$RunLast.handle(CommandLine.java:2346)
	at picocli.CommandLine$RunLast.handle(CommandLine.java:2311)
	at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179)
	at picocli.CommandLine.execute(CommandLine.java:2078)
	at org.apache.pinot.tools.admin.PinotAdministrator.execute(PinotAdministrator.java:165)
	at org.apache.pinot.tools.admin.PinotAdministrator.main(PinotAdministrator.java:196)

My question: The "UploadSegment" documentation is rather incomplete; it really does not spell out what needs to be included in the directory (or the tarred-and-gzipped version of the directory, except that the files should have a suffix indicating their type; I am using "json" for all 3). Do I need to include additional files (if so, what is needed?), or rename any of the files? (Thanks in advance!)

schema.json

Nathan Maves

07/23/2022, 1:13 PM

Is there a quickstart guide to connecting to confluent cloud authentication?

Cheguri Vinay Goud

07/25/2022, 10:12 AM

Hello, Is there any way to push data from REALTIME table to offline table when the segments are full, so that there is no stop in consuming records?

James Kelleher

07/25/2022, 5:19 PM

hello! trying to test out kafka ingest, but documetation seems really outdated, has inconsistencies between intstructions and refers to files that no longer exist. is there a more recent, up-to-date guide on how to do this?

Abhinav Rai

07/27/2022, 7:31 AM

Hey guys, my team and I are looking to use pinot for serving data to different teams and was wondering if there was a comparison chart somewhere with data vs latency vs pods or something along those lines. Basically what I would like to understand is how much time would data extraction take with different configurations using presto.

Deepika Eswar

08/03/2022, 7:11 AM

hello all, how to do incremental loading in Pinot

Romil Punetha

08/09/2022, 11:22 AM

Is there an example where I could see how SimpleAvroMessageDecoder is used? I’m pushing an encoded byte[] in kafka. When I consume that in a service to recreate the object from that byte array, it works. However when I connect pinot to the same topic and provide a schema, it says Index out of bounds.

Abdelhakim Bendjabeur

08/11/2022, 12:43 PM

Hello everyone 👋 I have recently started experimenting on Pinot in order to have run in production. I have a question about disaster recovery in case metrics are wrong for some reason and we'd like to recompute them. In the case of a kafka realtime table, is there a way to pause the Kafka Consumers, truncate the table and then restart consumption from scratch? Or should I just consider a blueGreen way here (create new table with correct metrics and redirect the queries to it when it's up to date)?

austin macciola

08/11/2022, 6:53 PM

Hello 👋 Just getting started with Pinot. Was reading documentation and it looks like when creating Ingest and Table specs for ingesting from something like Kafka. You can only have 1 table created per topic you are ingesting from ? Just wanted to confirm this. There is no way have many topics be ingested and write to the same table ?

Devang Shah

08/23/2022, 7:53 PM

Hello,

👋 3

ValarieR

08/24/2022, 8:34 PM

Hello! I am a new Developer Advocate with StarTree (2nd week!) and am attempting to follow the startup guide HERE for an M1 Mac. I ran into errors on the execution of

mvn install package -DskipTests -Pbin-dist

, and after asking around for help, was told to remove the

~/.m2/settings.xml

configuration. After doing so, I ran into the following error, and am kind of at a loss:

Copy code

[INFO] Pinot Service Provider Interface ................... FAILURE [  0.570 s]

[ERROR] Failed to execute goal com.diffplug.spotless:spotless-maven-plugin:2.9.0:check (default) on project pinot-spi: Execution default of goal com.diffplug.spotless:spotless-maven-plugin:2.9.0:check failed: java.lang.reflect.InvocationTargetException: class com.google.googlejavaformat.java.RemoveUnusedImports (in unnamed module @0x3ba015b1) cannot access class com.sun.tools.javac.util.Context (in module jdk.compiler) because module jdk.compiler does not export com.sun.tools.javac.util to unnamed module @0x3ba015b1 -> [Help 1]

Has anyone seen this before?

Hari

08/30/2022, 4:05 AM

Hi ALL We just evaluate Pinot, i have few questions on it. 1- Is there any readymade plugin or connector available to ingest data from Kafka to Pinot. 2- As zookeeper is removing from Kafka, is there any plan for remove dependency from Pinot ? 3- Is there any solution for cross data centre replication for DR.

Naveen Nagarajan

09/02/2022, 1:18 AM

Hi All I am new to Pinot. I have my Pinot cluster running in Kind cluster with kafka broker running outside kind. I keep seeing “Failed to construct kafka consumer” when adding a realtime table. How can I debug the consumer logs ?

Prabhagaran Ks

09/02/2022, 3:42 AM

Hi All, Getting the below error while running V2 engine..followed the config provided(https://docs.pinot.apache.org/developers/advanced/v2-multi-stage-query-engine) to enable the v2 engine.. Please help me with any documentation to setup V2 engine. [ { "message": "SQLParsingError\njava.lang.RuntimeException Error composing query plan for: select * from test limit 10\n\tat org.apache.pinot.query.QueryEnvironment.planQuery(QueryEnvironment.java:136)\n\tat org.apache.pinot.broker.requesthandler.MultiStageBrokerRequestHandler.handleRequest(MultiStageBrokerRequestHandler.java:146)\n\tat org.apache.pinot.broker.requesthandler.MultiStageBrokerRequestHandler.handleRequest(MultiStageBrokerRequestHandler.java:127)\n\tat org.apache.pinot.broker.requesthandler.BrokerRequestHandlerDelegate.handleRequest(BrokerRequestHandlerDelegate.java:102)\n...\nCaused by: *java.lang.NullPointerException\n\tat org.apache.pinot.query.routing.WorkerManager.assignWorkerToStage*(WorkerManager.java:63)\n\tat org.apache.pinot.query.planner.logical.StagePlanner.makePlan(StagePlanner.java:99)\n\tat org.apache.pinot.query.QueryEnvironment.toDispatchablePlan(QueryEnvironment.java:202)\n\tat org.apache.pinot.query.QueryEnvironment.planQuery(QueryEnvironment.java:134)", "errorCode": 150 } ]

Matt Fysh

09/06/2022, 5:12 AM

hi everyone, I’m wondering how I can assess whether Pinot is the right choice for me. I have a very large table with ~1gb of data ingested each day, I want to write several thousands of queries on top of this data that run continuously, transform data in python, then write a table (each active query has its own table/sink) Then, there may be other queries that are also continuously querying the table that was just written to. I’ve started looking into Pinot, and looked at the streaming pages in the docs, I see that there is a large amount of support for streaming on the ingestion side, but not so much for running queries? I have looked into Delta Live Tables on Databricks which seems fairly close to what I have in mind, but couldn’t understand if running thousands of queries continuously on one large table was going to cause issues.

Eaugene Thomas

09/07/2022, 8:34 AM

Hi , I was working an a POC for using Encryption in transit in Pinot . In my case the pinot nodes are distributed across system , So if say I want to use self signed certificates for TLS , how does that work with Pinot ? I got some answers in https://stackoverflow.com/questions/2893819/accept-servers-self-signed-ssl-certificate-in-java-client which says to modify the trust manager , is there any other alternate options for accepting self signed certificates between pinot nodes ?

Abdelhakim Bendjabeur

09/07/2022, 1:16 PM

Hello 👋 Does anyone have insight on how much of a bottleneck can Presto become if plugged on top of Pinot to unlock full SQL syntax? i.e. how much will Presto hurt the latency and the high load resistance that make Pinot a suitable solution for user-facing analytics?

Deena Dhayalan

09/14/2022, 7:50 AM

Hi does anyone explain me about the intermediary stage in detail? And what is intermediary server 1,2,3 is it a multistage engine works on background when its enabled?

Josh Clum

09/19/2022, 3:26 PM

Hello, is it possible to combine multiple indexing techniques (e.g. star tree and native text search indexes) on a single table?

Josh Clum

09/19/2022, 8:42 PM

Do Ingest Level Aggregations happen across segments? Also, do the aggregations occur across partitions? I saw the requirement for the lowLevel kafka api, so I'm guessing aggregations only happen at the partition level.