Marc Kriguer
07/22/2022, 10:18 PMUgloadSegment
command to work,
I am invoking the following command from the "pinot" directory (where the "git clone" command brought in all the pinot source code):
./build/bin/pinot-admin.sh UploadSegment -controllerHost A.B.C.D -controllerPort 9000 -segmentDir ./july-13-segment
where A.B.C.D is a Linux machine (provisioned through Google Cloud) that our team has set up to be our initial instance of Pinot (we are initially just provisioning one instance, until we need to scale up). The july-13-segment
directory just contains 3 files: two data files that are meant to wind up in the same segment; the files are named 2022-07-13T22_02_50.179274000Z.json and 2022-07-13T22_02_52.770718122Z.json, and each contains a single JSON string that we were able to successfully import into Pinot via Kafka (until we determined that Kafka seemed to be our performance bottleneck). The third file, named schema.json
, is the definition of the schema of the table I want the segments to go into. I'll attach it to this message, in case the contents do matter. When I run the above command, the output/error message are:
... [Lots of messages about plugins]
Uploading segment tar file: ./july-13-segment/schema.json
Sending request: <http://A.B.C.D:9000/v2/segments?tableName> to controller: pinot-controller-0.pinot-controller-headless.pinot-quickstart.svc.cluster.local, version: Unknown
org.apache.pinot.common.exception.HttpErrorStatusException: Got error status code: 500 (Internal Server Error) with reason: "Exception while uploading segment: Input is not in the .gz format" while sending request: <http://A.B.C.D:9000/v2/segments?tableName> to controller: pinot-controller-0.pinot-controller-headless.pinot-quickstart.svc.cluster.local, version: Unknown
at org.apache.pinot.common.utils.http.HttpClient.wrapAndThrowHttpException(HttpClient.java:442)
at org.apache.pinot.common.utils.FileUploadDownloadClient.uploadSegment(FileUploadDownloadClient.java:597)
at org.apache.pinot.tools.admin.command.UploadSegmentCommand.execute(UploadSegmentCommand.java:176)
at org.apache.pinot.tools.Command.call(Command.java:33)
at org.apache.pinot.tools.Command.call(Command.java:29)
at picocli.CommandLine.executeUserObject(CommandLine.java:1953)
at picocli.CommandLine.access$1300(CommandLine.java:145)
at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2352)
at picocli.CommandLine$RunLast.handle(CommandLine.java:2346)
at picocli.CommandLine$RunLast.handle(CommandLine.java:2311)
at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179)
at picocli.CommandLine.execute(CommandLine.java:2078)
at org.apache.pinot.tools.admin.PinotAdministrator.execute(PinotAdministrator.java:165)
at org.apache.pinot.tools.admin.PinotAdministrator.main(PinotAdministrator.java:196)
If, instead of specifying the directory with the 2 data files and the schema.json file, I created a .july-13-segment.tar.gz
file (the same directory, tarred and gzipped), and specify that filename instead of the directory, namely
build/bin/pinot-admin.sh UploadSegment -controllerHost 35.226.77.155 -controllerPort 9000 -segmentDir ./july-13-segment.tar.gz
I get nearly the same error message (just without the "Input is not in the .gz format" part of the error:
...
Executing command: UploadSegment -controllerProtocol http -controllerHost A.B.C.D -controllerPort 9000 -segmentDir ./july-13-segment.tar.gz
java.lang.NullPointerException
at org.apache.pinot.shaded.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:770)
at org.apache.pinot.tools.admin.command.UploadSegmentCommand.execute(UploadSegmentCommand.java:158)
at org.apache.pinot.tools.Command.call(Command.java:33)
at org.apache.pinot.tools.Command.call(Command.java:29)
at picocli.CommandLine.executeUserObject(CommandLine.java:1953)
at picocli.CommandLine.access$1300(CommandLine.java:145)
at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2352)
at picocli.CommandLine$RunLast.handle(CommandLine.java:2346)
at picocli.CommandLine$RunLast.handle(CommandLine.java:2311)
at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179)
at picocli.CommandLine.execute(CommandLine.java:2078)
at org.apache.pinot.tools.admin.PinotAdministrator.execute(PinotAdministrator.java:165)
at org.apache.pinot.tools.admin.PinotAdministrator.main(PinotAdministrator.java:196)
My question: The "UploadSegment" documentation is rather incomplete; it really does not spell out what needs to be included in the directory (or the tarred-and-gzipped version of the directory, except that the files should have a suffix indicating their type; I am using "json" for all 3). Do I need to include additional files (if so, what is needed?), or rename any of the files?
(Thanks in advance!)Mayank
Mayank
Marc Kriguer
07/25/2022, 5:10 PMMarc Kriguer
07/25/2022, 8:32 PM/ingestFromFile
curl command to work -- now it's just a matter of getting the individual files that "add up" to the segment together into one file; the documentation indicates that API is "NOT meant for ... large input files"; how large is "large" ? My two test files were 10K and 7K in size, each containing 2 seconds of data. I believe we may want 1 segment to represent 15 minutes of data; is 4.5MB of data hitting that "large" threshold? Or does "large" mean gigabytes (or larger)?Mayank
Marc Kriguer
07/27/2022, 5:58 PM