Hi <@UDQU92KBK> <@UDRLN8MAP> <@U0122PWUVJT> We ha...
# troubleshooting
s
Hi @Mayank @Neha Pawar @Daniel Lavoie We have segments backed-up in GCS. How can we reuse that segment files?? How backup and restore works?? Do we have any documentation for this? CC: @Sadim Nadeem @Mohamed Sultan
k
you dont need to do anything special.. back up/restore is built into Pinot. You can add remove nodes and segments will be pulled from deep store when needed
s
Thanks for the response @Kishore G Can we deploy the pinot in the new GKE cluster. and have the same data (which is already there in GCS deepstore) in the new pinot cluster. If yes can you suggest what can we done for this case. Thanks
m
I think I discussed this in detail with one of the team members?
s
Yes Mayank I also had discussion with you related to offline and realtime tables and came up with Hybrid table. But I am not able to get the proper answer for reusing the GCS Backed-up segment file. In case we have to move pinot to different GKE Cluster. CC: @Sadim Nadeem
d
GCS segments files are managed by the Helix cluster of Pinot (Zookeeper).
You can’t share segments between 2 Pinot clusters.
You could use segment download and upload apis to copy then between clusters
s
"You can’t share segments between 2 Pinot clusters."--> Not at the same time. If one pinot or GKE cluster it self crashed. can't we reuse the same segment files to create pinot in different GKE Cluster??
k
Yes.. you can
s
"You could use segment download and upload apis to copy then between clusters"--> can you help me with download and upload apis. How it could be done??
k
You can simply call invoke the rest api calls to upload segment from deep store into any cluster
d
You can’t share segments between 2 Pinot clusters.
I meant in active / active fashion.
s
no .. we dont need active/active
s
"You can simply call invoke the rest api calls to upload segment from deep store into any cluster" @Kishore G Do we have any docs or link for this??
m
Most questions have answers in docs.pinot.apache.org and the search is also good. For example, https://docs.pinot.apache.org/operators/cli#upload-segments
I can summarize our previous discussion here, if you can help create a FAQ for it?
@Shailesh Jha
s
Thanks Mayank will check on this link and update you.
s
sure mayank .. I will create the FAQ .. please summarize
s
I am inside pinot-controller pod. Trying to run this cmd
pinot-admin.sh UploadSegment -controllerHost localhost -controllerPort 9000 -segmentDir /path/to/local/dir -tableName myTable
But I am getting error as Address already in use
Copy code
Exception in thread "main" java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at sun.instrument.InstrumentationImpl.loadClassAndStartAgent(InstrumentationImpl.java:386)
        at sun.instrument.InstrumentationImpl.loadClassAndCallPremain(InstrumentationImpl.java:401)
Caused by: java.net.BindException: Address already in use
        at sun.nio.ch.Net.bind0(Native Method)
        at sun.nio.ch.Net.bind(Net.java:461)
        at sun.nio.ch.Net.bind(Net.java:453)
        at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:222)
        at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:85)
        at sun.net.httpserver.ServerImpl.bind(ServerImpl.java:133)
        at sun.net.httpserver.HttpServerImpl.bind(HttpServerImpl.java:54)
        at io.prometheus.jmx.shaded.io.prometheus.client.exporter.HTTPServer.<init>(HTTPServer.java:145)
        at io.prometheus.jmx.shaded.io.prometheus.jmx.JavaAgent.premain(JavaAgent.java:31)
        ... 6 more
FATAL ERROR in native method: processing of -javaagent failed
Aborted (core dumped)
d
You should run that command from a new pod which is not already running Pinot. Default configuration are trying to configure the prometheus JMX exporter on on the JVM of the segment uploader. The same port is already used by the pinot controller jmx exporter.
or
Just run
JAVA_OPTS="" && pinot-admin.sh UploadSegment -controllerHost localhost -controllerPort 9000 -segmentDir /path/to/local/dir -tableName myTable
This will remove the exporter configuration from the upload segment jvm
s
ok
I have run the GCS Job Spec of ingesting data to the offline table from GCS Segment(Backup) files. But getting few errors.
Failed to generate Pinot segment for file
java.lang.IllegalStateException: Invalid segment start/end time:
Copy code
executionFrameworkSpec:
            name: 'standalone'
            segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'
            segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'
            segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner'
        jobType: SegmentCreationAndTarPush
        inputDirURI: '<gs://pinot-dev/pinot-data>'
        outputDirURI: '<gs://pinot-restore-test/table-test>'
        overwriteOutput: true
        pinotFSSpecs:
            - scheme: gs
              className: org.apache.pinot.plugin.filesystem.GcsPinotFS
              configs:
                projectId: 'test'
                gcpKey: '/var/gcs/keys/gcs-key.json'
        recordReaderSpec:
            dataFormat: 'csv'
            className: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReader'
            configClassName: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReaderConfig'
        tableSpec:
            tableName: 'test'
        pinotClusterSpecs:
            - controllerURI: '<http://localhost:9000>'
Does this job spec looks good.
what should be the "dataFormat: 'csv'" if using GCS?? Is CSV Fine??
And Using this cmd to run the job-spec
JAVA_OPTS="" && bash pinot-admin.sh LaunchDataIngestionJob -jobSpecFile /var/gcs/job-spec/sample.json
I have followed these steps from the docs: ->Define Schema ->Define Table Config ->Upload Schema and Table configs ->Upload data [getting errors] CC: @Sadim Nadeem
n
this looks like a mismatch between the time column format in your data and what you’ve defined in the schema.
can you share a sample row from your csv, and your schema and table conf?