Hello, I am trying to create a local Pinot cluster...
# general
m
Hello, I am trying to create a local Pinot cluster with HDFS(running locally) as deep storage. Getting multiple class not found/method not found errors. Pinot Version -
0.9.3
and Hadoop Version
3.3.1
- Is there a compatibility matrix which I can refer?
m
Can you try Hadoop version
2.x
?
m
ok sure, thanks. Trying with Pinot
0.9.3
and Hadoop
2.7.1
r
this could be classloader related cc @User
what JDK version are you using @User?
m
AdoptOpenJDK-11.0.11+9
I was able to setup cluster with Hadoop
2.7.1
Segments were also created in HDFS using
SegmentCreationAndMetadataPush
standalone job although servers were failed to download segments from HDFS deep storage. Can’t seem to find the cause, exception printed doesn’t have much information. Servers were started with following configuration -
Copy code
pinot.service.role=SERVER
pinot.cluster.name=PinotCluster
pinot.zk.server=localhost:2181
pinot.set.instance.id.to.hostname=true
pinot.server.instance.enable.split.commit=true
pinot.server.storage.factory.class.hdfs=org.apache.pinot.plugin.filesystem.HadoopPinotFS
pinot.server.storage.factory.hdfs.hadoop.conf.path=/Users/zaikhan/servers/hadoop-2.7.1/etc/hadoop
pinot.server.segment.fetcher.protocols=file,http,hdfs
pinot.server.segment.fetcher.hdfs.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher
pinot.server.instance.dataDir=/tmp/pinot/data/server/index
pinot.server.instance.segmentTarDir=/tmp/pinot/data/server/segmentTar
pinot.server.grpc.enable=true

# server-1 ports
#pinot.server.grpc.port=8090
#pinot.server.netty.port=8098
#pinot.server.adminapi.port=8097

pinot.server.grpc.port=8091
pinot.server.netty.port=8001
pinot.server.adminapi.port=8011
Copy code
2022/01/07 14:36:37.636 ERROR [BaseTableDataManager] [HelixTaskExecutor-message_handle_thread] Attempts exceeded when downloading segment: transcript_OFFLINE_1571900400000_1571900400000_0 for table: transcript_OFFLINE from: <hdfs://localhost:9000/pinot/examples/batch/studentStats/segments/2022/01/02/transcript_OFFLINE_1571900400000_1571900400000_0.tar.gz> to: /tmp/pinot/data/server/index/transcript_OFFLINE/tmp-transcript_OFFLINE_1571900400000_1571900400000_0-83ceab0d-695c-4616-8630-d7f0bb6293ff/transcript_OFFLINE_1571900400000_1571900400000_0.tar.gz
2022/01/07 14:36:37.637 ERROR [StateModel] [HelixTaskExecutor-message_handle_thread] Default rollback method invoked on error. Error Code: ERROR
2022/01/07 14:36:37.637 ERROR [SegmentOnlineOfflineStateModelFactory$SegmentOnlineOfflineStateModel] [HelixTaskExecutor-message_handle_thread] Caught exception in state transition from OFFLINE -> ONLINE for resource: transcript_OFFLINE, partition: transcript_OFFLINE_1571900400000_1571900400000_0
org.apache.pinot.spi.utils.retry.AttemptsExceededException: Operation failed after 3 attempts
	at org.apache.pinot.spi.utils.retry.BaseRetryPolicy.attempt(BaseRetryPolicy.java:61) ~[pinot-all-0.9.3-jar-with-dependencies.jar:0.9.3-e23f213cf0d16b1e9e086174d734a4db868542cb]
	at org.apache.pinot.common.utils.fetcher.BaseSegmentFetcher.fetchSegmentToLocal(BaseSegmentFetcher.java:72) ~[pinot-all-0.9.3-jar-with-dependencies.jar:0.9.3-e23f213cf0d16b1e9e086174d734a4db868542cb]
	at org.apache.pinot.common.utils.fetcher.SegmentFetcherFactory.fetchSegmentToLocalInternal(SegmentFetcherFactory.java:147)
m
Seems like a config issue? Localhost:9000 in hdfs path
m
I had to add /localhost:9000 in`inputDirURI` and
outputDirURI
to get standalone ingestion job because without localhost:9000 segment tar creation were failing with below error -
Copy code
Failed to generate Pinot segment for file - <hdfs://localhost:9000/pinot/examples/batch/studentStats/rawdata/2022/01/02/studentStats_data_2022-01-02.csv>
java.lang.IllegalStateException: Unable to extract out the relative path for input file '<hdfs://localhost:9000/pinot/examples/batch/studentStats/rawdata/2022/01/02/studentStats_data_2022-01-02.csv>', based on base input path: hdfs:///pinot/examples/batch/studentStats/rawdata/
Just now, I removed localhost:9000 only from
outputDirURI
and server were able to download segments. here is the difference -
Copy code
inputDirURI: '<hdfs://localhost:9000/pinot/examples/batch/studentStats/rawdata/>'
outputDirURI: 'hdfs:///pinot/examples/batch/studentStats/segments/'
Thanks for quick responses, Issue is resolved.
m
Thanks for confirming. @User we might want to document the version compatiblity and any other learning from here