Ken Krugler
12/15/2020, 1:32 AMinputDirURI: 'hdfs://<clustername>/user/hadoop/pinot-input/'
includeFileNamePattern: 'glob:**/us_*.gz'
outputDirURI: 'hdfs://<clustername>/user/hadoop/pinot-segments/'
When I run the job, segments are generated, but then each segment fails with something like:
Failed to generate Pinot segment for file - hdfs:/user/hadoop/pinot-input/us_2020-03_03.gz
java.lang.IllegalStateException: Unable to extract out the relative path based on base input path: hdfs://<clustername>/user/hadoop/pinot-input/
So it looks like the input file URI is getting the authority (<clustername>
) stripped out, which is why the baseInputDir.relativize(inputFile)
call fails to generate appropriate results in SegmentGenerationUtils.getRelativeOutputPath
. Or is there something else I need to be doing here to get this to work properly? I’m able to read the files, so the inputDirURI
is set up properly (along with HDFS jars).Xiang Fu
Ken Krugler
12/15/2020, 1:38 AM'hdfs://<clustername>/user/hadoop/pinot-input/'
(in job yml file). But input file URI is hdfs:/user/hadoop/pinot-input/us_2020-03_03.gz
Xiang Fu
Ken Krugler
12/15/2020, 4:02 AMgetFileURI()
at line 247 in SegmentGenerationJobRunner
. This method needs to also get the authority from the base input directory URI, as otherwise it gets just the path (the /user/hadoop/pinot-input/somefile.gz
bit), sees that the resulting URI doesn’t have a protocol, and uses the provided protocol to construct the URI - but the authority has been lost.