Aaron Wishnick
04/14/2021, 8:11 PMApr 14, 2021 3:16:33 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: time spent so far 0% reading (1854 ms) and 99% processing (311813 ms)
Is there a setting to use more cores to process segments in parallel or anything like that?Daniel Lavoie
04/14/2021, 8:13 PMAaron Wishnick
04/14/2021, 8:16 PMDaniel Lavoie
04/14/2021, 8:17 PMAaron Wishnick
04/14/2021, 8:18 PMAaron Wishnick
04/14/2021, 9:14 PMKen Krugler
04/15/2021, 3:42 PMsegmentCreationJobParallelism
in the job yaml file that should be set to the number of cores you’ve got. Though depending on your table definition (e.g. is createInvertedIndexDuringSegmentGeneration
set true) you might run out of memory if your parallelism is too high.Ken Krugler
04/15/2021, 3:43 PMAaron Wishnick
04/15/2021, 4:56 PMKen Krugler
04/15/2021, 4:57 PMKen Krugler
04/15/2021, 4:58 PMKen Krugler
04/15/2021, 4:59 PMAaron Wishnick
04/15/2021, 5:22 PMKen Krugler
04/15/2021, 5:42 PMAaron Wishnick
04/15/2021, 6:42 PMKen Krugler
04/15/2021, 6:50 PMKen Krugler
04/15/2021, 6:52 PMAaron Wishnick
04/15/2021, 6:59 PMAaron Wishnick
04/15/2021, 7:00 PMKen Krugler
04/15/2021, 7:13 PMKen Krugler
04/15/2021, 7:13 PMKen Krugler
04/15/2021, 7:14 PMKen Krugler
04/15/2021, 7:15 PMKen Krugler
04/15/2021, 7:15 PMAaron Wishnick
04/15/2021, 7:40 PMAaron Wishnick
04/15/2021, 7:41 PMAaron Wishnick
04/15/2021, 7:43 PMAaron Wishnick
04/15/2021, 8:00 PMKen Krugler
04/15/2021, 8:15 PMKen Krugler
04/15/2021, 8:17 PM