kish
06/03/2020, 10:54 PMXiang Fu
➜ parquet-tools meta /tmp/data.parquet | grep "row group" | head -n 3
row group 1: RC:93898 TS:131168051 OFFSET:4
row group 2: RC:93921 TS:131157859 OFFSET:131168055
row group 3: RC:93899 TS:131162316 OFFSET:262325914
➜ parquet-tools meta /tmp/data.parquet
file: file:/tmp/data.parquet
creator: parquet-mr version 1.8.0 (build 0fda28af84b9746396014ad6a415b90592a98b3b)
extra: parquet.avro.schema = {"type":"record","name":"record","fields":[{"name":"met_float","type":"float"},{"name":"met_double","type":"double"},{"name":"dim_mv_long","type":{"type":"array","items":"long"}},{"name":"dim_sv_double","type":"double"},{"name":"met_long","type":"long"},{"name":"dim_sv_string","type":"string"},{"name":"dim_mv_int","type":{"type":"array","items":"int"}},{"name":"dim_mv_string","type":{"type":"array","items":"string"}},{"name":"dim_sv_int","type":"int"},{"name":"dim_mv_double","type":{"type":"array","items":"double"}},{"name":"dim_mv_float","type":{"type":"array","items":"float"}},{"name":"dim_sv_float","type":"float"},{"name":"met_int","type":"int"},{"name":"dim_sv_long","type":"long"}]}
file schema: record
--------------------------------------------------------------------------------
met_float: REQUIRED FLOAT R:0 D:0
met_double: REQUIRED DOUBLE R:0 D:0
dim_mv_long: REQUIRED F:1
.array: REPEATED INT64 R:1 D:1
dim_sv_double: REQUIRED DOUBLE R:0 D:0
met_long: REQUIRED INT64 R:0 D:0
dim_sv_string: REQUIRED BINARY O:UTF8 R:0 D:0
dim_mv_int: REQUIRED F:1
.array: REPEATED INT32 R:1 D:1
dim_mv_string: REQUIRED F:1
.array: REPEATED BINARY O:UTF8 R:1 D:1
dim_sv_int: REQUIRED INT32 R:0 D:0
dim_mv_double: REQUIRED F:1
.array: REPEATED DOUBLE R:1 D:1
dim_mv_float: REQUIRED F:1
.array: REPEATED FLOAT R:1 D:1
dim_sv_float: REQUIRED FLOAT R:0 D:0
met_int: REQUIRED INT32 R:0 D:0
dim_sv_long: REQUIRED INT64 R:0 D:0
row group 1: RC:93898 TS:131168051 OFFSET:4
--------------------------------------------------------------------------------
met_float: FLOAT UNCOMPRESSED DO:0 FPO:4 SZ:204570/204570/1.00 VC:93898 ENC:BIT_PACKED,PLAIN_DICTIONARY ST:[min: 1.2111664E-4, max: 0.9999642, num_nulls: 0]
met_double: DOUBLE UNCOMPRESSED DO:0 FPO:204574 SZ:244586/244586/1.00 VC:93898 ENC:BIT_PACKED,PLAIN_DICTIONARY ST:[min: 1.7748871805833843E-5, max: 0.9999900631857935, num_nulls: 0]
dim_mv_long:
.array: INT64 UNCOMPRESSED DO:0 FPO:449160 SZ:19375790/19375790/1.00 VC:2386112 ENC:RLE,PLAIN ST:[min: -9223315962872288953, max: 9223235375325574819, num_nulls: 0]
dim_sv_double: DOUBLE UNCOMPRESSED DO:0 FPO:19824950 SZ:244586/244586/1.00 VC:93898 ENC:BIT_PACKED,PLAIN_DICTIONARY ST:[min: 8.108521868788188E-5, max: 0.9999897173166792, num_nulls: 0]
met_long: INT64 UNCOMPRESSED DO:0 FPO:20069536 SZ:244586/244586/1.00 VC:93898 ENC:BIT_PACKED,PLAIN_DICTIONARY ST:[min: -9221656197292706803, max: 9223232869150310363, num_nulls: 0]
dim_sv_string: BINARY UNCOMPRESSED DO:0 FPO:20314122 SZ:460339/460339/1.00 VC:93898 ENC:BIT_PACKED,PLAIN_DICTIONARY ST:[no stats for this column]
dim_mv_int:
.array: INT32 UNCOMPRESSED DO:0 FPO:20774461 SZ:9740584/9740584/1.00 VC:2363940 ENC:RLE,PLAIN ST:[min: -2147462202, max: 2147465086, num_nulls: 0]
dim_mv_string:
.array: BINARY UNCOMPRESSED DO:0 FPO:30515045 SZ:70269656/70269656/1.00 VC:2373041 ENC:RLE,PLAIN ST:[no stats for this column]
dim_sv_int: INT32 UNCOMPRESSED DO:0 FPO:100784701 SZ:204578/204578/1.00 VC:93898 ENC:BIT_PACKED,PLAIN_DICTIONARY ST:[min: -2146586718, max: 2146995692, num_nulls: 0]
dim_mv_double:
.array: DOUBLE UNCOMPRESSED DO:0 FPO:100989279 SZ:19618358/19618358/1.00 VC:2416477 ENC:RLE,PLAIN ST:[min: 3.2332416607383507E-6, max: 0.9999963662410412, num_nulls: 0]
dim_mv_float:
.array: FLOAT UNCOMPRESSED DO:0 FPO:120607637 SZ:9906684/9906684/1.00 VC:2404955 ENC:RLE,PLAIN ST:[min: 4.529953E-6, max: 0.9999939, num_nulls: 0]
dim_sv_float: FLOAT UNCOMPRESSED DO:0 FPO:130514321 SZ:204570/204570/1.00 VC:93898 ENC:BIT_PACKED,PLAIN_DICTIONARY ST:[min: 1.579523E-5, max: 0.99991333, num_nulls: 0]
met_int: INT32 UNCOMPRESSED DO:0 FPO:130718891 SZ:204578/204578/1.00 VC:93898 ENC:BIT_PACKED,PLAIN_DICTIONARY ST:[min: -2146612524, max: 2147058892, num_nulls: 0]
dim_sv_long: INT64 UNCOMPRESSED DO:0 FPO:130923469 SZ:244586/244586/1.00 VC:93898 ENC:BIT_PACKED,PLAIN_DICTIONARY ST:[min: -9221351713267308050, max: 9223334243910011623, num_nulls: 0]
parquet-tools
to see the file metadata?kish
06/04/2020, 9:53 PMXiang Fu
➜ parquet-tools meta /tmp/data_foo_bar_baz_quiz_131168051.parquet
file: file:/tmp/data_foo_bar_baz_quiz_131168051.parquet
creator: parquet-mr version 1.8.0 (build 0fda28af84b9746396014ad6a415b90592a98b3b)
extra: parquet.avro.schema = {"type":"record","name":"record","fields":[{"name":"met_float","type":"float"},{"name":"met_double","type":"double"},{"name":"dim_mv_long","type":{"type":"array","items":"long"}},{"name":"dim_sv_double","type":"double"},{"name":"met_long","type":"long"},{"name":"dim_sv_string","type":"string"},{"name":"dim_mv_int","type":{"type":"array","items":"int"}},{"name":"dim_mv_string","type":{"type":"array","items":"string"}},{"name":"dim_sv_int","type":"int"},{"name":"dim_mv_double","type":{"type":"array","items":"double"}},{"name":"dim_mv_float","type":{"type":"array","items":"float"}},{"name":"dim_sv_float","type":"float"},{"name":"met_int","type":"int"},{"name":"dim_sv_long","type":"long"}]}
file schema: record
--------------------------------------------------------------------------------
met_float: REQUIRED FLOAT R:0 D:0
met_double: REQUIRED DOUBLE R:0 D:0
dim_mv_long: REQUIRED F:1
.array: REPEATED INT64 R:1 D:1
dim_sv_double: REQUIRED DOUBLE R:0 D:0
met_long: REQUIRED INT64 R:0 D:0
dim_sv_string: REQUIRED BINARY O:UTF8 R:0 D:0
dim_mv_int: REQUIRED F:1
.array: REPEATED INT32 R:1 D:1
dim_mv_string: REQUIRED F:1
.array: REPEATED BINARY O:UTF8 R:1 D:1
dim_sv_int: REQUIRED INT32 R:0 D:0
dim_mv_double: REQUIRED F:1
.array: REPEATED DOUBLE R:1 D:1
dim_mv_float: REQUIRED F:1
.array: REPEATED FLOAT R:1 D:1
dim_sv_float: REQUIRED FLOAT R:0 D:0
met_int: REQUIRED INT32 R:0 D:0
dim_sv_long: REQUIRED INT64 R:0 D:0
row group 1: RC:93898 TS:131168051 OFFSET:4
--------------------------------------------------------------------------------
ParquetRecordReaderTest.java
to create a test parquet file@Override
protected RecordReader createRecordReader()
throws Exception {
ParquetRecordReader recordReader = new ParquetRecordReader();
recordReader.init(_dataFile, _sourceFields, null);
return recordReader;
}
just like thiskish
06/04/2020, 10:07 PMXiang Fu
kish
06/05/2020, 1:23 AM