Hi everyone! I’ve been doing POC on Pinot, and cur...
# general
s
Hi everyone! I’ve been doing POC on Pinot, and currently facing issue while ingestion orc file data to pinot. Filed an GH issue as well: https://github.com/apache/pinot/issues/8460 Can anyone help?
k
Hi can you add your schema and table config in the issue as well. Do remove the secret values.
s
Updated @User
k
To me it seems like, the column names in your orc file and the column names in your schema file, do not match. They should be the same.
s
How can I get the exact column name from the orc file?
k
java -jar orc-tools-X.Y.Z-uber.jar meta your-file.orc
should print the schema
s
I guess the columns are named as
_col0, _col2
and so on
k
java -jar orc-tools-1.5.5-uber.jar meta 000000_0
this should work in your case
Can you paste the metadata you got from command here?
s
Copy code
➜  batchjob-spec java -jar orc-tools-1.5.5-uber.jar meta 000000_0
log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See <http://logging.apache.org/log4j/1.2/faq.html#noconfig> for more info.
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.hadoop.security.authentication.util.KerberosUtil (file:/Users/satyam.raj/dataplatform/pinot-dist/batchjob-spec/orc-tools-1.5.5-uber.jar) to method sun.security.krb5.Config.getInstance()
WARNING: Please consider reporting this to the maintainers of org.apache.hadoop.security.authentication.util.KerberosUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Processing data file 000000_0 [length: 8321467]
Structure for 000000_0
File Version: 0.12 with HIVE_8732
Rows: 723010
Compression: ZLIB
Compression size: 262144
Type: struct<_col0:string,_col1:string,_col2:string,_col3:string,_col4:string,_col5:string,_col6:string,_col7:string,_col8:string,_col9:string,_col10:string,_col11:string,_col12:string,_col13:int,_col14:int,_col15:int,_col16:string,_col17:string,_col18:date,_col19:date,_col20:date,_col21:string,_col22:string>

Stripe Statistics:
  Stripe 1:
    Column 0: count: 723010 hasNull: false
    Column 1: count: 723010 hasNull: false min: 1000 max: 99999750 sum: 6114370
    Column 2: count: 723010 hasNull: false min: customer max: customer sum: 5784080
    Column 3: count: 723010 hasNull: false min: Birmingham max: wollongong sum: 2285843
k
Yep, then you will have to use same columnNames in schema. If you want the new column names, you can use
transformConfigs
in table config file
s
alright, thanks! one more question. what should i be using as datatype for the
date
fields in orc
k
long
s
It worked 🎉