Hi I am running into an index out of bounds issue ...
# troubleshooting
a
Hi I am running into an index out of bounds issue reported here: https://stackoverflow.com/questions/65342630/flink-sql-read-hive-table-throw-java-lang-arrayindexoutofboundsexception-1024/65351402#65351402 Has this been fixed? If yes how can I configure it in ORC reader? This is my reader code:
Copy code
tableEnv.createTemporaryTable("myTable", TableDescriptor.forConnector("filesystem")
        .schema(schema)
        .option("path", "path-to-file.orc")
        .format("orc")
        .build());
Exception I get:
Copy code
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024
	at org.apache.orc.impl.TreeReaderFactory$TreeReader.nextVector(TreeReaderFactory.java:255)
	at org.apache.orc.impl.TreeReaderFactory$DoubleTreeReader.nextVector(TreeReaderFactory.java:762)
	at org.apache.orc.impl.ConvertTreeReaderFactory$DecimalFromDoubleTreeReader.nextVector(ConvertTreeReaderFactory.java:1297)
My ORC files are written with SNAPPY compression so maybe thats the reason for this error? Is there any way to provide the compression format while reading orc files? I am not able to find much information on Google. Thanks!
I was able to get past the error by explicitly specifying a newer version of apache orc library. Seems like there was a bug which is fixed in
1.6.6
and even the latest Flink
1.17.0
still uses the old orc library
1.5.6
However I am now facing a different issue:
Copy code
Table myTable = tableEnv.from("myTable")
        .where($("raw_id").isEqual("3ugg02hhun851"))
        .where($("data").isNotNull())
        .limit(1);
A simple isEqual filter seems to work and prints the data however when I add a NOT predicate using
isNotNull
it throws a null pointer exception.
Copy code
Caused by: java.lang.NullPointerException
	at org.apache.flink.orc.OrcFilters$Not.add(OrcFilters.java:678)
Anyone has an idea what might be the problem here?
Seems like this happens to only nested Row fields. The schema is defined something like this:
Copy code
Schema schema = Schema.newBuilder()
        .column("id", DataTypes.BIGINT())
        .column("raw_id", DataTypes.STRING())
        .column("data", DataTypes.ROW(
                DataTypes.FIELD("gender", <http://DataTypes.INT|DataTypes.INT>())

        ))
        .build();
So I debugged further into the code and was able to find the root cause in Flink OrcFilters library:
Copy code
PredicateLeaf.Type colType =
        toOrcType(
                ((FieldReferenceExpression) callExp.getChildren().get(0))
                        .getOutputDataType());
if (colType == null) {
    // unsupported type
    LOG.debug(
            "Unsupported predicate [{}] cannot be pushed into OrcFileSystemFormatFactory.",
            callExp);
    return null;
}
Seems like the colType is being returned as null? Does Flink orc not support nested fields? I could only see simple fields in
toOrcType
method.