Hi I am running into an index out of bounds issue reported h Apache Flink #troubleshooting

Hi I am running into an index out of bounds issue ...

Adesh Dsilva

04/07/2023, 5:27 PM

Hi I am running into an index out of bounds issue reported here: https://stackoverflow.com/questions/65342630/flink-sql-read-hive-table-throw-java-lang-arrayindexoutofboundsexception-1024/65351402#65351402 Has this been fixed? If yes how can I configure it in ORC reader? This is my reader code:

Copy code

tableEnv.createTemporaryTable("myTable", TableDescriptor.forConnector("filesystem")
        .schema(schema)
        .option("path", "path-to-file.orc")
        .format("orc")
        .build());

Exception I get:

Copy code

Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024
	at org.apache.orc.impl.TreeReaderFactory$TreeReader.nextVector(TreeReaderFactory.java:255)
	at org.apache.orc.impl.TreeReaderFactory$DoubleTreeReader.nextVector(TreeReaderFactory.java:762)
	at org.apache.orc.impl.ConvertTreeReaderFactory$DecimalFromDoubleTreeReader.nextVector(ConvertTreeReaderFactory.java:1297)

Adesh Dsilva

04/11/2023, 10:21 AM

My ORC files are written with SNAPPY compression so maybe thats the reason for this error? Is there any way to provide the compression format while reading orc files? I am not able to find much information on Google. Thanks!

Adesh Dsilva

04/12/2023, 10:28 AM

I was able to get past the error by explicitly specifying a newer version of apache orc library. Seems like there was a bug which is fixed in

1.6.6

and even the latest Flink

1.17.0

still uses the old orc library

1.5.6

However I am now facing a different issue:

Copy code

Table myTable = tableEnv.from("myTable")
        .where($("raw_id").isEqual("3ugg02hhun851"))
        .where($("data").isNotNull())
        .limit(1);

A simple isEqual filter seems to work and prints the data however when I add a NOT predicate using

isNotNull

it throws a null pointer exception.

Copy code

Caused by: java.lang.NullPointerException
	at org.apache.flink.orc.OrcFilters$Not.add(OrcFilters.java:678)

Anyone has an idea what might be the problem here?

Adesh Dsilva

04/12/2023, 11:17 AM

Seems like this happens to only nested Row fields. The schema is defined something like this:

Copy code

Schema schema = Schema.newBuilder()
        .column("id", DataTypes.BIGINT())
        .column("raw_id", DataTypes.STRING())
        .column("data", DataTypes.ROW(
                DataTypes.FIELD("gender", <http://DataTypes.INT|DataTypes.INT>())

        ))
        .build();

Adesh Dsilva

04/12/2023, 11:57 AM

So I debugged further into the code and was able to find the root cause in Flink OrcFilters library:

Copy code

PredicateLeaf.Type colType =
        toOrcType(
                ((FieldReferenceExpression) callExp.getChildren().get(0))
                        .getOutputDataType());
if (colType == null) {
    // unsupported type
    LOG.debug(
            "Unsupported predicate [{}] cannot be pushed into OrcFileSystemFormatFactory.",
            callExp);
    return null;
}

Seems like the colType is being returned as null? Does Flink orc not support nested fields? I could only see simple fields in

toOrcType

method.

Open in Slack

Previous Next