Apache Flink

:wave: Extremely new user here, just trying to load some data from parquet files and finding the process really hard compared to Spark. Every option I see within Flink requires me to specify some sort of schema (<https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/formats/parquet/#flink-rowdata|example>). But I would much prefer to just rely on Parquet’s built in schema…. any advice?

Have you considered using the SQL implementation for Parquet? E.g. <https://nightlies.apache.org/flink/flink-docs-stable/docs/connectors/table/formats/parquet/> ?

I think that would be easier.

<@U03GADV9USX> Thanks for the response, It seems like I have to specific a schema in that case too, no?

Yes, there’s no automatic import of Parquets schema to match it with the SQL type system

Such a type system doesn’t exist in the DataStream API, so there even more work is required

Maybe I am missing something. This code from the <https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/connectors/datastream/formats/parquet/|datastream API> reference looks like a type system too:
```row_type = DataTypes.ROW([
    DataTypes.FIELD('f7', DataTypes.DOUBLE()),
    DataTypes.FIELD('f4', <http://DataTypes.INT|DataTypes.INT>()),
    DataTypes.FIELD('f99', DataTypes.VARCHAR()),
])```
What is the difference you are referring too?