I want to compress the records using snappy In file system s Apache Flink #troubleshooting

Join Slack

I want to compress the records using snappy In fil...

# troubleshooting

Aly Ayman

09/01/2024, 12:22 PM

I want to compress the records using snappy In file system sink but this error comes

java.lang.UnsatisfiedLinkError: 'boolean org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()'

at org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native Method)

at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:63)

at org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(SnappyCodec.java:136)

at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:150)

at org.apache.hadoop.io.compress.CompressionCodec$Util.createOutputStreamWithCodecPool(CompressionCodec.java:131)

at org.apache.hadoop.io.compress.SnappyCodec.createOutputStream(SnappyCodec.java:101)

D. Draco O'Brien

09/01/2024, 1:23 PM

do you have this path set?

Copy code

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/hadoop/lib/native

D. Draco O'Brien

09/01/2024, 1:24 PM

if you are using docker/kubernetes be sure the hadoop libraries are included in your image

Aly Ayman

09/01/2024, 1:28 PM

Am using standalone flink cluster

Aly Ayman

09/01/2024, 1:28 PM

So what libs should be in pom ?

Aly Ayman

09/01/2024, 1:28 PM

Or in the lib

D. Draco O'Brien

09/01/2024, 1:35 PM

I think you need libhadoop.so, libsnappy.so

D. Draco O'Brien

09/01/2024, 1:37 PM

also possibly libhadoopfs.so, libhadooppipes.so, libhadoopcyrpto.so, libnative.so

D. Draco O'Brien

09/01/2024, 1:37 PM

this are normally found in the a hadoop installation and are the shared libraries.

Aly Ayman

09/01/2024, 1:38 PM

Ok I will try it , thanks

D. Draco O'Brien

09/01/2024, 1:41 PM

ok after downloading you can check if everything is setup ok with

Copy code

ldconfig -p | grep snappy

D. Draco O'Brien

09/01/2024, 1:41 PM

this will check if its setup in the dynamic linkers cache

Aly Ayman

09/01/2024, 1:44 PM

Ok I will try . If you don’t mind, could you recommend whether FastCSV or Apache Commons CSV is better for high throughput (100 MB/s) from Kafka? Also, should I map CSV data to POJOs or return it as strings as all I want is reduce number of columns ?

D. Draco O'Brien

09/01/2024, 5:27 PM

well FastCSV is indeed a bit faster from a high performance standpoint however it’s not as feature rich so there is a tradeoff. You have more control of processing with Apache Commons CSV. So I think FastCSV if you data is rather simple.

D. Draco O'Brien

09/01/2024, 5:32 PM

Concerning POJOS. Given your objective of merely reducing the number of columns, both options could work. Mapping to POJOs would provide more structure and could be beneficial if you plan to extend functionality later. However, if simplicity and raw speed are critical, directly manipulating strings might be more efficient.

Open in Slack

Previous Next