I want to compress the records using snappy In fil...
# troubleshooting
a
I want to compress the records using snappy In file system sink but this error comes
java.lang.UnsatisfiedLinkError: 'boolean org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()'
at org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native Method)
at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:63)
at org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(SnappyCodec.java:136)
at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:150)
at org.apache.hadoop.io.compress.CompressionCodec$Util.createOutputStreamWithCodecPool(CompressionCodec.java:131)
at org.apache.hadoop.io.compress.SnappyCodec.createOutputStream(SnappyCodec.java:101)
d
do you have this path set?
Copy code
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/hadoop/lib/native
if you are using docker/kubernetes be sure the hadoop libraries are included in your image
a
Am using standalone flink cluster
So what libs should be in pom ?
Or in the lib
d
I think you need libhadoop.so, libsnappy.so
also possibly libhadoopfs.so, libhadooppipes.so, libhadoopcyrpto.so, libnative.so
this are normally found in the a hadoop installation and are the shared libraries.
a
Ok I will try it , thanks
d
ok after downloading you can check if everything is setup ok with
Copy code
ldconfig -p | grep snappy
this will check if its setup in the dynamic linkers cache
a
Ok I will try . If you don’t mind, could you recommend whether FastCSV or Apache Commons CSV is better for high throughput (100 MB/s) from Kafka? Also, should I map CSV data to POJOs or return it as strings as all I want is reduce number of columns ?
d
well FastCSV is indeed a bit faster from a high performance standpoint however it’s not as feature rich so there is a tradeoff. You have more control of processing with Apache Commons CSV. So I think FastCSV if you data is rather simple.
Concerning POJOS. Given your objective of merely reducing the number of columns, both options could work. Mapping to POJOs would provide more structure and could be beneficial if you plan to extend functionality later. However, if simplicity and raw speed are critical, directly manipulating strings might be more efficient.