Hi Team, i am trying to implement datastream and t...
# random
v
Hi Team, i am trying to implement datastream and table API both. i am converting datastream to table and again to datastream. this datastream is realtime stream, so data will continuously insert to table. so i want to know will this occure memory lickege problem? https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/data_stream_api/
j
There shouldn't be a memory leak issue if you are properly handling the conversion. Have you encountered any specific problems?
v
Thanks for reponse @Jane Chan, i didn't encountered any issue but we are testing it local level as of now, but once we go prod it might occure issue. i dont know how flink is storing tables? i have 5m data per data in pipeline, table will continuously store this data and it will occure some memory issue right?
because it will up when we re-deploy without checkpoint
can we apply window with the table?
j
i dont know how flink is storing tables?
Flink is primarily a compute engine and does not directly store data. The actual data storage for tables is typically managed by connectors to external systems or storage solutions integrated with Flink, such as Apache Kafka, Apache Cassandra, or other databases. For stateful computation, Flink uses internal state backend to store intermediate result. The state backend can be configured to store the state in various ways, you can take a reference to https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/fault-tolerance/state/ nightlies.apache.org Working with State Working with State # In this section you will learn about the APIs that Flink provides for writing stateful programs. Please take a look at Stateful Stream Processing to learn about the concepts behind stateful stream processing. Keyed DataStream # If you want to use keyed state, you first need to specify a key on a DataStream that should be used to partition the state (and also the records in the stream themselves).
v
yeah, i got it that flink store data through state-backend but at the end this state-backend will consume so much memory i think, because we had implemented process function with statevalues which was occuring memory issue.
For how long will it store the
statevalues
for table since I am processing 1 million records of 10KB each within 6 hours. Is there any way we can define/change the
statevalue
for tables preservation time which can behave like sliding window?
j
For window-based stateful operations, it's using watermark to keep state relative small. You can take a look at https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/concepts/overview/#state-management