This message was deleted.
# ask-anything
s
This message was deleted.
👍 1
e
interesting question. the benefit of reading/writing is that you "cache" results so you don't need to re-compute them, but here your bottleneck is reading/writing. parquet is a great format for fast read/write operations but I'm unsure how fast it is for text data. feather is another alternative https://parquet.apache.org/ https://arrow.apache.org/docs/python/feather.html please let us know if this helps!
forgot to mention: parquet is a columnar format so one way to improve read operations is to only load the columns that you're gonna use e.g. if you have 100 columns but some script only needs the first 10, you can load them. both parquet and feather work great with Python and R
j
this is helpful, thanks! I’ll take a look
e
sure, please share your experience! would love to learn how parquet/feather do with text, I've only used them with numeric data
i
You can also use in memory only, but then the tradeoff is that you’d have to rerun it once the process dies. Agree on parquet, not sure how performant it’ll be on plain text. It also depends on how much write calls you’re performing.