interesting question. the benefit of reading/writing is that you "cache" results so you don't need to re-compute them, but here your bottleneck is reading/writing. parquet is a great format for fast read/write operations but I'm unsure how fast it is for text data. feather is another alternative
https://parquet.apache.org/
https://arrow.apache.org/docs/python/feather.html
please let us know if this helps!