Hi
I need to extract text from a large number of financial documents (trade agreements mostly, but varied formats). Which is the best OCR API service in your opinion? I have tried the ones from AWS, GCP and Azure. My initial observations:
01 In terms of pure text accuracy, I think GCP is the best of the three.
02 AWS offers an additional "analyze" service, which was able to extract some structure of the document too, eg key-value pairs (form) and tables. This isn't highly accurate, but better than nothing (GCP and Azure just offer words & lines for structure currently).
Ideally, I want best text accuracy & structure (table and form values). Any ideas / opinions? I am also open to implement some model myself (specially for extracting the document structure) if there is some relevant paper / resource you know of and can share. Thanks for reading!