Hi I need to extract text from a large number of ...
# random
q
Hi I need to extract text from a large number of financial documents (trade agreements mostly, but varied formats). Which is the best OCR API service in your opinion? I have tried the ones from AWS, GCP and Azure. My initial observations: 01 In terms of pure text accuracy, I think GCP is the best of the three. 02 AWS offers an additional "analyze" service, which was able to extract some structure of the document too, eg key-value pairs (form) and tables. This isn't highly accurate, but better than nothing (GCP and Azure just offer words & lines for structure currently). Ideally, I want best text accuracy & structure (table and form values). Any ideas / opinions? I am also open to implement some model myself (specially for extracting the document structure) if there is some relevant paper / resource you know of and can share. Thanks for reading!