Yes, we should do a quick retro. This is my fault.
I think there are two reasons:
1. We did see the acceptance test failure. But I thought it was due to how the acceptance test was setup. Since the test could not be updated locally, I decided that it did not worth the time to fix it. I did not realize that it was a genuine production issue.
2. The ultimate solution was to deprecate the file-based migration systems. I was hoping that this would be done soon so that we don’t need to deal with any more issues related to the file-based system. That was kind of wishful thinking.