Yes your Airbyte assumption is correct here - would move point-to-point.
From my viewpoint, best practice for the solution to your problem is to fix your source data to something more usable - I’m not naive to suggest that is possible - so I’ll recognize as a limitation.
Airbyte will move the table point-to-point, you *can do a transform mid-flight, but i’d advise against this.
This suggestion is under the assumption of modern workflow, Extract > Load > Transform
At its base dbt does as you describe, but there’s also programmatic possibilities:
• It could allow for indefinite column definitions, or at least a macro for you to put various definitions of the same column in a bucket, to be able to treat different columns across multiple tables in the same fashion. Function like
dbt-utils.star() function (column auto-discover)
• Given you’re able to reconcile the multiple tables into standard streams, you can compile the disparate files into a unified dataset.
• Validation checks can also be written against loosely referenced definitions.
To approach from another angle, my alternative suggestion would be to check out
Google’s Document AI
It may be possible to train a model to go and mine the values from these tables as needed (given the values themselvves, rather than what someone’s named them),
specifically the table parsing function.