Databricks proving lakehouse pattern by breaking w...
# good-reads-and-discussions
g
Databricks proving lakehouse pattern by breaking warehouse performance record: https://databricks.com/blog/2021/11/02/databricks-sets-official-data-warehousing-performance-record.html
k
They haven't shared any details about the Snowflake cluster.
The cluster was also warmed up earlier
r
if the result is so great why not open source the benchmark and make it reproducible? if you follow Gartner, the best data integration tool is from Informatica 😛
g
@Kamil Breguła good points, I certainly wouldn't argue this proves lakehouse/databricks as the ultimate pattern but I'm excited for the future of it and thought this was an interesting milestone 😄 @Rick Radewagen the benchmark is just one of many set by a separate organisation, TPC. Gartner are actually an associate member.
r
@George Claireaux (Airbyte), I know TPC, but I also know that a lot of tuning and tweaking can make a huge difference for benchmarking. Dealing with 100TB of data, tuning is usually needed for optimal cost-performance. Non open benchmarks, conducted by the vendor itself are suspicious for me. I would also not trust Snowflake if they publish something like that. A good example for how it should be done: https://github.com/fivetran/benchmark
👍 2
b
Still waiting for that delta lake connector…
k