Google Cloud announced the general availability of BigQuery history-based optimizations, which can speed up query performance by up to 100x. This new feature is designed to learn from past query executions and identify additional improvements that can be applied to future executions.
One interesting aspect of BigQuery history-based optimizations is its ability to improve different types of queries, including those that involve highly selective joins. For example, if BigQuery identifies a join that results in a much smaller number of rows than its input, it may choose to run that join earlier in the execution plan. This can significantly reduce the amount of data that needs to be processed, leading to overall performance improvements.
Furthermore, BigQuery history-based optimizations can help reduce the amount of data that BigQuery scans by inserting selective semi-join operations throughout the query. In some cases, BigQuery can identify a highly selective join (similar to join pushdown) in a query with several parallel execution paths that are eventually joined together. BigQuery can then insert new “semijoin” operations based on the selective join that “reduces” the amount of data scanned and processed by those parallel execution paths.
Overall, BigQuery history-based optimizations are a valuable addition to BigQuery. By leveraging historical data from past query executions, this new feature can significantly improve query performance and reduce costs. Moreover, since it works automatically, users can benefit from these improvements without having to make any changes to their queries.