Query Optimization

Thousands of possible query plans for a given query that differ by orders of magnitude in their performance

  1. Alternative plans that derive from equivalences in RA
  2. Alternative plans that choose different implementations of RA operations

How is the best plan found?

  1. Review basic always good transformations
  2. Cost-based join order selection in next unit
    df
    → Finding an optimal plan is computationally not feasible

General Approach

Pasted image 20231213210107.png

All Physical plans equivalent to the query

In practice

  1. Consider only plans of a certain form (restrictions on the search space)
  2. Focus on eliminating really bad queries

Plans having the lowest cost

Not possible to run the plan to find out

Estimate the cost based on statistical metadata collected by the DBMS on database instances