CSE-4/562 Spring 2021 - Relational Algebra Equivalences

February 11, 2021

If X and Y are __equivalent__ and Y is __better__,

then replace all Xs with Ys

**Today's focus**: Provable Equivalence for RA Expressions

$$Q_1 \stackrel{?}{\equiv} Q_2$$

- Only Instances Matter
- Obviously $Q_1 \neq Q_2$. What we care about is whether $Q_1(R) = Q_2(R)$...
- Data Independent
- ... for
*all*valid input data $R$. - However, it's fair to talk about equivalence when we know the data has some properties. (more on this later)
- Data-Model Dependent
- It's important to be clear whether we're talking about sets, bags, or lists.
- Attribute Order Doesn't matter
`A, B, C`is the same as`C, B, A`

We say that $Q_1 \equiv Q_2$ if and only if

we can guarantee that the *bag* of tuples produced by $Q_1(R, S, T, \ldots)$

is the same as the *bag* of tuples produced by $Q_2(R, S, T, \ldots)$

for any combination of valid inputs $R, S, T, \ldots$.

... that satisfy any necessary properties.

Selection | |
---|---|

$\sigma_{c_1 \wedge c_2}(R) \equiv \sigma_{c_1}(\sigma_{c_2}(R))$ | (Decomposability) |

Projection | |

$\pi_{A}(R) \equiv \pi_{A}(\pi_{A \cup B}(R))$ | (Idempotence) |

Cross Product | |

$R \times (S \times T) \equiv (R \times S) \times T$ | (Associativity) |

$R \times S \equiv S \times R$ | (Commutativity) |

Union | |

$R \cup (S \cup T) \equiv (R \cup S) \cup T$ | (Associativity) |

$R \cup S \equiv S \cup R$ | (Commutativity) |

Show that $$R \times (S \times T) \equiv T \times (S \times R)$$

Show that $$\sigma_{c_1}(\sigma_{c_2}(R)) \equiv \sigma_{c_2}(\sigma_{c_1}(R))$$

Show that $$R \bowtie_{c} S \equiv S \bowtie_{c} R$$

Show that $$\sigma_{R.B = S.B \wedge R.A > 3}(R \times S) \equiv \sigma_{R.A > 3}(R \bowtie_{B} S)$$

Selection + Projection | |
---|---|

$\pi_{A}(\sigma_{c}(R)) \equiv \sigma_{c}(\pi_{A}(R))$ | (Commutativity) |

... but only if $A$ and $c$ are __compatible__

$A$ must include all columns referenced by $c$ ($cols(c)$)

Show that $$\pi_A(\sigma_c(R)) \equiv \pi_A(\sigma_c(\pi_{(A \cup cols(c))}(R)))$$

Selection + Cross Product | |
---|---|

$\sigma_c(R \times S) \equiv (\sigma_{c}(R)) \times S$ | (Commutativity) |

... but only if $c$ references only columns of $R$

$cols(c) \subseteq cols(R)$

Show that $$\sigma_{R.B = S.B \wedge R.A > 3}(R \times S) \equiv (\sigma_{R.A > 3}(R)) \bowtie_{B} S$$

When is this rewrite a good idea?

Projection + Cross Product | |
---|---|

$\pi_A(R \times S) \equiv (\pi_{A_R}(R)) \times (\pi_{A_S}(S))$ | (Commutativity) |

... where $A_R$ and $A_S$ are the columns of $A$ from $R$ and $S$ respectively.

$A_R = A \cap cols(R)$ $A_S = A \cap cols(S)$

Show that $$\pi_{A}(R \bowtie_c S) \equiv (\pi_{A_R}(R)) \bowtie_c (\pi_{A_S}(S))$$

When does this condition hold?

Intersection | |
---|---|

$R \cap (S \cap T) \equiv (R \cap S) \cap T$ | (Associativity) |

$R \cap S \equiv S \cap R$ | (Commutativity) |

Selection + | |

$\sigma_c(R \cup S) \equiv (\sigma_c(R)) \cup (\sigma_c(R))$ | (Commutativity) |

$\sigma_c(R \cap S) \equiv (\sigma_c(R)) \cap (\sigma_c(R))$ | (Commutativity) |

Projection + Union | |

$\pi_A(R \cup S) \equiv (\pi_A(R)) \cup (\pi_A(R))$ | (Commutativity) |

Cross Product + Union | |

$R \times (S \cup T) \equiv (R \times S) \cup (R \times T)$ | (Distributivity) |

```
SELECT R.A, T.E
FROM R, S, T
WHERE R.B = S.B
AND S.C < 5
AND S.D = T.D
```

➔
**Input:** Dumb translation of SQL to RA

⬇︎

Apply rewrites

⬇︎

**Output:** Better, but equivalent query

Which rewrite rules should we apply?

- Selection Pushdown
**Always**commute Selections as close to the leaves as possible.- Join Construction
- Joins are
**always**better than cross-products. (if there's a good join algorithm) - (Optional) Projection Pushdown
- Commuting Projections down to the leaves removes redundant columns, and
**may**be beneficial for some systems. - Join Algorithm Selection
- Joins can be implemented differently, depending on the join predicate.
- Join/Union Ordering
- The order in which joins are evaluated
**may**affect query runtimes. - Access Paths
- $(\sigma_c(R))$ and $(Q(\ldots) \bowtie_c R)$ are special cases that we can make fast!

Some rewrites are situational... we need more information to decide when to apply them.

- Apply blind heuristics (e.g., push down selections)
- Enumerate all possible
*execution plans*by varying (or for a reasonable subset)- Join/Union Evaluation Order (commutativity, associativity, distributivity)
- Algorithms for Joins, Aggregates, Sort, Distinct, and others
- Data Access Paths

- Estimate the cost of each execution plan
- Pick the execution plan with the lowest cost (or best risk/reward)

(note: $c$ is always compatible in this direction)

```
plan.transform {
case Filter(condition, Project(columns, child)) =>
Project(columns, Filter(condition, child))
}
```

`match`/`case` lets you find patterns.

`transform` lets you apply rewrite rules.

(Slight oversimplification since Spark uses extended relational algebra)

What happens if I apply this rewrite to:

```
Filter(condition, Project(columns1, Project(columns2, child)))
```

↕
$$\sigma_c(\pi_{A_1}(\pi_{A_2}(R)))$$
⇓ $$\pi_{A_1}(\sigma_c(\pi_{A_2}(R)))$$

```
var last = null
while( ! plan.equals(last) ){
last = plan
plan = plan.transform { ... }
}
```

Repeat until we reach a "fixed-point"

```
plan.transformDown {
case Filter(condition, Project(columns, child)) =>
Project(columns, Filter(condition, child))
}
```

`transformUp`: Require bottom-up tree traversal.

`transformDown`: Require top-down tree traversal.

```
plan.transformDown {
case Filter(condition, Union(children, /* other goop */)) =>
Union(
children.map { child =>
Filter(condition, child)
},
/* other goop */
)
}
```

Checkpoint 1 overview.