Schedule - PGConf.dev 2026

Extending Extended Statistics to Joins

Date: 2026-05-20
Time: 16:00–16:25
Room: Labatt (1700)
Level: Intermediate
Feedback: Leave feedback

Accurate join selectivity estimation is one of the longest-standing challenges in query planning. While PostgreSQL's extended statistics — defined via CREATE STATISTICS — improve planner estimates for correlated filters and multi-column predicates, the planner currently applies them only to single-table scans, not to join selectivity estimation. This limitation means that even when the database has information about correlated columns, it still assumes independence across tables during join planning, often resulting in suboptimal join orders and unnecessarily expensive plans.

In this talk, I will present a proof-of-concept patch for extending PostgreSQL's statistics framework to joins, focusing on collecting and using join-level MCV (Most Common Value) statistics for common patterns, including but not limited to foreign key joins. The work explores catalog extensions, changes to ANALYZE, and planner-side integration that allow join predicates to benefit from real data distributions of joins rather than independence assumptions.

I will focus on evaluating and operationalizing join statistics. Specifically, I will present results on how improved selectivity affects join order decisions, plan quality, and execution-time performance using join-order–sensitive benchmarks. I will also present measurements of the performance cost of collecting join statistics during ANALYZE, and discuss maintenance aspects, including how join statistics interact with VACUUM, when they should be refreshed, and what configurations might be needed.

Speaker

Alexandra Wang