Choosing the Right Join Method: Nested Loop, Sort Merge, or Hash Join

SQL Tuning Series  ·  Episode 30

Choosing the Right Join Method
Nested Loop, Sort Merge, or Hash Join

Learn how nested loop join, sort merge join, and hash join work, and how indexes, row counts, and memory influence join method selection.

Joins are central to relational databases.

This article is part of a beginner-friendly SQL tuning series. The goal is to connect each concept to practical query performance, not just define terminology.

They allow SQL to combine data from multiple tables, but they can also become major performance bottlenecks.

The three physical join methods every SQL tuner should understand are nested loop join, sort merge join, and hash join.

01 Nested Loop Join

A nested loop join reads rows from one table and repeatedly looks up matching rows in another table.

The first table is often called the outer or driving table. The second table is the inner or driven table.

Nested loop join works well when:

  • The outer result set is small.
  • The inner table has a useful index.
  • The query can return rows quickly.

It becomes inefficient when the outer result set is large and the inner lookup repeats many times.

02 Sort Merge Join

Sort merge join reads both inputs, sorts them by the join key, and then merges the sorted results.

It can be useful when:

  • Join ranges are large.
  • Indexes are not available.
  • The join condition is not only equality.

The downside is sorting cost. If both inputs are large, sorting can consume significant memory and temporary space.

03 Hash Join

Hash join builds a hash table from one input, then probes it with rows from the other input.

It is commonly used for large equality joins.

Hash join works well when:

  • The join condition is equality.
  • Index access would cause too much random I/O.
  • Sorting would be expensive.
  • The smaller build input fits in memory.

If the hash area is too small, the database may use temporary disk space, which can slow the join.

04 How the Optimizer Chooses

The optimizer considers row counts, indexes, join predicates, memory, statistics, and estimated cost.

For example:

  • Small outer table plus indexed inner table often favors nested loop.
  • Large unsorted inputs may favor hash join.
  • Large inputs that can be sorted or already ordered may favor sort merge.

05 The Tuner’s Role

A tuner should understand why a join method was chosen.

Ask:

  • Which table is driving the join?
  • How many rows are produced before the join?
  • Does the inner table have a useful index?
  • Is the join result small or large?
  • Is memory sufficient for hash or sort work?
  • Are statistics accurate?

06 Practical Example

Nested loop join can be effective when the driving set is small:

example.sql
SELECT o.order_id, c.customer_name
FROM orders o
JOIN customers c
  ON c.customer_id = o.customer_id
WHERE o.order_id = :order_id;

Hash join may be better for large equality joins:

example.sql
SELECT o.order_id, p.payment_amount
FROM orders o
JOIN payments p
  ON p.order_id = o.order_id
WHERE o.order_date >= DATE '2026-01-01';

The best join method depends on row counts, indexes, memory, and statistics.

07 Key Takeaway

There is no single best join method.

Nested loop, sort merge, and hash join each fit different data volumes and access conditions. SQL tuning means helping the optimizer choose the join strategy that matches the real workload.

카테고리 SQL

Cluster Tables and Physical Data Grouping

SQL Tuning Series  ·  Episode 29

Cluster Tables and Physical Data Grouping

Cluster tables physically group related rows to improve access for certain query patterns. Learn the basic idea and tuning trade-offs.

A cluster table is a physical storage design that groups related data together.

This article is part of a beginner-friendly SQL tuning series. The goal is to connect each concept to practical query performance, not just define terminology.

The idea is to place rows that are often accessed together in the same or nearby blocks.

01 Why Physical Grouping Matters

SQL is logical, but databases still read physical blocks.

If related rows are scattered across many blocks, the DBMS may perform more I/O to retrieve them.

If related rows are physically grouped, the database may read fewer blocks for certain queries.

02 A Simple Example

Imagine employee records are frequently queried by company ID.

If all employees for the same company are stored close together, a query for one company may read a compact set of blocks.

That can be more efficient than jumping across the table.

03 Cluster Key

Cluster tables use a cluster key to decide how rows are grouped.

Rows with the same or related cluster key values can be stored together.

This is a physical design choice, not just a logical table relationship.

04 Benefits

Clustered storage can help when:

  • Queries repeatedly access rows by the same grouping key.
  • Related rows are often read together.
  • Reducing random I/O is important.

05 Trade-Offs

Cluster tables are not a universal solution.

They can add complexity to data loading, maintenance, and storage design.

If access patterns do not match the cluster key, the benefit may be limited.

06 Practical Example

Clustered storage can help if queries repeatedly access related rows:

example.sql
SELECT *
FROM employees
WHERE company_id = :company_id;

If rows for the same company_id are physically grouped, fewer blocks may be needed.

07 Key Takeaway

Cluster tables improve performance by physically grouping related rows.

They are useful only when the grouping matches real query patterns and reduces the amount of block access required.

카테고리 SQL

Index-Organized Tables: When the Index Becomes the Table

SQL Tuning Series  ·  Episode 28

Index-Organized Tables
When the Index Becomes the Table

Learn what an index-organized table is, how it differs from a heap table, and when it may improve SQL access performance.

An index-organized table, often called an IOT, stores table data inside an index structure.

This article is part of a beginner-friendly SQL tuning series. The goal is to connect each concept to practical query performance, not just define terminology.

Instead of storing rows in a separate heap table and using an index to point to them, the table itself is organized by the primary key in a B-tree structure.

01 Heap Table vs IOT

In a typical heap-organized table, rows are stored without a primary key order.

An index stores key values and row locations. To get table data through the index, the database first searches the index, then uses the row location to access the table.

That creates two logical steps:

1. Find the row location in the index. 2. Visit the table block.

In an IOT, the row data is stored in the index leaf blocks.

The index leaf block is effectively the data block.

02 Why IOT Can Help

An IOT can reduce random table access because the database may find the row data directly in the index structure.

This can be useful for:

  • Tables frequently accessed by primary key
  • Small lookup tables
  • Tables used repeatedly in nested loop joins
  • Tables with narrow rows
  • Cases where primary key order is important

03 Trade-Offs

IOTs are not always better.

Because the table is organized by primary key, changes can affect the index structure. They may be less flexible for some update patterns.

Secondary indexes on IOTs can also be more complex because rows may move within the index structure.

04 When to Consider IOT

Consider an IOT when primary key access dominates and avoiding extra table lookup is valuable.

Be cautious when the table has many columns, frequent updates, or many secondary access paths.

05 Practical Example

An IOT-style design is most useful when queries often access rows by primary key:

example.sql
SELECT *
FROM code_lookup
WHERE code_id = :code_id;

If the row data is stored in the index structure, the database may avoid an extra table lookup.

06 Key Takeaway

An index-organized table stores data in primary key index order.

It can reduce lookup cost in the right workload, but it should be chosen based on access patterns, not because it sounds faster.

카테고리 SQL

How to Design Table Partitions

SQL Tuning Series  ·  Episode 27

How to Design Table Partitions

Use a practical process for table partition design, including candidate tables, partition keys, partition count, and index strategy.

Partitioning should be designed carefully.

This article is part of a beginner-friendly SQL tuning series. The goal is to connect each concept to practical query performance, not just define terminology.

A poor partition design can add complexity without improving performance.

Here is a practical process for thinking about table partitions.

01 1. Choose Candidate Tables

Partitioning is usually considered for tables that are:

  • Very large
  • Frequently accessed
  • Expected to grow significantly
  • Expensive to maintain
  • Limited by hardware or I/O capacity
  • Often processed by date, region, or another natural grouping

Small tables usually do not need partitioning.

02 2. Study Access Patterns

Ask how applications query the table.

Do queries filter by date? Customer region? Organization? Product group?

The partition key should match common filters. Otherwise, partition pruning may not happen.

03 3. Select a Partition Key

A good partition key should support:

  • Business meaning
  • Useful pruning
  • Balanced distribution
  • Manageable partition size
  • Future growth

For example, order_date is often a good candidate for time-based transactional tables.

04 4. Choose the Partition Type

Select range, hash, list, or composite partitioning based on the data and query pattern.

Range partitioning fits date ranges. Hash partitioning fits even distribution. List partitioning fits known categories.

05 5. Decide Partition Count

Too few partitions may not reduce enough work.

Too many partitions may add management and optimizer overhead.

Partition count should reflect data size, query patterns, and maintenance operations.

06 6. Design Indexes Together

Partitioning and indexing should be planned together.

Local and global index choices can affect query performance and maintenance complexity.

07 Practical Example

If most large-table queries include a date range:

example.sql
WHERE order_date >= :start_date
  AND order_date < :end_date

then order_date may be a strong partition key candidate.

If queries rarely filter by date, date partitioning may help maintenance but may not improve query performance much.

08 Key Takeaway

Partition design is not just splitting a table.

It is matching physical storage to real query and maintenance patterns so the DBMS can read less data and manage large tables more effectively.

카테고리 SQL