162
Views

Modern databases often contain data that is not flat but organized in the form of trees or parent–child relationships. Examples include organizational charts, product categories, folder structures, and bill of materials. Working with this type of data requires a special type of SQL technique called a hierarchical query.

This article explains hierarchical queries in simple terms. It covers Oracle’s CONNECT BY model. The article explores Recursive CTEs used in PostgreSQL/MySQL/SQL Server. It includes ready-to-run examples with table creation and sample data.

Whether you’re an ETL developer, SQL analyst, or someone preparing for interviews, this guide covers everything you need.

What Is a Hierarchical Query?

A hierarchical query is an SQL technique used to navigate data that is linked to itself in a parent–child manner.

Common examples include:

  • Employee → Manager reporting chain
  • Category → Subcategory
  • Folder → Subfolder
  • Comment → Reply threads
  • Parent → Child relationships in CRM or ERP systems

In a typical hierarchy, the top-most element is called the root, and elements underneath it are called children, which may themselves have children, forming a multi-level tree.

Traditional SQL is designed for flat data, so hierarchical queries help us traverse these tree-like structures efficiently.

Why Are Hierarchical Queries Important?

Hierarchical queries allow you to answer questions like:

  • “Who reports to whom in an organization?”
  • “Show all subcategories under Electronics.”
  • “List all components used to build a product.”
  • “How many levels exist under this node?”

They are essential in:

✔ ETL pipelines
✔ Data warehouses
✔ BI reporting
✔ ERP and CRM systems
✔ Organization charts
✔ E-commerce category navigation

Hierarchical Queries in Oracle (CONNECT BY)

Oracle provides built-in keywords to work with hierarchical data:

  • START WITH → identifies the root node
  • CONNECT BY → defines parent–child relationship
  • PRIOR → used to link parent and child
  • LEVEL → shows depth of hierarchy

Oracle Example

SELECT 
    emp_id,
    emp_name,
    manager_id,
    LEVEL AS hierarchy_level
FROM employees
START WITH manager_id IS NULL
CONNECT BY PRIOR emp_id = manager_id
ORDER BY LEVEL, emp_id;

Oracle handles recursion internally and is extremely efficient for hierarchical trees.

Hierarchical Queries in PostgreSQL, MySQL, SQL Server (Recursive CTE)

Databases like PostgreSQL, SQL Server, MySQL 8+, and Redshift use Recursive Common Table Expressions (CTEs) instead of CONNECT BY.

A recursive CTE has two parts:

✔ 1. Anchor Member

Selects the root element.

✔ 2. Recursive Member

Selects children and links them to the previous result.The query keeps calling itself until there are no more children.

What Is a Recursive CTE?

A recursive CTE is a SQL expression that references itself to repeatedly process hierarchical data.Think of it as a loop inside SQL that keeps running until all levels of the hierarchy have been processed.

Structure of Recursive CTE:

WITH RECURSIVE cte_name AS (
    -- Anchor (root)
    SELECT ...

    UNION ALL

    -- Recursive member
    SELECT ...
    FROM table
    JOIN cte_name
)
SELECT * FROM cte_name;

The recursion stops automatically when the query finds no more rows.

Practical Example: Employee Reporting Hierarchy

Let’s build a complete example using PostgreSQL.


⭐ 1. Create Table (PostgreSQL)

CREATE TABLE employees (
    emp_id       INT PRIMARY KEY,
    emp_name     VARCHAR(100) NOT NULL,
    manager_id   INT NULL,
    CONSTRAINT fk_manager FOREIGN KEY (manager_id)
        REFERENCES employees(emp_id)
        ON DELETE SET NULL
);

⭐ 2. Insert Sample Data

INSERT INTO employees (emp_id, emp_name, manager_id) VALUES
(1, 'John',   NULL),   -- CEO
(2, 'Amit',     1),    -- Reports to John
(3, 'Priya',    1),    -- Reports to John
(4, 'Ravi',     2),    -- Reports to Amit
(5, 'Neha',     2),    -- Reports to Amit
(6, 'Sam',      3);    -- Reports to Priya

Hierarchy visually:

John (CEO)
 ├── Amit
 │     ├── Ravi
 │     └── Neha
 └── Priya
       └── Sam

Recursive CTE: Fetch the Entire Reporting Hierarchy

WITH RECURSIVE emp_tree AS (
    -- Step 1: Base/Anchor (CEO)
    SELECT 
        emp_id,
        emp_name,
        manager_id,
        1 AS level
    FROM employees
    WHERE manager_id IS NULL

    UNION ALL

    -- Step 2: Recursive step (children)
    SELECT 
        e.emp_id,
        e.emp_name,
        e.manager_id,
        t.level + 1
    FROM employees e
    JOIN emp_tree t ON e.manager_id = t.emp_id
)
SELECT * 
FROM emp_tree
ORDER BY level, emp_id;

✔ Output

level | emp_id | emp_name | manager_id
----------------------------------------
1     | 1      | John     | NULL
2     | 2      | Amit     | 1
2     | 3      | Priya    | 1
3     | 4      | Ravi     | 2
3     | 5      | Neha     | 2
3     | 6      | Sam      | 3

This reveals the full chain of command from the CEO down to the last-level employees.


Simple Query: Who Reports to Whom?

SELECT 
    e.emp_name AS employee,
    m.emp_name AS manager
FROM employees e
LEFT JOIN employees m
       ON e.manager_id = m.emp_id
ORDER BY manager, employee;

✔ Output

employee | manager
-----------------------
John     | NULL
Amit     | John
Priya    | John
Ravi     | Amit
Neha     | Amit
Sam      | Priya

This query is helpful for building reporting dashboards or employee directories.

When Should You Use Hierarchical Queries?

Use hierarchical queries when:

✔ Your data has multiple levels
✔ Relationships depend on parent–child logic
✔ You need to compute org charts or drill-down structures
✔ Data naturally forms a tree

Typical business scenarios include:

  • HR reporting systems
  • E-commerce category trees
  • File directory explorers
  • Manufacturing BOM structures
  • Company reporting hierarchy

Recursive CTE vs. Oracle CONNECT BY

FeatureRecursive CTECONNECT BY
DatabasesPostgreSQL, MySQL 8+, SQL ServerOracle
DepthUnlimitedUnlimited
SyntaxModern, flexibleNative, optimized
UsageUniversal SQL standardOracle-specific

Both are reliable and powerful — choose based on your database.

Hierarchical queries are one of the most powerful SQL techniques for working with tree-structured data. Whether you’re using Oracle’s CONNECT BY or PostgreSQL’s Recursive CTEs, the goal remains the same: efficiently navigate parent–child relationships.

Understanding hierarchical queries gives you the ability to:

  • Build dynamic organization charts
  • Process nested categories
  • Design powerful ETL pipelines
  • Work with multi-level reporting structures

As data becomes more complex, mastering recursive SQL will make you a stronger data engineer or ETL developer.

Also Read:American Express SQL Interview Questions

Article Categories:
ETL · SQL

All Comments

Leave a Reply

Your email address will not be published. Required fields are marked *