110
Views

Recently, I had the opportunity to appear for a Walk-in drive conducted by NAB (National Australia Bank) at Gurugram for the Data Engineer position. As someone deeply passionate about SQL and large-scale data transformation pipelines, I was eager to test my skills in a real-world setting. One part of the selection process included a Codility-based assessment, which was designed to evaluate technical thinking and coding skills specifically in SQL logic construction and optimization.


Problem Statment

Note:Though I don’t recall the exact wording of the problem, the structure and logic were very similar to the following scenario:

A passenger wants to travel from New York to Tokyo in the shortest possible time.
They may:

  • Take a direct flight, or
  • Make one stopover, i.e., at most one connecting flight is allowed.

The conditions for a valid journey are:

  1. The journey must start at any airport in New York and end at any airport in Tokyo.
  2. If using two flights, the following must be true:
    • The arrival airport of the first flight must match the departure airport of the second flight.
    • The end time of the first flight must be less than or equal to the start time of the second flight.

You’re given two tables:

  • airports: Contains city_name and port_code.
  • flights: Contains details like start_time, end_time, start_port, end_port.
CREATE TABLE airports (
    port_code VARCHAR(10) PRIMARY KEY,
    city_name VARCHAR(100)
);

CREATE TABLE flights (
    flight_id varchar (10),
    start_port VARCHAR(10),
    end_port VARCHAR(10),
    start_time datetime,
    end_time datetime
);


INSERT INTO airports (port_code, city_name) VALUES
('JFK', 'New York'),
('LGA', 'New York'),
('EWR', 'New York'),
('LAX', 'Los Angeles'),
('ORD', 'Chicago'),
('SFO', 'San Francisco'),
('HND', 'Tokyo'),
('NRT', 'Tokyo'),
('KIX', 'Osaka');


INSERT INTO flights VALUES
(1, 'JFK', 'HND', '2025-06-15 06:00', '2025-06-15 18:00'),
(2, 'JFK', 'LAX', '2025-06-15 07:00', '2025-06-15 10:00'),
(3, 'LAX', 'NRT', '2025-06-15 10:00', '2025-06-15 22:00'),
(4, 'JFK', 'LAX', '2025-06-15 08:00', '2025-06-15 11:00'),
(5, 'LAX', 'KIX', '2025-06-15 11:30', '2025-06-15 22:00'),
(6, 'LGA', 'ORD', '2025-06-15 09:00', '2025-06-15 12:00'),
(7, 'ORD', 'HND', '2025-06-15 11:30', '2025-06-15 23:30'),
(8, 'EWR', 'SFO', '2025-06-15 09:00', '2025-06-15 12:00'),
(9, 'LAX', 'HND', '2025-06-15 13:00', '2025-06-15 23:00'),
(10, 'KIX', 'NRT', '2025-06-15 08:00', '2025-06-15 10:00');

Solution Approach

  1. Identify relevant airports for New York and Tokyo using the airports table.
  2. Filter flights that start in New York and end in Tokyo (direct).
  3. Join flights to find valid connections (f1 → f2), ensuring:
    • f1’s arrival port matches f2’s departure port.
    • f1’s end_time is f2’s start_time.
  4. Calculate total travel time in minutes using Redshift’s DATEDIFF(MINUTE, ...) function.
  5. Return the minimum travel time from either direct or connected routes.

SQL Solution

WITH all_flight AS (
    SELECT 
        fl.*,
        s.city_name AS start_city,
        e.city_name AS end_city
    FROM flights fl
    INNER JOIN airports s 
        ON fl.start_port = s.port_code
    INNER JOIN airports e 
        ON fl.end_port = e.port_code
),

direct_flight AS (
    SELECT 
        flight_id,
        start_city,
        NULL AS middle_city,
        end_city,
        start_time,
        end_time,
        (EXTRACT(EPOCH FROM (end_time::timestamp - start_time::timestamp)) / 60)::INT AS duration_of_flight
    FROM all_flight
    WHERE start_city = 'New York' 
      AND end_city = 'Tokyo'
),

connecting_flight AS (
    SELECT 
        a.flight_id || ';' || b.flight_id AS connecting_flight,
        a.start_city AS trip_start_city,
        a.end_city AS trip_middle_city,
        b.end_city AS trip_end_city,
        a.start_time AS first_flight_start,
        a.end_time AS first_flight_end,
        b.start_time AS second_flight_start,
        b.end_time AS second_flight_end,
        (EXTRACT(EPOCH FROM (b.end_time::timestamp - a.start_time::timestamp)) / 60)::INT AS duration_of_flight
    FROM (
        SELECT * 
        FROM all_flight 
        WHERE start_city = 'New York'
    ) a
    INNER JOIN (
        SELECT * 
        FROM all_flight 
        WHERE end_city = 'Tokyo'
    ) b 
        ON a.end_city = b.start_port
    WHERE b.start_time >= a.end_time
),

shortest_flight AS (
    SELECT 
        flight_id AS connecting_flight,
        start_city AS trip_start_city,
        middle_city AS trip_middle_city,
        end_city AS trip_end_city,
        duration_of_flight
    FROM direct_flight

    UNION ALL

    SELECT 
        connecting_flight,
        trip_start_city,
        trip_middle_city,
        trip_end_city,
        duration_of_flight
    FROM connecting_flight
)

SELECT *
FROM shortest_flight
ORDER BY duration_of_flight

  

Final Thoughts

Even though I did not cracked the final round, this walk-in drive by NAB was an enriching experience. It reaffirmed my passion for data engineering logic and SQL performance tuning. I’m continuing to prepare for more such challenges and hope this article helps fellow engineers sharpen their SQL muscles.

Article Categories:
ETL · SQL

All Comments

Leave a Reply

Your email address will not be published. Required fields are marked *