Recently, I had the opportunity to appear for a Walk-in drive conducted by NAB (National Australia Bank) at Gurugram for the Data Engineer position. As someone deeply passionate about SQL and large-scale data transformation pipelines, I was eager to test my skills in a real-world setting. One part of the selection process included a Codility-based assessment, which was designed to evaluate technical thinking and coding skills specifically in SQL logic construction and optimization.
Problem Statment
Note:Though I don’t recall the exact wording of the problem, the structure and logic were very similar to the following scenario:
A passenger wants to travel from New York to Tokyo in the shortest possible time.
They may:
- Take a direct flight, or
- Make one stopover, i.e., at most one connecting flight is allowed.
The conditions for a valid journey are:
- The journey must start at any airport in New York and end at any airport in Tokyo.
- If using two flights, the following must be true:
- The arrival airport of the first flight must match the departure airport of the second flight.
- The end time of the first flight must be less than or equal to the start time of the second flight.
You’re given two tables:
airports
: Containscity_name
andport_code
.flights
: Contains details likestart_time
,end_time
,start_port
,end_port
.
CREATE TABLE airports (
port_code VARCHAR(10) PRIMARY KEY,
city_name VARCHAR(100)
);
CREATE TABLE flights (
flight_id varchar (10),
start_port VARCHAR(10),
end_port VARCHAR(10),
start_time datetime,
end_time datetime
);
INSERT INTO airports (port_code, city_name) VALUES
('JFK', 'New York'),
('LGA', 'New York'),
('EWR', 'New York'),
('LAX', 'Los Angeles'),
('ORD', 'Chicago'),
('SFO', 'San Francisco'),
('HND', 'Tokyo'),
('NRT', 'Tokyo'),
('KIX', 'Osaka');
INSERT INTO flights VALUES
(1, 'JFK', 'HND', '2025-06-15 06:00', '2025-06-15 18:00'),
(2, 'JFK', 'LAX', '2025-06-15 07:00', '2025-06-15 10:00'),
(3, 'LAX', 'NRT', '2025-06-15 10:00', '2025-06-15 22:00'),
(4, 'JFK', 'LAX', '2025-06-15 08:00', '2025-06-15 11:00'),
(5, 'LAX', 'KIX', '2025-06-15 11:30', '2025-06-15 22:00'),
(6, 'LGA', 'ORD', '2025-06-15 09:00', '2025-06-15 12:00'),
(7, 'ORD', 'HND', '2025-06-15 11:30', '2025-06-15 23:30'),
(8, 'EWR', 'SFO', '2025-06-15 09:00', '2025-06-15 12:00'),
(9, 'LAX', 'HND', '2025-06-15 13:00', '2025-06-15 23:00'),
(10, 'KIX', 'NRT', '2025-06-15 08:00', '2025-06-15 10:00');
Solution Approach
- Identify relevant airports for New York and Tokyo using the
airports
table. - Filter flights that start in New York and end in Tokyo (direct).
- Join flights to find valid connections (f1 → f2), ensuring:
- f1’s arrival port matches f2’s departure port.
- f1’s
end_time
is ≤ f2’sstart_time
.
- Calculate total travel time in minutes using Redshift’s
DATEDIFF(MINUTE, ...)
function. - Return the minimum travel time from either direct or connected routes.
SQL Solution
WITH all_flight AS (
SELECT
fl.*,
s.city_name AS start_city,
e.city_name AS end_city
FROM flights fl
INNER JOIN airports s
ON fl.start_port = s.port_code
INNER JOIN airports e
ON fl.end_port = e.port_code
),
direct_flight AS (
SELECT
flight_id,
start_city,
NULL AS middle_city,
end_city,
start_time,
end_time,
(EXTRACT(EPOCH FROM (end_time::timestamp - start_time::timestamp)) / 60)::INT AS duration_of_flight
FROM all_flight
WHERE start_city = 'New York'
AND end_city = 'Tokyo'
),
connecting_flight AS (
SELECT
a.flight_id || ';' || b.flight_id AS connecting_flight,
a.start_city AS trip_start_city,
a.end_city AS trip_middle_city,
b.end_city AS trip_end_city,
a.start_time AS first_flight_start,
a.end_time AS first_flight_end,
b.start_time AS second_flight_start,
b.end_time AS second_flight_end,
(EXTRACT(EPOCH FROM (b.end_time::timestamp - a.start_time::timestamp)) / 60)::INT AS duration_of_flight
FROM (
SELECT *
FROM all_flight
WHERE start_city = 'New York'
) a
INNER JOIN (
SELECT *
FROM all_flight
WHERE end_city = 'Tokyo'
) b
ON a.end_city = b.start_port
WHERE b.start_time >= a.end_time
),
shortest_flight AS (
SELECT
flight_id AS connecting_flight,
start_city AS trip_start_city,
middle_city AS trip_middle_city,
end_city AS trip_end_city,
duration_of_flight
FROM direct_flight
UNION ALL
SELECT
connecting_flight,
trip_start_city,
trip_middle_city,
trip_end_city,
duration_of_flight
FROM connecting_flight
)
SELECT *
FROM shortest_flight
ORDER BY duration_of_flight

Final Thoughts
Even though I did not cracked the final round, this walk-in drive by NAB was an enriching experience. It reaffirmed my passion for data engineering logic and SQL performance tuning. I’m continuing to prepare for more such challenges and hope this article helps fellow engineers sharpen their SQL muscles.
Pingback: Interview Question: Explain Snowflake Architecture - Tank Seekers
ok