American Express SQL Question
40
Views

American Express interview questions test more than syntax. You’ll be judged on SQL fluency, data quality thinking, and your ability to turn numbers into business insight. For candidates with 0–3 years of experience, compensation can reach 17–19 LPA, so strong preparation pays off. This guide covers practical SQL patterns, analytics scenarios, and clear explanations you can adapt during interviews.

1) How to find duplicate rows

In order to find the duplicate records and rows in a table we can use the below sql query.

SELECT col1, col2, COUNT(*) AS dup_count
FROM your_table
GROUP BY col1, col2
HAVING COUNT(*) > 1;

Explanation:

  • GROUP BY col1, col2 puts rows with the same (col1, col2) values into the same bucket.
  • COUNT(*) counts how many rows fall into each bucket.
  • HAVING COUNT(*) > 1 filters to only those buckets that appear more than once (i.e., duplicates).

de-dup variant (keep 1 row, mark the rest)

WITH ranked AS (
  SELECT *,
         ROW_NUMBER() OVER (PARTITION BY col1, col2 ORDER BY created_at) AS rn
  FROM your_table
)
SELECT * FROM ranked WHERE rn > 1; -- these are the extras

2) Write a SQL query to find the employee who earns more than manager (self-join)

In order to find the employee who earns more than a manager, we can use self-join and SQL is as below.

SELECT e.name AS employee, e.salary, m.name AS manager, m.salary AS manager_salary
FROM employees e
JOIN employees m ON e.manager_id = m.emp_id
WHERE e.salary > m.salary;

Explanation

  • employees is joined to itself: e = employee row; m = manager row.
  • the link is e.manager_id = m.emp_id.
  • the WHERE compares salaries and keeps cases where the employee’s is higher.

3) Find Second-highest salary (distinct)

We can use the aggregation on salary by doing the max(salary) to find the second highest salary, we can use the below query.

SELECT MAX(salary)
FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees);

Explanation:

  • inner subquery gets the overall highest salary max.
  • outer query finds the maximum salary strictly less than max → the “second distinct” highest.

Points to Ponder

  • if the table has only one distinct salary, the result is NULL.
  • if you need “2nd place incl. ties”, use DENSE_RANK():
SELECT salary
FROM (
  SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS r
  FROM employees
) s
WHERE r = 2;

4) What GROUP BY + HAVING is doing?

SELECT department_id, COUNT(*) AS emp_count
FROM employees
GROUP BY department_id
HAVING COUNT(*) > 5;

Explanation:

  • groups by department and counts employees.
  • HAVING filters the aggregated result (not individual rows) to departments with more than 5 employees.

5) Window running total (cumulative sales)

SELECT order_date, product_id, sales_amount,
       SUM(sales_amount) OVER (PARTITION BY product_id ORDER BY order_date) AS cumulative_sales
FROM sales;

Explanation:

  • PARTITION BY product_id: restart the running sum for each product.
  • ORDER BY order_date: accumulate chronologically.
  • returns one row per original row (unlike GROUP BY which collapses rows).

Points to remember

  • duplicate timestamps can cause non-deterministic order; add a tie-breaker (ORDER BY order_date, order_id).
  • some DBs use a default frame; if needed, be explicit:
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.

6) Window ranking (ROW_NUMBER vs RANK)

SELECT name, department, salary,
       ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS row_num,
       RANK()       OVER (PARTITION BY department ORDER BY salary DESC) AS rank_num
FROM employees;

Explanation:

  • ROW_NUMBER → will give unique records with the sequence as 1,2,3,4.. always
  • RANK → will give ties and share the same rank (1,1,3,4… skips numbers).However, whenever there is a tie, it will skip the next record.

use cases

  • ROW_NUMBER for picking a single “top” row per group.
  • RANK when ties matter.

7) Find Top-3 products from Sales (basic + ties)

SELECT product_id, product_name, total_sales
FROM sales_data
ORDER BY total_sales DESC
LIMIT 3;

what it’s doing

  • sorts by total_sales and takes first 3.

ties version

SELECT product_id, product_name, total_sales
FROM (
  SELECT *, RANK() OVER (ORDER BY total_sales DESC) AS r
  FROM sales_data
) s
WHERE r <= 3;

8) UNION vs UNION ALL

SELECT city FROM customers
UNION
SELECT city FROM vendors;

what it’s doing

  • UNION concatenates results and removes duplicates (sort/dedup step).
SELECT city FROM customers
UNION ALL
SELECT city FROM vendors;
  • UNION ALL keeps duplicates and is faster.

rule of thumb

  • if duplicates are meaningful to the analysis (e.g., total counts), use UNION ALL.
  • if you need a distinct list, use UNION.

9) CASE (conditional buckets)

SELECT name, salary,
CASE
  WHEN salary >= 100000 THEN 'High'
  WHEN salary >=  50000 THEN 'Medium'
  ELSE 'Low'
END AS salary_band
FROM employees;

what it’s doing

  • evaluates conditions top-down and returns the first matching label.

tips

  • you can put CASE in SELECT, WHERE, and ORDER BY.
  • always add an ELSE for clarity.

10) CTE (Common Table Expression)

WITH HighEarners AS (
  SELECT emp_id, name, salary
  FROM employees
  WHERE salary > 100000
)
SELECT * FROM HighEarners;

why use it

  • breaks complex logic into named steps (readability).
  • can be reused within the same query.
  • supports recursion (hierarchies, date series).

performance note

  • in most modern engines, non-recursive CTEs are inlined (similar to subqueries), but check plans.

11) Customers with > ₹5,000 transactions more than once

SELECT customer_id, COUNT(*) AS high_value_txns
FROM transactions
WHERE transaction_amount > 5000
GROUP BY customer_id
HAVING COUNT(*) > 1;

what it’s doing

  • filters to “high value” rows first.
  • groups by customer and keeps those with count > 1.

pitfall

  • currency normalization (₹ vs $) and outliers; you can also design thresholds by percentile.

12) DELETE vs TRUNCATE (when to use)

DELETE

  • removes chosen rows (WHERE), logs each change, supports rollback.
  • use for targeted cleanup.

TRUNCATE

  • drops all rows quickly, resets identity in many DBs, minimal logging.
  • use for full table refresh when safe.

13) Find No purchases in last 6 months (anti-join pattern)

SELECT c.customer_id, c.name
FROM customers c
LEFT JOIN transactions t
  ON c.customer_id = t.customer_id
 AND t.transaction_date >= CURRENT_DATE - INTERVAL '6 months'
WHERE t.customer_id IS NULL;

what it’s doing

  • LEFT JOIN keeps all customers.
  • the WHERE ... IS NULL means: “no matching transactions in the last 6 months”.

dialect notes

  • MySQL: CURRENT_DATE - INTERVAL 6 MONTH
  • Postgres: CURRENT_DATE - INTERVAL '6 months'
  • SQL Server: DATEADD(month, -6, GETDATE())

index tip

  • index (customer_id, transaction_date) for pruning.

14) How to Handle NULLs in SQL data.

  • detect: col IS NULL / col IS NOT NULL
  • replace: COALESCE(col, 'Not Provided') (MySQL: IFNULL)
  • aggregates ignore NULLs by default.

example

SELECT name, COALESCE(phone, 'Not Provided') AS phone_display
FROM customers;

15) How you can Convert rows to column in SQL

Pivot rows → columns (conditional aggregation)

SELECT region,
       SUM(CASE WHEN month = 'Jan' THEN sales_amount ELSE 0 END) AS Jan,
       SUM(CASE WHEN month = 'Feb' THEN sales_amount ELSE 0 END) AS Feb,
       SUM(CASE WHEN month = 'Mar' THEN sales_amount ELSE 0 END) AS Mar
FROM sales
GROUP BY region;

what it’s doing

  • turns month values into separate columns via conditional sums.

dialect

  • SQL Server/Oracle also have PIVOT syntax; Postgres/MySQL typically use this conditional approach.

16) What is Indexing in SQL, write the syntax

CREATE INDEX idx_transactions_customer_id
ON transactions(customer_id);

what it’s doing

  • builds a sorted/searchable structure (often B-tree) on customer_id, so lookups and joins don’t scan the whole table.

best practices

  • index columns used in WHERE, JOIN, ORDER BY, GROUP BY.
  • avoid indexing very low-cardinality columns (e.g., boolean).
  • composite indexes should match your common predicate order.
  • more indexes → slower writes. strike a balance.

17) Find the Max transaction per customer

SELECT customer_id, MAX(amount) AS max_transaction
FROM transactions
GROUP BY customer_id;

Explanation:
  • groups transactions per customer and picks the largest amount.

if you also need the transaction row

SELECT customer_id, amount, transaction_id
FROM (
  SELECT *,
         ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY amount DESC, transaction_id) AS rn
  FROM transactions
) t
WHERE rn = 1;

18) Write a Query to find the Empoyee reporting to a Manager

Self-join (general pattern)

SELECT e.name AS employee, m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.emp_id;

19) Most profitable regions (revenue example)

SELECT region, SUM(amount) AS total_revenue
FROM transactions
GROUP BY region
ORDER BY total_revenue DESC
LIMIT 3;

profit variant

  • if you have costs, compute SUM(amount - cost) or SUM(amount) - SUM(cost).

21) OLTP vs OLAP

  • OLTP: normalized, small fast transactions (banking/e-commerce apps).
  • OLAP: denormalized star/snowflake, heavy analytical queries (BI dashboards).
  • pipelines move data from OLTP → OLAP periodically.

22) LEFT JOIN vs INNER JOIN (when LEFT is better)

SELECT c.customer_id, c.name, t.transaction_id, t.amount
FROM customers c
LEFT JOIN transactions t ON c.customer_id = t.customer_id;

why LEFT JOIN

  • you want to see all customers even if they’ve never transacted.
  • INNER JOIN would drop the “no-transaction” customers.

23) Query optimization

  • Projection: select only needed columns (reduces I/O).
  • Predicates: sargable conditions so indexes can be used (avoid FUNCTION(col) in WHERE).
  • Indexes: on join/filter/sort columns; composite in correct order.
  • Joins: reduce early (filter before big joins), avoid unnecessary nesting.
  • Plans: use EXPLAIN/EXPLAIN ANALYZE to spot scans/hotspots.
  • Partitions: date partition large facts for pruning.

24) UNION/ALL + DISTINCT salary 2nd place (bonus tie-aware)

SELECT salary
FROM (
  SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS r
  FROM employees
) s
WHERE r = 2;

Also Read: Interview Question: Explain Snowflake Architecture
Article Categories:
ETL · SQL

Leave a Reply

Your email address will not be published. Required fields are marked *