American Express interview questions test more than syntax. You’ll be judged on SQL fluency, data quality thinking, and your ability to turn numbers into business insight. For candidates with 0–3 years of experience, compensation can reach 17–19 LPA, so strong preparation pays off. This guide covers practical SQL patterns, analytics scenarios, and clear explanations you can adapt during interviews.
1) How to find duplicate rows
In order to find the duplicate records and rows in a table we can use the below sql query.
SELECT col1, col2, COUNT(*) AS dup_count
FROM your_table
GROUP BY col1, col2
HAVING COUNT(*) > 1;
Explanation:
GROUP BY col1, col2
puts rows with the same(col1, col2)
values into the same bucket.COUNT(*)
counts how many rows fall into each bucket.HAVING COUNT(*) > 1
filters to only those buckets that appear more than once (i.e., duplicates).
de-dup variant (keep 1 row, mark the rest)
WITH ranked AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY col1, col2 ORDER BY created_at) AS rn
FROM your_table
)
SELECT * FROM ranked WHERE rn > 1; -- these are the extras
2) Write a SQL query to find the employee who earns more than manager (self-join)
In order to find the employee who earns more than a manager, we can use self-join and SQL is as below.
SELECT e.name AS employee, e.salary, m.name AS manager, m.salary AS manager_salary
FROM employees e
JOIN employees m ON e.manager_id = m.emp_id
WHERE e.salary > m.salary;
Explanation
employees
is joined to itself:e
= employee row;m
= manager row.- the link is
e.manager_id = m.emp_id
. - the
WHERE
compares salaries and keeps cases where the employee’s is higher.
3) Find Second-highest salary (distinct)
We can use the aggregation on salary by doing the max(salary) to find the second highest salary, we can use the below query.
SELECT MAX(salary)
FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees);
Explanation:
- inner subquery gets the overall highest salary
max
. - outer query finds the maximum salary strictly less than
max
→ the “second distinct” highest.
Points to Ponder
- if the table has only one distinct salary, the result is
NULL
. - if you need “2nd place incl. ties”, use
DENSE_RANK()
:
SELECT salary
FROM (
SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS r
FROM employees
) s
WHERE r = 2;
4) What GROUP BY + HAVING is doing?
SELECT department_id, COUNT(*) AS emp_count
FROM employees
GROUP BY department_id
HAVING COUNT(*) > 5;
Explanation:
- groups by department and counts employees.
HAVING
filters the aggregated result (not individual rows) to departments with more than 5 employees.
5) Window running total (cumulative sales)
SELECT order_date, product_id, sales_amount,
SUM(sales_amount) OVER (PARTITION BY product_id ORDER BY order_date) AS cumulative_sales
FROM sales;
Explanation:
PARTITION BY product_id
: restart the running sum for each product.ORDER BY order_date
: accumulate chronologically.- returns one row per original row (unlike
GROUP BY
which collapses rows).
Points to remember
- duplicate timestamps can cause non-deterministic order; add a tie-breaker (
ORDER BY order_date, order_id
). - some DBs use a default frame; if needed, be explicit:
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
.
6) Window ranking (ROW_NUMBER vs RANK)
SELECT name, department, salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS row_num,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rank_num
FROM employees;
Explanation:
ROW_NUMBER
→ will give unique records with the sequence as 1,2,3,4.. alwaysRANK
→ will give ties and share the same rank (1,1,3,4… skips numbers).However, whenever there is a tie, it will skip the next record.
use cases
ROW_NUMBER
for picking a single “top” row per group.RANK
when ties matter.
7) Find Top-3 products from Sales (basic + ties)
SELECT product_id, product_name, total_sales
FROM sales_data
ORDER BY total_sales DESC
LIMIT 3;
what it’s doing
- sorts by
total_sales
and takes first 3.
ties version
SELECT product_id, product_name, total_sales
FROM (
SELECT *, RANK() OVER (ORDER BY total_sales DESC) AS r
FROM sales_data
) s
WHERE r <= 3;
8) UNION vs UNION ALL
SELECT city FROM customers
UNION
SELECT city FROM vendors;
what it’s doing
UNION
concatenates results and removes duplicates (sort/dedup step).
SELECT city FROM customers
UNION ALL
SELECT city FROM vendors;
UNION ALL
keeps duplicates and is faster.
rule of thumb
- if duplicates are meaningful to the analysis (e.g., total counts), use
UNION ALL
. - if you need a distinct list, use
UNION
.
9) CASE (conditional buckets)
SELECT name, salary,
CASE
WHEN salary >= 100000 THEN 'High'
WHEN salary >= 50000 THEN 'Medium'
ELSE 'Low'
END AS salary_band
FROM employees;
what it’s doing
- evaluates conditions top-down and returns the first matching label.
tips
- you can put
CASE
inSELECT
,WHERE
, andORDER BY
. - always add an
ELSE
for clarity.
10) CTE (Common Table Expression)
WITH HighEarners AS (
SELECT emp_id, name, salary
FROM employees
WHERE salary > 100000
)
SELECT * FROM HighEarners;
why use it
- breaks complex logic into named steps (readability).
- can be reused within the same query.
- supports recursion (hierarchies, date series).
performance note
- in most modern engines, non-recursive CTEs are inlined (similar to subqueries), but check plans.
11) Customers with > ₹5,000 transactions more than once
SELECT customer_id, COUNT(*) AS high_value_txns
FROM transactions
WHERE transaction_amount > 5000
GROUP BY customer_id
HAVING COUNT(*) > 1;
what it’s doing
- filters to “high value” rows first.
- groups by customer and keeps those with count > 1.
pitfall
- currency normalization (₹ vs $) and outliers; you can also design thresholds by percentile.
12) DELETE vs TRUNCATE (when to use)
DELETE
- removes chosen rows (
WHERE
), logs each change, supports rollback. - use for targeted cleanup.
TRUNCATE
- drops all rows quickly, resets identity in many DBs, minimal logging.
- use for full table refresh when safe.
13) Find No purchases in last 6 months (anti-join pattern)
SELECT c.customer_id, c.name
FROM customers c
LEFT JOIN transactions t
ON c.customer_id = t.customer_id
AND t.transaction_date >= CURRENT_DATE - INTERVAL '6 months'
WHERE t.customer_id IS NULL;
what it’s doing
LEFT JOIN
keeps all customers.- the
WHERE ... IS NULL
means: “no matching transactions in the last 6 months”.
dialect notes
- MySQL:
CURRENT_DATE - INTERVAL 6 MONTH
- Postgres:
CURRENT_DATE - INTERVAL '6 months'
- SQL Server:
DATEADD(month, -6, GETDATE())
index tip
- index
(customer_id, transaction_date)
for pruning.
14) How to Handle NULLs in SQL data.
- detect:
col IS NULL
/col IS NOT NULL
- replace:
COALESCE(col, 'Not Provided')
(MySQL:IFNULL
) - aggregates ignore NULLs by default.
example
SELECT name, COALESCE(phone, 'Not Provided') AS phone_display
FROM customers;
15) How you can Convert rows to column in SQL
Pivot rows → columns (conditional aggregation)
SELECT region,
SUM(CASE WHEN month = 'Jan' THEN sales_amount ELSE 0 END) AS Jan,
SUM(CASE WHEN month = 'Feb' THEN sales_amount ELSE 0 END) AS Feb,
SUM(CASE WHEN month = 'Mar' THEN sales_amount ELSE 0 END) AS Mar
FROM sales
GROUP BY region;
what it’s doing
- turns month values into separate columns via conditional sums.
dialect
- SQL Server/Oracle also have
PIVOT
syntax; Postgres/MySQL typically use this conditional approach.
16) What is Indexing in SQL, write the syntax
CREATE INDEX idx_transactions_customer_id
ON transactions(customer_id);
what it’s doing
- builds a sorted/searchable structure (often B-tree) on
customer_id
, so lookups and joins don’t scan the whole table.
best practices
- index columns used in
WHERE
,JOIN
,ORDER BY
,GROUP BY
. - avoid indexing very low-cardinality columns (e.g., boolean).
- composite indexes should match your common predicate order.
- more indexes → slower writes. strike a balance.
17) Find the Max transaction per customer
SELECT customer_id, MAX(amount) AS max_transaction
FROM transactions
GROUP BY customer_id;
Explanation:
- groups transactions per customer and picks the largest amount.
if you also need the transaction row
SELECT customer_id, amount, transaction_id
FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY amount DESC, transaction_id) AS rn
FROM transactions
) t
WHERE rn = 1;
18) Write a Query to find the Empoyee reporting to a Manager
Self-join (general pattern)
SELECT e.name AS employee, m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.emp_id;
19) Most profitable regions (revenue example)
SELECT region, SUM(amount) AS total_revenue
FROM transactions
GROUP BY region
ORDER BY total_revenue DESC
LIMIT 3;
profit variant
- if you have costs, compute
SUM(amount - cost)
orSUM(amount) - SUM(cost)
.
21) OLTP vs OLAP
- OLTP: normalized, small fast transactions (banking/e-commerce apps).
- OLAP: denormalized star/snowflake, heavy analytical queries (BI dashboards).
- pipelines move data from OLTP → OLAP periodically.
22) LEFT JOIN vs INNER JOIN (when LEFT is better)
SELECT c.customer_id, c.name, t.transaction_id, t.amount
FROM customers c
LEFT JOIN transactions t ON c.customer_id = t.customer_id;
why LEFT JOIN
- you want to see all customers even if they’ve never transacted.
- INNER JOIN would drop the “no-transaction” customers.
23) Query optimization
- Projection: select only needed columns (reduces I/O).
- Predicates: sargable conditions so indexes can be used (avoid
FUNCTION(col)
inWHERE
). - Indexes: on join/filter/sort columns; composite in correct order.
- Joins: reduce early (filter before big joins), avoid unnecessary nesting.
- Plans: use
EXPLAIN
/EXPLAIN ANALYZE
to spot scans/hotspots. - Partitions: date partition large facts for pruning.
24) UNION/ALL + DISTINCT salary 2nd place (bonus tie-aware)
SELECT salary
FROM (
SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS r
FROM employees
) s
WHERE r = 2;
Also Read: Interview Question: Explain Snowflake Architecture