ETL Developer Interview Questions
13
Views

My Experience at Gemini Solutions, Gurugram

Recently, I appeared for an ETL Developer interview at Gemini Solutions in Gurugram. The interview was a blend of technical concepts, SQL problem-solving, and practical data engineering knowledge. Sharing my experience here might help others preparing for similar roles, as the questions I faced tested both theory and real-world application.

Photo Credit: Gemini Solutions

1. Tell me about yourself

I focused on giving a short professional introduction, covering my experience with ETL tools like Informatica, my knowledge of cloud data warehouses such as Snowflake and Redshift, and the kind of projects I have worked on—like migrating ETL pipelines from on-premises systems to cloud platforms. I also highlighted my problem-solving abilities, collaboration with cross-functional teams, and the business domains I had exposure to, such as insurance and retail.

2. What is the difference between OLAP and OLTP? How are you implementing them?

The next question was about distinguishing OLAP (Online Analytical Processing) from OLTP (Online Transaction Processing). I explained that OLTP systems are designed for transactional workloads like order entries, payments, and reservations. These systems are optimized for quick inserts and updates. On the other hand, OLAP systems are designed for analysis, where large volumes of data are aggregated and queried for insights. In ETL processes, OLTP systems often serve as the source, while OLAP systems become the target for building data warehouses and reporting solutions. I also gave examples from projects where we extracted data from transactional systems and transformed it into star schema models for OLAP reporting.

3. What is a fact and a dimension? How are you segregating the data?

I explained that fact tables store measurable and quantitative data such as revenue, sales, or transaction amounts, while dimension tables store descriptive attributes such as customer, product, or date. To make it clear, I gave the example of an online shopping event where the purchase amount goes into the fact table, and the customer and product details go into dimension tables. This question emphasized the importance of structuring data in a way that supports analytical queries efficiently.

4. How many different types of dimensions are there?

The interviewer then moved deeper into dimensional modeling by asking about the different types of dimensions. I explained the concepts of conformed dimensions (shared across multiple fact tables), degenerate dimensions (keys stored in fact tables, such as invoice numbers), junk dimensions (a combination of miscellaneous attributes like flags and indicators), and slowly changing dimensions (SCDs) that manage historical data changes. I also highlighted the difference between SCD Type 1, Type 2, and Type 3, as these are often used in real projects.

5. Write a SQL query to find a manager with 5 employees reporting to them

One of the more interesting SQL challenges was based on the Employee table. The task was to identify managers who had exactly five employees reporting to them. Looking at the sample dataset, John (id 101) is the manager for Dan, James, Amy, Anne, and Ron—five employees in total.

To solve this, I explained that we can use a self-join on the Employee table, grouping by manager and counting their direct reports. The query looked like this:

SELECT m.name AS Manager_Name, COUNT(e.id) AS Employee_Count
FROM Employee e
JOIN Employee m ON e.managerId = m.id
GROUP BY m.name
HAVING COUNT(e.id) = 5;

This query returns “John” as the result, since he has exactly five employees under him. I also added that if the requirement were to find managers with at least five employees, we could simply change the condition to HAVING COUNT(e.id) >= 5.

6. SQL Puzzle: Find consecutive numbers in Logs

Another SQL puzzle involved a table named Logs with columns id (auto-incremented primary key) and num. The task was to find all numbers that appear at least three times consecutively. For example, if the table contained three consecutive rows of 1, then the result should return 1.

To solve this, I used a self-join approach, where the table is joined with itself to check three consecutive records:

SELECT DISTINCT l1.num
FROM Logs l1, Logs l2, Logs l3
WHERE l1.num = l2.num
  AND l2.num = l3.num
  AND l1.id = l2.id - 1
  AND l2.id = l3.id - 1;

7. How would you partition 120 million records in Snowflake?

With such a huge dataset, partitioning is critical for performance. I explained that Snowflake uses micro-partitions automatically, but performance can be further improved by defining clustering keys. The choice of clustering columns depends on query patterns. For example, if most queries filter by date or region, those columns would be ideal. I also mentioned that Snowflake supports automatic clustering, which keeps partitions optimized in the background.

8. How would you retrieve a previous version of a file in AWS S3?

This was a practical cloud-related question. If versioning is enabled in an S3 bucket, you can navigate to the bucket, enable “List Versions,” and restore the required version of the file. However, if versioning is not enabled, it is not possible to roll back once the file is deleted or overwritten. This question was aimed at checking my familiarity with AWS storage management.

9. What will be the result of FLOOR(12.75) and FLOOR(-12.75)?

The result of FLOOR(12.75) is 12, since the floor function returns the largest integer less than or equal to the number. For FLOOR(-12.75), the result is -13, because -13 is the largest integer less than -12.75. This was a small but important reminder that mathematical functions in SQL behave differently with negative numbers, something interviewers like to test.

Conclusion

These were the major questions I was able to recall from my interview at Gemini Solutions. They may not be word-for-word the same as asked, but they capture the essence of the topics covered. I will keep updating this list as I remember more or if I come across similar questions in future interviews.

For anyone preparing for an ETL developer role, my advice would be to strengthen your SQL fundamentals, review dimensional modeling concepts, and get comfortable with cloud tools like Snowflake and AWS. Above all, be prepared to explain your answers with examples from real projects because that’s what really makes an impression.

Also Read: Interview Question: Explain Snowflake Architecture

Article Categories:
ETL · SQL

Leave a Reply

Your email address will not be published. Required fields are marked *