Detailed Informatica PowerCenter interview questions with answers:
📋 Table of Contents
🎯 Pro Tip Before You Begin — If you read and understand all 25 topics covered in this guide before walking into your Informatica interview, you will be more prepared than 90% of the candidates in the room. These are not just theory questions — they are the exact concepts interviewers ask about every single time. Go through each one carefully, understand the why behind it, not just the what — and you will walk out of that interview with full confidence. Your offer letter is just 25 concepts away. 🚀
Whether you are just starting out in the world of data engineering or preparing for an Informatica interview, this guide walks you through the 25 most important concepts in plain, simple language. No jargon overload — just clear explanations with real-world context. By the end, you will have a solid understanding of how Informatica PowerCenter works and why it matters.
What Is Informatica PowerCenter and What Are Its Main Components?
Informatica PowerCenter is one of the most widely used ETL (Extract, Transform, Load) tools in the world. Think of it as a “data movement engine” — it pulls data from different sources like databases, flat files, or cloud apps, cleans and reshapes it, and loads it into a destination like a data warehouse.
Imagine a water purification plant: raw water (raw data) comes in from multiple rivers (sources), gets filtered and processed (transformed), and clean water (refined data) is distributed to homes (targets). That is exactly what Informatica does — but for data.
| Component | What It Does |
|---|---|
| Repository | Stores all metadata — mappings, workflows, sessions. It’s the central brain of PowerCenter. |
| Repository Server | Manages access to the repository and handles multiple client connections. |
| PowerCenter Server | The actual runtime engine that executes ETL jobs (sessions and workflows). |
| Designer | Where developers build mappings — the logic for moving and transforming data. |
| Workflow Manager | Used to schedule and manage the execution of ETL processes. |
| Workflow Monitor | Displays real-time status and logs of running or completed workflows. |
| Repository Manager | Handles administration tasks like creating folders, managing users, and backup. |
Connected vs Unconnected Lookup Transformations
Lookup transformation is used to look up data in a table, much like a VLOOKUP in Excel. Informatica provides two flavours of this.
- Directly wired into the data flow pipeline
- Can return multiple output columns
- Participates in the main flow of data
- Called once per input row automatically
- Supports dynamic caching
- Use when you need multiple return values
- Called explicitly via :LKP expression
- Returns only one column per call
- Reusable across multiple transformations
- Not part of the main pipeline directly
- Only supports static cache
- Use when you need a single lookup value
What Are the Advantages of Using Informatica Over Other ETL Tools?
Informatica PowerCenter stands out from competitors for several reasons that matter in real enterprise environments.
| Advantage | Why It Matters |
|---|---|
| High Performance | Parallel execution, partitioning, and caching make it blazing fast for large volumes. |
| Visual Interface | Drag-and-drop Designer means less coding — great for business analysts too. |
| Wide Connectivity | 500+ connectors for databases, cloud apps, flat files, and APIs out of the box. |
| Scalability | Handles small to terabyte-scale data loads with the same architecture. |
| Data Quality | Built-in tools to profile, cleanse, and standardize data before loading. |
| Metadata Management | Every mapping, rule, and transformation is stored and auditable. |
| Error Handling | Robust reject file and session log mechanisms to trace and fix issues. |
| Cloud Ready | Informatica IICS (Intelligent Cloud Services) extends it to AWS, Azure, GCP. |
Main Use Cases for Informatica in an Organization
Organizations invest in Informatica to solve data challenges at scale. Here are the primary use cases:
| Use Case | Description |
|---|---|
| Data Warehousing | Loading and maintaining large data warehouses from multiple operational sources (ERP, CRM, etc.). |
| Data Migration | Moving data from legacy systems to modern platforms without losing integrity. |
| Master Data Management | Creating a single, trusted view of customer, product, or employee data. |
| Data Integration | Combining data from different applications (SAP + Salesforce + Oracle) into one system. |
| Real-time Data Sync | Keeping source and target databases in sync with CDC (Change Data Capture). |
| Regulatory Compliance | Cleansing and auditing data for GDPR, SOX, HIPAA compliance requirements. |
| Analytics & Reporting | Feeding clean, structured data into BI tools like Tableau, Power BI, or SAP BW. |
What Is an Enterprise Data Warehouse and How Does Informatica Support It?
An Enterprise Data Warehouse (EDW) is a central repository that integrates data from all departments and systems in an organization. Unlike an operational database (which handles day-to-day transactions), an EDW is built for analysis — answering questions like “What were our sales by region over the last 5 years?”
Informatica plays a critical role in building and maintaining an EDW:
- Extracts data from dozens of source systems (ERP, CRM, HR, Finance)
- Applies business rules to clean, transform, and standardize the data
- Loads the data into dimensional models (fact and dimension tables)
- Handles incremental loads to keep the warehouse fresh (nightly or real-time)
- Manages Slowly Changing Dimensions to preserve historical data accurately
- Provides data lineage — you can trace every value back to its origin
Explain the Concept of Workflow in Informatica
A Workflow is a set of instructions that tells the PowerCenter Server what to execute and in what order. It is created in the Workflow Manager and can contain Sessions, Commands, Email tasks, Decision tasks, and more.
Here is the hierarchy to understand:
| Object | Description |
|---|---|
| Mapping | Defines the data transformation logic (built in Designer) |
| Session | A runtime instance of a mapping with source/target connections and properties |
| Workflow | Orchestrates one or more sessions with scheduling and dependencies |
| Worklet | A reusable mini-workflow that can be embedded inside other workflows |
A workflow can also contain non-session tasks:
Target Load Order in Informatica and How Is It Set?
When a mapping has multiple target tables, Informatica needs to know in what order to load them — especially when there are foreign key constraints (e.g., you must insert into a parent table before a child table).
How to set it:
- Open the Mapping in the Designer
- Go to Mappings → Target Load Order from the menu
- A dialog box opens listing all target instances in the mapping
- Drag and reorder them to set the desired load sequence
- Save the mapping and validate
Types of Transformations Available in Informatica
Transformations are the building blocks of every mapping in Informatica. They are categorized into Active and Passive (covered in Q20), and across functional types:
| Transformation | Purpose |
|---|---|
| Source Qualifier | Defines how data is read from a relational source; can include SQL overrides |
| Expression | Row-level calculations and derivations without changing row count |
| Filter | Removes rows that don’t meet a specified condition |
| Aggregator | Performs group-by aggregations like SUM, COUNT, AVG, MAX, MIN |
| Lookup | Looks up data from a reference table or flat file |
| Joiner | Joins two heterogeneous data sources (like SQL JOIN) |
| Sorter | Sorts data ascending or descending on specified columns |
| Router | Routes rows to multiple groups based on conditions (like CASE WHEN) |
| Union | Merges data from multiple pipelines (like SQL UNION ALL) |
| Rank | Selects the top or bottom N rows based on a ranked column |
| Sequence Generator | Generates sequential numbers — used to create surrogate keys |
| Update Strategy | Marks rows as Insert, Update, Delete, or Reject |
| Normalizer | Converts denormalized columns into multiple rows |
| Transaction Control | Controls commit and rollback behavior during a session |
| XML Source/Target | Handles reading and writing XML data |
Aggregator vs Expression Transformation
Both are used for calculations, but they work at different levels:
- Works on GROUPS of rows
- Reduces row count (Active)
- Functions: SUM, COUNT, AVG, MAX, MIN
- Requires a GROUP BY port
- Uses memory cache for grouping
- Example: Total sales per region
- Works on ONE row at a time
- Row count stays the same (Passive)
- Functions: IIF, DECODE, string ops, dates
- No GROUP BY needed
- Very fast and lightweight
- Example: Concatenate first + last name
Purpose of Filter Transformation in Informatica
The Filter transformation acts like a WHERE clause in SQL — it allows only rows that meet a certain condition to pass through to the next stage, and drops all others.
Key properties of Filter transformation:
| Property | Detail |
|---|---|
| Type | Active (changes row count) |
| Condition | Boolean expression that evaluates to TRUE or FALSE |
| Dropped rows | Rows that evaluate to FALSE are simply discarded — not sent to reject file |
| Best practice | Place Filter early in the pipeline to reduce volume flowing through expensive transformations |
What Is OLAP and What Are Its Different Types?
OLAP stands for Online Analytical Processing. It is a technology that enables fast, multidimensional analysis of large data sets — the kind of analysis used in business intelligence to answer questions like “How did Product A sell in the Northeast region in Q3 compared to Q3 last year?”
OLAP differs from OLTP (Online Transaction Processing) which handles day-to-day operations like recording a sale. OLAP is about analysis; OLTP is about operations.
| OLAP Type | Description | Best For |
|---|---|---|
| MOLAP | Multidimensional OLAP — data stored in pre-built cubes in memory | Speed; fixed dimensions |
| ROLAP | Relational OLAP — queries are run against relational tables at runtime | Large datasets, flexible schemas |
| HOLAP | Hybrid OLAP — summary data in cubes, detail data in relational tables | Balance of speed and flexibility |
| DOLAP | Desktop OLAP — cube downloaded to client machine for offline analysis | Offline analytics for field teams |
| WOLAP | Web OLAP — OLAP accessed via web browser interface | Browser-based dashboards |
How Does Informatica Handle Data Partitioning?
Data partitioning is one of Informatica’s most powerful performance features. It allows a single session to process data in parallel across multiple CPU threads or nodes — significantly speeding up large data loads.
Think of it like splitting a large pizza delivery order into 4 delivery drivers instead of one. All four work simultaneously, and the job gets done 4x faster.
| Partition Type | How It Works |
|---|---|
| Round Robin | Rows are distributed evenly across partitions in a circular pattern |
| Hash Auto Keys | Informatica automatically picks key columns and hashes rows across partitions |
| Hash User Keys | You specify which columns to hash — good for joining or aggregating |
| Key Range | Rows are split based on value ranges (e.g., IDs 1–1000 to partition 1, 1001–2000 to partition 2) |
| Pass Through | All rows go to all partitions — useful when no splitting is needed at a stage |
| Database | Source database handles the partition split itself (Oracle or DB2 native partitioning) |
What Is a Surrogate Key and When Is It Used?
A surrogate key is an artificially generated, system-assigned unique identifier for each row in a dimension table. It has no business meaning — it is purely a technical identifier created by the ETL process.
Unlike a natural key (like an employee ID or product code from the source system), a surrogate key is stable, controlled, and not exposed to business users.
- Comes from source system (EmpID: E001)
- Can change (employee re-hired)
- May not be unique across sources
- Has business meaning
- Risk of duplicates in EDW
- Generated by ETL (1, 2, 3…)
- Never changes once assigned
- Always unique across entire warehouse
- No business meaning — purely technical
- Enables SCD Type 2 history tracking
In Informatica, surrogate keys are generated using the Sequence Generator transformation, which produces an auto-incrementing integer sequence.
Mapping Parameter vs Mapping Variable
Both allow you to make mappings flexible and reusable — but they behave very differently.
| Feature | Mapping Parameter | Mapping Variable |
|---|---|---|
| Value changes? | No — set once per run | Yes — changes during run |
| Defined in | Parameter file (.par) | Mapping itself |
| Syntax | $$ParameterName | $$VariableName |
| Persists across runs? | No | Yes — stored in repository |
| Use case | Filter date, source file path, schema name | Track last processed ID or date |
| Modified by | Only parameter file | Variable functions in Expression |
Workflow Manager and How Many Repositories Can Be Connected?
The Workflow Manager is the scheduling and orchestration layer of Informatica PowerCenter. It is used to create, modify, schedule, and manage workflows that execute ETL processes.
Workflow Manager has three core tools:
| Tool | Purpose |
|---|---|
| Task Designer | Create and configure individual tasks (Session, Command, Email, etc.) |
| Worklet Designer | Build reusable worklets (mini-workflows) |
| Workflow Designer | Assemble tasks into a complete workflow with links and conditions |
Creating Indices After Completing the Load Process
A common performance best practice in ETL is: drop indices before loading, reload data, then recreate indices. Why? Because maintaining indexes during a bulk INSERT dramatically slows down the load.
Here is how this is done in Informatica:
- In the Session properties, go to the Mapping tab → Target settings
- Set “Drop Table” or “Truncate Table” if a full reload
- Under Pre-SQL, add SQL commands to drop existing indexes
- Run the session to load data without indexes (much faster)
- Under Post-SQL, add SQL commands to recreate the indexes
What Is Complex Mapping in Informatica?
A complex mapping in Informatica refers to a mapping that involves multiple sources, multiple targets, intricate transformation logic, branching pipelines, and dependencies between transformations. There is no official “Complex Mapping” feature — it’s a descriptive term used in practice.
Characteristics of a complex mapping include:
How Does Informatica Handle Error Handling and Data Quality?
Informatica has a multi-layered approach to catching and managing bad data:
| Mechanism | What It Does |
|---|---|
| Reject Files | Rows that fail to load (constraint violations, type mismatches) go to a flat file for review |
| Session Log | Detailed log of every step, with row counts, errors, and warnings |
| Error Threshold | Set max errors before session stops — prevents loading corrupt data silently |
| Bad File | Separate file for rows rejected at the source qualifier stage |
| Row Error Logging | Errors logged to a relational table (PM_REC_ERR) for SQL-based querying |
| Update Strategy | Rows marked as DD_REJECT are explicitly rejected by business logic |
| Data Quality (IDQ) | Informatica Data Quality add-on provides profiling, standardization, deduplication |
Session Partitioning in Informatica
Session Partitioning is the configuration that tells Informatica how many parallel threads (partitions) to use when executing a session, and how to distribute data across them.
It is different from data partitioning (the logic of splitting data). Session partitioning is the execution-level configuration.
| Aspect | Detail |
|---|---|
| Where configured | Session properties → Mapping tab → Partitions pane |
| Partition count | Set per transformation — different stages can have different counts |
| Requires license | PowerCenter Partitioning Option (separate license) for >1 partition |
| Dynamic partitioning | Informatica can auto-scale partitions based on available CPU at runtime |
| Pipeline partitioning | The same data is processed in multiple parallel pipelines |
Active vs Passive Transformations
This is one of the most fundamental concepts in Informatica and comes up in nearly every interview.
- CAN change the number of rows
- Can add, remove, or duplicate rows
- Examples: Filter, Aggregator, Sorter, Rank, Router, Update Strategy, Normalizer, Joiner
- Cannot share a source pipeline with passive transforms easily
- Do NOT change the number of rows
- One row in = one row out always
- Examples: Expression, Lookup (connected), Sequence Generator, XML Parser
- Can be shared in a pipeline more flexibly
Slowly Changing Dimensions (SCD) in Informatica
Slowly Changing Dimensions (SCD) handle the challenge of tracking history in dimension tables. For example, a customer’s address changes — do you overwrite the old one, or keep both?
| SCD Type | Strategy | History Kept? |
|---|---|---|
| Type 0 | Never update — original values retained forever | Only original |
| Type 1 | Overwrite old values with new values | No — old data lost |
| Type 2 | Add a new row for the new value; old row marked inactive with end date | Full history |
| Type 3 | Add a new column for the previous value | Only last change |
| Type 4 | Mini-dimension — separate history table | Yes, separately |
| Type 6 | Hybrid of Type 1 + 2 + 3 | Yes, comprehensively |
How Informatica handles SCD Type 2:
- Use Lookup to check if the record already exists in the target
- Compare key columns — if different, it’s a change
- Use Update Strategy with DD_INSERT to add a new row (new surrogate key)
- Use Update Strategy with DD_UPDATE to mark the old row as expired (set end date)
- Sequence Generator creates a new surrogate key for the new row
Transaction Control Transformation
Transaction Control transformation gives you fine-grained control over when to COMMIT or ROLLBACK data during a session. By default, Informatica commits in bulk — but sometimes you need row-level or group-level commits.
| Function | Meaning |
|---|---|
| TC_CONTINUE_TRANSACTION | Keep accumulating rows — do not commit yet |
| TC_COMMIT_BEFORE | Commit all previous rows, then include current row in the next transaction |
| TC_COMMIT_AFTER | Include current row in commit, then start a new transaction |
| TC_ROLLBACK_BEFORE | Rollback previous rows, then start fresh with current row |
| TC_ROLLBACK_AFTER | Include current row in rollback, then start a new transaction |
Informatica vs Datastage: Key Differences
| Feature | Informatica PowerCenter | IBM DataStage |
|---|---|---|
| Vendor | Informatica Corporation | IBM (part of Watson Knowledge Catalog) |
| Architecture | Repository-based, metadata-driven | Project-based, server-parallel engine |
| Ease of Use | More intuitive GUI; shorter learning curve | Steeper learning curve; more complex setup |
| Performance | Excellent with partitioning and caching | Known for very high throughput on IBM stack |
| Connectivity | 500+ connectors; strong cloud support | Strong on IBM ecosystem (DB2, Mainframe) |
| Pricing | Moderate to high | High; tied to IBM infrastructure |
| Cloud | Informatica IICS — mature cloud offering | DataStage on Cloud Pak for Data |
| Market Share | Leader (Gartner Magic Quadrant) | Strong in financial and telecom sectors |
Improving the Performance of Aggregator Transformation
The Aggregator transformation is one of the most memory-intensive transformations. Poorly tuned aggregators are a common bottleneck. Here are proven techniques to speed it up:
- Sort data before Aggregator: Pre-sort on GROUP BY columns using Sorter and enable “Sorted Input” in Aggregator — this avoids building an in-memory hash table
- Filter early: Use Filter transformation before the Aggregator to reduce row volume going in
- Increase cache size: Tune the Aggregator cache (in session properties) to fit groups in memory and avoid disk spillage
- Use incremental aggregation: For incremental loads, enable incremental aggregation to process only new/changed data rather than the full dataset
- Minimize output ports: Only pass through columns you actually need in output — extra columns waste memory
- Partition by hash: Use hash auto-keys partitioning to ensure all rows with the same GROUP BY key go to the same partition
- Source qualifier SQL override: Push aggregation to the database using GROUP BY in SQL override when possible — it’s faster than doing it in Informatica
Creating and Using Worklets in Informatica
A Worklet is a reusable workflow object — essentially a mini-workflow that can be embedded inside larger workflows. It helps avoid duplicate logic and makes large workflows cleaner and more maintainable.
Think of it like a function in programming: write the logic once, call it many times.
Creating a Worklet:
- Open the Worklet Designer in Workflow Manager
- Go to Workflows → Create Worklet and give it a name
- Add tasks (Sessions, Commands, Email, etc.) inside the worklet
- Connect tasks with links and set conditions as needed
- Save and validate the worklet
Using a Worklet in a Workflow:
- Open the parent Workflow in Workflow Designer
- From the Task menu, insert the Worklet as a task object
- Connect it in the workflow sequence like any other task
- The worklet can also be made reusable or non-reusable (non-reusable is local to one workflow)
You’ve Covered All 25 Topics! 🎉
From the basics of PowerCenter architecture to advanced topics like SCD, session partitioning, and performance tuning — you now have a comprehensive foundation in Informatica ETL. Keep practicing, build sample mappings, and revisit these concepts regularly.




