DWBI: November 2011

Friday, 18 November 2011

Basic Shell Scripting

Command Line Arguments:

Argument	Functionality
$#	Represents the parameter count. Useful for controlling loop constructs that need to process each parameter
$@	Expands to all the parameters separated by spaces. Useful for passing all the parameters to some other function or program
$*	Contains the string of all arguments (positional parameters) on the command line
The variable $@ is the same as $* except when enclosed in double quotes. Then, each argument contained in $@ is double quoted.
$-	Expands to the flags (options) the shell was invoked with. Useful for controlling program flow based on the flags set # echo $-
$$	Expands to the process id of the shell innovated to run the script/ current shell. Useful for creating unique temporary filenames relative to this instantiation of the script
$?	contains the exit status of the last command or shell program
$!	Contains the process ID number of the last command sent to the background. $ kill -9 $!
$0	Contains the name of the command (process) currently being executed
The variables $#,$* and $@ also change with the shiftcommand. Once a shift is performed, the first parameter is discarded. The $# variable is decremented, and the $* and $@ variables are updated.

Variables:

Assign a variable to a value	For string with space use "" $ var1="Hello World"
Read only variables	$ readonly variable_name
Assign variable from read line	$ read fname lname # Read 2 values separated by Space #"AA BB" "CC DD"/AA BB
Assign NULL to a variable	$ name=/ $ name=''/ $ name=""/ $ unset varname

Parameter substitution formats:

Substitution	Definition	Purpose
${parameter:-value}	If parameter exists and isn't null, return its value; otherwise return value	Returning a default value if the variable is undefined. $ echo ${var1:-$HOME}
${parameter:=value}	If parameter exists and isn't null, return its value; otherwise set it to value and then return its value	Setting a variable to a default value if it is undefined. $ echo ${var1:=$HOME}
${parameter:+value}	If parameter exists and isn't null, return value; otherwise return null.	Testing for the existence of a variable.
${parameter:?value}	If parameter exists and isn't null, return its value; otherwise print parameter: followed by message and abort the current command or script. Omitting message produces the default message parameter null or not set.	Catching errors that result from variables being undefined.

Command Substitution:

$(command)
`command`

Arithmetic Expansion:

$((expression))
echo $((1 + 3 + 4))

Relational Operators(works for numerical values 10, 20 or “10”, “20”):

-eq EQUALS TO
-ne NOT EQUALS TO
-gt GREATER THAN
-lt LESS THAN
-ge GRATER THAN OR EQUALS TO
-le LESS THAN OR EQUALS TO

Boolean Operators:

! NOT
-o OR
-a AND

String Operators:

= EQUALS TO # [ $a = $b ]
!= NOT EQUALS TO # [ $a != $b ]
-z RETURNS TRUE FOR ZERO LENGTH STRING # [ -z $a ]
-n RETURNS TRUE FOR NON-ZERO LENGTH STRING # [ -n $a ]
str RETURNS TRUE FOR NON-EMPTY STRING # [ $a ]

File Test Operators:

Operator	Description
-a file	True if file EXISTS
-b file	TRUE if file is a block special file
-c file	TRUE if file is a character special file
-d file	TRUE if file is a directory
-e file	TRUE if file exists. Is TRUE even if file is a directory but exists
-f file	TRUE if file is an ordinary file as opposed to a directory or special file
-g file	TRUE if file has its group ID (SGID) bit set
-k file	TRUE if file has its sticky bit set
-p file	TRUE if file is a named pipe
-t file	TRUE if file descriptor is open
-u file	TRUE if file has its user id (SUID) bit set
-r file	TRUE if file is readable/ read permission
-w file	TRUE if file is writable/ write permission
-x file	TRUE if file is execute/ execute permission
-s file	TRUE if file has size greater than 0
-O file	TRUE if you own file
-G	TRUE file your group id is same as file's
file1 -nt file2	TRUE if file1 is newer than file2
file1 -ot file2	TRUE if file1 is older than file2

Pattern Matching:

* Zero or More Characters, Except a Leading dot
? Any Single Character, Except a Leading dot
[] Defines a Class of Characters ( - for Range, ! to Exclude)

If Statement:

if [ "$1" = "1" ]

then

echo "The first choice is nice"

elif [ "$1" = "2" ]

then

echo "The second choice is just as nice"

else

echo "I see you were wise enough not to choose"

echo "You win"

Do...While:

count=$1 # Initialise count to first parameter

while [ $count -gt 0 ] # while count is greater than 10 do

echo $count seconds till supper time!

count=$(expr $count -1) # decrement count by 1

sleep 1 # sleep for a second using the Unix sleep command

done

For:

fruitlist="Apple Pear Tomato Peach Grape"

for fruit in $fruitlist

if [ "$fruit" = "Tomato" ] || [ "$fruit" = "Peach" ]

then

echo "I like ${fruit}es"

else

Unix Shell Scripting Tutorial

echo "I like ${fruit}s"

done

Case:

case $1

1) echo 'First Choice';;

2) echo 'Second Choice';;

*) echo 'Other Choice';;

esac

Functions:

name() {

commands

}

· For any local function variable we write: local value=4

#!/bin/sh

inc() { # The increment is defined first so we can use it

echo $(($1 + $2)) # We echo the result of the first parameter plus the second parameter

}

if [ "$1" "" ] || [ "$2" = "" ] || [ "$3" = "" ]

then

Unix Shell Scripting Tutorial

echo USAGE:

echo " counter startvalue incrementvalue endvalue"

else

count=$1 # Rename are variables with clearer names

value=$2

end=$3

while [ $count -lt $end ] # Loop while count is less than end

echo $count

count=$(inc $count $value) # Call increment with count and value as parameters

count=`inc $count $value`

done # so that count is incremented by value

· Import functions from a script called common.sh in DIRECTORY

#!/bin/sh

. ./common.sh

if [ "$1" = "" ]; then

echo USAGE:

echo "sh test.sh type"

exit

if `validtype $1`; then

echo Valid type

else

echo Invalid type

Here is common.sh:

#!/bin/sh

validtype() {

if [ "$1" = "TYPEA" ] || [ "$1" = "TYPEB" ] || [ "$1" = "TYPEC" ] || [ "$1" = "TYPED" ] || [ "$1" = "TYPEE" ];

then

exit 0

else

exit 1

}

Sample Shell Script 2: Find the nzreclaim candidate for any database

#!/bin/sh
# ./nzDeleteRecPercentage.sh dbname
############################################################################
# Name: nzDeleteRecPercentage.sh
# Desc: Get the list of tables of a database for which at least one deleted
# record exists in the form of percentage. We can use this percentage to
# choose candidate for nzreclaim/GROOM table operation
#
# Change Log
# Ver   Date            Author          Comment
# ---   -----------     -------------   ----------------------------------------
# 1.0   MM/DD/YYYY      SD              Creation
#
#############################################################################
#LOGDIR=/home/nz_training_user14/sankar
#LOGFILE="${LOGDIR}/nzDeleteRecPercentage.log"
#ERRFILE="${LOGDIR}/nzDeleteRecPercentage.err"

DATABASENAME=${1}
#echo "Database: ${DATABASENAME}"

TABLES=`nzsql -u UserName -pw Password -db ${DATABASENAME} -t -c "SELECT TABLENAME FROM _V_TABLE;"`
#echo ${TABLES}

for TABLENAME in ${TABLES}
do
    DELETECOUNT=`nzsql -u UserName -pw Password -db ${DATABASENAME} -t -c "set show_deleted_records=true; SELECT COUNT(1) FROM ${TABLENAME} WHERE deletexid<>0;"`

    if [ ${DELETECOUNT} -gt 0 ]
    then
        TOTALCOUNT=`nzsql -u UserName -pw Password -db ${DATABASENAME} -t -c "SELECT COUNT(1) FROM ${TABLENAME};"`
        PERCENTAGE=`echo "scale=2;($DELETECOUNT*100/($TOTALCOUNT+$DELETECOUNT))" | bc`
        echo "${TABLENAME} has ${PERCENTAGE}% deleted records "
    fi
done

Thursday, 17 November 2011

Sample Shell Script 1: Sum all the argument passed to the script

#!/bin/sh
# ./argsum.sh abcd 45.1 89.2 12.3 efgh
############################################################################
# Name: argsum.sh
# Desc: Sum all the argument passed to the script. It accepts all type of
# arguments but make a SUM of only number type arguments
# Change Log
# Ver   Date            Author          Comment
# ---   -----------     -------------   ----------------------------------------
# 1.0   MM/DD/YYYY      SD              Creation
#
#############################################################################

LOGDIR=/home/nzuser/sankar
LOGFILE="${LOGDIR}/argsum.log"
ERRFILE="${LOGDIR}/argsum.err"
SUM=0

# Function to write log
log_msg()
{
MESSAGE=${@}
echo ${MESSAGE}>>${LOGFILE}
}

# Check if the log file already exists in the directory, remove the file
if test -f ${LOGFILE}
then
    rm -f ${LOGFILE}
fi

log_msg "Log directory: ${LOGDIR}"
log_msg "Log file: ${LOGFILE}"
log_msg "Error file: ${ERRFILE}"

if [ ${#} -gt 0 ]    # ${#} --> Number of Arguments
then
    log_msg "Number of arg: ${#}"
fi

for I in ${@}       # ${@} --> All arguement as space separated
do
    SUM=`echo ${SUM}+${I}|bc`
    # Input values are accepted as character, converted to number
done

log_msg "Total of ${#} arg value is : ${SUM}"
echo "Get the run log from ${LOGFILE}"

Friday, 11 November 2011

Informatica Interview Question: Part 1

You will find lots of Informatica Interview question scattered in different sites. I collected most of them and put a place for you only. If you find anything wrong or you have a better answer please write me back.

Which transformation should we use to normalize the COBOL and relational sources?

Normalizer Transformation.When we drag the COBOL source in to the mapping Designer workspace,the normalizer transformation automatically appears,creating input and output ports for every column in the source.

Difference between static cache and dynamic cache?

In case of Dynamic catche when you are inserting a new row it looks at the lookup catche to see if the row existing or not,If not it inserts in the target and catche as well in case of Static catche when you are inserting a new row it checks the catche and writes to the target but not catche

If you cache the lookup table, you can choose to use a dynamic or static cache. By default, the lookup cache remains static and does not change during the session. With a dynamic cache, the Informatica Server inserts or updates rows in the cache during the session. When you cache the target table as the lookup, you can look up values in the target and insert them if they do not exist, or update them if they do.

What are the join types in joiner transformation?

Normal,Master Outer,Detail Outer and Full Outer

In which condtions we can not use joiner transformation(Limitaions of joiner transformation)?

no restrictions

you perform the following task before configuring the joiner transformation configure the transformation to use sorted data and define the join condition to recieve sorted data in the same order as the sort origin

In the conditions; Either input pipeline contains an Update Strategy transformation, You connect a Sequence Generator transformation directly before the Joiner transformation

1.Both input pipelines originate from the same Source Qualifier transformation.

2.Both input pipelines originate from the same Normalizer transformation.

3.Both input pipelines originate from the same Joiner transformation.

4.Either input pipeline contains an Update Strategy transformation.

5.We connect a Sequence Generator transformation directly before the Joiner transformation.

What is the look up transformation?

Lookup is a passive transformation and used to look up data in a flat file or a relational table

What are the diffrence between joiner transformation and source qualifier transformation?

1. Source Qualifier Operates only with relational sources within the same schema. Joiner can have either heterogenous sources or relation sources in different schema

2. Source qualifier requires at least one matching column to perform a join. Joiner joins based on matching port.

Why use the lookup transformation?

Used to look up data in a relational table or view.

in Inf7.1, we can get from flat file also

look up is used to perform one of the following task: -to get related value -to perform calculation -to update slowley changing dimension table

generally we use lookup transformation for 1) get a related value from key column value 2) check whether the record already existing in the table 3) slowly changing dimension tables

A Lookup transformation is used for checking the matched values from the source or target tables,used for updating the slowly changing dimensions and also performs some calculations.

How can you improve session performance in aggregator transformation?

By using Incremental Aggregation

create the sorter transformation before the aggregator

sorted input

Ya we can use a Sorted Input option to improve the performance. Basically aggregate transformation reduces the performance because it uses caches.

Can you use the maping parameters or variables created in one maping into any other reusable transformation?

Yes. Because reusable transformation is not contained with any mapplet or mapping.

What is meant by lookup caches?

Session will read all unique rows from the reference table/ file to fill the local buffer first; then for each row received from up-stream transformation, it tries to match them against the local buffer

Informatica server builts a cache in memory when it process the first row of a cached lookup transformation.

- When server runs a lookup transformation, the server builds a cache in memory, when it process the first row of data in the transformation. - Server builds the cache and queries it for the each row that enters the transformation. - The server creates index and data cache files in the lookup cache directory and used the server code page to create the files. - index cache contains conductional values and data cache contains output values

The informatica server builds a cache in memory when it processes the first row of a data in a cached look up transformation. It allocates memory for the cache based on the amount you configure in the transformation or session properties. The informatica server stores condition values in the index cache and output values in the data cache.

What is source qualifier transformation?

SQ is an active tramsformation. It performs one of the following task:

to join data from the same source database

to filtr the rows when Power centre reads source data to perform an outer join to select only distinct values from the source

In source qualifier transformation a user can defined join conditons, filter the data and eliminating the duplicates. The default source qualifier can over written by the above options, this is known as SQL Override.

The source qualifier represents the records that the informatica server reads when it runs a session.

When we add a relational or a flat file source definition to a mapping, we need to connect it to a source qualifier transformation. The source qualifier transformation represents the records that the informatica server reads when it runs a session.

How the informatica server increases the session performance through partitioning the source?

Partitioning the session improves the session performance by creating multiple connections to sources and targets and loads data in parallel pipe lines

What are the settiings that you use to cofigure the joiner transformation?

1) Master Source

2) Detail Source

3) Type Of Join

4) Condition of Join

Take less no. of rows table as master table, more no of table as detail table and join condition. joiner will put all row from master table into chache and check condition with detail table rows.

What are the rank caches?

the informatica server stores group information in an index cache and row data in data cache

when the server runs a session with a Rank transformation, it compares an input row with rows with rows in data cache. If the input row out-ranks a stored row, the Informatica server replaces the stored row with the input row.

During the session , the informatica server compares an inout row with rows in the data cache. If the input row out-ranks a stored row, the informatica server replaces the stored row with the input row. The informatica server stores group information in an index cache and row data in a data cache.

What is Code Page Compatibility?

When two code pages are compatible, the characters encoded in the two code pages are virtually identical.

Compatibility between code pages is used for accurate data movement when the Informatica Sever runs in the Unicode data movement mode. If the code pages are identical, then there will not be any data loss. One code page can be a subset or superset of another. For accurate data movement, the target code page must be a superset of the source code page.

How can you create or import flat file definition in to the warehouse designer?

By giving server connection path

Create the file in Warehouse Designer or Import the file from the location it exists or modify the source if the structure is one and the same

first create in source designer then drag into warehouse designer you can't create a flat file target defenition directly

There is no way to import target definition as file in Informatica designer. So while creating the target definition for a file in the warehouse designer it is created considering it as a table, and then in the session properties of that mapping it is specified as file.

U can not create or import flat file definition in to warehouse designer directly.Instead U must analyze the file in source analyzer, then drag it into the warehouse designer. When U drag the flat file source definition into warehouse designer workspace, the warehouse designer creates a relational target definition not a file definition. If u want to load to a file, configure the session to write to a flat file.When the informatica server runs the session, it creates and loads the flatfile.

What is aggregate cache in aggregator transforamtion?

IT IS A CACHE

Aggregate value will stored in data cache, grouped column value will stored in index cache

Power centre server stores data in the aggregate cache until it completes aggregate calculations.

Aggregator Index Cache stores group by values from Group-By ports and Data Cache stores aggregate data based on Group-By ports (variable ports, output ports, non group by ports). When the PowerCenter Server runs a session with an Aggregator transformation, it stores data in memory until it completes the aggregation. If you use incremental aggregation, the PowerCenter Server saves the cache files in the cache file directory.

it is a temporary memory used by aggregator in order to improve the performance

aggregator transformation contains two caches namely data cache and index cache data cache consists aggregator value or the detail record index cache consists grouped column value or unique values of the records

When the PowerCenter Server runs a session with an Aggregator transformation, it stores data in aggregator until it completes the aggregation calculation.

The aggregator stores data in the aggregate cache until it completes aggregate calculations. When u run a session that uses an aggregator transformation, the informatica server creates index and data caches in memory to process the transformation.If the informatica server requires more space,it stores overflow values in cache files.

How can you recognise whether or not the newly added rows in the source are gets insert in the target?

In the type-2 mapping we have three options to recognise the newly added rows. i) Version Number ii) Flag Value iii) Effective Date Range

1. we can add a count aggregator column to the target and generate it before running the mapping there might couple of different ways to do this or we can run a sql query after running the mapping each time to make sure new data is inserted

2. From session SrcSuccessRows can be compared with TgtSuccessRows

3. check the seesion log or check the target table.

What are the types of lookup?

Mainly two types of look up...there 1.static lookup 2.dynamic lookup In static lookup. there two types are used one is connected and unconnected. In connected lookup means while using the pipeline symbol. In unconnected lookup means while using the expression condition.

What are the types of metadata that stores in repository?

Data base connections, global objects, sources, targets, mapping, mapplets, sessions, shortcuts, transformations

The repository stores metadata that describes how to transform and load source and target data.

Data about data

Metadata can include information such as mappings describing how to transform source data, sessions indicating when you want the Informatica Server to perform the transformations, and connect strings for sources and targets.

Following are the types of metadata that stores in the repository Database connections Global objects Mappings Mapplets Multidimensional metadata Reusable transformations Sessions and batches Short cuts Source definitions Target definitions Transformations.

What happens if Informatica server doesn't find the session parameter in the parameter file?

Workflow will fail.

Can you access a repository created in previous version of informatica?

We have to migrate the repository from the older version to newer version. Then you can use that repository.

Without using ETL tool can u prepare a Data Warehouse and maintain?

Yes we can do that using PL/ SQL or Stored procedures when all the data are in the same databases. If you have source as flat files you can?t do it through PL/ SQL or stored procedures.

How do you identify the changed records in operational data?

In my project source system itself sending us the new records and changed records from the last 24 hrs.

Why couldn't u go for Snowflake schema?

Snowflake is less performance while compared to star schema, because it will contain multi joins while retrieving the data.

Snowflake is preferred in two cases,

If you want to load the data into more hierarchical levels of information example yearly, quarterly, monthly, daily, hourly, minutes of information. Prefer snowflake.

Whenever u found input data contain more low cardinality elements. You have to prefer snowflake schema. Low cardinality example: sex , marital Status, etc., Low cardinality means no of distinct records is very less while compared to total number of the records.

Name some measures in your fact table?

Sales amount.

How many dimension tables did you had in your project and name some dimensions (columns)?

Product Dimension : Product Key, Product id, Product Type, Product name, Batch Number.

Distributor Dimension: Distributor key, Distributor Id, Distributor Location,

Customer Dimension : Customer Key, Customer Id, CName, Age, status, Address, Contact

Account Dimension : Account Key, Acct id, acct type, Location, Balance,

How many Fact and Dimension tables are there in your project?

In my module (Sales) we have 4 Dimensions and 1 fact table.

How many Data marts are there in your project?

There are 4 Data marts, Sales, Marketing, Finance and HR. In my module we are handling only sales data mart.

What is the daily data volume (in GB/records)? What is the size of the data extracted in the extraction process?

Approximately average 40k records per file per day. Daily we will get 8 files from 8 source systems.

What is the size of the database in your project?

Based on the client?s database, it might be in GB?s.

What is meant by clustering?

It will join two (or more) tables in single buffer, will retrieve the data easily.

Whether are not the session can be considered to have a heterogeneous target is determined?

It will consider (there is no primary key and foreign key relationship)

Under what circumstance can a target definition are edited from the mapping designer. Within the mapping where that target definition is being used?

We can't edit the target definition in mapping designer. we can edit the target in warehouse designer only. But in our projects, we haven't edited any of the targets. if any change required to the target definition we will inform to the DBA to make the change to the target definition and then we will import again. We don't have any permission to the edit the source and target tables.

Can a source qualifier be used to perform a outer join when joining 2 database?

No, we can't join two different databases join in SQL Override.

If u r source is flat file with delimited operator.when next time u want change that delimited operator where u can make?

In the session properties go to mappings and click on the target instance click set file properties we have to change the delimited option.

If index cache file capacity is 2MB and datacache is 1 MB. If you enter the data of capacity for index is 3 MB and data is 2 MB. What will happen?

Nothing will happen based the buffer size exists in the server we can change the cache sizes. Max size of cache is 2 GB.

Difference between next value and current value ports in sequence generator?

Assume that they r both connected to the input of another transformer. It will gives values like nextvalue 1, currval 0.

How does dynamic cache handle the duplicates rows?

Dynamic Cache will gives the flags to the records while inserting to the cache it will gives flags to the records, like new record assigned to insert flag as "0", updated record is assigned to updated flag as "1", No change record assigned to rejected flag as "2".

How will u find whether your mapping is correct or not without connecting session?

Through debugging option.

If you are using aggregator transformation in your mapping at that time your source contain dimension or fact?

According to requirements, we can use aggregator transformation. There is no limitation for the aggregator. We should use source as dimension or fact.

My input is oracle and my target is flat file shall I load it? How?

Yes, Create flat file based on the structure match with oracle table in warehouse designer than develop the mapping according requirement and map to that target flat file. Target file is created in TgtFiles directory in the server system.

For a session, can I use 3 mappings?

No, for one session there should be only one mapping. We have to create separate session for each mapping.

Type of loading procedures?

Load procedures are two types 1) Normal load 2) bulk loads if you are talking about informatica level. If you are talking about project load procedures based on the project requirement. Daily loads or weekly loads.

Are you involved in high level r low level design? What is meant by that high level design n low level design?

Low Level design:

Requirements should be in the excel format which describes field to field validations and business logic needs to present. Mostly onsite team will do this Low Level design.

High Level Design:

Describes the informatica flow chart from source qualifier to target simply we can say flow chart of the informatica mapping. Developer will do this design document.

What r the dimension load methods?

Daily loads or weekly loads based on the project requirement.

Where we are using lkp b/n source to stage or stage to target?

Depend on the requirement. There is no rule we have to use in this stage only.

How will you do SQL tuning?

We can do SQL tuning using Oracle Optimizer, TOAD software

Did u use any other tools for scheduling purpose other than workflow manager or pmcmd?

Using third party tools like "Control M".

What is SQL mass updating?

Update one table with respect to another.

Update (select hs1.col1 as hs1_col1

, hs1.col2 as hs1_col2

, hs1.col3 as hs1_col3

, hs2.col1 as hs2_col1

, hs2.col2 as hs2_col2

, hs2.col3 as hs2_col3

From hs1, hs2

Where hs1.sno = hs2.sno)

set hs1_col1 = hs2_col1

, hs1_col2 = hs2_col2

, hs1_col3 = hs2_col3;

What is unbounded exception in source qualifier?

"TE_7020 Unbound field in Source Qualifier" when running session

A) Problem Description:

When running a session the session fails with the following error:

TE_7020 Unbound field <field_name> in Source Qualifier <SQ_name>"

Solution:

This error will occur when there is an inconsistency between the Source Qualifier and the source table.

Either there is a field in the Source Qualifier that is not in the physical table or there is a column

of the source object that has no link to the corresponding port in the Source Qualifier.

To resolve this, re-import the source definition into the Source Analyzer in Designer.

Bring the new Source definition into the mapping.This will also re-create the Source Qualifier.

Connect the new Source Qualifier to the rest of the mapping as before.

Using unconnected lookup how we you remove nulls n duplicates?

We can't handle nulls and duplicates in the unconnected lookup. We can handle in dynamic connected lookup.

I have 20 lookup, 10 joiners, 1 normalizer how will you improve the session performance?

We have to calculate lookup & joiner caches size.

What is version controlling?

It is the method to differentiate the old build and the new build after changes made to the existing code. For the old code v001 and next time u have to increase the version number as v002 like that. In my last company we haven't use any version controlling. We just delete the old build and replace with the new code.

We don't maintain version controlling in informatica. We are maintaining the code in VSS (Virtual visual Source) that is the software with maintain the code with versioning. Whenever client made change request came once the production starts we have to create another build.

How is the Sequence Generator transformation different from other transformations?

The Sequence Generator is unique among all transformations because we cannot add, edit, or delete its default ports (NEXTVAL and CURRVAL).

Unlike other transformations we cannot override the Sequence Generator transformation properties at the session level. This protecxts the integrity of the sequence values generated.

What are the advantages of Sequence generator? Is it necessary, if so why?

We can make a Sequence Generator reusable, and use it in multiple mappings. We might reuse a Sequence Generator when we perform multiple loads to a single target.

For example, if we have a large input file that we separate into three sessions running in parallel, we can use a Sequence Generator to generate primary key values. If we use different Sequence Generators, the Informatica Server might accidentally generate duplicate key values. Instead, we can use the same reusable Sequence Generator for all three sessions to provide a unique value for each target row.

What are the uses of a Sequence Generator transformation?

We can perform the following tasks with a Sequence Generator transformation:

Create keys
Replace missing values
Cycle through a sequential range of numbers

What is Sequence Generator Transformation?

The Sequence Generator transformation generates numeric values. We can use the Sequence Generator to create unique primary key values, replace missing primary keys, or cycle through a sequential range of numbers.

The Sequence Generation transformation is a connected transformation. It contains two output ports that we can connect to one or more transformations.

What is the difference between connected lookup and unconnected lookup?

Differences between Connected and Unconnected Lookups:

Connected Lookup <-> Unconnected Lookup

Receives input values directly from the pipeline. <-> Receives input values from the result of a :LKP expression in another transformation.

We can use a dynamic or static cache <-> We can use a static cache

Supports user-defined default values <-> Does not support user-defined default values

What are connected and unconnected Lookup transformations?

We can configure a connected Lookup transformation to receive input directly from the mapping pipeline, or we can configure an unconnected Lookup transformation to receive input from the result of an expression in another transformation.

An unconnected Lookup transformation exists separate from the pipeline in the mapping. We write an expression using the :LKP reference qualifier to call the lookup within another transformation.

A common use for unconnected Lookup transformations is to update slowly changing dimension tables.

What is a Lookup transformation and what are its uses?

We use a Lookup transformation in our mapping to look up data in a relational table, view or synonym.

We can use the Lookup transformation for the following purposes:

Get a related value. For example, if our source table includes employee ID, but we want to include the employee name in our target table to make our summary data easier to read.

Perform a calculation. Many normalized tables include values used in a calculation, such as gross sales per invoice or sales tax, but not the calculated value (such as net sales).

Update slowly changing dimension tables. We can use a Lookup transformation to determine whether records already exist in the target.

What is a lookup table?

The lookup table can be a single table, or we can join multiple tables in the same database using a lookup query override. The Informatica Server queries the lookup table or an in-memory cache of the table for all incoming rows into the Lookup transformation.

If your mapping includes heterogeneous joins, we can use any of the mapping sources or mapping targets as the lookup table.

Where do you define update strategy?

We can set the Update strategy at two different levels:

Within a session. When you configure a session, you can instruct the Informatica Server to either treat all records in the same way (for example, treat all records as inserts), or use instructions coded into the session mapping to flag records for different database operations.
Within a mapping. Within a mapping, you use the Update Strategy transformation to flag records for insert, delete, update, or reject.

What is Update Strategy?

When we design our data warehouse, we need to decide what type of information to store in targets. As part of our target table design, we need to determine whether to maintain all the historic data or just the most recent changes.

The model we choose constitutes our update strategy, how to handle changes to existing records.

Update strategy flags a record for update, insert, delete, or reject. We use this transformation when we want to exert fine control over updates to a target, based on some condition we apply. For example, we might use the Update Strategy transformation to flag all customer records for update when the mailing address has changed, or flag all employee records for reject for people no longer working for the company.

What are the different types of Transformations?

a) Aggregator transformation: The Aggregator transformation allows you to perform aggregate calculations, such as averages and sums. The Aggregator transformation is unlike the Expression transformation, in that you can use the Aggregator transformation to perform calculations on groups. The Expression transformation permits you to perform calculations on a row-by-row basis only. (Mascot)

b) Expression transformation: You can use the Expression transformations to calculate values in a single row before you write to the target. For example, you might need to adjust employee salaries, concatenate first and last names, or convert strings to numbers. You can use the Expression transformation to perform any non-aggregate calculations. You can also use the Expression transformation to test conditional statements before you output the results to target tables or other transformations.

c) Filter transformation: The Filter transformation provides the means for filtering rows in a mapping. You pass all the rows from a source transformation through the Filter transformation, and then enter a filter condition for the transformation. All ports in a Filter transformation are input/output, and only rows that meet the condition pass through the Filter transformation.

d) Joiner transformation: While a Source Qualifier transformation can join data originating from a common source database, the Joiner transformation joins two related heterogeneous sources residing in different locations or file systems.

e) Lookup transformation: Use a Lookup transformation in your mapping to look up data in a relational table, view, or synonym. Import a lookup definition from any relational database to which both the Informatica Client and Server can connect. You can use multiple Lookup transformations in a mapping.

The Informatica Server queries the lookup table based on the lookup ports in the transformation. It compares Lookup transformation port values to lookup table column values based on the lookup condition. Use the result of the lookup to pass to other transformations and the target.

What is a transformation?

A transformation is a repository object that generates, modifies, or passes data. You configure logic in a transformation that the Informatica Server uses to transform data. The Designer provides a set of transformations that perform specific functions. For example, an Aggregator transformation performs calculations on groups of data.

Each transformation has rules for configuring and connecting in a mapping. For more information about working with a specific transformation, refer to the chapter in this book that discusses that particular transformation.

You can create transformations to use once in a mapping, or you can create reusable transformations to use in multiple mappings.

What are the tools provided by Designer?

The Designer provides the following tools:

Source Analyzer. Use to import or create source definitions for flat file, XML, Cobol, ERP, and relational sources.
Warehouse Designer. Use to import or create target definitions.
Transformation Developer. Use to create reusable transformations.
Mapplet Designer. Use to create mapplets.
Mapping Designer. Use to create mappings.

What are the different types of Commit intervals?

The different commit intervals are:

Target-based commit. The Informatica Server commits data based on the number of target rows and the key constraints on the target table. The commit point also depends on the buffer block size and the commit interval.
Source-based commit. The Informatica Server commits data based on the number of source rows. The commit point is the commit interval you configure in the session properties.

What is Event-Based Scheduling?

When you use event-based scheduling, the Informatica Server starts a session when it locates the specified indicator file. To use event-based scheduling, you need a shell command, script, or batch file to create an indicator file when all sources are available. The file must be created or sent to a directory local to the Informatica Server. The file can be of any format recognized by the Informatica Server operating system. The Informatica Server deletes the indicator file once the session starts.

Use the following syntax to ping the Informatica Server on a UNIX system:

pmcmd ping [{user_name | %user_env_var} {password | %password_env_var}] [hostname:]portno

Use the following syntax to start a session or batch on a UNIX system:

pmcmd start {user_name | %user_env_var} {password | %password_env_var} [hostname:]portno [folder_name:]{session_name | batch_name} [:pf=param_file] session_flag wait_flag

Use the following syntax to stop a session or batch on a UNIX system:

pmcmd stop {user_name | %user_env_var} {password | %password_env_var} [hostname:]portno[folder_name:]{session_name | batch_name} session_flag

Use the following syntax to stop the Informatica Server on a UNIX system:

pmcmd stopserver {user_name | %user_env_var} {password | %password_env_var} [hostname:]portno

I have the Administer Repository Privilege, but I cannot access a repository using the Repository Manager.

To perform administration tasks in the Repository Manager with the Administer Repository privilege, you must also have the default privilege Browse Repository. You can assign Browse Repository directly to a user login, or you can inherit Browse Repository from a group.

My privileges indicate I should be able to edit objects in the repository, but I cannot edit any metadata.

You may be working in a folder with restrictive permissions. Check the folder permissions to see if you belong to a group whose privileges are restricted by the folder owner.

How does read permission affect the use of the command line program, pmcmd?

To use pmcmd, you do not need to view a folder before starting a session or batch within the folder. Therefore, you do not need read permission to start sessions or batches with pmcmd. You must, however, know the exact name of the session or batch and the folder in which it exists.

With pmcmd, you can start any session or batch in the repository if you have the Session Operator privilege or execute permission on the folder.

I do not want a user group to create or edit sessions and batches, but I need them to access the Server Manager to stop the Informatica Server.

To permit a user to access the Server Manager to stop the Informatica Server, you must grant them both the Create Sessions and Batches, and Administer Server privileges. To restrict the user from creating or editing sessions and batches, you must restrict the user's write permissions on a folder level.

Alternatively, the user can use pmcmd to stop the Informatica Server with the Administer Server privilege alone.

I created a new group and removed the Browse Repository privilege from the group. Why does every user in the group still have that privilege?

Privileges granted to individual users take precedence over any group restrictions. Browse Repository is a default privilege granted to all new users and groups. Therefore, to remove the privilege from users in a group, you must remove the privilege from the group, and every user in the group.

After creating users and user groups, and granting different sets of privileges, I find that none of the repository users can perform certain tasks, even the Administrator.

Repository privileges are limited by the database privileges granted to the database user who created the repository. If the database user (one of the default users created in the Administrators group) does not have full database privileges in the repository database, you need to edit the database user to allow all privileges in the database.

What are the different types of locks?

There are five kinds of locks on repository objects:

Read lock. Created when you open a repository object in a folder for which you do not have write permission. Also created when you open an object with an existing write lock.
Write lock. Created when you create or edit a repository object in a folder for which you have write permission.
Execute lock. Created when you start a session or batch, or when the Informatica Server starts a scheduled session or batch.
Fetch lock. Created when the repository reads information about repository objects from the database.
Save lock. Created when you save information to the repository.

What is Local Repository?

Each local repository in the domain can connect to the global repository and use objects in its shared folders. A folder in a local repository can be copied to other local repositories while keeping all local and global shortcuts intact.

What is a Global repository?

The centralized repository in a domain, a group of connected repositories. Each domain can contain one global repository. The global repository can contain common objects to be shared throughout the domain through global shortcuts. Once created, you cannot change a global repository to a local repository. You can promote an existing local repository to a global repository.

When should you create the dynamic data store? Do you need a DDS at all?

To decide whether you should create a dynamic data store (DDS), consider the following issues:

How much data do you need to store in the DDS? The one principal advantage of data marts is the selectivity of information included in it. Instead of a copy of everything potentially relevant from the OLTP database and flat files, data marts contain only the information needed to answer specific questions for a specific audience (for example, sales performance data used by the sales division). A dynamic data store is a hybrid of the galactic warehouse and the individual data mart, since it includes all the data needed for all the data marts it supplies. If the dynamic data store contains nearly as much information as the OLTP source, you might not need the intermediate step of the dynamic data store. However, if the dynamic data store includes substantially less than all the data in the source databases and flat files, you should consider creating a DDS staging area.
What kind of standards do you need to enforce in your data marts? Creating a DDS is an important technique in enforcing standards. If data marts depend on the DDS for information, you can provide that data in the range and format you want everyone to use. For example, if you want all data marts to include the same information on customers, you can put all the data needed for this standard customer profile in the DDS. Any data mart that reads customer data from the DDS should include all the information in this profile.
How often do you update the contents of the DDS? If you plan to frequently update data in data marts, you need to update the contents of the DDS at least as often as you update the individual data marts that the DDS feeds. You may find it easier to read data directly from source databases and flat file systems if it becomes burdensome to update the DDS fast enough to keep up with the needs of individual data marts. Or, if particular data marts need updates significantly faster than others, you can bypass the DDS for these fast update data marts.
Is the data in the DDS simply a copy of data from source systems, or do you plan to reformat this information before storing it in the DDS? One advantage of the dynamic data store is that, if you plan on reformatting information in the same fashion for several data marts, you only need to format it once for the dynamic data store. Part of this question is whether you keep the data normalized when you copy it to the DDS.
How often do you need to join data from different systems? On occasion, you may need to join records queried from different databases or read from different flat file systems. The more frequently you need to perform this type of heterogeneous join, the more advantageous it would be to perform all such joins within the DDS, then make the results available to all data marts that use the DDS as a source.

What is Dynamic Data Store?

The need to share data is just as pressing as the need to share metadata. Often, several data marts in the same organization need the same information. For example, several data marts may need to read the same product data from operational sources, perform the same profitability calculations, and format this information to make it easy to review.

If each data mart reads, transforms, and writes this product data separately, the throughput for the entire organization is lower than it could be. A more efficient approach would be to read, transform, and write the data to one central data store shared by all data marts. Transformation is a processing-intensive task, so performing the profitability calculations once saves time.

Therefore, this kind of dynamic data store (DDS) improves throughput at the level of the entire organization, including all data marts. To improve performance further, you might want to capture incremental changes to sources. For example, rather than reading all the product data each time you update the DDS, you can improve performance by capturing only the inserts, deletes, and updates that have occurred in the PRODUCTS table since the last time you updated the DDS.

The DDS has one additional advantage beyond performance: when you move data into the DDS, you can format it in a standard fashion. For example, you can prune sensitive employee data that should not be stored in any data mart. Or you can display date and time values in a standard format. You can perform these and other data cleansing tasks when you move data into the DDS instead of performing them repeatedly in separate data marts.

What are Target definitions?

Detailed descriptions for database objects, flat files, Cobol files, or XML files to receive transformed data. During a session, the Informatica Server writes the resulting data to session targets. Use the Warehouse Designer tool in the Designer to import or create target definitions.

What are Source definitions?

Detailed descriptions of database objects (tables, views, synonyms), flat files, XML files, or Cobol files that provide source data. For example, a source definition might be the complete structure of the EMPLOYEES table, including the table name, column names and datatypes, and any constraints applied to these columns, such as NOT NULL or PRIMARY KEY. Use the Source Analyzer tool in the Designer to import and create source definitions.

What are Shortcuts?

We can create shortcuts to objects in shared folders. Shortcuts provide the easiest way to reuse objects. We use a shortcut as if it were the actual object, and when we make a change to the original object, all shortcuts inherit the change.

Shortcuts to folders in the same repository are known as local shortcuts. Shortcuts to the global repository are called global shortcuts.

We use the Designer to create shortcuts.

What are Sessions and Batches?

Sessions and batches store information about how and when the Informatica Server moves data through mappings. You create a session for each mapping you want to run. You can group several sessions together in a batch. Use the Server Manager to create sessions and batches.

What are Reusable transformations?

You can design a transformation to be reused in multiple mappings within a folder, a repository, or a domain. Rather than recreate the same transformation each time, you can make the transformation reusable, then add instances of the transformation to individual mappings. Use the Transformation Developer tool in the Designer to create reusable transformations.

What are Transformations?

A transformation generates, modifies, or passes data through ports that you connect in a mapping or mapplet. When you build a mapping, you add transformations and configure them to handle data according to your business purpose. Use the Transformation Developer tool in the Designer to create transformations.

What are mapplets?

You can design a mapplet to contain sets of transformation logic to be reused in multiple mappings within a folder, a repository, or a domain. Rather than recreate the same set of transformations each time, you can create a mapplet containing the transformations, then add instances of the mapplet to individual mappings. Use the Mapplet Designer tool in the Designer to create mapplets.

What are mappings?

A mapping specifies how to move and transform data from sources to targets. Mappings include source and target definitions and transformations. Transformations describe how the Informatica Server transforms data. Mappings can also include shortcuts, reusable transformations, and mapplets. Use the Mapping Designer tool in the Designer to create mappings.

What are folders?

Folders let you organize your work in the repository, providing a way to separate different types of metadata or different projects into easily identifiable areas.

What is a metadata?

Designing a data mart involves writing and storing a complex set of instructions. You need to know where to get data (sources), how to change it, and where to write the information (targets). PowerMart and PowerCenter call this set of instructions metadata. Each piece of metadata (for example, the description of a source table in an operational database) can contain comments about it.

In summary, Metadata can include information such as mappings describing how to transform source data, sessions indicating when you want the Informatica Server to perform the transformations, and connect strings for sources and targets.

What are different kinds of repository objects? And what it will contain?

Repository objects displayed in the Navigator can include sources, targets, transformations, mappings, mapplets, shortcuts, sessions, batches, and session logs.

What is a repository?

The Informatica repository is a relational database that stores information, or metadata, used by the Informatica Server and Client tools. The repository also stores administrative information such as usernames and passwords, permissions and privileges, and product version.

We create and maintain the repository with the Repository Manager Client tool. With the Repository Manager, we can also create folders to organize metadata and groups to organize users.

What are the new features and enhancements in PowerCenter 5.1?

The major features and enhancements to PowerCenter 5.1 are:

a) Performance Enhancements

High precision decimal arithmetic. The Informatica Server optimizes data throughput to increase performance of sessions using the Enable Decimal Arithmetic option.
To_Decimal and Aggregate functions. The Informatica Server uses improved algorithms to increase performance of To_Decimal and all aggregate functions such as percentile, median, and average.
Cache management. The Informatica Server uses better cache management to increase performance of Aggregator, Joiner, Lookup, and Rank transformations.
Partition sessions with sorted aggregation. You can partition sessions with Aggregator transformation that use sorted input. This improves memory usage and increases performance of sessions that have sorted data.

b) Relaxed Data Code Page Validation

When enabled, the Informatica Client and Informatica Server lift code page selection and validation restrictions. You can select any supported code page for source, target, lookup, and stored procedure data.

c) Designer Features and Enhancements

Debug mapplets. You can debug a mapplet within a mapping in the Mapping Designer. You can set breakpoints in transformations in the mapplet.
Support for slash character (/) in table and field names. You can use the Designer to import source and target definitions with table and field names containing the slash character (/). This allows you to import SAP BW source definitions by connecting directly to the underlying database tables.

d) Server Manager Features and Enhancements

Continuous sessions. You can schedule a session to run continuously. A continuous session starts automatically when the Load Manager starts. When the session stops, it restarts immediately without rescheduling. Use continuous sessions when reading real time sources, such as IBM MQSeries.
Partition sessions with sorted aggregators. You can partition sessions with sorted aggregators in a mapping.
Register multiple servers against a local repository. You can register multiple PowerCenter Servers against a local repository.

What is the difference between PowerCenter and PowerMart?

With PowerCenter, you receive all product functionality, including the ability to register multiple servers, share metadata across repositories, and partition data.

A PowerCenter license lets you create a single repository that you can configure as a global repository, the core component of a data warehouse.

PowerMart includes all features except distributed metadata, multiple registered servers, and data partitioning. Also, the various options available with PowerCenter (such as PowerCenter Integration Server for BW, PowerConnect for IBM DB2, PowerConnect for IBM MQSeries, PowerConnect for SAP R/3, PowerConnect for Siebel, and PowerConnect for PeopleSoft) are not available with PowerMart.

Why do we need SQL overrides in Lookup transformations?

In order to lookup more than one value from one table, we go for SQL overrides in Lookups.

What is session?

A session is a set of instructions to move data from sources to targets.

What is workflow?

A workflow is a set of instructions that tells the Informatica server how to execute the tasks.

What is worklet?

Worklet is an object that represents a set of tasks.

Which ETL tool is more preferable Informatica or Data Stage and why?

Preference of an ETL tool depends on affordability and functionality. It is mostly a tradeoff between the price and feature. While Informatica has been a market leader since the past many years, DataStage is beginning to pick up momentum.

What is a mapplet?

Mapplet is the set of reusable transformation.

What is the use of auxiliary mapping?

Auxiliary mapping reflects change in one table whenever there is a change in the other table.

What is authenticator?

It validates user name and password to access the PowerCenter repository.

How do we create primary key only on odd numbers?

To create primary key, we use sequence generator and set the 'Increment by' property of sequence generator to 2

What is the difference between source qualifier transformation and application source qualifier transformation?

Source qualifier transformation extracts data from RDBMS or from a single flat file system. Application source qualifier transformation extracts data from application sources like ERP.

What are the types of loading in Informatica?

There are two types of loading, normal loading and bulk loading. In normal loading, it loads record by record and writes log for that. It takes comparatively a longer time to load data to the target in normal loading. But in bulk loading, it loads number of records at a time to target database. It takes less time to load data to target.

What is the use of control break statements?

They execute a set of codes within the loop and endloop.

What is the difference between active transformation and passive transformation?

An active transformation can change the number of rows that pass through it, but a passive transformation can not change the number of rows that pass through it.

What are the various types of transformation?

Various types of transformation are: Aggregator Transformation, Expression Transformation, Filter Transformation, Joiner Transformation, Lookup Transformation, Normalizer Transformation, Rank Transformation, Router Transformation, Sequence Generator Transformation, Stored Procedure Transformation, Sorter Transformation, Update Strategy Transformation, XML Source Qualifier Transformation, Advanced External Procedure Transformation, External Transformation.

What is the use of tracing levels in transformation?

Tracing levels store information about mapping and transformations.

What is the difference between a Database and a Datawarehouse?

Database is a place where data is taken as base to data access to retrieve and load data, whereas, a data warehouse is a place where application data is managed for analysis and reporting services. Database stores data in the form of tables and columns. On the contrary, in a data warehouse, data is subject oriented and stored in the form of dimensions and packages which are used for analysis purpose. In short, we must understand that a database is used for running an enterprise but a data warehouse helps in how to run an enterprise.

What are the different types of OLAP TECHNOLOGY?

Online Analytical process is of three types, they are MOLAP, HOLAP and ROLAP. MOLAP Mulidimensional online analytical process. It is used for fast retrival of data and also for slicing and dicing operations. It plays a vital role in easing complex calculations. ROLAP Relational online analytical process. It has the ability to handle large amount of data. HOLAP Hybrid online analytical process. It is a combination of both HOLAP and MOLAP.

What is Data Modeling? What are the different types of Data Modeling?

Data modeling is a process of creating data models. In other words, it is structuring and organizing data in a uniform manner where constraints are placed within the structure. The Data structure formed are maintained in a database management system. The Different types of Data Modeling are:

1. Dimension Modelling

2. E-R Modelling

What is the need of building a data warehouse?

The need of building a data warehouse is that, it acts as a storage fill for a large amount of data. It also provides end user access to a wide varity of data, helps in analyzing data more effectively and also in generating reports. It acts as a huge repository for integrated information.

What is drill-down and drill-up?

Both drill-down and drill-up are used to explore different levels of dimensionally modeled data. Drill-down allows the users view lower level (i.e. more detailed level) of data and drill-up allows the users to view higher level (i.e. more summarized level) of data.

What is cube?

Cube is a multidimensional representation of data. It is used for analysis purpose. A cube gives multiple views of data.

What is a Data warehouse?

A Data warehouse is a denormalized database, which stores historical data in summary level format. It is specifically meant for heavy duty querying and analysis.

How to delete duplicate row for FLAT FILE sources?

Since source is a flat file DISTINCT option in source qualifier is disabled.

Use Sorter
Use Aggregator. In Aggregator you get the option to choose the first or last of the duplicate value to consider.
Using Dynamic Lookup cache of the target table and implementing Insert else Update logic.

Suppose we have some serial numbers in a flat file source. We want to load the serial numbers in two target files one containing the EVEN serial numbers and the other file having the ODD ones.

Use Router transformation with the condition:

Group Name Condition

EVEN MOD(Serial_No,2)=0

ODD MOD(Serial_No,2)=1

Suppose in our Source Table we have data as given below (Transpose Data):

We want to load our Target Table as:

Name	Month	Transportation	House Rent	Food
Sam	Jan	200	1500	500
John	Jan	300	1200	300
Tom	Jan	300	1350	350
Sam	Feb	300	1550	450
John	Feb	350	1200	290
Tom	Feb	350	1400	350

We want to load our Target Table as:

Name	Month	Expense Type	Expense
Sam	Jan	Transport	200
Sam	Jan	House rent	1500
Sam	Jan	Food	500
John	Jan	Transport	300
John	Jan	House rent	1200
John	Jan	Food	300
Tom	Jan	Transport	300
Tom	Jan	House rent	1350
Tom	Jan	Food	350

Describe your approach.

Here to convert the Rows to Columns we have to use the Normalizer Transformation followed by an Expression Transformation to Decode the column taken into consideration.

A Normalizer is an Active transformation that returns multiple rows from a source row, it returns duplicate data for single-occurring source columns. The Normalizer transformation parses multiple-occurring columns from COBOL sources, relational tables, or other sources. Normalizer can be used to transpose the data in columns to rows.

Normalizer effectively does the opposite of what Aggregator does!

In the Ports tab of the Normalizer the ports will be created automatically as configured in the Normalizer tab.

Interestingly we will observe two new columns namely,

GK_EXPENSEHEAD

GCID_EXPENSEHEAD

GK field generates sequence number starting from the value as defined in Sequence field while GCID holds the value of the occurence field i.e. the column no of the input Expense head.

Here 1 is for FOOD, 2 is for HOUSERENT and 3 is for TRANSPORTATION.

Now the GCID will give which expense corresponds to which field while converting columns to rows.

Below is the screen-shot of the expression to handle this GCID efficiently:

Name the transformations which converts one to many rows i.e increases the i/p:o/p row count. Also what is the name of its reverse transformation.

Normalizer as well as Router Transformations are the Active transformation which can increase the number of input rows to output rows.

Aggregator Transformation is the active transformation that performs the reverse action.

Suppose we have a source table and we want to load three target tables based on source rows such that first row moves to first target table, secord row in second target table, third row in third target table, fourth row again in first target table so on and so forth. Describe your approach.

We can clearly understand that we need a Router transformation to route or filter source data to the three target tables. Now the question is what will be the filter conditions. First of all we need an Expression Transformation where we have all the source table columns and along with that we have another i/o port say seq_num, which is gets sequence numbers for each source row from the port NextVal of a Sequence Generator start value 0 and increment by 1. Now the filter condition for the three router groups will be:

MOD(SEQ_NUM,3)=1 connected to 1st target table, MOD(SEQ_NUM,3)=2 connected to 2nd target table, MOD(SEQ_NUM,3)=0 connected to 3rd target table.

Suppose we have ten source flat files of same structure. How can we load all the files in target database in a single batch run using a single mapping.

After we create a mapping to load data in target database from flat files, next we move on to the session property of the Source Qualifier. To load a set of source files we need to create a file say final.txt containing the source falt file names, ten files in our case and set the Source filetype option as Indirect. Next point this flat file final.txt fully qualified through Source file directory and Source filename.

How can we implement Aggregation operation without using an Aggregator Transformation in Informatica.

We will use the very basic concept of the Expression Transformation that at a time we can access the previous row data as well as the currently processed data in an expression transformation. What we need is simple Sorter, Expression and Filter transformation to achieve aggregation at Informatica level.

Suppose in our Source Table we have data as given below:

Student Name	Subject Name	Marks
Sam	Maths	100
Tom	Maths	80
Sam	Physical Science	80
John	Maths	75
Sam	Life Science	70
John	Life Science	100
John	Physical Science	85
Tom	Life Science	100
Tom	Physical Science	85

We want to load our Target Table as:

Student Name	Maths	Life Science	Physical Science
Sam	100	70	80
John	75	100	85
Tom	80	100	85

Describe your approach.

Here our scenario is to convert many rows to one rows, and the transformation which will help us to achieve this is Aggregator.

We will sort the source data based on STUDENT_NAME ascending followed by SUBJECT ascending.

Now based on STUDENT_NAME in GROUP BY clause the following output subject columns are populated as

MATHS: MAX(MARKS, SUBJECT='Maths')

LIFE_SC: MAX(MARKS, SUBJECT='Life Science')

PHY_SC: MAX(MARKS, SUBJECT='Physical Science')

What is a Source Qualifier? What are the tasks we can perform using a SQ and why it is an ACTIVE transformation?

A Source Qualifier is an Active and Connected Informatica transformation that reads the rows from a relational database or flat file source.

We can configure the SQ to join [Both INNER as well as OUTER JOIN] data originating from the same source database.
We can use a source filter to reduce the number of rows the Integration Service queries.
We can specify a number for sorted ports and the Integration Service adds an ORDER BY clause to the default SQL query.
We can choose Select Distinct option for relational databases and the Integration Service adds a SELECT DISTINCT clause to the default SQL query.
Also we can write Custom/Used Defined SQL query which will override the default query in the SQ by changing the default settings of the transformation properties.
Aslo we have the option to write Pre as well as Post SQL statements to be executed before and after the SQ query in the source database.

Since the transformation provides us with the property Select Distinct , when the Integration Service adds a SELECT DISTINCT clause to the default SQL query, which in turn affects the number of rows returned by the Database to the Integration Service and hence it is an Active transformation.

What happens to a mapping if we alter the datatypes between Source and its corresponding Source Qualifier?

The Source Qualifier transformation displays the transformation datatypes. The transformation datatypes determine how the source database binds data when the Integration Service reads it.

Now if we alter the datatypes in the Source Qualifier transformation or the datatypes in the source definition and Source Qualifier transformation do not match, the Designer marks the mapping as invalid when we save it.

Suppose we have used the Select Distinct and the Number Of Sorted Ports property in the SQ and then we add Custom SQL Query. Explain what will happen.

Whenever we add Custom SQL or SQL override query it overrides the User-Defined Join, Source Filter, Number of Sorted Ports, and Select Distinct settings in the Source Qualifier transformation. Hence only the user defined SQL Query will be fired in the database and all the other options will be ignored.

Describe the situations where we will use the Source Filter, Select Distinct and Number Of Sorted Ports properties of Source Qualifier transformation.

Source Filter option is used basically to reduce the number of rows the Integration Service queries so as to improve performance.
Select Distinct option is used when we want the Integration Service to select unique values from a source, filtering out unnecessary data earlier in the data flow, which might improve performance.
Number Of Sorted Ports option is used when we want the source data to be in a sorted fashion so as to use the same in some following transformations like Aggregator or Joiner, those when configured for sorted input will improve the performance.

What will happen if the SELECT list COLUMNS in the Custom override SQL Query and the OUTPUT PORTS order in SQ transformation do not match?

Mismatch or Changing the order of the list of selected columns to that of the connected transformation output ports may result is session failure.

What happens if in the Source Filter property of SQ transformation we include keyword WHERE say, WHERE CUSTOMERS.CUSTOMER_ID > 1000.

We use source filter to reduce the number of source records. If we include the string WHERE in the source filter, the Integration Service fails the session.

Describe the scenarios where we go for Joiner transformation instead of Source Qualifier transformation.

While joining Source Data of heterogeneous sources as well as to join flat files we will use the Joiner transformation.

Use the Joiner transformation when we need to join the following types of sources:

Join data from different Relational Databases.

Join data from different Flat Files.

Join relational sources and flat files.

What is the maximum number we can use in Number Of Sorted Ports for Sybase source system.

Sybase supports a maximum of 16 columns in an ORDER BY clause. So if the source is Sybase, do not sort more than 16 columns.

Suppose we have two Source Qualifier transformations SQ1 and SQ2 connected to Target tables TGT1 and TGT2 respectively. How do you ensure TGT2 is loaded after TGT1?

If we have multiple Source Qualifier transformations connected to multiple targets, we can designate the order in which the Integration Service loads data into the targets.

In the Mapping Designer, We need to configure the Target Load Plan based on the Source Qualifier transformations in a mapping to specify the required loading order.

Suppose we have a Source Qualifier transformation that populates two target tables. How do you ensure TGT2 is loaded after TGT1?

In the Workflow Manager, we can Configure Constraint based load ordering for a session. The Integration Service orders the target load on a row-by-row basis. For every row generated by an active source, the Integration Service loads the corresponding transformed row first to the primary key table, then to the foreign key table.

Hence if we have one Source Qualifier transformation that provides data for multiple target tables having primary and foreign key relationships, we will go for Constraint based load ordering.

DWBI

Friday, 18 November 2011

Basic Shell Scripting

Sample Shell Script 2: Find the nzreclaim candidate for any database

Thursday, 17 November 2011

Sample Shell Script 1: Sum all the argument passed to the script

Friday, 11 November 2011

Informatica Interview Question: Part 1

About Me

Blog Archive