Informatica SQ returns different result - db2

I am trying to pull data from DB2 via informatica, I have a SQ query that pulls few fields based on joins for 4 different tables.
When I run the query directly in the database, it returns the expected result, however when I run it in informatica and run a debugger, I see something else.
Please note all the columns data perfectly match, except one single column.
Weird thing is, this is a calculated field from the table based on a case statement:
CASE WHEN Column1='3' THEN 'N' ELSE 'Y' END.
Since this is a calculated field with a length of one string, I have connected from the source to SQ from one of the sources having 1 character length.
This returns 'Y' when executed in the database, the same query when I copy paste in SQ of information and run it, I get a data 'E', and this data can never be possible as I expect only a N or a Y. I have verified the column order, that its in the right place. This is very strange, is something going wrong because of the CASE Statement?

Save yourself the hassle, put an expression transformation after tge source qualifier and calculate, port value there then forget about it

I think i got the issue. We use Informatica PowerExchange to connect to a as400 system(DB2), and it seems that when we are trying to set a flag information in AS400, and pass it to informatica via PowerExchange, it converts it to binary, and to solve this, there needs to be an entry in the PowerExchange configuration file.
Unfortunately, i myself was not aware that it could be related to PowerExchange instead of powercenter itself.!!
Thanks for your assistance! Below is the KB about it.
https://kb.informatica.com/solution/4/Pages/17498.aspx

Related

Oracle DB link - where clause evaluation

i have a DB2 data source and an Oracle 12c target.
The Oracle has a DB link to the DB2 defined which is working in general.
Now i have a huge table in the DB2 which has a timestamp column (lets call it ROW_CHANGED) for row changes. I want to retrieve rows which have changed after a particular time.
Running
SELECT * FROM lib.tbl WHERE ROW_CHANGED >'2016-08-01 10:00:00'
on the DB2 returns exactly 1 row after ca. 90 secs which is fine.
Now i try the same query from the Oracle via the db link:
SELECT * FROM lib.tbl#dblink_name WHERE ROW_CHANGED >TO_TIMESTAMP('2016-08-01 10:00:00')
This runs for hours and ends up in a timeout.
I read some Oracle docs and found distributed query optimization tips but most of them refer to joining a local to a remote table which is not my case.
In my desperation, i have tried the DRIVING_SITE hint, without effect.
Now i wonder when the WHERE part of the query will be evaluated. Since i have to use Oracle syntax and not DB2 syntax for the query, is it possible the Oracle will try to first copy the full table and apply the where clause afterwards? I did some research but did not find anything which would help me in this direction.
The ROW_CHANGED is a hidden column in the DB2, if that matters.
Thx for any hint in advance.
Update
Thanks#all for help. I'll share what did the trick for me.
First of all i have used TO_TIMESTAMP since the DB2 column is also Timestamp (not date) and i had expected to circumvent implicit conversions by this.
Without the explicit conversion i ran into ORA-28534: Heterogeneous Services preprocessing error and i have no hope of touching the DB config within reasonable time.
The explain plan btw did not bring much. It showed a FULL hint and no conversion on the predicates. Indeed it showed the ROW_CHANGED column as Date, i wonder why.
I have tried Justins suggestion to use a bind variable, however i got ORA-28534 again. Next thing i did was to wrap it into a pl/sql block (will run in a SP anyway later).
declare
v_tmstmp TIMESTAMP := 01.08.16 10:00:00;
begin
INSERT INTO ORAUSER.TMP_TBL (SRC_PK,ROW_CHANGED)
SELECT SRC_PK,ROW_CHANGED
FROM lib.tbl#dblink_name
WHERE ROW_CHANGED > v_tmstmp;
end;
This was executing in the same time as in DB2 itself. The date format is DD.MM.YY here since it is the default unfortunately.
When changing the variable assignment to
v_tmstmp TIMESTAMP := TO_TIMESTAMP('01.08.16 10:00:00','DD.MM.YY HH24:MI:SS');
I got the same problem as before.
Meanwhile the DB2 operators have created an index in the ROW_CHANGED column which i requested earlier that day. This has solved the problem in general it seems. Even my original query finishes in no time now.
If you are actually using an Oracle-specific conversion function like to_timestamp, that forces the predicate to be evaluated on the Oracle side. Oracle isn't going to know how to convert a built-in function like to_timestamp into an exactly equivalent function call in DB2.
If you used a bind variable, that would be more likely to get evaluated on the DB2 side. But that may be complicated by the data type mapping between different databases-- there may not be a perfect mapping between one engine's date and another engine's timestamp data type. If this was a numeric column, a bind variable would be almost certain to get pushed. In this case, it probably involves playing around a bit to figure out exactly what data type to use for your variable that works for your framework, Oracle, and DB2.
If using a bind variable doesn't work, you can force the predicate to be evaluated on the remote server using the dbms_hs_passthrough package. That lets you send a query verbatim to the remote server which allows you to do things like use functions defined in your DB2 database. That's a bit of overkill in this situation, hopefully, but it's nice to have the hammer as your backup if the simpler solution doesn't work quickly enough.

How to assign csv field value to SQL query written inside table input step in Pentaho Spoon

I am pretty new to Pentaho so my query might sound very novice.
I have written a transformation in which am using CSV file input step and table input step.
Steps I followed:
Initially, I created a parameter in transformation properties. The
parameter birthdate doesn't have any default value set.
I have used this parameter in postgresql query in table input step
in the following manner:
select * from person where EXTRACT(YEAR FROM birthdate) > ${birthdate};
I am reading the CSV file using CSV file input step. How do I assign the birthdate value which is present in my CSV file to the parameter which I created in the transformation?
(OR)
Could you guide me the process of assigning the CSV field value directly to the SQL query used in the table input step without the use of a parameter?
TLDR;
I recommend using a "database join" step like in my third suggestion below.
See the last image for reference
First idea - Using Table Input as originally asked
Well, you don't need any parameter for that, unless you are going to provide the value for that parameter when asking the transformation to run. If you need to read data from a CSV you can do that with this approach.
First, read your CSV and make sure your rows are ok.
After that, use a select values to keep only the columns to be used as parameters.
In the table input, use a placeholder (?) to determine where to place the data and ask it to run for each row that it receives from the source step.
Just keep in ming that the order of columns received by the table input (the columns out of the select values) is the same order that it will be used for the placeholders (?). This should not be a problem with your question that uses only one placeholder, but keep that in mind as you ramp up using Pentaho.
Second idea, using a Database Lookup
This is another approach where you can't personalize the query made to the database and may experience a better performance because you can set a "Enable cache" flag and if you don't need to use a function on your where clause this is really recommended.
Third idea, using a Database Join
That is my recommended approach if you need a function on your where clause. It looks a lot like the Table Input approach but you can skip the select values step and select what columns to use, repeat the same column a bunch of times and enable a "outer join" flag that returns the rows without result from the query
ProTip: If you feel the transformation running too slow, try to use multiple copies from the step (documentation here) and obviously make sure the table have the appropriate indexes in place.
Yes there's a way of assigning directly without the use of parameter. Do as follows.
Use Block this step until steps finish to halt the table input step till csv input step completes.
Following is how you configure each step.
Note:
Postgres query should be select * from person where EXTRACT(YEAR
FROM birthdate) > ?::integer
Check Execute for each row and Replace variables in in Table input step.
Select only the birthday column in CSV input step.

Talend tMap Set Default Value for Rejected Inner Joins and connect them with the main data flow

i've got the following problem.
I have several tMaps, each has a lookup and at the end all the data is written in a db. The following mockup shall illustrate it:
There can be values in the main data stream which are not found in the lookup tables. For this values there is a rejected path which catches them from the specific tMap.
Requirements:
In case of a rejected inner join the looked up value shall be set to a default value (for example 0, which could be done in the schema of the tMap) and after that these "corrected" records should be added to the "normal" main data flow and process the next lookup.
The tUnite component is not able to handle this cases because it can not exist in a data flow loop.
Does anybody got an idea how to solve this problem?
Cheers.
The answer was so easy that i didn't got it in the first conception. I just have to change the join model from inner to left-join so all the formal rejected values will have a null value in it. Afterwards i can check the columns in the tmap and set them on a default value if they are null.
row1.id == null ? 0 : row1.id
Cheers.
If I understand correctly what you are trying to accomplish you will have to have staging files or staging tables on the database. Once you get the rejected rows, write them on a file or table. The accepted files will go also to a staging table(different than the rejected). Then you can union both tables or files by reading them. The key point is having a staging structure. I attach a picture what how would it be. In the picture the staging structure is a mysql table.
Let me know if it helps!

boolean field in redshift copy

I am producing a comma-separated file in S3 that needs to be copied to a staging table in a redshift database using the postgres COPY command.
It has one boolean field. With every sensible way I can think of to represent the boolean value in the file, redshift copy complains, usually with "Unknown boolean format".
I'm going to give up and change the staging table field to a smallint so that I can proceed with the copy and translate the value on the load from staging to the final redshift table, but I'm curious if anyone knows the correct incantation.
A zero or one works just fine for us.
Check your loads carefully, it may well be another issue that's 'pushing' invalid data into your boolean column.
For instance, we had all kinds of crazy characters embedded in our data that would cause errors like that. I eventually settled on using the US character for the record separator.
Check to make sure you're excluding the headers during the COPY command.
I ran into the same problem, but adding the ignoreheader 1 option (ignores 1 header line during import) solved the issue.

How to use Pentaho Data Integration to copy columns between tables

I thought this would be an easy task, but since I am new to PDI, I could not
find out so far which transform to choose to accomplish the following:
I am using Pentaho Data Integration (former Kettle), Community Edition, to map/copy values from one table ('tasksA') of one database 'A' to another table
'tasksB' in another database B. tasksA has a column 'description' and I want
to copy these values to the column 'taskName' in 'tasksB'.
Furthermore, I have to copy each value of 'description' several times, since
in 'tasksB', there are multiple lines for each value in 'taskName'.
Maybe this would be possible by direct SQL, but I wanted to try whether
I can define this more readable with PDI, especially because in the next step I will have to extend it to other tables involved.
So I have to tell which value of
'description' has to be mapped onto which value of 'taskName' and that in
every tuple containing this value (well, sounds like a WHERE clause...) in the column 'taskName' it should be replaced.
My first experiments with the 'Table input' and 'Table output' steps
did not work when I simply drew a hop between them and modifying the 'database
fields' tab of the 'Table output' step, which generated 'drop column' statements
in the resulting SQL which is not what I want. I don't want to modify the schema, just copy the values.
Would be great if someone could point me to the right steps/transforms needed,
I worked through the first examples from the Pentaho Wiki and have got the 'Pentaho Kettle Solutions' book of Casters et al. but could find out how
to do solve this. Many thanks in advance for any help.
If I got this right, you should use the Table Input connected to a "Insert/Update" step.
On the Insert/Update step you need to inform the keys from tasksA where should be looked up on tasksB. Then define which fields on tasksB should be updated: description (as stream field) -> taskName (as the table field).
Keep in mind that if this key is not found, a row will be inserted on tasksB. If it is not what you plan, you'll need to build something like: Table Input -> Database Lookup -> Filter Rows -> Insert/Update
#RFVoltolini has a good answer. Alternatively you could go
Table Input -> Update
And connect the error output to something else like a Text file output.