Datastage multiple parametric (conditionned) query execution - db2

I would like to create a job than based on some values in Table A, execute a Select query in Table B where the WHERE CONDITION must be parametric.
For example: I have 10 columns in A with 100 rows filled. 9 of my columns can be nullable so I have to create a query that controls the nullability of a value, if null then it must NOT be considered a research criteria in the Select statement.
I thought about using a SPARSE lookup where I'd pass a string that I created with the concatenation of the research parameters if they're not null but the job fails because you need to map the columns.
I even created a file with queries as string and then I loop the file and pass the string as a variable for the DB2 connector stage. It works... but I have more than 10000 rows means 10000 queries.. not that fast.
Thanks for your help.
PS: I'm new to this stuff :D

what you can do is to use Before SQL option at your source/target stage. Namely, your job will have at least two stages. One source db2 stage and one copy or sequential or peek stage as target or Row generator and target db2 connector.
In your input db2 connector you can pass your sql script as parameter into before sql provided that it is generated in advance and pass it as value to your before sql of db2 connector. Your actual sql statement will use "dummy" script such as "select current date from sysibm.sysdummy1" to complete your execution.
Hope it makes sense.

Related

PostgreSQL, allow to filter by not existing fields

I'm using a PostgreSQL with a Go driver. Sometimes I need to query not existing fields, just to check - maybe something exists in a DB. Before querying I can't tell whether that field exists. Example:
where size=10 or length=10
By default I get an error column "length" does not exist, however, the size column could exist and I could get some results.
Is it possible to handle such cases to return what is possible?
EDIT:
Yes, I could get all the existing columns first. But the initial queries can be rather complex and not created by me directly, I can only modify them.
That means the query can be simple like the previous example and can be much more complex like this:
WHERE size=10 OR (length=10 AND n='example') OR (c BETWEEN 1 and 5 AND p='Mars')
If missing columns are length and c - does that mean I have to parse the SQL, split it by OR (or other operators), check every part of the query, then remove any part with missing columns - and in the end to generate a new SQL query?
Any easier way?
I would try to check within information schema first
"select column_name from INFORMATION_SCHEMA.COLUMNS where table_name ='table_name';"
And then based on result do query
Why don't you get a list of columns that are in the table first? Like this
select column_name
from information_schema.columns
where table_name = 'table_name' and (column_name = 'size' or column_name = 'length');
The result will be the columns that exist.
There is no way to do what you want, except for constructing an SQL string from the list of available columns, which can be got by querying information_schema.columns.
SQL statements are parsed before they are executed, and there is no conditional compilation or no short-circuiting, so you get an error if a non-existing column is referenced.

Snowflake : Unsupported subquery type cannot be evaluated

I am using snowflake as a data warehouse. I have a CSV file at AWS S3. I am writing a merge sql to merge data received in CSV to the table in snowflake. I have a column in time dimension table with data type as Number(38,0) data type in SF. This table holds all dates time, one e.g. is of column
time_id= 232 and time=12:00
In CSV I am getting a column with the label as time and value as 12:00.
In merge sql I am fetching this value and trying to get time_id for it.
update table_name set start_time_dim_id = (select time_id from time_dim t where t.time_name = csv_data.start_time_dim_id)
On this statement I am getting this error "SQL compilation error: Unsupported subquery type cannot be evaluated"
I am struggling to solve it, during this I google for it and got one reference for it
https://github.com/snowflakedb/snowflake-connector-python/issues/251
So want to make sure if anyone have encountered this issue? If yes, will appreciate pointers over it.
It seems like a conversion issue. I suggest you to check the data in CSV file. Maybe there is a wrong or missing value. Please check your data, and make sure it returns numeric values
create table simpleone ( id number );
insert into simpleone values ( True );
The last statement fails with:
SQL compilation error: Expression type does not match column data type, expecting NUMBER(38,0) but got BOOLEAN for column ID
If you provide sample data, and SQL to produce this error, maybe we can provide a solution.
unfortunately correlated and nested subqueries in Snowflake are a bit limited at this stage.
I would try running something like this:
update table_name
set start_time_dim_id= time_id
from time_dim
where t.time_name=csv_data.start_time_dim_id

SAS SQL Pass Through

I would like to know what gets executed first in the SAS SQL pass thru in this code:
Connect To OLEDB As MYDB ( %DBConnect( Catalog = MYDB ) ) ;
Create table MYDB_extract as
select put(Parent,$ABC.) as PARENT,
put(PFX,z2.) as PFX,*
From Connection To MYDB
( SELECT
Appointment,Parents,Children,Cats,Dogs
FROM MYDB.dbo.FlatRecord
WHERE Appointment between '20150801' and '20150831'
And Children > 2);
Disconnect from MYDB;
Since MS SQL-Server doesn't support the PUT function will this query cause ALL of the records to be processed locally or only the resultant records from the DBMS?
The explicit pass-through query will still process and will return to SAS what it returns (however many records that is). Then, SAS will perform the put operations on the returned rows.
So if 10000 rows are in the table, and 500 rows meet the criteria in where, 500 records will go to SAS and then be put; SQL will handle the 10000 -> 500.
If you had written this in implicit pass through, then it's possible (if not probable) that SAS might have done all of the work.
First the code in the inline view will be executed on the server:
SELECT Appointment,Parents,Children,Cats,Dogs
FROM MYDB.dbo.FlatRecord
WHERE Appointment between '20150801' and '20150831' And Children > 2
Rows that meet that WHERE clause will be returned by the DBMS to SAS over the OLDEB connection.
Then SAS will (try and) select from that result set, applying any other code, including the put functions.
This isn't really any different from how an inline view works in any other DBMS, except that here you have two different database engines, one running the inner query and SAS running the outer query.

how to write a DB2 store procedure to insert/update/delete with random value?

1.I want to write a DB2 procedure to do common insert/update/delete to a table, problem is how to generate SQL statement with random values? for example, if a column of integer type, the store procedure could generate numbers between 1 to 10000, or for a column of varchar type, the store procedure could generate string of random chosen characters with a fixed length,say 10;
2.if the DB2 SQL syntax support sth to put the data from file into a LOB column for a randomly chosen row, say, I have a table t1(c0 integer,c1 clob), then how could I do sth like "insert into t1 values(100,some_path_to_a_text_file)" ?
3.using DB2 "import" to load data, if the file contains 10000 rows,it seems DB2 by default will commit the entire 10000 rows of insertion in one single transaction. Is there any configuration/option I could use to divide the "import" process into like 10 transaction, each with 1000 rows?
Thank you very much!
1) To do a random operation, get a random value, and process it according to set of rules. I have a similar case in an utility I am currently developping.
https://github.com/angoca/log4db2/blob/master/src/examples/sql-pl/bank/DemoBankRandom.sql
It realizes an insert, a select, an update or a delete based on a random value.
2) No idea. What is sth?
3) For more frequent commits, you put commitcount. For more info please check the infoCenter http://publib.boulder.ibm.com/infocenter/db2luw/v10r1/topic/com.ibm.db2.luw.admin.cmd.doc/doc/r0008304.html

How to Retrieve autoincremnt value after inserting 1 record in single query (sql server)

I am have two fields in my table:
One is Primary key auto increment value and second is text value.
lets say: xyzId & xyz
So I can easily insert like this
insert into abcTable(xyz) Values('34')
After performing above query it must insert these information
xyzId=1 & xyz=34
and for retrieving I can retrieve like this
select xyzId from abcTable
But for this I have to write down two operation. Cant I retrieve in single/sub query ?
Thanks
If you are on SQL Server 2005 or later you can use the output clause to return the auto created id.
Try this:
insert into abcTable(xyz)
output inserted.xyzId
values('34')
I think you can't do an insert and a select in a single query.
You can use a Store Procedures to execute the two instructions as an atomic operation or you can build a query in code with the 2 instructions using ';' (semicolon) as a separator betwen instructions.
Anyway, for select identity values in SQL Server you must check ##IDENTITY, SCOPE_IDENTITY and IDENT_CURRENT. It's faster and cleaner than a select in the table.