Autoincrement in query - db2

I need to create a query which increment value of current row by 8% to previous row.
Table (let's name it money) contains one row (and two columns), and it looks like
AMOUNT ID
100.00 AAA
I just need to populate a data from this table like this way (one select from this table, eg. 6 iterations):
100.00 AAA
108.00 AAA
116.64 AAA
125.97 AAA
136.04 AAA
146.93 AAA

You can do that with a common table expression.
E.g. if your source looks like this:
db2 "create table money(amount decimal(31,2), id varchar(10))"
db2 "insert into money values (100,'AAA')"
You can create the input data with the following query (I will include counter column for clarity):
db2 "with
cte(c1,c2,counter)
as
(select
amount, id, 1
from
money
union all
select
c1*1.08, c2, counter+1
from
cte
where counter < 10)
select * from cte"
C1 C2 COUNTER
--------------------------------- ---------- -----------
100.00 AAA 1
108.00 AAA 2
116.64 AAA 3
125.97 AAA 4
136.04 AAA 5
146.92 AAA 6
158.67 AAA 7
171.36 AAA 8
185.06 AAA 9
199.86 AAA 10
To populate the existing table without repeating the existing row you use e.g. an insert like this:
$ db2 "insert into money
with
cte(c1,c2,counter)
as
(select
amount*1.08, id, 1
from
money
union all
select
c1*1.08, c2, counter+1
from
cte
where counter < 10) select c1,c2 from cte"
$ db2 "select * from money"
AMOUNT ID
--------------------------------- ----------
100.00 AAA
108.00 AAA
116.64 AAA
125.97 AAA
136.04 AAA
146.93 AAA
158.68 AAA
171.38 AAA
185.09 AAA
199.90 AAA
215.89 AAA
11 record(s) selected.

Related

PostgreSQL: Count Number of Occurrences in Columns

BACKGROUND
I have three large tables (employee_info, driver_info, school_info) that I have joined together on common attributes using a series of LEFT OUTER JOIN operations. After each join, the resulting number of records increased slightly, indicating that there are duplicate IDs in the data. To try and find all of the duplicates in the IDs, I dumped the ID columns into a temp table like so:
Original Dump of ID Columns
first_name
last_name
employee_id
driver_id
school_id
Mickey
Mouse
1234
abcd
wxyz
Donald
Duck
2423
heca
qwer
Mary
Poppins
1111
acbe
aaaa
Wiley
Cayote
1234
strf
aaaa
Daffy
Duck
1256
acbe
pqrs
Bugs
Bunny
9999
strf
yxwv
Pink
Panther
2222
zzzz
zzaa
Michael
Archangel
0000
rstu
aaaa
In this overly simplified example, you will see that IDs 1234 (employee_id), strf (driver_id), and aaaa (school_id) are each duplicated at least once. I would like to add a count column for each of the ID columns, and populate them with the count for each ID used, like so:
ID Columns with Counts
first_name
last_name
employee_id
employee_id_count
driver_id
driver_id_count
school_id
school_id_count
Mickey
Mouse
1234
2
abcd
1
wxyz
1
Donald
Duck
2423
1
heca
1
qwer
1
Mary
Poppins
1111
1
acbe
1
aaaa
3
Wiley
Cayote
1234
2
strf
2
aaaa
3
Daffy
Duck
1256
1
acbe
1
pqrs
1
Bugs
Bunny
9999
1
strf
2
yxwv
1
Pink
Panther
2222
1
zzzz
1
zzaa
1
Michael
Archangel
0000
1
rstu
1
aaaa
3
You can see that IDs 1234 and strf each have 2 in the count, and aaaa has 3. After generating this table, my goal is to pull out all records where any of the counts are greater than 1, like so:
All Records with One or More Duplicate IDs
first_name
last_name
employee_id
employee_id_count
driver_id
driver_id_count
school_id
school_id_count
Mickey
Mouse
1234
2
abcd
1
wxyz
1
Mary
Poppins
1111
1
acbe
1
aaaa
3
Wiley
Cayote
1234
2
strf
2
aaaa
3
Bugs
Bunny
9999
1
strf
2
yxwv
1
Michael
Archangel
0000
1
rstu
1
aaaa
3
Real World Perspective
In my real-world work, the JOIN'd table contains 100 columns, 15 different ID fields and over 30,000 records, and the final table came out to be 28 more than the original. This may seem like a small amount, but each of the 28 represent a broken link that we must fix.
Is there a simple way to get the counts populated like in the second table above? I have been wrestling with this for hours already, and have not been able to make this work. I tried some aggregate functions, but they cannot be used in table UPDATE operations.
The COUNT function, when used as an analytic function, can do what you want here, e.g.
WITH cte AS (
SELECT *,
COUNT(employee_id) OVER (PARTITION BY employee_id) employee_id_count,
COUNT(driver_id) OVER (PARTITION BY driver_id) driver_id_count,
COUNT(school_id) OVER (PARTITION BY school_id) school_id_count
FROM yourTable
)
SELECT *
FROM cte
WHERE
employee_id_count > 1
driver_id_count > 1
school_id_count > 1;

how to concatenate or append current value with existing value in Datastage

I need achive below requirement i.e
Input -- at very first time
Order value
1111 aaa
222 bbb
333 ccc
in the target (Insert) I will have
Order value
Order value
1111 aaa
222 bbb
333 ccc
----------Input -- at second time
Order value
1111 Aaa1
222 Bbb2
333 ccc
Out put must be
Order value
1111 aaa Aaa1
222 bbb Bbb2
So on
I need to keep appending change values for the corresponding key column ..
111 aaa aaa1 aaa2 aaa3 ..like this
Please help
You can follow these steps:
Use a CDC stage. Here by default it keeps the following functions: 0 for copy/duplicate record,1 for insert ,b2 for delete, 3 for update.
Now the link which would be carrying the copy record simply connect it to a transformer where declare a stage variable that would be incremented by 1.
Next in the field derivation write as CONCAT(string,'Initcaps(a'),'aa',svarcount,'')

Case statement using different columns in Talend

I have a case statement as below
Case when col1 like '%other%' then 'No' else col5 end as col5
Here like in SQL I need to implement the case statement with different columns and the wild card check of the word 'other' all in talend how can this be done?
Your question is not clear, without any screenshots or explanation.
I assume that you have some input component, like tOracleInput with row out of it, having multiple columns in the schema. I would suggest to use tMap component to manipulate contents of the schema, especially Expression Builder.
P.S. I personally prefer tJavaFlex for columns validations / manipulations, this way the code more readable, but it is more advanced technique.
If the columns are not dynamically created, you can add a case in tMap for example.
So, for a new boolean column you could create the expression:
row1.mycolumn1.toLowerCase().contains("other") ? new Boolean.true : new Boolean.false;
The boolean field would hold the check value then.
EDIT
Since a new boolean column is not wanted here, your specific requirement would look like this:
row1.product_code.toLowerCase().contains("other") ? "No data" : row1.product_code;
Please find the input as follows.
col1 col2 col3 col4 col5
---- ---- ---- ---- ----
aaaother aaa aaa aaa aaa
otherbbb bbb bbb bbb bbb
ccc ccc ccc ccc ccc
in the tMap you have to have the following statement in the expression builder against col5
input.col1.contains("other")?"No":input.col5
then the output will be as follows
col1 col2 col3 col4 col5
---- ---- ---- ---- ----
aaaother aaa aaa aaa No
otherbbb bbb bbb bbb No
ccc ccc ccc ccc ccc

Delete rows where data has less sum

I have the following table:
data_id sum_value
xxx 30
xxx 40
ccc 50
aaa 60
ccc 70
aaa 80
ddd 100
eee 200
How would I delete the row where data_id = data_id and sum < sum ? Delete rows if data_id = data_id and sum_value is less and if data_id != data_id then show actual values
Expected Output
data_id sum_value
xxx 40
ccc 70
aaa 80
ddd 100
eee 200
thank you
delete from foo
using(select min(sum_value) sum
,data_id
from foo
group by data_id
having count(data_id)>1
)t
where foo.sum_value=t.sum
and foo.data_id=t.data_id
SqlFiddle - Demo
Try this:
delete from table where data_id=data_id and (SELECT min(the_field) FROM the_table)
select * from table
Assuming you only want to keep the records with the largest sum (for every id), and delete the rest:
-- the data
CREATE TABLE ztable
( data_id CHAR(3) NOT NULL
, sum_value INTEGER NOT NULL
);
INSERT INTO ztable(data_id, sum_value) VALUES
('xxx',30)
,('xxx',40)
,('ccc',50)
,('aaa',60)
,('ccc',70)
,('aaa',80)
;
-- Delete the non-largest per id:
DELETE FROM ztable del -- del cannot be the record with the largest sum
WHERE EXISTS ( -- if a record exists
SELECT * FROM ztable x
WHERE x.data_id = del.data_id -- with the same ID
AND x.sum_value > del.sum_value -- ... but a larger sum
);
Result:
CREATE TABLE
INSERT 0 6
DELETE 3
data_id | sum_value
---------+-----------
xxx | 40
ccc | 70
aaa | 80
(3 rows)
You can do this :
with src as (
SELECT DISTINCT on (data_id)
data_id as di, sum_value as sv
FROM Table1
ORDER BY data_id, sum_value DESC
)
DELETE FROM table1
WHERE ( data_id, sum_value) NOT IN (SELECT di, sv FROM src);
to only leave the row with the highest sum_value , or this:
with src as (
SELECT DISTINCT on (data_id)
data_id as di, sum_value as sv
FROM Table1
ORDER BY data_id, sum_value ASC
)
DELETE FROM table1
WHERE ( data_id, sum_value) IN (SELECT di, sv FROM src);
to only remove row with the smallest sum_value.
For supplied data both will do the same. You didn't mention what do you expect where more than two rows exist with the same data_id.
see the fiddle: http://sqlfiddle.com/#!15/92a04/1
Hmm sqlFiddle behaves strange when you modify data in the right pane. When running them both it shows that 0 rows were affected by first one, but then the second on shows only 3 rows left . However, if you run them separately - it shows that no rows were deleted, and all rows still exist. I guess it returns the table for the state defined in the left pane after each run... It has to run it in a transaction. When I added COMMIT; statement it threw an error:
Explicit commits are not allowed within the query panel.
Try theese queries on your local DB - this is how it looks on my local postgres 9.4:
testdb=# select * from Table1;
data_id | sum_value
---------+-----------
xxx | 30
xxx | 40
ccc | 50
aaa | 60
ccc | 70
aaa | 80
(6 wierszy)
testdb=# with src as (
testdb(# SELECT DISTINCT on (data_id)
testdb(# data_id as di, sum_value as sv
testdb(# FROM Table1
testdb(# ORDER BY data_id, sum_value ASC
testdb(# )
testdb-# DELETE FROM table1
testdb-# WHERE ( data_id, sum_value) IN (SELECT di, sv FROM src);
DELETE 3
testdb=# select * from Table1;
data_id | sum_value
---------+-----------
xxx | 40
ccc | 70
aaa | 80
(3 wiersze)
re-init table
testdb=# select * from Table1;
data_id | sum_value
---------+-----------
xxx | 30
xxx | 40
ccc | 50
aaa | 60
ccc | 70
aaa | 80
ddd | 100
eee | 200
(8 wierszy)
testdb=# with src as (
testdb(# SELECT DISTINCT on (data_id)
testdb(# data_id as di, sum_value as sv
testdb(# FROM Table1
testdb(# ORDER BY data_id, sum_value DESC
testdb(# )
testdb-# DELETE FROM table1
testdb-# WHERE ( data_id, sum_value) NOT IN (SELECT di, sv FROM src);
DELETE 3
testdb=# SELECT * from table1;
data_id | sum_value
---------+-----------
xxx | 40
ccc | 70
aaa | 80
ddd | 100
eee | 200
(5 wierszy)
works for me...
delete from table
USING table, table as vtable
WHERE (NOT table.ID=vtable.ID)
OR sum=(SELECT min(sum) FROM table)
try this

TSQL advanced ranking, grouping to find date spans

I need to do some advanced grouping in TSQL with data that looks like this:
PK YEARMO DATA
1 201201 AAA
1 201202 AAA
1 201203 AAA
1 201204 AAA
1 201205 (null)
1 201206 BBB
1 201207 AAA
2 201301 CCC
2 201302 CCC
2 201303 CCC
2 201304 DDD
2 201305 DDD
And then, every time DATA changes per primary key, pull up the date range for said item so that it looks something like this:
PK START_DT STOP_DT DATA
1 201201 201204 AAA
1 201205 201205 (null)
1 201206 201206 BBB
1 201207 201207 AAA
2 201301 201303 CCC
2 201304 201305 DDD
I've been playing around with ranking functions but haven't had much success. Any pointers in the right direction would be supremely awesome and appreciated.
You can use the row_number()function to partition your data into ranges:
SELECT
PK,
START_DT = MIN(YEARMO),
STOP_DT = MAX(YEARMO),
DATA
FROM (
SELECT
PK, DATA, YEARMO,
ROW_NUMBER() OVER (ORDER BY YEARMO) -
ROW_NUMBER() OVER (PARTITION BY PK, DATA ORDER BY YEARMO) grp
FROM your_table
) A
GROUP BY PK, DATA, grp
ORDER BY MIN(YEARMO)
Sample SQL Fiddle