DB2 - update increment based on timestamp - db2

After a complex operation (some database merge-ing) I have a table that needs to be updated based on timestamp.
JobsTable
Id Time_stamp Resource RunNumber
121 1 A 1
122 2 A 1
123 3 B 1
124 4 B 1
125 5 A 2
The point is to Update the RunNumber column incrementally for each resource based on timestamp. So in the end the expected result is:
Id Time_stamp Resource RunNumber
121 1 A 1
122 2 A 2 //changed
123 3 B 1
124 4 B 2 //changed
125 5 A 3 //changed
I tried doing this in multiple ways. Since DB2 update does not support Join or With statements I tried something like:
update JOBSTABLE JT
SET RunNumber =
(SELECT RunNumber
FROM (Select ID, ROW_NUMBER() OVER (ORDER BY TIME_STAMP ) RunNumber from JobsTable, ORDER BY TIME_STAMP) AS AAA
WHERE AAA.ID = JT.ID)
WHERE ID = ?
Error:
Assignment of a NULL value to a NOT NULL column "TBSPACEID=6, TABLEID=16, COLNO=2" is not allowed.. SQLCODE=-407, SQLSTATE=23502, DRIVER=3.64.82 SQL Code: -407, SQL State: 23502
Is this even possible? (I am aiming at doing this operation in a single query rather than using Cursors, etc..)
Thank you

Firstly, your subselect has a syntax error, which tells me it's not the exact statement that you are trying to run. The error message is pretty clear -- in your actual statement the subselect sometimes returns NULL.
Secondly, you should probably be numbering rows within a partition by resource.
Thirdly, you could probably do with a single subselect anyway -- this is based on the statement you published:
update JOBSTABLE JT
SET RunNumber =
(SELECT ROW_NUMBER() OVER (partition by resource ORDER BY TIME_STAMP )
from JobsTable where id = JT.ID)

Related

PostgreSQL Query not returning the proper results

So this is my table structure
learning_paths
id
name
version
created_at
updated_at
learning_path_levels
id
name
learning_path_id
order
created_at
updated_at
learning_path_level_nodes
id
name
description
documentation_links
evaluation_methodology
learning_path_level_id
created_at
updated_at
learning_path_node_users
id
learning_path_level_node_id
user_id
evaluated_by
evaluated_at
is_successful
created_at
updated_at
I'm writing a query to retrieve the learning_path_name, count of the amount of levels each learning path has, the pending and completed nodes per level for the user, and the total amount of nodes per level.
I have the following query
select learning_paths."name",
sum(case when learning_path_node_users.is_successful and learning_path_node_users.user_id is not null then 1 else 0 end) as completed_nodes,
sum(case when learning_path_node_users.is_successful = false or learning_path_node_users.user_id is null then 1 else 0 end) as pending_nodes,
count(learning_path_levels.id) as total_levels,
count(*) as total_nodes
from learning_path_level_nodes
inner join learning_path_levels on learning_path_levels.id = learning_path_level_nodes.learning_path_level_id
inner join learning_paths on learning_paths.id = learning_path_levels.learning_path_id
left join learning_path_node_users on learning_path_node_users.learning_path_level_node_id = learning_path_level_nodes.id
group by learning_paths."name"
which returns:
name
completed_nodes
pending_nodes
total_levels
total_nodes
Devops
5
3
8
8
QA
0
1
1
1
Project manager
3
3
6
6
AI
0
5
5
5
Everything is correct, except for the levels count,
for example, for Devops,it should be 2, and it is returning 8
for Project Manager it should be 2, and it is returning 6
a pattern I see is that it returns the amount of nodes as the amount of levels,
How can I fix this?
I'd really appreciate any help or suggestions, as I've been struggling with this.
Thanks in advance
EDIT: As per your suggestion, I'm attaching a fiddle with the tables and data.
https://dbfiddle.uk/?rdbms=postgres_14&fiddle=f29676ff7051686a28de96928db1e3a6
While I don't get the exact results you want, I think you want to add a distinct to your count for the total levels:
select
lp.name,
sum(case when u.is_successful and u.user_id is not null then 1 else 0 end) as completed_nodes,
sum(case when u.is_successful = false or u.user_id is null then 1 else 0 end) as pending_nodes,
count(distinct lpl.id) as total_levels, -- added "distinct"
array_agg (lpl.id) as level_detail, -- debugging aid
count(*) as total_nodes
from
learning_path_level_nodes n
join learning_path_levels lpl on lpl.id = n.learning_path_level_id
join learning_paths lp on lp.id = lpl.learning_path_id
left join learning_path_node_users u on u.learning_path_level_node_id = n.id
group by
lp.name
To help expose the rationale, I added the field level_detail, which you can delete, to show why the results are what they are. You can obviously remove that once the results are what you want.
If it's not what you expect, perhaps you can explain or give by example what I might be missing.

How to get below SQL output

I am trying to get select from a table and return row based on values of a column. Below is data and desired output. If column EmpRecord has multiple values not null to be returned, if it has only null then it should be returned.
Data Table
EmployeeNo EmpRecord
1 A
1 NULL
2 a
3 NULL
4 NULL
4 A
4 aa
Output
EmployeeNo EmpRecord
1 A
2 a
3 NULL
4 A
4 aa
Any advice on how to go ahead with it would be great?
Regards,
Sid
The first half of the UNION query below simply strips off records for which the EmpRecord be NULL. This almost gets the job done, except that for employees who have only one more NULL records, this would remove them from the result set as well. So the second part of the UNION adds these employees back as a single record with their employee number and NULL placeholder for the record.
SELECT t1.EmployeeNo,
t1.EmpRecord
FROM yourTable t1
WHERE t1.EmpRecord IS NOT NULL
UNION ALL
SELECT t2.EmployeeNo,
NULL AS EmpRecord
FROM yourTable t2.
GROUP BY t2.EmployeeNo
HAVING SUM(CASE WHEN t2.EmpRecord IS NULL THEN 1 ELSE 0 END) = COUNT(*)

PostgreSQL: set a column with the ordinal of the row sorted via another field

I have a table segnature describing an item with a varchar field deno and a numeric field ord. A foreign key fk_collection tells which collection the row is part of.
I want to update field ord so that it contains the ordinal of that row per each collection, sorted by field deno.
E.g. if I have something like
[deno] ord [fk_collection]
abc 10
aab 10
bcd 10
zxc 20
vbn 20
Then I want a result like
[deno] ord [fk_collection]
abc 1 10
aab 0 10
bcd 2 10
zxc 1 20
vbn 0 20
I tried with something like
update segnature s1 set ord = (select count(*)
from segnature s2
where s1.fk_collection=s2.fk_collection and s2.deno<s1.deno
)
but query is really slow: 150 collections per about 30000 items are updated in 10 minutes about.
Any suggestion to speed up the process?
Thank you!
You can use a window function to generate the "ordinal" number:
with numbered as (
select deno, fk_collection,
row_number() over (partition by fk_collection order by deno) as rn,
ctid as id
from segnature
)
update segnature
set ord = n.rn
from numbered n
where n.id = segnature.ctid;
This uses the internal column ctid to uniquely identify each rows. The ctid comparison is quite slow, so if you have a real primary (or unique) key in that table, use that column instead.
Alternatively without the common table expression:
update segnature
set ord = n.rn
from (
select deno, fk_collection,
row_number() over (partition by fk_collection order by deno) as rn,
ctid as id
from segnature
) as n
where n.id = segnature.ctid;
SQLFiddle example: http://sqlfiddle.com/#!15/e997f/1

Restricting duplicate results in grouped result set without using distinct

I am attempting to create a query that returns a list of specific entity records without returning any duplicated entries from the entityID field. The query cannot use DISTINCT because the list is being passed to a reporting engine that doesn't understand result sets containing more than the entityID, and DISTINCT requires all the ORDER BY fields to be returned.
The result set cannot contain duplicate entityIDs because the reporting engine also cannot process a report for the same entity twice in the same run. I have found out the hard way that temporary tables aren't supported as well.
The entries need to be sorted in the query because the report engine only allows sorting on the entity_header level, and I need to sort based on the report.status. Thankfully the report engine honors the order in which you return the results.
The tables are as follows:
entity_header
=================================================
entityID(pk) Location active name
1 LOCATION1 0 name1
2 LOCATION1 0 name2
3 LOCATION2 0 name3
4 LOCATION3 0 name4
5 LOCATION2 1 name5
6 LOCATION2 0 name6
report
========================================================
startdate entityID(fk) status reportID(pk)
03-10-2013 1 running 1
03-12-2013 2 running 2
03-10-2013 1 stopped 3
03-10-2013 3 stopped 4
03-12-2013 4 running 5
03-10-2013 5 stopped 6
03-12-2013 6 running 7
Here is the query I've got so far, and it is almost what I need:
SELECT entity_header.entityID
FROM entity_header eh
INNER JOIN report r on r.entityID = eh.entityID
WHERE r.startdate between getdate()-7.5 and getdate()
AND eh.active = 0
AND eh.location in ('LOCATION1','LOCATION2')
AND r.status is not null
AND eh.name is not null
GROUP BY eh.entityID, r.status, eh.name
ORDER BY r.status, eh.name;
I would appreciate any advice this community can offer. I will do my best to provide any additional information required.
Here is a working sample that runs on ms SQL only.
I am using the rank() to count the number of times entityID appears in the results. Saved as list.
The list will contain an integer value of the number of times the entityID occurs.
Using where a.list = 1, filters the results.
Using ORDER BY a.ut, a.en, sorts the results. The ut and en are used to sort.
SELECT a.entityID FROM (
SELECT distinct TOP (100) PERCENT eh.entityID,
rank() over(PARTITION BY eh.entityID ORDER BY r.status, eh.name) as list,
r.status ut, eh.name en
FROM report AS r INNER JOIN entity_header as eh ON r.entityID = eh.entityID
WHERE (r.startdate BETWEEN GETDATE() - 7.5 AND GETDATE()) AND (eh.active = 0)
AND (eh.location IN ('LOCATION1', 'LOCATION2'))
ORDER BY r.status, eh.name
) AS a
where a.list = 1
ORDER BY a.ut, a.en

How to rank in postgres query

I'm trying to rank a subset of data within a table but I think I am doing something wrong. I cannot find much information about the rank() feature for postgres, maybe I'm looking in the wrong place. Either way:
I'd like to know the rank of an id that falls within a cluster of a table based on a date. My query is as follows:
select cluster_id,feed_id,pub_date,rank
from (select feed_id,pub_date,cluster_id,rank()
over (order by pub_date asc) from url_info)
as bar where cluster_id = 9876 and feed_id = 1234;
I'm modeling this after the following stackoverflow post: postgres rank
The reason I think I am doing something wrong is that there are only 39 rows in url_info that are in cluster_id 9876 and this query ran for 10 minutes and never came back. (actually re-ran it for quite a while and it returned no results, yet there is a row in cluster 9876 for id 1234) I'm expecting this will tell me something like "id 1234 was 5th for the criteria given). It will return a relative rank according to my query constraints, correct?
This is postgres 8.4 btw.
By placing the rank() function in the subselect and not specifying a PARTITION BY in the over clause or any predicate in that subselect, your query is asking to produce a rank over the entire url_info table ordered by pub_date. This is likely why it ran so long as to rank over all of url_info, Pg must sort the entire table by pub_date, which will take a while if the table is very large.
It appears you want to generate a rank for just the set of records selected by the where clause, in which case, all you need do is eliminate the subselect and the rank function is implicitly over the set of records matching that predicate.
select
cluster_id
,feed_id
,pub_date
,rank() over (order by pub_date asc) as rank
from url_info
where cluster_id = 9876 and feed_id = 1234;
If what you really wanted was the rank within the cluster, regardless of the feed_id, you can rank in a subselect which filters to that cluster:
select ranked.*
from (
select
cluster_id
,feed_id
,pub_date
,rank() over (order by pub_date asc) as rank
from url_info
where cluster_id = 9876
) as ranked
where feed_id = 1234;
Sharing another example of DENSE_RANK() of PostgreSQL.
Find top 3 students sample query.
Reference taken from this blog:
Create a table with sample data:
CREATE TABLE tbl_Students
(
StudID INT
,StudName CHARACTER VARYING
,TotalMark INT
);
INSERT INTO tbl_Students
VALUES
(1,'Anvesh',88),(2,'Neevan',78)
,(3,'Roy',90),(4,'Mahi',88)
,(5,'Maria',81),(6,'Jenny',90);
Using DENSE_RANK(), Calculate RANK of students:
;WITH cteStud AS
(
SELECT
StudName
,Totalmark
,DENSE_RANK() OVER (ORDER BY TotalMark DESC) AS StudRank
FROM tbl_Students
)
SELECT
StudName
,Totalmark
,StudRank
FROM cteStud
WHERE StudRank <= 3;
The Result:
studname | totalmark | studrank
----------+-----------+----------
Roy | 90 | 1
Jenny | 90 | 1
Anvesh | 88 | 2
Mahi | 88 | 2
Maria | 81 | 3
(5 rows)