SQL: Data Cleaning - data-cleaning

SQL: Data Cleaning - data-cleaning

I am facing a problem which I do not know how to categorize. So, pardon me for the generic title. I have a dataset like:
Table1: Column1, Column2, Column3.
According to my business logic, for a pair of 'Column1 Column2', the Column3 can have only one unique value. So below table is a problematic one because of the second entry:
Table1
Column1 Column2 Column3
A1 B1 R
A1 B1 O << ERROR! for A1-B1 pair only one value on column3 is accepted
A2 B2 R
A2 B3 J
A3 B3 K
A4 B5 K
From above table I would like to find the problematic entries:
A1 B1 R
A1 B1 O
Thanks in advance for your help !

Using your example column names, you can run the following query to just see the Column1/Column2 pairs that have more than 1 value in Column 3.
SELECT Column1, Column2, COUNT(DISTINCT Column3) as Column3
FROM Table1
GROUP BY Column1, Column2
HAVING COUNT(DISTINCT Column3) > 1
You can omit the HAVING line to see the complete list of Column1/Column2 pairs.

Related

DB2: SQL to return all rows in a group having a particular value of a column in two latest records of this group

I have a DB2 table having one of the columns (A) which has either value PQR or XYZ.
I need output where the latest two records based on col C date have value A = PQR.
Sample Table
A B C
--- ----- ----------
PQR Mark 08/08/2019
PQR Mark 08/01/2019
XYZ Mark 07/01/2019
PQR Joe 10/11/2019
XYZ Joe 10/01/2019
PQR Craig 06/06/2019
PQR Craig 06/20/2019
In this sample table, my output would be Mark and Craig records

Since 11.1
You may use the nth_value OLAP function.
Refer to OLAP specification.
SELECT A, B, C
FROM
(
SELECT
A, B, C
, NTH_VALUE (A, 1) OVER (PARTITION BY B ORDER BY C DESC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) C1
, NTH_VALUE (A, 2) OVER (PARTITION BY B ORDER BY C DESC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) C2
FROM TAB
)
WHERE C1 = 'PQR' AND C2 = 'PQR'
dbfiddle link.
Older versions
SELECT T.*
FROM TAB T
JOIN
(
SELECT B
FROM
(
SELECT
A, B
, ROWNUMBER() OVER (PARTITION BY B ORDER BY C DESC) RN
FROM TAB
)
WHERE RN IN (1, 2)
GROUP BY B
HAVING MIN(A) = MAX(A) AND COUNT(1) = 2 AND MIN(A) = 'PQR'
) G ON G.B = T.B;

A simple solution could be
SELECT A,B,C
FROM tab
WHERE A = 'PQR'
ORDER BY C DESC FETCH FIRST 2 ROWS only

Selecting records in SQL Server

I have table with two columns with this sample data:
Column1 Column2
------------------------
A B
A C
A D
R B
R D
S E
If I pass the input value of Column2='B', it will display these records:
Column1 Column2
-------------------------
A B
A C
A D
R B
R D
because Column2 'D' Contains 'A' in column1 and 'R' in Column1. So it fetches
all the records which contains 'A' and 'R'
Suppose if I pass an input if Column2='C', it will display these records
Column1 Column2
-------------------------
A B
A C
A D
because Column2 'C' Contains 'A' in Column1. So it fetches all the records which contain 'A'.
Original table contains > 100k records. So we will give any input for column2.

How can i query the last data from 3 tables

now i have 3 tables, for example A,B,C
the relation between them is A onetomany B, B onetomany C.
C is a table saved photos
now i want get data from A, but only the last photo each A.
the colomns maybe like this:
table a:
id a_msg
a1 msg in a
a2 msg in a
a3 msg in a
table b:
id b_msg a_id
b1 some data in b a1
b2 some data in b a1
b3 some data in b a2
b4 some data in b a3
table c:
id url createdate c_msg b_id
c1 /file/1.jpg 2014-12-01 06:55:54.600 some data in c b1
c2 /file/2.jpg 2014-12-01 06:55:54.601 some data in c b1
c3 /file/3.jpg 2014-12-01 06:55:54.602 some data in c b1
c4 /file/4.jpg 2014-12-01 06:55:54.603 some data in c b2
c5 /file/5.jpg 2014-12-01 06:55:54.604 some data in c b2
c6 /file/6.jpg 2014-12-01 06:55:54.605 some data in c b3
the result i want get
c_id url createdate c_msg b_msg b_id a_msg a_id
c6 /file/6.jpg 2014-12-01 06:55:54.605 some data in c some data in b b3 msg in a a1
c5 /file/5.jpg 2014-12-01 06:55:54.604 some data in c some data in b b2 msg in a a1
Sorry ,i don't know how to use tool to describle the table,hope you can easily understand what i mean.
if my description is not clear enough,i will edit the question,thank you if anyone can help me

Consider the following as an example :
create table table_a (id int,a_msg text);
create table table_b (id int,b_msg text,a_id int);
create table table_c (id int,url text,createdate timestamp with time zone,c_msg text ,b_id int);
and the data
insert into table_a values (1,'msg in table_a')
,(2,'2nd msg in table_a')
,(3,'3rd msg in table_a');
insert into table_b values (20,'msg in table_b',1)
,(21,'2nd msg in table_b',2)
,(22,'3rd msg in table_b',3);
insert into table_c values (30,'url','2014-12-01 06:55:54.600','msg in table_c',20)
,(31,'url 1','2014-12-01 06:55:54.604','2nd msg in table_c',21)
,(32,'url 2','2014-12-01 06:55:54.605','3rd msg in table_c',22);
to get the result you need to use INNER JOIN and to get the last two data use order by createdate desc limit 2
select c.id,c.url
,c.createdate
,c.c_msg,b.b_msg
,b.id bi_id,a.a_msg
,a.id a_id
from
table_c c inner join table_b b on c.b_id=b.id /* to get data from table_b */
inner join table_a a on b.a_id=a.id /* to get data from table_a */
order by createdate desc limit 2 /* DESC will sort from the highest date time values and LIMIT 2 will return two rows */
>SQLFIDDLE DEMO WITH OP'S DATA

Common records for 2 fields in a table?

I have a Table which has 2 fields say A,B. Suppose A has values a1,a2.
Corresponding records for a1 in B are 1,2,3,x,y,z.
Corresponding records for a2 in B are 1,2,3,4,d,e,f
I need a a query to be written in DB2, so that it will fetch the common records in B for each record in A (a1 and a2).
So here the output would be :
A B
a1 1
a1 2
a1 3
a2 1
a2 2
a2 3
Can someone please help on this?

Try something like:
SELECT A, B
FROM Table t1
WHERE (SELECT COUNT(*) FROM Table t2 WHERE t2.B = t1.B)
= (SELECT COUNT(DISTINCT t3.A) FROM Table t3)
ORDER BY A, B

This might not be 100% accurate as I can't test it out in DB2 so you might have to tweak the query a little bit to make it work.
with t(num) as (select count(distinct A) from table)
select t1.A, t1.B
from table t1, table t2, t
where t1.B = t2.B
group by t1.A, t1.B, num
having count(*) = num
Basically, the idea is to join the same table with column B and filter out just the ones that match exactly the same number of times as the number of elements in column A, which indicates that it is a common record out of all the A values.

T-SQL Query to convert rows to coumns based on mutiple tables

I have two master tables CompanyMaster, ActivityMaster for a child table CompanyActivities
ActivityMaster
ACTIVITYID ACTIVITYNAME
A1 testActivity
A2 someActivity
A3 otheractivity
A4 someotheractivity
A5 anyotheractivity
CompanyMaster
COMPANYID COMPANYNAME
C1 testcompany
C2 ACompany
C3 MyCompany
C4 SomeCompany
C5 ZCompany
C6 Company123
C7 ComapnyABC
CompanyActivities - The COMPANYID in CompanyActivities is having a primarykey-foreighkey relation ship with COMPANYID in CompanyMaster (primary key table) and ACTIVITYID is having a primarykey-foreighkey relation ship with ACTIVITYID in ActivityMaster(primary key table)
COMPANYID ACTIVITYID
C1 A1
C1 A3
C3 A1
C3 A2
C4 A5
C5 A1
C6 A3
C7 A3
I want to do write a query to get the following output where all the rows in ACTIVITYID column of the ActivityMaster table will be converted to columns
Output
Companies A1 A2 A3 A4 A5
C1 Y N Y N N
C2 N N N N N
C3 Y Y N N N
C4 N N N N Y
C5 Y N N N N
C6 N N Y N N
C7 N N Y N N
The output table displays all the companies as rows in the first column and all the activities are shown as columns that start after the first column, if there is row that contains both ACTIVITYID and COMPANYID it will set to Y in output otherwise it would be set to N
eg- COMPANYID C1 is having an activity ACTIVITYID A1 in CompanyActivities table so the first row in the second column that comes just below A1 and in the right to C1 is set Y, whereas C1 and A2 are not having a row, so the third column in the first row is set to N
I am using C#.net and 4 for loops to achieve the output now which is talking a heavy toll on the performance of the application, So i would like to do this using a query, I have searched for pivot queries, but all the examples i found knows the column names before-hand, which i don't i only get the names of the column names by querying the ActivityMaster.

create table #CompanyMaster (COMPANYID int, COMPANYNAME varchar(30))
create table #ActivityMaster (ACTIVITYID int, ACTIVITYNAME varchar(30))
create table #CompanyActivities (COMPANYID int, ACTIVITYID int)
insert into #CompanyMaster
SELECT 1, 'Company A'
union all
SELECT 2, 'Company B'
insert into #ActivityMaster
SELECT 101, 'Activity X'
union all
SELECT 102, 'Activity Y'
union all
SELECT 103, 'Activity Z'
insert into #CompanyActivities
select 1, 102
union all
select 2, 101
-- build activities column names
--case [Activity X] when 0 then ''N'' else ''Y'' end as [Activity X],
--case [Activity Y] when 0 then ''N'' else ''Y'' end as [Activity Y],
--case [Activity Z] when 0 then ''N'' else ''Y'' end as [Activity Z]
declare #activities nvarchar(max)
set #activities
= (
select 'case [' + ACTIVITYNAME + '] when 0 then ''N'' else ''Y'' end as [' + ACTIVITYNAME + '],' + char(10)
from #ActivityMaster
for xml path('')
)
set #activities = substring(#activities, 0, len(#activities)-1)
declare #activities_for nvarchar(max)
-- build activities column names in for
--[Activity X], [Activity Y], [Activity Z]
set #activities_for
= (
select '[' + ACTIVITYNAME + '],' + char(10)
from #ActivityMaster
for xml path('')
)
set #activities_for = substring(#activities_for, 0, len(#activities_for)-1)
declare #sql nvarchar(MAX) = N'
select COMPANYNAME,
<activities>
From
(select c.COMPANYNAME, a.ACTIVITYNAME,
(case
when ca.ACTIVITYID is not null and ca.COMPANYID is not null then 1
else 0
end) as STATUS
from #CompanyMaster c
cross join #ActivityMaster a
left join #CompanyActivities ca on ca.COMPANYID = c.COMPANYID and a.ACTIVITYID = ca.ACTIVITYID) p
pivot
(
sum(STATUS) for ACTIVITYNAME IN (<activities_for>)
) as pvt
'
set #sql = replace(#sql, '<activities>', #activities)
set #sql = replace(#sql, '<activities_for>', #activities_for)
print #sql
exec sp_executesql #sql
drop table #CompanyMaster
drop table #ActivityMaster
drop table #CompanyActivities

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

SQL: Data Cleaning - data-cleaning

Related

DB2: SQL to return all rows in a group having a particular value of a column in two latest records of this group

Selecting records in SQL Server

How can i query the last data from 3 tables

Common records for 2 fields in a table?

T-SQL Query to convert rows to coumns based on mutiple tables

Categories

Resources