Querying Missing rows in TSQL - tsql

We have a table that is populated from information on multiple computers every day. The problem is sometimes it doesn't pull information from certain computers.
So for a rough example, the table columns would read computer_name, information_pulled, qty_pulled, date_pulled.
So Lets say it pulled every day in a week, except the 15th. A query will pull
Computer_name, Information_pulled, qty_pulled, date_pulled
computer1 infopulled 2 2014-06-14
computer2 infopulled 3 2014-06-14
computer3 infopulled 2 2014-06-14
computer1 infopulled 2 2014-06-15
computer3 infopulled 1 2014-06-15
computer1 infopulled 3 2014-06-16
computer2 infopulled 2 2014-06-16
computer3 infopulled 4 2014-06-16
As you can see, nothing pulled in for computer 2 on the 15th. I am looking to write a query that pulls up missing rows for a specific date.
For Example, after running it it says
computer 2 null null 20140615
or anything close to this. We're trying to catch it each morning when this table isn't populated that way we can be proactive and I am not positive I can even query for missing data w/o searching for null values.

You need to have a master list of all your computers somewhere, so that you know when a computer is not accounted for in your table. Say that you have a table called Computer that holds this.
Declare a variable to store the date you want to check:
declare #date date
set #date = '6/15/2014'
Then you can query for missing rows like this:
select c.Computer_name, null, null, #date
from Computer c
where not exists(select 1
from myTable t
where t.Computer_name = c.Computer_name
and t.date_pulled = #date)
SQL Fiddle
If you are certain that every computer_name already exists in your table at least once, you could skip creating a separate Computer table, and modify the query like this:
select c.Computer_name, null, null, #date
from (select distinct Computer_name from myTable) c
where not exists(select 1
from myTable t
where t.Computer_name = c.Computer_name
and t.date_pulled = #date)
This query isn't as robust because it will not show computers that do not already have a row in your table (e.g. a new computer, or a problematic computer that has never had its information pulled).

I think a cross-join will answer your problem.
In the query below, every computer will have to have successfully uploaded at least once and at least one every day.
This way you'll get every missing computer/date couple.
select
Compare.*
from Table_1 T1
right join (
select *
from
(select Computer_name from Table_1 group by Computer_name) CPUS,
(select date_pulled from Table_1 group by date_pulled) DAYs
) Compare
on T1.Computer_name=Compare.Computer_name
and T1.date_pulled=Compare.date_pulled
where T1.Computer_name is null
Hope this help.

If you join the table to itself by date and computer_name like the following, you should get a list of missing dates
SELECT t1.computer_name, null as information_pulled, null as qty_pulled,
DATEADD(day,1,t1.date_pulled) as missing_date
FROM computer_info t1
LEFT JOIN computer_info t2 ON t2.date_pulled = DATEADD(day,1,t1.date_pulled)
AND t2.computer_name = t1.computer_name
WHERE t1.date_pulled >= '2014-06-14'
AND t2.date_pulled IS NULL
This will also get the next date that hasn't been pulled yet, but that should be clear and you could add an additional condition to filter it out.
AND DATEADD(day,1,t1.date_pulled) < '2014-06-17'
Of course, this only works if you know each of the computer names already exist in the table for previous days. If not, #Jerrad's suggestion to create a separate computer table would help.
EDIT: if the gap is larger than a single day, you may want to see that
SELECT t1.computer_name, null as info, null as qty_pulled,
DATEADD(day,1,t1.date_pulled) as missing_date,
t3.date_pulled AS next_pulled_date
FROM computer_info t1
LEFT JOIN computer_info t2 ON t2.date_pulled = DATEADD(day,1,t1.date_pulled)
AND t2.computer_name = t1.computer_name
LEFT JOIN computer_info t3 ON t3.date_pulled > t1.date_pulled
AND t3.computer_name = t1.computer_name
LEFT JOIN computer_info t4 ON t4.date_pulled > t1.date_pulled
AND t4.date_pulled < t3.date_pulled
AND t4.computer_name = t1.computer_name
WHERE t1.date_pulled >= '2014-06-14'
AND t2.date_pulled IS NULL
AND t4.date_pulled IS NULL
AND DATEADD(day,1,t1.date_pulled) < '2014-06-17'
The 't3' join will join all dates over the first missing one and the 't4' join along with t4.pulled_date IS NULL will exclude all but the lowest of those dates.
You could do this with subqueries as well, but excluding joins have served me well in the past.

Related

Oracle Sql - Discarding outer select if inner select returns null, and avoiding multiple rows

Pre-Info: In our company a person is marked * if he is actively working. And there are people who changed their departments.
For a report I use 2 tables named COMPANY_PERSON_ALL and trifm_izinler4, joining person_id field as below.
I want to discard (don't list) the row, if the first inner select returns null.
And I want to prevent the second inner select returning multiple Departments.
select izn.person_id, izn.adi_soyadi, izn.company_id,
(select a.employee_status from COMPANY_PERSON_ALL a where a.employee_status = '*' and a.person_id = izn.person_id) as Status,
(select a.org_code from COMPANY_PERSON_ALL a where a.person_id = izn.person_id) as Department,
izn.hizmet_suresi, izn.kalan_izin
from trifm_izinler4 izn
where trunc(rapor_tarihi) = trunc(SYSDATE)
Can you help me how to overcome these 2 problems of inner select statements?
Assuming you only want to see the department from the active person record, you can just join the two tables instead of using subquery expressions, and filter on that status:
select izn.person_id, izn.adi_soyadi, izn.company_id,
a.employee_status as status, a.org_code as department
izn.hizmet_suresi, izn.kalan_izin
from trifm_izinler4 izn
join company_person_all a on a.person_id = izn.person_id
where rapor_tarihi >= trunc(SYSDATE)
-- and rapor_tarihi < trunc(SYSDATE) + 1 -- probably not needed
and a.employee_status = '*'
I've also changed the date comparison; if you compare using trunc(rapor_tarihi) then a normal index on that column can't be used, so it's generally better to compare the original value against a range. Since you're comparing against today's date you probably only need to look for values greater than midnight today, but if that column can have future dates then you can put an upper bound on the range of midnight tomorrow - which I've included but commented out.
If a person can be active in more than one department at a time then this will show all of those, but your wording suggests people are only active in one at a time. If you want to see a department for all active users, but not necessarily the one that has the active flag (or if there can be more than one active), then it's a bit more complicated, and you need to explain how you would want to choose which to show.

Proc is running slow with NOT EXISTS

I'm working on trying to create a stored procedure however I'm running into a issue where the stored procedure runs for over 5 minutes due to close to 50k records.
The process seems pretty straight forward, I'm just not sure why it is taking so long.
Essentially I have two tables:
Table_1
ApptDate ApptName ApptDoc ApptReason ApptType
-----------------------------------------------------------------------
03/15/2021 Physical Dr Smith Yearly Day
03/15/2021 Check In Dr Doe Check In Day
03/15/2021 Appt oth Dr Dee Check In Monthly
Table_2 - this table has the same exact structure as Table_1, what I am trying to achieve is simply archive the the data from Table_1
DECLARE #Date_1 as DATETIME
SET #Date_1 = GetDate() - 1
INSERT INTO Table_2 (ApptDate, ApptName, ApptDoc, ApptReason)
SELECT ApptDate, ApptName, ApptDoc, ApptReason
FROM Table_1
WHERE ApptType = 'Day' AND ApptDate = #Date_1
AND NOT EXISTS (SELECT 1 FROM Table_2
WHERE AppType = 'Day' AND ApptDate = #Date_1)
So this stored procedure seems pretty straight forward, however the NOT EXIST is causing it to be really slow.
The reason for NOT EXIST, is that this stored procedure is part of a bigger process that runs multiple times a day (morning, afternoon, night). I'm trying to make sure that I only have 1 copy of the the '03/15/2021' data. I'm basically running an archive process on previous days data (#Date_1)
Any thoughts how this can be "sped up".
For this query:
INSERT INTO Table_2 (ApptDate, ApptName, ApptDoc, ApptReason)
SELECT ApptDate, ApptName, ApptDoc, ApptReason
from Table_1 t1
Where ApptType = 'Day' and
ApptDate = #Date_1 and
NOT EXISTS (Select 1
from Table_2 t2
where t2.AppType = t1.AppType and
t2.ApptDate = t1.ApptDate
);
You want indexes on: table_1(ApptType) and more importantly, Table_2(AppType, ApptDate) or Table_2(ApptDate, AppType).
Note: I changed the correlation clause to just refer to the values in the outer query. This seems more general than your version, but should have the same performance (in this case).

fetch data from and to date to get all matching results

Hello everyone I have to get data from and to date, I tried using between clause which fails to retrieve data what I need. Here is what I need.
I have table called hall_info which has following structure
hall_info
id | hall_name |address |contact_no
1 | abc | India |XXXX-XXXX-XX
2 | xyz | India |XXXX-XXXX-XX
Now I have one more table which is events, that contains data about when and which hall is booked on what date, the structure is as follows.
id |hall_info_id |event_date(booked_date)| event_name
1 | 2 | 2015-10-25 | Marriage
2 | 1 | 2015-10-28 | Marriage
3 | 2 | 2015-10-26 | Marriage
So what I need now is I wanna show hall_names that are not booked on selected dates, suppose if user chooses from 2015-10-23 to 2015-10-30 so I wanna list all halls that are not booked on selected dates. In above case both the halls of hall_info_id 1 and 2 ids booked in given range but still I wanna show them because they are free on 23,24,27 and on 29 date.
In second case suppose if user chooses date from 2015-10-25 and 2015-10-26 then only hall_info_id 2 is booked on both the dates 25 and 26 so in this case i wanna show only hall_info_id 1 as hall_info_id 2 is booked.
I tried using inner query and between clause but I am not getting required result to simply i have given only selected fields I have more tables to join so i cant paste my query please help with this. Thanks in advance for all who are trying.
Some changes in Yasen Zhelev's code:
SELECT * FROM hall_info
WHERE id not IN (
SELECT hall_info_id FROM events
WHERE event_date >= '2015-10-23' AND event_date <= '2015-10-30'
GROUP BY hall_info_id
HAVING COUNT(DISTINCT event_date) > DATE_PART('day', '2015-10-30'::timestamp - '2015-10-23'::timestamp))
I have not tried it but how about checking if the number of bookings per hall is less than the actual days in the selected period.
SELECT * FROM hall_info WHERE id NOT IN
(SELECT hall_info_id FROM events
WHERE event_date >= '2015-10-23' AND event_date <= '2015-10-30'
GROUP BY hall_info_id
HAVING COUNT(id) < DATEDIFF(day, '2015-10-30', '2015-10-23')
);
That will only work if you have one booking per day per hall.
To get the "available dates" for the hall returned, your query needs a row source of all possible dates. For example, if you had a calendar table populated with possible date values, e.g.
CREATE TABLE cal (dt DATE NOT NULL PRIMARY KEY) Engine=InnoDB
;
INSERT INTO cal (dt) VALUES ('2015-10-23')
,('2015-10-24'),('2015-10-25'),('2015-10-26'),('2015-10-27')
,('2015-10-28'),('2015-10-29'),('2015-10-30'),('2015-10-31')
;
The you could use a query that performs a cross join between the calendar table and hall_info... to get every hall on every date... and an anti-join pattern to eliminate rows that are already booked.
The anti-join pattern is an outer join with a restriction in the WHERE clause to eliminate matching rows.
For example:
SELECT cal.dt, h.id, h.hall_name, h.address
FROM cal cal
CROSS
JOIN hall_info h
LEFT
JOIN events e
ON e.hall_id = h.id
AND e.event_date = cal.dt
WHERE e.id IS NULL
AND cal.dt >= '2015-10-23'
AND cal.dt <= '2015-10-30'
The cross join between cal and hall_info gets all halls for all dates (restricted in the WHERE clause to a specified range of dates.)
The outer join to events find matching rows in the events table (matching on hall_id and event_date. The trick is the predicate (condition) in the WHERE clause e.id IS NULL. That throws out any rows that had a match, leaving only rows that don't have a match.
This type of problem is similar to other "sparse data" problems. e.g. How do you return a zero total for sales by a given store on a given date, when there are no rows with that store and date...
In your case, the query needs a source of rows with available date values. That doesn't necessarily have to be a table named calendar. (Other databases give us the ability to dynamically generate a row source; someday, MySQL may have similar features.)
If you want the row source to be dynamic in MySQL, then one approach would be to create a temporary table, and populate it with the dates, run the query referencing the temporary table, and then dropping the temporary table.
Another approach is to use an inline view to return the rows...
SELECT cal.dt, h.id, h.hall_name, h.address
FROM (
SELECT '2015-10-23'+INTERVAL 0 DAY AS dt
UNION ALL SELECT '2015-10-24'
UNION ALL SELECT '2015-10-25'
UNION ALL SELECT '2015-10-26'
UNION ALL SELECT '2015-10-27'
UNION ALL SELECT '2015-10-28'
UNION ALL SELECT '2015-10-29'
UNION ALL SELECT '2015-10-30'
) cal
CROSS
JOIN hall_info h
LEFT
JOIN events e
ON e.hall_id = h.id
AND e.event_date = c.dt
WHERE e.id IS NULL
FOLLOWUP: When this question was originally posted, it was tagged with mysql. The SQL in the examples above is for MySQL.
In terms of writing a query to return the specified results, the general issue is still the same in PostgreSQL. The general problem is "sparse data".
The SQL query needs a row source for the "missing" date values, but the specification doesn't provide any source for those date values.
The answer above discusses several possible row sources in MySQL: 1) a table, 2) a temporary table, 3) an inline view.
The answer also mentions that some databases (not MySQL) provide other mechanisms that can be used as a row source.
For example, PostgreSQL provides a nifty generate_series function (Reference: http://www.postgresql.org/docs/9.1/static/functions-srf.html.
It should be possible to use the generate_series function as a row source, to supply a set of rows containing the date values needed by the query to produced the specified result.
This answer demonstrates the approach to solving the "sparse data" problem.
If the specification is to return just the list of halls, and not the dates they are available, the queries above can be easily modified to remove the date expression from the SELECT list, and add a GROUP BY clause to collapse the rows into a distinct list of halls.

select distinct from 2 columns but only 1 is duplicate

select a.subscriber_msisdn, war.created_datetime from
(
select distinct subscriber_msisdn from wiz_application_response
where application_item_id in
(select id from wiz_application_item where application_id=155)
and created_datetime between '2012-10-07 00:00' and '2012-11-15 00:00:54'
) a
left outer join wiz_application_response war on (war.subscriber_msisdn=a.subscriber_msisdn)
the sub select returns 11 rows but when joined return 18 (with duplicates). The objective of this query is only add the date column to the 11 rows of the sub select.
Based on your description, it stands to reason that there are multiple created_datetime values for some of the subscriber_msisdn values which is what prompted you to use the distinct in the subquery to begin with. By joining the sub query to the original table you are defeating this. A cleaner way to write the query would be:
SELECT
war.subscriber_msisdn
, war.created_datetime
FROM
wiz_application_response war
LEFT JOIN wiz_application_item wai
ON war.application_item_id = wai.id
AND wai.application_id = 155
WHERE
war.created_datetime BETWEEN '2012-10-07 00:00' AND '2012-11-15 00:00:54'
This should return only the rows from the war table that satisfy the criteria based on the wai table. It should not be and outer join unless you wanted to return all the rows from war table that satisfied the created_datetime parameter regardless of the application_item_id parameter.
This is my best guess based on the limited information I have about your tables and what I’m assuming you’re trying to accomplish. If this doesn’t get you what you are after, I will continue to offer other ideas based on additional information you could provide. Hope this works.
Can most probably simplified to this:
SELECT DISTINCT ON (1)
r.subscriber_msisdn, r.created_datetime
FROM wiz_application_item i
JOIN wiz_application_response r ON r.application_item_id = i.id
WHERE i.application_id = 155
AND i.created_datetime BETWEEN '2012-10-07 00:00' AND '2012-11-15 00:00:54'
ORDER BY 1, 2 DESC -- to pick the latest created_datetime
Details depend on missing information.
More explanation here.

a dual variable not in statement?

I have the need to look at two tables that share two variables and get a list of the data from one table that does not have matching data in the other table. Example:
Table A
xName
Date
Place
xAmount
Table B
yName
Date
Place
yAmount
I need to be able to write a query that will check Table A and find entries that have no corresponding entry in Table B. If it was a one variable issue I could use not in statement but I can't think of a way to do that with two variables. A left join also does not appear like you could do it. Since looking at it by a specific date or place name would not work since we are talking about thousands of dates and hundreds of place names.
Thanks in advance to anyone who can help out.
SELECT TableA.Date,
TableA.Place,
TableA.xName,
TableA.xAmount,
TableB.yName,
TableB.yAmount
FROM TableA
LEFT OUTER JOIN TableB
ON TableA.Date = TableB.Date
AND TableA.Place = TableB.Place
WHERE TableB.yName IS NULL
OR TableB.yAmount IS NULL
SELECT * FROM A WHERE NOT EXISTS
(SELECT 1 FROM B
WHERE A.xName = B.yName AND A.Date = B.Date AND A.Place = B.Place AND A.xAmount = B.yAmount)
in ORACLE:
select xName , xAmount from tableA
MINUS
select yName , yAmount from tableB