Create View for Latest Value Each Day (No Skipped Days) - select

I have a table that records the firmware version for each device every day. If a device goes down, the script to populate the device's firmware won't be able to reach it for a value so there is no record for offline days. I need a view that will return the latest firmware version for each device every day, regardless if the device was down or not. This works great in Postgres SQL:
SELECT
d.ip,
d.date,
CASE
WHEN f.firmware_version IS NOT NULL THEN f.firmware_version
ELSE (--Use last available firmware_version for the device:
SELECT l.firmware_version
FROM firmware l
WHERE l.date < d.date AND l.firmware_version IS NOT NULL
ORDER BY l.date DESC
LIMIT 1)
END AS firmware_version
FROM
devices d --Table with a record for every device every day
LEFT JOIN firmware f ON d.date = f.date AND d.ip = f.ip
However, we are transitioning to Denodo, and I cannot get this query to work in Denodo. It seems to fail with the subquery in the case statement. Does anyone know how I can get logic like this to create a view in Denodo?

I figured it out! It's a bit long and complicated, but it works just the way I hoped. Here is the solution if it helps anyone else:
--Get all values of firmware prior to the listed date
--Note: Will need to find the latest firmware for each date in a later step
WITH firmware_prep (
ip,
date_main,
date_firmware,
firmware
) AS (
SELECT
d.ip,
d.date,
f.date,
f.firmware
FROM
device d LEFT JOIN
firmware f ON (d.ip = f.ip AND f.date <= d.date AND f.firmware IS NOT NULL)
)
SELECT
s.ip,
s.date_main AS date,
f.firmware
FROM
(--Here's where you find which firmware date is the latest available date for each listed date:
SELECT
ip,
date_main,
MAX(date_firmware) AS select_date
FROM
firmware_prep
GROUP BY
ip,
date_main
) s LEFT JOIN
firmware f ON s.select_date = f.date AND s.ip = f.ip

Related

Proc is running slow with NOT EXISTS

I'm working on trying to create a stored procedure however I'm running into a issue where the stored procedure runs for over 5 minutes due to close to 50k records.
The process seems pretty straight forward, I'm just not sure why it is taking so long.
Essentially I have two tables:
Table_1
ApptDate ApptName ApptDoc ApptReason ApptType
-----------------------------------------------------------------------
03/15/2021 Physical Dr Smith Yearly Day
03/15/2021 Check In Dr Doe Check In Day
03/15/2021 Appt oth Dr Dee Check In Monthly
Table_2 - this table has the same exact structure as Table_1, what I am trying to achieve is simply archive the the data from Table_1
DECLARE #Date_1 as DATETIME
SET #Date_1 = GetDate() - 1
INSERT INTO Table_2 (ApptDate, ApptName, ApptDoc, ApptReason)
SELECT ApptDate, ApptName, ApptDoc, ApptReason
FROM Table_1
WHERE ApptType = 'Day' AND ApptDate = #Date_1
AND NOT EXISTS (SELECT 1 FROM Table_2
WHERE AppType = 'Day' AND ApptDate = #Date_1)
So this stored procedure seems pretty straight forward, however the NOT EXIST is causing it to be really slow.
The reason for NOT EXIST, is that this stored procedure is part of a bigger process that runs multiple times a day (morning, afternoon, night). I'm trying to make sure that I only have 1 copy of the the '03/15/2021' data. I'm basically running an archive process on previous days data (#Date_1)
Any thoughts how this can be "sped up".
For this query:
INSERT INTO Table_2 (ApptDate, ApptName, ApptDoc, ApptReason)
SELECT ApptDate, ApptName, ApptDoc, ApptReason
from Table_1 t1
Where ApptType = 'Day' and
ApptDate = #Date_1 and
NOT EXISTS (Select 1
from Table_2 t2
where t2.AppType = t1.AppType and
t2.ApptDate = t1.ApptDate
);
You want indexes on: table_1(ApptType) and more importantly, Table_2(AppType, ApptDate) or Table_2(ApptDate, AppType).
Note: I changed the correlation clause to just refer to the values in the outer query. This seems more general than your version, but should have the same performance (in this case).

Postgresql randomly starting to complain about grouping clause

For almost a year, I have been using this sql to report on the rank of game profiles, based on the number of days that a game has been above the rank of 10:
SELECT P.id, P.name, P.rank, COUNT(P.id)
FROM application_classes_models_gameprofile P
LEFT JOIN application_classes_models_gamedeveloper D ON D.id = P."developerId"
LEFT JOIN application_classes_models_gameprofileposition PP ON
PP."gameProfileId" = P.id AND
PP.position <= 10 AND
PP.position > 0
WHERE
P.inactive = false AND
D."excludeFromRanking" = false AND
P.rank <= 10 AND
P.rank > 0
GROUP BY P.id
ORDER BY COUNT(P.id) DESC
Grouping is always a big of a pain in postgresql, but the above sql has been working fine for almost a year, returning the expected results.
Yesterday, I had an issue with the game profile table which forced me to have to restore a backup, for that table. I did so using pg_restore -v --clean -t application_classes_models_gameprofile < backup.bak.
This morning, when we ran our reports, postgresql came back with the error:
column "p.name" must appear in the GROUP BY clause or be used in an aggregate function
Just to clarify, this sql has been running for almost a year, and the above error has never appeared for this specific sql query, however, it seems that after we have cleaned and restored the game profile table we're getting the above error...
I know that I can solve the problem by fixing the sql query to remove the name/rank, but I worry if there is a deeper issue here... so does anyone know why the above might happen?
Postgresal version is 9.6, running on debian 9

Find closest positive value based on multiple criteria

First of all I am still learning sql / postgresql so I am eagerly looking for explanations and thought process / strategy instead of just the raw answer. And I apologize in advance for the potential future misunderstandings or "stupid" questions.
Also if you know a great site which propose exercices or challenges in order to master sql / postgresql, I take everything :)
I am looking for a way to return the closest value, based on other specific results in the same table.
In the same table, I am tracking different events:
ESESS = End session event. Gives me a new timestamp (ts) every time Georges (id) finishes a session (let's say Georges is using a computer, so end session = shut the computer down)
USD = Amount of money inventory update event. Each time Georges spends/earn money, those 3 columns will return me the new balance (v), as well as his id and timestamp (ts) when the balance has been updated.
What I am trying to get is the balance at the end of each session.
My plan was to return esess.id and usd.v only if (ts.esess - ts.usd) is equal to the smallest minimum positive value.
So some sort of lookup from the ts.usd, when (ts.esess - ts.usd) match the condition...but I'm struggling with that part.
Here is the strategy in the following link:
QUERY PLAN
Here is the query:
SELECT
sessId, moneyV
FROM
(
SELECT
ts as sessTs,
mid as sessId
FROM
table1
WHERE
n='esess'
) as sess
INNER JOIN
(
SELECT
ts as moneyTs,
mid as moneyId,
v as moneyV
FROM
table1
WHERE
n='usd'
)as balance
ON sessId = moneyId
WHERE
sessTs - moneyTs =
(
SELECT
sessTs - moneyTs as timeDiff
FROM
table1
WHERE
sessTs - moneyTs > 0
ORDER BY
timeDiff ASC
LIMIT 1
)
;
So how should I proceed?
Also, I dug to find answers and find this post in particular, but did not understand everything and did not manage to make it work properly...
Thanks in advance!

Querying Missing rows in TSQL

We have a table that is populated from information on multiple computers every day. The problem is sometimes it doesn't pull information from certain computers.
So for a rough example, the table columns would read computer_name, information_pulled, qty_pulled, date_pulled.
So Lets say it pulled every day in a week, except the 15th. A query will pull
Computer_name, Information_pulled, qty_pulled, date_pulled
computer1 infopulled 2 2014-06-14
computer2 infopulled 3 2014-06-14
computer3 infopulled 2 2014-06-14
computer1 infopulled 2 2014-06-15
computer3 infopulled 1 2014-06-15
computer1 infopulled 3 2014-06-16
computer2 infopulled 2 2014-06-16
computer3 infopulled 4 2014-06-16
As you can see, nothing pulled in for computer 2 on the 15th. I am looking to write a query that pulls up missing rows for a specific date.
For Example, after running it it says
computer 2 null null 20140615
or anything close to this. We're trying to catch it each morning when this table isn't populated that way we can be proactive and I am not positive I can even query for missing data w/o searching for null values.
You need to have a master list of all your computers somewhere, so that you know when a computer is not accounted for in your table. Say that you have a table called Computer that holds this.
Declare a variable to store the date you want to check:
declare #date date
set #date = '6/15/2014'
Then you can query for missing rows like this:
select c.Computer_name, null, null, #date
from Computer c
where not exists(select 1
from myTable t
where t.Computer_name = c.Computer_name
and t.date_pulled = #date)
SQL Fiddle
If you are certain that every computer_name already exists in your table at least once, you could skip creating a separate Computer table, and modify the query like this:
select c.Computer_name, null, null, #date
from (select distinct Computer_name from myTable) c
where not exists(select 1
from myTable t
where t.Computer_name = c.Computer_name
and t.date_pulled = #date)
This query isn't as robust because it will not show computers that do not already have a row in your table (e.g. a new computer, or a problematic computer that has never had its information pulled).
I think a cross-join will answer your problem.
In the query below, every computer will have to have successfully uploaded at least once and at least one every day.
This way you'll get every missing computer/date couple.
select
Compare.*
from Table_1 T1
right join (
select *
from
(select Computer_name from Table_1 group by Computer_name) CPUS,
(select date_pulled from Table_1 group by date_pulled) DAYs
) Compare
on T1.Computer_name=Compare.Computer_name
and T1.date_pulled=Compare.date_pulled
where T1.Computer_name is null
Hope this help.
If you join the table to itself by date and computer_name like the following, you should get a list of missing dates
SELECT t1.computer_name, null as information_pulled, null as qty_pulled,
DATEADD(day,1,t1.date_pulled) as missing_date
FROM computer_info t1
LEFT JOIN computer_info t2 ON t2.date_pulled = DATEADD(day,1,t1.date_pulled)
AND t2.computer_name = t1.computer_name
WHERE t1.date_pulled >= '2014-06-14'
AND t2.date_pulled IS NULL
This will also get the next date that hasn't been pulled yet, but that should be clear and you could add an additional condition to filter it out.
AND DATEADD(day,1,t1.date_pulled) < '2014-06-17'
Of course, this only works if you know each of the computer names already exist in the table for previous days. If not, #Jerrad's suggestion to create a separate computer table would help.
EDIT: if the gap is larger than a single day, you may want to see that
SELECT t1.computer_name, null as info, null as qty_pulled,
DATEADD(day,1,t1.date_pulled) as missing_date,
t3.date_pulled AS next_pulled_date
FROM computer_info t1
LEFT JOIN computer_info t2 ON t2.date_pulled = DATEADD(day,1,t1.date_pulled)
AND t2.computer_name = t1.computer_name
LEFT JOIN computer_info t3 ON t3.date_pulled > t1.date_pulled
AND t3.computer_name = t1.computer_name
LEFT JOIN computer_info t4 ON t4.date_pulled > t1.date_pulled
AND t4.date_pulled < t3.date_pulled
AND t4.computer_name = t1.computer_name
WHERE t1.date_pulled >= '2014-06-14'
AND t2.date_pulled IS NULL
AND t4.date_pulled IS NULL
AND DATEADD(day,1,t1.date_pulled) < '2014-06-17'
The 't3' join will join all dates over the first missing one and the 't4' join along with t4.pulled_date IS NULL will exclude all but the lowest of those dates.
You could do this with subqueries as well, but excluding joins have served me well in the past.

SQL query problem when upgrading from SQL Server 2000 to SQL Server 2008 R2

I am currently upgrading a database server from SQL Server 2000 to SQL Server 2008 R2. One of my queries used to take under a second to run and now takes in excess of 3 minutes (running on faster a faster machine).
I think I have located where it is going wrong but not why it is going wrong. Could somebody explain what the problem is and how I might resolve it?
The abridged code is as follows:
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
...
FROM
Registrar reg
JOIN EnabledType et ON et.enabledTypeCode = reg.enabled
LEFT JOIN [Transaction] txn ON txn.registrarId = reg.registrarId
WHERE
txn.transactionid IS NULL OR
txn.transactionid IN
(
SELECT MAX(transactionid)
FROM [Transaction]
GROUP BY registrarid
)
I believe the issue is located on the "txn.transactionid IS NULL OR" line. If I remove this condition it runs as fast as it used to (less than a second) and returns all the records minus the 3 rows that that statement would have included. If I remove the second part of the OR statement it returns the 3 rows that I would expect in less than a second.
Could anybody point me in the correct direction as to why this is happening and when this change occured?
Many thanks in advance
Jonathan
I have accepted Alex's solution and included the new version of the code. It seems that we have found 0.1% of queries that the new query optimiser runs slower.
WITH txn AS (
SELECT registrarId, balance , ROW_NUMBER() OVER (PARTITION BY registrarid ORDER BY transactionid DESC) AS RowNum
FROM [Transaction]
)
SELECT
reg.registrarId,
reg.ianaId,
reg.registrarName,
reg.clientId,
reg.enabled,
ISNULL(txn.balance, 0.00) AS [balance],
reg.alertBalance,
reg.disableBalance,
et.enabledTypeName
FROM
Registrar reg
JOIN EnabledType et
ON et.enabledTypeCode = reg.enabled
LEFT JOIN txn
ON txn.registrarId = reg.registrarId
WHERE
ISNULL(txn.RowNum,1)=1
ORDER BY
registrarName ASC
Try restructuring the query using a CTE and ROW_NUMBER...
WITH txn AS (
SELECT registrarId, transactionid, ...
, ROW_NUMBER() OVER (PARTITION BY registrarid ORDER BY transactionid DESC) AS RowNum
FROM [Transaction]
)
SELECT
...
FROM
Registrar reg
JOIN EnabledType et ON et.enabledTypeCode = reg.enabled
LEFT JOIN txn ON txn.registrarId = reg.registrarId
AND txn.RowNum=1