Finding the percentage (%) range of average value in SQL - postgresql

I am wanting to return the values that lie within 20% of the average value within the Duration column in my database.
I want to build on the code below but instead of returning Where Duration is less than the average value of duration I want to return all values which lay within 20% of the AVG(Duration) value.
Select * From table
Where Duration < (Select AVG(Duration) from table)

Here is one way...
Select * From table
Where Duration between (Select AVG(Duration)*0.8 from table)
and (Select AVG(Duration)*1.2 from table)
perhaps this to avoid repeated scans:
with cte as ( Select AVG(Duration) as AvgDuration from table )
Select * From table
Where Duration between (Select AvgDuration*0.8 from cte)
and (Select AvgDuration*1.2 from cte)
or
Select table.* From table
cross join ( Select AVG(Duration) as AvgDuration from table ) cj
Where Duration between cj.AvgDuration*0.8 and cj.AvgDuration*1.2
or using a window function:
Select d.*
from (
SELECT table.*
, AVG(Duration) OVER() as AvgDuration
From table
) d
Where d.Duration between d.AvgDuration*0.8 and d.AvgDuration*1.2
The last one might be the most efficient method.

Related

How to Return Records Equal to a Specific Percentage of an Aggregate in Transact-SQL?

My requirement is to provide a random sample of claims that comprise 2.5% of the total amount paid and also comprise 2.5% of total claims for a given population. The goal is to deliver records in a report that meet both criteria. My staging table is defined as follows:
[RecordId] UniqueIdentifier NOT NULL PRIMARY KEY DEFAULT NEWID()
,ClaimNO varchar(50)
,Company_ID varchar(10)
,HPCode varchar(10)
,FinancialResponsibility varchar(30)
,ProviderType varchar(50)
,DateOfService date
,DatePaid date
,ClaimType varchar(50)
,TotalBilled numeric(11,2)
,TotalPaid numeric(11,2)
,ProcessorType varchar(100)
I've already built the logic to return 2.5% of the total number of claims but need guidance in how best to ensure both criterion are met.
Here's what I've tried thus far:
with cteTotals as (
Select Count(*) as TotalClaims, sum(TotalPaid) as TotalPaid, sum(TotalPaid) * .025 as PaidSampleAmount
from [Z_Monthly_Quality_Review]
),
ctePopulation as (
Select *
from [Z_Monthly_Quality_Review]
),
cteSampleRows as (
select TOP 2.5 PERCENT NEWID() RandomID, RecordID, ClaimNo, HPCode, FinancialResponsibility, ProviderType, ProcessorType,
Format(DateOfService, 'MM/dd/yyyy') as DateOfService, Format(DatePaid, 'MM/dd/yyyy') as DatePaid, ClaimType, TotalBilled, TotalPaid
from [Z_Monthly_Quality_Review]
order by NEWID()
),
cteSamplePaid as (
Select Top 2.5 PERCENT NEWID() RandomID, RecordID, ClaimNo, HPCode, FinancialResponsibility, ProviderType, ProcessorType,
Format(DateOfService, 'MM/dd/yyyy') as DateOfService, Format(DatePaid, 'MM/dd/yyyy') as DatePaid, ClaimType, TotalBilled, TotalPaid
from [Z_Monthly_Quality_Review] mqr
inner join ctePopulation cte on mqr.ClaimNo = cte.ClaimNO
order by NEWID()
)
Since both criterion must be satisfied, how should I structure both CTEs to ensure this? In my cteSamplePaid, how do I ensure that the sum of total paid equals 2.5% of the total population? Would this be accomplished with a Having clause? The end result will be displayed to my business users via SQL Server Reporting Services. Ideally, I would want to provide them with 1 sample that meets both criteria. If that's not possible, how do I randomly sample claims from both criterion?
Don't think there is a guaranteed way it will add up to 2.5% of the total. There's no guarantee results and the performance would be very poor as it you would essentially have to brute force every possible combination of rows. A way to get very close to your goal would be to use return rows that add up to an acceptable margin of error.
Since no sample data was provided, I just used AdventureWorks2017 (downloaded from here)
USE AdventureWorks2017
GO
DROP TABLE IF EXISTS #SalesData
SELECT SalesOrderID AS ID,TotalDue
INTO #SalesData
FROM Sales.SalesOrderHeader
Declare #DesiredPercentage Numeric(10,3) = .025 /*Desired sum percentage of total rows*/
,#AcceptableMargin Numeric(10,3) = .01 /*Random row total can be plus or minus this percentage of the desired sum*/
DECLARE #DesiredSum Numeric(16,2) = #DesiredPercentage *(SELECT SUM(TotalDue) FROM #SalesData)
/*For loop*/
DECLARE #RowNum INT
,#LoopCounter INT = 1
WHILE (1=1)
BEGIN
DROP TABLE IF EXISTS #RandomData
SELECT RowNum = ROW_NUMBER() OVER (ORDER BY B.RandID),A.*,RunningTotal = SUM(TotalDue) OVER (ORDER BY B.RandID)
INTO #RandomData
FROM #SalesData AS A
CROSS APPLY (SELECT RandID = NEWID()) AS B
WHERE TotalDue < #DesiredSum /*If single row bigger than desired sum, then filter it out*/
ORDER BY B.RandID
SELECT Top(1) #RowNum = RowNum
FROM #RandomData AS A
CROSS APPLY (SELECT DeltaFromDesiredSum = ABS(RunningTotal-#DesiredSum)) AS B
WHERE RunningTotal BETWEEN #DesiredSum *(1-#AcceptableMargin) AND #DesiredSum *(1+#AcceptableMargin)
ORDER BY DeltaFromDesiredSum
IF (#RowNum IS NOT NULL)
BREAK;
IF (#LoopCounter >=100) /*Prevents infinite loops*/
THROW 59194,'Result unable to be generated in 100 tries. Recommend expanding acceptable margin',1;
SET #LoopCounter +=1;
END
SELECT *
FROM #RandomData
WHERE RowNum <= #RowNum
SELECT RandomRowTotal = SUM(TotalDue)
,DesiredSum = #DesiredSum
,PercentageFromDesiredSum = Concat(Cast(Round(100*(1-SUM(TotalDue)/#DesiredSum),2) as Float),'%')
FROM #RandomData
WHERE RowNum <= #RowNum

how to values transfer to another column with two query

I have a query. this query is calculated percentage for every product. I created a virtual column on this query this columns name is 'yüzde'. After that, i want to transfer yüzde columns to another column in another table with update query if product ids are same.
I think I need to write a stored procedure. How can I do that?
SELECT [ProductVariantId] ,
count([ProductVariantId]) as bedensayısı,
count([ProductVariantId]) * 100.0 / (SELECT Top 1 Count(*) as Total
FROM [Live_ADL].[dbo].[_INV_ProductCombinationAttributes]
Where Size LIKE '%[^0-9]%' and [StockQuantity]>0
Group by [ProductVariantId]
order by Total Desc) as yüzde
FROM [Live_ADL].[dbo].[_INV_ProductCombinationAttributes]
Where Size LIKE '%[^0-9]%' and [StockQuantity]>0
group by [ProductVariantId]
order by yüzde desc
you don't really need a SP, you can do it in-line, using CTE for instance, something along these lines:
; with tabyuzde as
(
SELECT [ProductVariantId] ,
count([ProductVariantId]) as bedensayısı,
count([ProductVariantId]) * 100.0 / (SELECT Top 1 Count(*) as Total
FROM [Live_ADL].[dbo].[_INV_ProductCombinationAttributes]
Where Size LIKE '%[^0-9]%' and [StockQuantity]>0
Group by [ProductVariantId]
order by Total Desc) as yüzde
FROM [Live_ADL].[dbo].[_INV_ProductCombinationAttributes]
Where Size LIKE '%[^0-9]%' and [StockQuantity]>0
group by [ProductVariantId]
)
update x
set othertablevalue=yüzde
from
othertable x
join tabyuzde t on x.ProductVariantId=t.ProductVariantId

How can I SUM distinct records in a Postgres database where there are duplicate records?

Imagine a table that looks like this:
The SQL to get this data was just SELECT *
The first column is "row_id" the second is "id" - which is the order ID and the third is "total" - which is the revenue.
I'm not sure why there are duplicate rows in the database, but when I do a SUM(total), it's including the second entry in the database, even though the order ID is the same, which is causing my numbers to be larger than if I select distinct(id), total - export to excel and then sum the values manually.
So my question is - how can I SUM on just the distinct order IDs so that I get the same revenue as if I exported to excel every distinct order ID row?
Thanks in advance!
Easy - just divide by the count:
select id, sum(total) / count(id)
from orders
group by id
See live demo.
Also handles any level of duplication, eg triplicates etc.
You can try something like this (with your example):
Table
create table test (
row_id int,
id int,
total decimal(15,2)
);
insert into test values
(6395, 1509, 112), (22986, 1509, 112),
(1393, 3284, 40.37), (24360, 3284, 40.37);
Query
with distinct_records as (
select distinct id, total from test
)
select a.id, b.actual_total, array_agg(a.row_id) as row_ids
from test a
inner join (select id, sum(total) as actual_total from distinct_records group by id) b
on a.id = b.id
group by a.id, b.actual_total
Result
| id | actual_total | row_ids |
|------|--------------|------------|
| 1509 | 112 | 6395,22986 |
| 3284 | 40.37 | 1393,24360 |
Explanation
We do not know what the reasons is for orders and totals to appear more than one time with different row_id. So using a common table expression (CTE) using the with ... phrase, we get the distinct id and total.
Under the CTE, we use this distinct data to do totaling. We join ID in the original table with the aggregation over distinct values. Then we comma-separate row_ids so that the information looks cleaner.
SQLFiddle example
http://sqlfiddle.com/#!15/72639/3
Create custom aggregate:
CREATE OR REPLACE FUNCTION sum_func (
double precision, pg_catalog.anyelement, double precision
)
RETURNS double precision AS
$body$
SELECT case when $3 is not null then COALESCE($1, 0) + $3 else $1 end
$body$
LANGUAGE 'sql';
CREATE AGGREGATE dist_sum (
pg_catalog."any",
double precision)
(
SFUNC = sum_func,
STYPE = float8
);
And then calc distinct sum like:
select dist_sum(distinct id, total)
from orders
SQLFiddle
You can use DISTINCT in your aggregate functions:
SELECT id, SUM(DISTINCT total) FROM orders GROUP BY id
Documentation here: https://www.postgresql.org/docs/9.6/static/sql-expressions.html#SYNTAX-AGGREGATES
If we can trust that the total for 1 order is actually 1 row. We could eliminate the duplicates in a sub-query by selecting the the MAX of the PK id column. An example:
CREATE TABLE test2 (id int, order_id int, total int);
insert into test2 values (1,1,50);
insert into test2 values (2,1,50);
insert into test2 values (5,1,50);
insert into test2 values (3,2,100);
insert into test2 values (4,2,100);
select order_id, sum(total)
from test2 t
join (
select max(id) as id
from test2
group by order_id) as sq
on t.id = sq.id
group by order_id
sql fiddle
In difficult cases:
select
id,
(
SELECT SUM(value::int4)
FROM jsonb_each_text(jsonb_object_agg(row_id, total))
) as total
from orders
group by id
I would suggest just use a sub-Query:
SELECT "a"."id", SUM("a"."total")
FROM (SELECT DISTINCT ON ("id") * FROM "Database"."Schema"."Table") AS "a"
GROUP BY "a"."id"
The Above will give you the total of each id
Use below if you want the full total of each duplicate removed:
SELECT SUM("a"."total")
FROM (SELECT DISTINCT ON ("id") * FROM "Database"."Schema"."Table") AS "a"
Using subselect (http://sqlfiddle.com/#!7/cef1c/51):
select sum(total) from (
select distinct id, total
from orders
)
Using CTE (http://sqlfiddle.com/#!7/cef1c/53):
with distinct_records as (
select distinct id, total from orders
)
select sum(total) from distinct_records;

Limit by percent instead of number of rows without subqueries

I would like to select the top 1% of rows; however, I cannot use subqueries to do it. I.e., this won't work:
SELECT * FROM mytbl
WHERE var='value'
ORDER BY id,random()
LIMIT(SELECT (COUNT(*) * 0.01)::integer FROM mytbl)
How would I accomplish the same output without using a subquery with limit?
You can utilize PERCENT_RANK:
WITH cte(ID, var, pc) AS
(
SELECT ID, var, PERCENT_RANK() OVER (ORDER BY random()) AS pc
FROM mytbl
WHERE var = 'value'
)
SELECT *
FROM cte
WHERE pc <= 0.01
ORDER BY id;
SqlFiddleDemo
I solved it with Python using the psycopg2 package:
cur.execute("SELECT ROUND(COUNT(id)*0.01,0)
FROM mytbl")
nrows = str([int(d[0]) for d in cur.fetchall()][0])
cur.execute("SELECT *
FROM mytbl
WHERE var='value'
ORDER BY id, random() LIMIT (%s)",nrows)
Perhaps there is a more elegant solution using just SQL, or a more efficient one, but this does exactly what I'm looking for.
If I got it right, you need:
Random 1% sample of all rows,
If some id is within the sample, all rows with the same id must be there too.
The follow sql should do the trick:
with ids as (
select id,
total,
sum(cnt) over (order by max(rnd)) running_total
from (
select id,
count(*) over (partition by id) cnt,
count(*) over () total,
row_number() over(order by random()) rnd
from mytbl
) q
group by id,
cnt,
total
)
select mytbl.*
from mytbl,
ids
where mytbl.id = ids.id
and ids.running_total <= ids.total * 0.01
order by mytbl.id;
I don’t have your data, of course, but I have no trouble using a sub query in the LIMIT clause.
However, the sub query contains only the count(*) part and I then multiply the result by 0.01:
SELECT * FROM mytbl
WHERE var='value'
ORDER BY id,random()
LIMIT(SELECT count(*) FROM mytbl)*0.01;

T-SQL if value exists use it other wise use the value before

I have the following table
-----Account#----Period-----Balance
12345---------200901-----$11554
12345---------200902-----$4353
12345 --------201004-----$34
12345 --------201005-----$44
12345---------201006-----$1454
45677---------200901-----$14454
45677---------200902-----$1478
45677 --------201004-----$116776
45677 --------201005-----$996
56789---------201006-----$1567
56789---------200901-----$7894
56789---------200902-----$123
56789 --------201003-----$543345
56789 --------201005-----$114
56789---------201006-----$54
I want to select the account# that have a period of 201005.
This is fairly easy using the code below. The problem is that if a user enters 201003-which doesnt exist- I want the query to select the previous value.*NOTE that there is an account# that has a 201003 period and I still want to select it too.*
I tried CASE, IF ELSE, IN but I was unsuccessfull.
PS:I cannot create temp tables due to system limitations of 5000 rows.
Thank you.
DECLARE #INPUTPERIOD INT
#INPUTPERIOD ='201005'
SELECT ACCOUNT#, PERIOD , BALANCE
FROM TABLE1
WHERE PERIOD =#INPUTPERIOD
SELECT t.ACCOUNT#, t.PERIOD, t.BALANCE
FROM (SELECT ACCOUNT#, MAX(PERIOD) AS MaxPeriod
FROM TABLE1
WHERE PERIOD <= #INPUTPERIOD
GROUP BY ACCOUNT#) q
INNER JOIN TABLE1 t
ON q.ACCOUNT# = t.ACCOUNT#
AND q.MaxPeriod = t.PERIOD
select top 1 account#, period, balance
from table1
where period >= #inputperiod
; WITH Base AS
(
SELECT *, ROW_NUMBER() OVER (ORDER BY Period DESC) RN FROM #MyTable WHERE Period <= 201003
)
SELECT * FROM Base WHERE RN = 1
Using CTE and ROW_NUMBER() (we take all the rows with Period <= the selected date and we take the top one (the one with auto-generated ROW_NUMBER() = 1)
; WITH Base AS
(
SELECT *, 1 AS RN FROM #MyTable WHERE Period = 201003
)
, Alternative AS
(
SELECT *, ROW_NUMBER() OVER (ORDER BY Period DESC) RN FROM #MyTable WHERE NOT EXISTS(SELECT 1 FROM Base) AND Period < 201003
)
, Final AS
(
SELECT * FROM Base
UNION ALL
SELECT * FROM Alternative WHERE RN = 1
)
SELECT * FROM Final
This one is a lot more complex but does nearly the same thing. It is more "imperative like". It first tries to find a row with the exact Period, and if it doesn't exists does the same thing as before. At the end it unite the two result sets (one of the two is always empty). I would always use the first one, unless profiling showed me the SQL wasn't able to comprehend what I'm trying to do. Then I would try the second one.