Find closest number to one number in postgresql - postgresql

I have a database like this:
people
id name zip
1 bill 84058
2 susan 90001
3 john 64354
Say I have an input number of 65432
I want to write a query something like this:
SELECT * FROM people WHERE zip CLOSEST TO 65432 LIMIT 1
And get john as the row returned.
I can't find what the closest to command is in Postgresql

You could use ABS function:
SELECT *
FROM people
ORDER BY ABS(65432 - zip) ASC LIMIT 1

Related

How to use a (repeating) aggregate function value with other columns from the table I use the aggregate function on

Problem: I have to count the number of times a certain user has a certificate and then return the users name, his number of certificates and the difference between the maximum number of certificates across all users and this specific users number of certificates. I succeeded in the first part (getting the number of certificates) which I'll denote as $query$ (because I have a feeling my problem has something to do with aliasing).
So $query$ looks like this:
User |N_Certificates
Geoff 4
Ann 2
Lisa 0
And my end result should look like this:
User |N_Certificates |Difference
Geoff 4 0
Ann 2 2
Lisa 0 4
I tried this query:
SELECT Sub.name, Sub.N_Certificates,
MAX(Sub.N_Certificates)- Sub.Certificates AS Difference FROM ($_query_$) AS SUB
but it returned a error (because I was trying to use an aggregate function in combination with a column I was not grouping by) or a wrong result (notably, difference=0 for all columns).
I tried a contraption with INNER JOIN on another version of sub (same $query$ code with another alias) but it also didn't work (same reason). I could ofcourse hard code the max but I don't think that's a good solution. My about screen tells me I'm using version 1.18 of pg_Admin.
You can't do it in this way, SQL syntax doesn't allow this.
The easiest way is to use a subquery:
SELECT Sub.name, Sub.N_Certificates,
(SELECT MAX(Sub.N_Certificates) FROM ($_query_$))
-
Sub.Certificates AS Difference
FROM ($_query_$) AS SUB
You can also use a common table expression:
WITH some_alias AS(
SELECT * FROM ($_query_$)
)
SELECT name, N_Certificates.
(SELECT MAX(N_Certificates)FROM some_alias)
-
Certificates AS Difference
FROM some_alias
And you can use a windows function: http://www.postgresql.org/docs/9.1/static/tutorial-window.html
SELECT Sub.name, Sub.N_Certificates,
MAX(Sub.N_Certificates) OVER ()
-
Sub.Certificates AS Difference
FROM ($_query_$) AS SUB

Is it possible to return only one instance of each ID in a view?

I am trying to work out how I would ensure I only get one instance of each user and their ID when I try to do an inner join on my source table.
The source data is a series of user names and IDs
userid username
1 alice
2 bob
3 charley
4 dave
5 robin
6 jon
7 lou
8 scott
I have had to write the algorithm in python, to make sure I only get one set of user data matches with another (so we can make sure each user's data is used once in each round)
We store the pairings, and how many rounds have been completed successfully after the tests, but I'd like to try and shortcut the process
I can get all the results, but I want to find a better way to remove each matched pair from the results so they can't be matched again.
select u.user_id, u.user, ur.user_id, ur.user
from userResults u inner join userResults ur
on u.user_id < ur.user_id
and (u.user_id, ur.user_id) not in
(select uid, uid2 from rounds)
where u.match <= ur.match and ((u.user_id) not in %s
and ur.user_id not in %s) limit 1;
I've tried making materialised views with a unique constraint, but it doesn't seem to affect it - I get each possible pairing once, rather than each user paired only once
I'm trying to work out how I only get 4 results, in the right order.
Every time I look at the underlying code, I can't help but think there's a better way to write it natively in SQL rather than having to iterate over results in python.
edit
assuming each user has been matched 0 or more times, you might have a situtation where user_id's 1-4 have rounds set to 1, and matches set to 1, and the remaining 4 have rounds set to 1 and no matches.
I have a view which will return a default value of 0 and 0 for rounds and matches if they haven't yet played, and you can't assume all rounds entered have met with a match.
If the first 4 have all matched, and have generated rounds, user 1 and user 2 have already met and matched in a round, so they won't be matched again, so user 1 will match user 3 (or 4) and user 2 will match user 4 (or 3)
The issue I'm having is that when I remove limit, and iterate through manually - the first three matches I always get are: 2,4 then 1,3, then 2,3 (rather than 5,7 or 6,8)
Adding the sample data and current rounds
table rounds
uid uid2
1 2
3 4
userresults view
user_id user rounds score
1 alice 1 0
2 bob 1 1
3 charley 1 1
4 dave 1 0
5 robin 0 0
6 jon 0 0
7 lou 0 0
8 scott 0 0
I'm currently getting results like:
2,4
2,3
1,3
1,4
4,6
...
These are all valid results, but I would like to limit them to a single instance of each ID in each column, so just the first match of each valid pairing.
I've created a new view to try and simplify things a but, and populated it with dummy data and tried to generate matches
All these matches are valid, and I'm trying to add some form of filter or restriction to bring it back to sensible numbers.
777;"Bongo Steveson";779;"Drongo Mongoson"
777;"Bongo Steveson";782;"Cheesey McHamburger"
777;"Bongo Steveson";780;"Buns Mattburger"
779;"Drongo Mongoson";782;"Cheesey McHamburger"
779;"Drongo Mongoson";781;"Hamburgler Bunburger"
775;"Bob Jobsworth";777;"Bongo Steveson"
778;"Mongo Bongoson";779;"Drongo Mongoson"
775;"Bob Jobsworth";778;"Mongo Bongoson"
778;"Mongo Bongoson";781;"Hamburgler Bunburger"
775;"Bob Jobsworth";782;"Cheesey McHamburger"
775;"Bob Jobsworth";781;"Hamburgler Bunburger"
775;"Bob Jobsworth";780;"Buns Mattburger"
776;"Steve Bobson";777;"Bongo Steveson"
776;"Steve Bobson";779;"Drongo Mongoson"
776;"Steve Bobson";782;"Cheesey McHamburger"
776;"Steve Bobson";778;"Mongo Bongoson"
776;"Steve Bobson";781;"Hamburgler Bunburger"
780;"Buns Mattburger";782;"Cheesey McHamburger"
780;"Buns Mattburger";781;"Hamburgler Bunburger"
I still can't work out a sensible way to restrict these values, and it's driving me nuts.
I've implemented a solution in code but I'd really like to see if I can get this working in native Postgres.
At this point I'm monkeying around with a new test database schema, and this is my view - the adding unique to the index generates an error, and I can't add a check constraint to a materialised view (grrrr)
You can try joining sub query to ensure distinct record from user table.
select * from any_table t1
inner join(
select distinct userid,username from source_table
) t2 on t1.userid=t2.userid;

SQL Sum and Group By for a running Tally?

I'm completely rewriting my question to simplify it. Sorry if you read the prior version. (The previous version of this question included a very complex query example that created a distraction from what I really need.) I'm using SQL Express.
I have a table of lessons.
LessonID StudentID StudentName LengthInMinutes
1 1 Chuck 120
2 2 George 60
3 2 George 30
4 1 Chuck 60
5 1 Chuck 10
These would be ordered by date. (Of course the actual table is thousands of records with dates and other lesson-related data but this is a simplification.)
I need to query this table such that I get all rows (or a subset of rows by a date range or by student), but I need my query to add a new column we might call PriorLessonMinutes. That is, the sum of all minutes of all lessons for the same student in lessons of PRIOR dates only.
So the query would return:
LessonID StudentID StudentName LengthInMinutes PriorLessonMinutes
1 1 Chuck 120 0
2 2 George 60 0
3 2 George 30 60 (The sum Length from row 2 only)
4 1 Chuck 60 120 (The sum Length from row 1 only)
5 1 Chuck 10 180 (The sum of Length from rows 1 and 4)
In essence, I need a running tally of the sum of prior lesson minutes for each student. Ideally the tally shouldn't include the current row, but if it does, no big deal as I can do subtraction in the code that receives the query.
Further, (and this is important) if I retrieve only a subset of records, (for example by a date range) PriorLessonMinutes must be a sum that considers rows that are NOT returned.
My first idea was to use SUM() and to GROUP BY Student, but that isn't right because unless I'm mistaken it would include a sum of minutes for all rows for each student, including rows that come after the row which aren't relevant to the sum I need.
OPTIONS I'M REJECTING: I could scan through all rows in my code that receives it, (although this would force me to retrieve all rows unnecessarily) but that's obviously inefficient. I could also put a real data field in there and populate it, but this too presents problems when other records are deleted or altered.
I have no idea how to write such a query together. Any guidance?
This is a great opportunity to use Windowed Aggregates. The trick is that you need SQL Server 2012 Express. If you can get it, then this is the query you are looking for:
select *,
sum(LengthInMinutes)
over (partition by StudentId order by LessonId
rows between unbounded preceding and 1 preceding)
as PriorLessonMinutes
from Lessons
Note that it returns NULLs instead of 0s (zeroes). If you insist on zeroes, use COALESCE function to turn NULLs into zeroes.
I suggest using a nested query to limit the number of rows returned:
select * from
(
select *,
sum(LengthInMinutes)
over (partition by StudentId order by LessonId
rows between unbounded preceding and 1 preceding)
as PriorLessonMinutes
from Lessons
) as NestedLessons
where LessonId > 3 -- this is an example of a filter
This way the filter is applied after the aggregation is complete.
Now, if you want to apply a filter that doesn't affect the aggregation (like only querying data for a certain student), you should apply the filter to the inner query, as pruning the rows that don't affect the computation early (like data for other students) will improve the performance.
I feel the following code will serve your purpose.Check it:-
select Students.StudentID ,Students.First, Students.Last,sum(Lessons.LengthInMinutes)
as TotalPriorMinutes from lessons,students
where Lessons.StartDateTime < getdate()
and Lessons.StudentID = Students.StudentID
and StartDateTime >= '20090130 00:00:00' and StartDateTime < '20790101 00:00:00'
group by Students.StudentID ,Students.First, Students.Last

SQL Server 2008: Pivot column with no aggregate function workaround

Yes I know, this question has been asked MANY times but after reading all the posts I found that there wasn't an answer that fits my need. So, Heres my question. I would like to take a column of values and pivot them into rows of 6 columns.
I want to take this...... And turn it into this.......................
G Letter Date Code Ammount Name Account
081278 G 081278 12 00123535 John Doe 123456
12
00123535
John Doe
123456
I have 110000 values in this one column in one table called TempTable. I need all the values displayed because each row is an entity to itself. For instance, There is one unique entry for all of the Letter, Date, Code, Ammount, Name, and Account columns. I understand that the aggregate function is required but is there a workaround that will allow me to get this desired result?
Just use a MAX aggregate
If one row = one column (per group of 6 rows) then MAX of a single value = that row value.
However, the data you've posted in insufficient. I don't see anything to:
associate the 6 rows per group
distinguish whether a row is "Letter" or "Name"
There is no implicit row order or number to rely upon to generate the groups
Unfortunately, the max columns in a SQL 2008 select statement is 4,096 as per MSDN Max Capacity.
Instead of using a pivot, you might consider dynamic SQL to get what you want to do.
Declare #SQLColumns nvarchar(max),#SQL nvarchar(max)
select #SQLColumns=(select '''+ColName+'''',' from TableName for XML Path(''))
set #SQLColumns=left(#SQLColumns,len(#SQLColumns)-1)
set #SQL='Select '+#SQLColumns
exec sp_ExecuteSQL #SQL,N''

Incredibly slow Materialized View creation when using string aggregation, any performance suggestions?

I've got a load of materialized views, some of them take just a few seconds to create and refresh, whereas others can take me up to 40 minutes to compile, if SQLDeveloper doesn't crash before that.
I need to aggregate some strings in my query, and I have the following function
create or replace
function stragg
( input varchar2 )
return varchar2
deterministic
parallel_enable
aggregate using stragg_type
;
Then, in my MV I use a select statement such as
SELECT
hse.refno,
STRAGG (DISTINCT per.person_name) as PERSONS
FROM
HOUSES hse,
PERSONS per
This is great, because it gives me the following :
refno persons
1 Dave, John, Mary
2 Jack, Jill
Instead of :
refno persons
1 Dave
1 John
1 Mary
2 Jack
2 Jill
It seems that when I use this STRAGG function, the time it takes to create/refresh an MV increases dramatically. Is there an alternative method to achieve a comma separate list of values? I use this throughout my MVs so it is quite a required feature for me
Thanks
There are a number of techniques for string aggregation at the link below. They might provide better performance for you.
http://www.oracle-base.com/articles/misc/StringAggregationTechniques.php