Do two array_aggs in a query share the same window? - postgresql

Consider this example:
SELECT comment_date
, array_agg(user_id) users
, array_agg(comment) comments
FROM user_comments
GROUP BY comment_date
Is it safe to assume that the indexes of users and comments refer to the same record (e.g., users[3] created comments[3])?
Is it possible that the order of the two arrays may refer to different records, possibly due to performance enhancements?
I don't know enough about the internals of Postgres to trust array_agg ordering.

Unless you explicitly specify it, you cannot assume anything about the order an aggregate function is applied. If you want to ensure that two calls to array_agg have corresponding values you should add the order by clause to both of them. E.g.:
SELECT date
, array_agg(user_id ORDER BY user_id) users
, array_agg(comment ORDER BY user_id) comments
FROM user_comments
GROUP BY date

Related

PostgreSQL how to GROUP BY single field from returned table

So I have complicated query, to simplify let it be like
SELECT
t.*,
SUM(a.hours) AS spent_hours
FROM (
SELECT
person.id,
person.name,
person.age,
SUM(contacts.id) AS contact_count
FROM
person
JOIN contacts ON contacts.person_id = person.id
) AS t
JOIN activities AS a ON a.person_id = t.id
GROUP BY t.id
Such query works fine in MySQL, but Postgres needs to know that GROUP BY field is unique, and despite it actually is, in this case I need to GROUP BY all returned fields from returned t table.
I can do that, but I don't believe that will work efficiently with big data.
I can't JOIN with activities directly in first query, as person can have several contacts which will lead query counting hours of activity several time for every joined contact.
Is there a Postgres way to make this query work? Maybe force to treat Postgres t.id as unique or some other solution that will make same in Postgres way?
This query will not work on both database system, there is an aggregate function in the inner query but you are not grouping it(unless you use window functions). Of course there is a special case for MySQL, you can use it with disabling "sql_mode=only_full_group_by". So, MySQL allows this usage because of it' s database engine parameter, but you cannot do that in PostgreSQL.
I knew MySQL allowed indeterminate grouping, but I honestly never knew how it implemented it... it always seemed imprecise to me, conceptually.
So depending on what that means (I'm too lazy to look it up), you might need one of two possible solutions, or maybe a third.
If you intent is to see all rows (perform the aggregate function but not consolidate/group rows), then you want a windowing function, invoked by partition by. Here is a really dumbed down version in your query:
.
SELECT
t.*,
SUM (a.hours) over (partition by t.id) AS spent_hours
FROM t
JOIN activities AS a ON a.person_id = t.id
This means you want all records in table t, not one record per t.id. But each row will also contain a sum of the hours for all values that value of id.
For example the sum column would look like this:
Name Hours Sum Hours
----- ----- ---------
Smith 20 120
Jones 30 30
Smith 100 120
Whereas a group by would have had Smith once and could not have displayed the hours column in detail.
If you really did only want one row per t.id, then Postgres will require you to tell it how to determine which row. In the example above for Smith, do you want to see the 20 or the 100?
There is another possibility, but I think I'll let you reply first. My gut tells me option 1 is what you're after and you want the analytic function.

Redshift: Max items within "IN clause"?

I have a query like:
SELECT count(id), pro.country_code
FROM profiles AS pro
WHERE id IN (SELECT profile_id FROM reports)
GROUP BY pro.country_code;
My questions:
How many items can you use in a Redshift IN CLAUSE? Storing the actual ids instead of the sub-sql statement has got to be faster for performing that outer query each time, right?
From what I know, there is no limit but if you going to bring a lot data you can use exists.
SELECT count(id),
pro.country_code
FROM profiles AS pro
WHERE exists (SELECT profile_id FROM reports where pro.id=reports.profile_id)
GROUP BY pro.country_code;
It should be much more faster
Also you can use intersect instead of in
As "user" already stated, your best performance will be with a WHERE EXISTS clause and subquery. Since you mentioned performance as an important consideration, I should also point out that the more important performance factor would like be your table distribution. In order for this to perform well, you'll want to double check that both tables have the column "profile_id" as the distribution key and that both tables have declared the column using the same data type.

How to combine two SQL queries where queries are joined by union

Can anyone please help me in writing a single query joining these two queries.
I am using IBM DB2.
(SELECT
TABLE1.COLS,TBLE2.COLS,TABLE3.COLS
FROM
TABLE1,TABLE2,TABLE3,TABLE_PROB
WHERE
TABLE_PROB.COL=TABLE1.COL,OTHER_CLAUSE )
UNION
(SELECT
TABLE1.COLS,TBLE2.COLS,TABLE3.COLS
FROM
TABLE1,TABLE2,TABLE3,TABLE_PROB1
WHERE TABLE_PROB1.COL=TABLE1.COL,OTHER_CLAUSE )
The two queries before and after union are same except that instead of "TABLE_PROB" it is changed to "TABLE_PROB1". There are no columns is to be selected from both the tables, they are only used to filter in the where clause.
Can anyone tell me how to combine both into a single query.
This query can be considered for the following scenario.
There are few employee details table which contains details of all employees.
"TABLE_PROB" contains list of contract employees and "TABLE_PROB1" contains list of permanent employees. I need to get the details of both the contract and not contract employees based on few criteria.
Since the query has big Whereclause and select clause firing two queries by using union,increases the cost of the query. So I need to merge it by making a single query.
Thanks for the help in advance.
You cannot avoid the UNION because you still have to access both TABLE_PROB and TABLE_PROB1. Depending on your DB2 version, platform, and the system configuration this might perform a bit better:
SELECT
TABLE1.COLS,TBLE2.COLS,TABLE3.COLS
FROM
TABLE1,TABLE2,TABLE3
WHERE
OTHER_CLAUSE
AND
EXISTS (
SELECT 1
FROM TABLE_PROB
WHERE COL=TABLE1.COL
UNION
SELECT 1
FROM TABLE_PROB1
WHERE COL=TABLE1.COL
)
Depending on the contents of TABLE_PROB.COL and TABLE_PROB1.COL UNION ALL instead of UNION might also prove beneficial.

TSQL Keyword Previous or Last or something similar

This question is geared for those who have more SQL experience than me.
I am writing a query(that will eventually be a Stored Procedure but this should be irrelevant) where I want to select the count of rows if the most recent entry's is equivalent to the one that was just entered before. And i want to continue to do this until it hits an entry that has a different value. (Poorly explained so I will show the example)
In my table I have a column 'Product_Id' and when this query is run i want it take the product_id and compare it to the previously entered product Id, if its the same I want to add one, and I want it to keep checking the previously entered product_id until it runs into a different product_id
I'm hoping it sounds more complicated than it is, and the query would look something like
Select count(Product_ID)
FROM dbo.myTable
Where Product_Id = previous(Product_Id)
Now, i know that previous isn't a keyword in TSQL, and neither was Last, but I'm hoping of someone who knows a keyword that does what I am asking.
Edit for Sam
USE DbName;
GO
WITH OrderedCount as
(
select ROW_NUMBER() OVER (Order by dbo.Line_Production.Run_Date DESC) as RowNumber,
Line_Production.Product_ID
From dbo.Line_Production
)
Select RowNumber, COUNT(OrderedCount.Product_ID) as PalletCount
From OrderedCount
WHERE OrderedCount.RowNumber + 1 = RowNumber
and Product_ID = Product_ID
Group by RowNumber
The OrderedCount portion works, and it returns the data back how I want it, I'm now having trouble comparing the Product_ID's for different RowNumbers
my Where Clause is wrong
There's no keyword. That would be a nice magic solution, but it doesn't exist, at least in part because there is no guaranteed ordering (okay, you could have the keyword only if there is an ORDER BY...). I can write you a query, but that'll take time, so for now I'll give you a few steps and I'll come back and see if you still need help in a bit.
Figure out an ORDER BY, otherwise no order is guaranteed. If there is a time entered field, that's a good choice, or an index, that works too.
Learn to use Row_Number.
Compare the table (with Row_Number) to itself where instance1.row - 1 = instance2.row.
If product_id is an identity column, couldn't you just do product_id - 1? In other words, if it's sequential, it's the same as using ROW_NUMBER mentioned in the previous comment.

Create a query to select two columns; (Company, No. of Films) from the database

I have created a database as part of university assignment and I have hit a snag with the question in the title.
More likely I am being asked to find out how many films each company has made. Which suggests to me a group by query. But I have no idea where to begin. It is only a two mark question but the syntax is not clicking in my head.
My schema is:
CREATE TABLE Movie
(movieID CHAR(3) ,
title CHAR(36),
year NUMBER,
company CHAR(50),
totalNoms NUMBER,
awardsWon NUMBER,
DVDPrice NUMBER(5,2),
discountPrice NUMBER(5,2))
There are other tables but at first glance I don't think they are relevant to this question.
I am using sqlplus10
The answer you need comes from three basic SQL concepts, I'll step through them with you. If you need more assistance to create an answer from these hints, let me know and I can try to keep guiding you.
Group By
As you mentioned, SQL offers a GROUP BY function that can help you.
A SQL Query utilizing GROUP BY would look like the following.
SELECT list, fields, aggregate(value)
FROM tablename
--WHERE goes here, if you need to restrict your result set
GROUP BY list, fields
a GROUP BY query can only return fields listed in the group by statement, or aggregate functions acting on each group.
Aggregate Functions
Your homework question also needs an Aggregate function called Count. This is used to count the results returned. A simple query like the following returns the count of all records returned.
SELECT Count(*)
FROM tablename
The two can be combined, allowing you to get the Count of each group in the following way.
SELECT list, fields, count(*)
FROM tablename
GROUP BY list, fields
Column Aliases
Another answer also tried to introduce you to SQL column aliases, but they did not use SQLPLUS syntax.
SELECT Count(*) as count
...
SQLPLUS column alias syntax is shown below.
SELECT Count(*) "count"
...
I'm not going to provide you the SQL, but instead a way to think about it.
What you want to do is select where the company matches and count the total rows returned. That count is the number of films made by the specified company.
Hope that points you in the right direction.
Select company, count(*) AS count
from Movie
group by company
select * group by company won't work in Oracle.