Subquery count() for postgres query - postgresql

I have two tables with web traffic that I am joining and visualizing on a map.
I am trying to write a counter that creates a query result with a count of the number of times a particular IP address shows up in the logs. I think this will take the form of a subquery that returns the count of rows of the specific row the main query is selecting.
When I run the following query, the error I get is more than one row returned by a subquery used as an expression.
select
squarespace_ip_addresses.ip,
squarespace_ip_addresses.latitude,
squarespace_ip_addresses.longitude,
st_SetSrid(ST_MAKEPOINT(squarespace_ip_addresses.longitude, squarespace_ip_addresses.latitude), 4326) as geom,
(select count(hostname) from squarespace_logs group by hostname) as counter,
squarespace_logs.referrer
from
squarespace_ip_addresses
left outer join
squarespace_logs
on
squarespace_ip_addresses.ip = squarespace_logs.hostname
Something that has been suggested to me is a subquery in a select that filters by the main query output runs that query for every row.
Does anyone have any insights?

Aggregate the data in a derived table (a subquery in FROM clause):
select
a.ip,
a.latitude,
a.longitude,
st_SetSrid(ST_MAKEPOINT(a.longitude, a.latitude), 4326) as geom,
c.count,
l.referrer
from squarespace_ip_addresses a
left join squarespace_logs l on a.ip = l.hostname
left join (
select hostname, count(*)
from squarespace_logs
group by hostname
) c on a.ip = c.hostname

Related

selecting a distinct column with alias table not working in postgres

SELECT pl_id,
distinct ON (store.store_ID),
in_user_id
FROM plan1.plan_copy_levl copy1
INNER JOIN plan1._PLAN_STORE store
ON copy1.PLAN_ID = store .PLAN_ID;
while running this query in postgres server i am getting the below error..How to use the distinct clause..in above code plan 1 is the schema name.
ERROR: syntax error at or near "distinct" LINE 2: distinct ON
(store.store_ID),
You are missing an order by where the first set of rows should be the ones specified in the distinct on clause. Also, the distinct on clause should be at start of the selection list.
Try this:
SELECT distinct ON (store_ID) store.store_ID, pl_id,
in_user_id
FROM plan1.plan_copy_levl copy1
INNER JOIN plan1._PLAN_STORE store
ON copy1.PLAN_ID = store .PLAN_ID
order by store_ID, pl_id;

Getting group by attribute in nested query

I am trying to find the most frequent value in a postgresql table. The problem is that I also want to "group by" in that table and only get the most frequent from the values that have the same name.
So I have the following query:
select name,
(SELECT value FROM table where name=name GROUP BY value ORDER BY COUNT(*) DESC limit 1)
as mfq from table group by name;
So, I am using where name=name, trying to get the outside group by attribute "name", but it doesn't seem to work. Any ideas on how to do it?
Edit: for example in the following table:
name value
a 3
a 3
a 3
b 2
b 2
I want to get:
name value
a 3
b 2
but the above statement gives:
name value
a 3
b 3
instead, since where doesn't work correctly.
There is a dedicated function in PostgreSQL for this case: the mode() ordered-set aggregate:
select name, mode() within group (order by value) mode_value
from table
group by name;
which returns the most frequent input value (arbitrarily choosing the first one if there are multiple equally-frequent results) -- which is the same behavior as with your order by count(*) desc limit 1.
It is available from PostgreSQL 9.4+.
http://rextester.com/GHGJH15037
If you want your query to work, you need table aliases. Table aliases and qualified column names are always a good idea:
select t.name,
(select t2.value
from table t2
where t2.name = t.name
group by t2.value
order by COUNT(*) desc
limit 1
) as mfq
from table t
group by t.name;

How does COUNT(*) behave in an inner join

Take this query:
SELECT c.CustomerID, c.AccountNumber, COUNT(*) AS CountOfOrders,
SUM(s.TotalDue) AS SumOfTotalDue
FROM Sales.Customer AS c
INNER JOIN Sales.SalesOrderheader AS s ON c.CustomerID = s.CustomerID
GROUP BY c.CustomerID, c.AccountNumber
ORDER BY c.CustomerID;
I expected COUNT(*) to count the rows in Sales.Customer but to my surprise it counts the number of rows in the joined table.
Any idea why this is? Also, is there a way to be explicit in specifying which table COUNT() should operate on?
Query Processing Order...
The FROM clause is processed before the SELECT clause -- which is to say -- by the time SELECT comes into play, there is only one (virtual) table it is selecting from -- namely, the individual tables after their joined (JOIN), filtered (WHERE), etc.
If you just want to count over the one table, then you might try a couple of things...
COUNT(DISTINCT table1.id)
Or turn the table you want to count into a sub-query with count() inside of it

Does SQL Server optimize repeated aggregate calculation in this example query?

If I execute the following query in SQL Server 2008 R2, will the count(*) aggregate be determined only once for OUTER SELECT query or it will repeat for every record in OUTER SELECT?
I was guessing that SQL Server would be intelligent to see that the same calculation is being repeated and so it would do this calculation only once for optimization purpose. The value of TotalCount in query below is going to be the same for all rows in outer query.
SELECT
p.ProductId, p.ProductName,
(select count(*) from Products p1) as TotalCount
FROM Products p
No, you're expecting too much from SQL Server. Also: the query processor really cannot be sure that this value won't be changing over time - so it cannot really "optimize" this for you.
For every single row, this subquery will be executed once.
So if your SELECT statement will return 10 million rows, this count will be determined 10 million times.
If you don't want that, you can always run the select count(*).. once before the query and store the value into a SQL variable, and select that variable in your query:
DECLARE #TableCount INT
SELECT #TableCount = COUNT(*) FROM Products
SELECT
p.ProductId, p.ProductName, #TableCount
FROM
Products p

Postgres LEFT JOIN is creating more rows than in left table

I am running Postgres 9.1.3 32-bit on Windows 7 x64. (Have to use 32 bit because there is no Windows PostGIS release compatible with 64 bit Postgres.) (EDIT: As of PostGIS 2.0, it is compatible with Postgres 64 bit on windows.)
I have a query that left joins a table (consistent.master) with a temporary table, then inserts the resulting data into a third table (consistent.masternew).
Since this is a left join, the resulting table should have the same number of rows as the left table in the query. However, if I run this:
SELECT count(*)
FROM consistent.master
I get 2085343. But if I run this:
SELECT count(*)
FROM consistent.masternew
I get 2085703.
How can masternew have more rows than master? Shouldn't masternew have the same number of rows as master, the left table in the query?
Below is the query. The master and masternew tables should be identically-structured.
--temporary table created here
--I am trying to locate where multiple tickets were written on
--a single traffic stop
WITH stops AS (
SELECT citation_id,
rank() OVER (ORDER BY offense_timestamp,
defendant_dl,
offense_street_number,
offense_street_name) AS stop
FROM consistent.master
WHERE citing_jurisdiction=1
)
--Here's the insert statement. Below you'll see it's
--pulling data from a select query
INSERT INTO consistent.masternew (arrest_id,
citation_id,
defendant_dl,
defendant_dl_state,
defendant_zip,
defendant_race,
defendant_sex,
defendant_dob,
vehicle_licenseplate,
vehicle_licenseplate_state,
vehicle_registration_expiration_date,
vehicle_year,
vehicle_make,
vehicle_model,
vehicle_color,
offense_timestamp,
offense_street_number,
offense_street_name,
offense_crossstreet_number,
offense_crossstreet_name,
offense_county,
officer_id,
offense_code,
speed_alleged,
speed_limit,
work_zone,
school_zone,
offense_location,
source,
citing_jurisdiction,
the_geom)
--Here's the select query that the insert statement is using.
SELECT stops.stop,
master.citation_id,
defendant_dl,
defendant_dl_state,
defendant_zip,
defendant_race,
defendant_sex,
defendant_dob,
vehicle_licenseplate,
vehicle_licenseplate_state,
vehicle_registration_expiration_date,
vehicle_year,
vehicle_make,
vehicle_model,
vehicle_color,
offense_timestamp,
offense_street_number,
offense_street_name,
offense_crossstreet_number,
offense_crossstreet_name,
offense_county,
officer_id,
offense_code,
speed_alleged,
speed_limit,
work_zone,
school_zone,
offense_location,
source,
citing_jurisdiction,
the_geom
FROM consistent.master LEFT JOIN stops
ON stops.citation_id = master.citation_id
In case it matters, I have run a VACUUM FULL ANALYZE and reindexed both tables. (Not sure of exact commands; did it through pgAdmin III.)
A left join does not necessarily have the same number of rows as the number of rows in the left table. Basically, it is like a normal join, except rows of the left table that would not appear in the normal join are also added. So, if you have more than one row in the right table that matches one row in the left table, you can have more rows in your results than the number of rows of the left table.
In order to do what you want to do, you should use a group by, and a count to detect multiples.
select citation_id
from stops join master on stops.citation_id = master.citation_id
group by citation_id
having count(*) > 1
Sometimes you know there are multiples, but don't care. You just want to take the first or top entry.
If so, you can use SELECT DISTINCT ON:
FROM consistent.master LEFT JOIN (SELECT DISTINCT ON (citation_id) * FROM stops) s
ON s.citation_id = master.citation_id
Where citation_id is the column that you want to take the first (any) row for each match.
You might want to ensure this is deterministic and use ORDER BY with some other orderable column:
SELECT DISTINCT ON (citation_id) * FROM stops ORDER BY citation_id, created_at