Sort SELECT result by pairs of columns - postgresql

In the following PostgreSQL 8.4.13 table
(where author users give grades to id users):
# \d pref_rep;
Table "public.pref_rep"
Column | Type | Modifiers
-----------+-----------------------------+-----------------------------------------------------------
id | character varying(32) | not null
author | character varying(32) | not null
good | boolean |
fair | boolean |
nice | boolean |
about | character varying(256) |
stamp | timestamp without time zone | default now()
author_ip | inet |
rep_id | integer | not null default nextval('pref_rep_rep_id_seq'::regclass)
Indexes:
"pref_rep_pkey" PRIMARY KEY, btree (id, author)
Check constraints:
"pref_rep_check" CHECK (id::text <> author::text)
Foreign-key constraints:
"pref_rep_author_fkey" FOREIGN KEY (author) REFERENCES pref_users(id) ON DELETE CASCADE
"pref_rep_id_fkey" FOREIGN KEY (id) REFERENCES pref_users(id) ON DELETE CASCADE
how to find faked entries, which have same id and same author_ip?
I.e. some users register several accounts and then submit bad notes (the good, fair, nice columns above) for other users. But I can still identify them by their author_ip addresses.
I'm trying to find them by fetching:
# select id, author_ip from pref_rep group by id, author_ip;
id | author_ip
-------------------------+-----------------
OK490496816466 | 94.230.231.106
OK360565502458 | 78.106.102.16
DE25213 | 178.216.72.185
OK331482634936 | 95.158.209.5
VK25785834 | 77.109.20.182
OK206383671767 | 80.179.90.103
OK505822972559 | 46.158.46.126
OK237791033602 | 178.76.216.77
VK90402803 | 109.68.173.37
MR16281819401420759860 | 109.252.139.198
MR5586967138985630915 | 2.93.14.248
OK341086615664 | 93.77.75.142
OK446200841566 | 95.59.127.194
But I need to sort the above result.
How can I sort it by the number of pairs (id, author_ip) desc please?

select id, pr.author_ip
from
pref_rep pr
inner join
(
select author_ip
from pref_rep
group by author_ip
having count(*) > 1
) s using(author_ip)
order by 2, 1

Related

How can I ensure that a join table is referencing two tables with a composite FK, one of the two column being in common on both tables?

I have 3 tables : employee, event, and these are N-N so the 3rd table employee_event.
The trick is, they can only N-N within the same group
employee
+---------+--------------+
| id | group |
+---------+--------------+
| 1 | A |
| 2 | B |
+---------+--------------+
event
+---------+--------------+
| id | group |
+---------+--------------+
| 43 | A |
| 44 | B |
+----
employee_event
+---------+--------------+
| employee_id | event_id |
+-------------+--------------+
| 1 | 43 |
| 2 | 44 |
+---------+--------------+
So the combination employee_id=1 event_id=44 should not be possible, because employee from group A can not attend an event from group B. How can I secure my DB with this?
My first idea is to add the column employee_event.group so that I can make my two FK (composite) with employee_id + group and event_id + group respectively to the table employee and event. But is there a way to avoid adding a column in the join table for the only purpose of FKs?
Thx!
You may create a function and use it as a check constraint on table employee_event.
create or replace function groups_match (employee_id integer, event_id integer)
returns boolean language sql as
$$
select
(select group from employee where id = employee_id) =
(select group from event where id = event_id);
$$;
and then add a check constraint on table employee_event.
ALTER TABLE employee_event
ADD CONSTRAINT groups_match_check
CHECK groups_match(employee_id, event_id);
Still bear in mind that rows in employee_event that used to be valid may become invalid but still remain intact if certain changes in tables employee and event occur.

Fetch records with distinct value of one column while replacing another col's value when multiple records

I have 2 tables that I need to join based on distinct rid while replacing the column value with having different values in multiple rows. Better explained with an example set below.
CREATE TABLE usr (rid INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(12) NOT NULL,
email VARCHAR(20) NOT NULL);
CREATE TABLE usr_loc
(rid INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
code CHAR NOT NULL PRIMARY KEY,
loc_id INT NOT NULL PRIMARY KEY);
INSERT INTO usr VALUES
(1,'John','john#product'),
(2,'Linda','linda#product'),
(3,'Greg','greg#product'),
(4,'Kate','kate#product'),
(5,'Johny','johny#product'),
(6,'Mary','mary#test');
INSERT INTO usr_loc VALUES
(1,'A',4532),
(1,'I',4538),
(1,'I',4545),
(2,'I',3123),
(3,'A',4512),
(3,'A',4527),
(4,'I',4567),
(4,'A',4565),
(5,'I',4512),
(6,'I',4567);
(6,'I',4569);
Required Result Set
+-----+-------+------+-----------------+
| rid | name | Code | email |
+-----+-------+------+-----------------+
| 1 | John | B | 'john#product' |
| 2 | Linda | I | 'linda#product' |
| 3 | Greg | A | 'greg#product' |
| 4 | Kate | B | 'kate#product' |
| 5 | Johny | I | 'johny#product' |
| 6 | Mary | I | 'mary#test' |
+-----+-------+------+-----------------+
I have tried some queries to join and some to count but lost with the one which exactly satisfies the whole scenario.
The query I came up with is
SELECT distinct(a.rid)as rid, a.name, a.email, 'B' as code
FROM usr
JOIN usr_loc b ON a.rid=b.rid
WHERE a.rid IN (SELECT rid FROM usr_loc GROUP BY rid HAVING COUNT(*) > 1);`
You need to group by the users and count how many occurrences you have in usr_loc. If more than a single one, then replace the code by B. See below:
select
rid,
name,
case when cnt > 1 then 'B' else min_code end as code,
email
from (
select u.rid, u.name, u.email, min(l.code) as min_code, count(*) as cnt
from usr u
join usr_loc l on l.rid = u.rid
group by u.rid, u.name, u.email
) x;
Seems to me that you are using MySQL, rather than IBM DB2. Is that so?

Query planner using a primary key index instead of a more targeted column index when adding order by primary key

Please excuse simplification of actual query. Just to make it readable. Currently having slowdowns in our queries when adding order by primary key.
select id, field1, field2
from table1
where field1 = 'value'
limit 1000;
So having an index for field1, this query uses that index, which makes the query faster. I can trace that the query planner uses that index via the explain command.
Adding an order by suddenly changes the index used to the primary key index though. Which makes the query a lot slower.
select id, field1, field2
from table1 where field1 = 'value'
order by id asc
limit 1000;
Is there a way to force the query planner to use the field1 index?
EDIT:
Actual table detail:
\d fax_message
Table "public.fax_message"
Column | Type | Modifiers
--------------------------+-----------------------------+-----------
id | bigint | not null
broadcast_ref | character varying(255) |
busy_retries | integer |
cli | character varying(255) |
dncr | boolean | not null
document_set_id | bigint | not null
fax_broadcast_attempt_id | bigint |
fps | boolean | not null
header_format | character varying(255) |
last_updated | timestamp without time zone | not null
max_fax_pages | integer |
message_ref | character varying(255) |
must_be_sent_before_date | timestamp without time zone |
request_id | bigint |
resolution | character varying(255) |
retries | integer |
send_from | character varying(255) |
send_ref | character varying(255) |
send_to | character varying(255) | not null
smartblock | boolean | not null
status | character varying(255) | not null
time_zone | character varying(255) |
total_pages | integer |
user_id | uuid | not null
delay_status_check_until | timestamp without time zone |
version | bigint | default 0
cost | numeric(40,10) | default 0
Indexes:
"fax_message_pkey" PRIMARY KEY, btree (id)
"fax_message_broadcast_ref_idx" btree (broadcast_ref)
"fax_message_delay_status_check_until" btree (delay_status_check_until)
"fax_message_document_set_idx" btree (document_set_id)
"fax_message_fax_broadcast_attempt_idx" btree (fax_broadcast_attempt_id)
"fax_message_fax_document_set_idx" btree (document_set_id)
"fax_message_message_ref_idx" btree (message_ref)
"fax_message_request_idx" btree (request_id)
"fax_message_send_ref_idx" btree (send_ref)
"fax_message_status_fax_broadcast_attempt_idx" btree (status, fax_broadcast_attempt_id)
"fax_message_user" btree (user_id)
Foreign-key constraints:
"fk2881c4e5106ed2de" FOREIGN KEY (request_id) REFERENCES core_api_send_fax_request(id)
"fk2881c4e5246f3088" FOREIGN KEY (document_set_id) REFERENCES fax_document_set(id)
"fk2881c4e555aad98b" FOREIGN KEY (user_id) REFERENCES users(id)
"fk2881c4e59920b254" FOREIGN KEY (fax_broadcast_attempt_id) REFERENCES fax_broadcast_attempt(id)
Referenced by:
TABLE "fax_message_status_modifier" CONSTRAINT "fk2dfbe52acb955ec1" FOREIGN KEY (fax_message_id) REFERENCES fax_message(id)
TABLE "fax_message_attempt" CONSTRAINT "fk82058973cb955ec1" FOREIGN KEY (fax_message_id) REFERENCES fax_message(id)
Actual index used:
\d fax_message_status_fax_broadcast_attempt_idx
Index "public.fax_message_status_fax_broadcast_attempt_idx"
Column | Type | Definition
--------------------------+------------------------+--------------------------
status | character varying(255) | status
fax_broadcast_attempt_id | bigint | fax_broadcast_attempt_id
btree, for table "public.fax_message"
Real queries:
With order by:
explain select this_.id as id65_0_, this_.version as version65_0_, this_.broadcast_ref as broadcast3_65_0_, this_.busy_retries as busy4_65_0_, this_.cli as cli65_0_, this_.cost as cost65_0_, this_.delay_status_check_until as delay7_5_0_, this_.dncr as dncr65_0_, this_.document_set_id as document9_65_0_, this_.fax_broadcast_attempt_id as fax10_65_0_, this_.fps as fps65_0_, this_.header_format as header12_65_0_, this_.last_updated as last13_65_0_, this_.max_fax_pages as max14_65_0_, this_.message_ref as message15_65_0_, this_.must_be_sent_before_date as must16_65_0_, this_.request_id as request17_65_0_, this_.resolution as resolution65_0_, this_.retries as retries65_0_, this_.send_from as send20_65_0_, this_.send_ref as send21_65_0_, this_.send_to as send22_65_0_, this_.smartblock as smartblock65_0_, this_.status as status65_0_, this_.time_zone as time25_65_0_, this_.total_pages as total26_65_0_, this_.user_id as user27_65_0_ from fax_message this_ where this_.status='TO_CHARGE_GROUP' order by id asc limit 1000;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------
Limit (cost=0.43..53956.06 rows=1000 width=2234)
-> Index Scan using fax_message_pkey on fax_message this_ (cost=0.43..2601902.61 rows=48223 width=2234)
Filter: ((status)::text = 'TO_CHARGE_GROUP'::text)
(3 rows)
This one without the order by:
explain select this_.id as id65_0_, this_.version as version65_0_, this_.broadcast_ref as broadcast3_65_0_, this_.busy_retries as busy4_65_0_, this_.cli as cli65_0_, this_.cost as cost65_0_, this_.delay_status_check_until as delay7_5_0_, this_.dncr as dncr65_0_, this_.document_set_id as document9_65_0_, this_.fax_broadcast_attempt_id as fax10_65_0_, this_.fps as fps65_0_, this_.header_format as header12_65_0_, this_.last_updated as last13_65_0_, this_.max_fax_pages as max14_65_0_, this_.message_ref as message15_65_0_, this_.must_be_sent_before_date as must16_65_0_, this_.request_id as request17_65_0_, this_.resolution as resolution65_0_, this_.retries as retries65_0_, this_.send_from as send20_65_0_, this_.send_ref as send21_65_0_, this_.send_to as send22_65_0_, this_.smartblock as smartblock65_0_, this_.status as status65_0_, this_.time_zone as time25_65_0_, this_.total_pages as total26_65_0_, this_.user_id as user27_65_0_ from fax_message this_ where this_.status='TO_CHARGE_GROUP' limit 1000;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.56..1744.13 rows=1000 width=2234)
-> Index Scan using fax_message_status_fax_broadcast_attempt_idx on fax_message this_ (cost=0.56..84080.59 rows=48223 width=2234)
Index Cond: ((status)::text = 'TO_CHARGE_GROUP'::text)
(3 rows)
The cost on the query that used the fax_message_pkey is greater than the max cost of the query that used fax_message_status_fax_broadcast_attempt_idx.
I was hoping that the query will still use the fax_message_status_fax_broadcast_attempt_idx index even with the order by there.
According to How do I force Postgres to use a particular index? (and links from answers there) there does not seem to be a way to force use of particular index .
CTEs are a optimization fence. You're not giving us enough information to tell you why your query is getting planned wrong, but this should work if you don't care to actually fix the problem.
WITH t AS (
select id, field1, field2
from table1
where field1 = 'value'
limit 1000
)
SELECT *
FROM t
order by id asc;

Postgres - updates with join gives wrong results

I'm having some hard time understanding what I'm doing wrong.
The result of this query shows the same results for each row instead of being updated by the right result.
My DATA
I'm trying to update a table of stats over a set of business
business_stats ( id SERIAL,
pk integer not null,
b_total integer,
PRIMARY KEY(pk)
);
the details of each business are stored here
business_details (id SERIAL,
category CHARACTER VARYING,
feature_a CHARACTER VARYING,
feature_b CHARACTER VARYING,
feature_c CHARACTER VARYING
);
and here a table that associate the pk with the category
datasets (id SERIAL,
pk integer not null,
category CHARACTER VARYING;
PRIMARY KEY(pk)
);
WHAT I DID (wrong)
UPDATE business_stats
SET b_total = agg.total
FROM business_stats b,
( SELECT d.pk, count(bd.id) total
FROM business_details AS bd
INNER JOIN datasets AS d
ON bd.category = d.category
GROUP BY d.pk
) agg
WHERE b.pk = agg.pk;
The result of this query is
| id | pk | b_total |
+----+----+-----------+
| 1 | 14 | 273611 |
| 2 | 15 | 273611 |
| 3 | 16 | 273611 |
| 4 | 17 | 273611 |
but if I run just the SELECT the results of each pk are completely different
| pk | agg.total |
+----+-------------+
| 14 | 273611 |
| 15 | 407802 |
| 16 | 179996 |
| 17 | 815580 |
THE QUESTION
why is this happening?
why is the WHERE clause not working?
Before writing this question I've used as reference these posts: a, b, c
Do the following (I always recommend against joins in Updates)
UPDATE business_stats bs
SET b_total =
( SELECT count(c.id) total
FROM business_details AS bd
INNER JOIN datasets AS d
ON bd.category = d.category
where d.pk=bs.pk
)
/*optional*/
where exists (SELECT *
FROM business_details AS bd
INNER JOIN datasets AS d
ON bd.category = d.category
where d.pk=bs.pk)
The issue is your FROM clause. The repeated reference to business_stats means you aren't restricting the join like you expect to. You're joining agg against the second unrelated mention of business_stats rather than the row you want to update.
Something like this is what you are after (warning not tested):
UPDATE business_stats AS b
SET b_total = agg.total
FROM
(...) agg
WHERE b.pk = agg.pk;

How to find duplicates by name filtered by geo coordinate

I need to find duplicate entries (accommodations) by name which will be done like this:
CREATE TABLE tbl_folded AS
SELECT name
, array_agg(id) AS ids
FROM accommodations
GROUP BY 1;
Which is fine to get all the ids of accommodations with same name, unfortunately they need further filtering. I just need to get accommodation of same name within a location.
Every accommodation has an address (addresses table has foreign key, accommodation_id and lonlat column for the geo coordinate).
In order to find the closest locations I would go for s.th. like this
ORDER BY ST_Distance(addresses.lonlat, addresses.lonlat)
So how can I extend the query above to apply this location filtering?
Help is very much appreciated.
Column | Type | Modifiers
-------------+------------------------+-------------------------------------------------------------
id | integer | not null default nextval('accommodations_id_seq'::regclass)
name | character varying(255) |
category | character varying(255) |
Table "public.addresses"
Column | Type | Modifiers
------------------+-----------------------------+--------------------------------------------------------
id | integer | not null default nextval('addresses_id_seq'::regclass)
formatted | character varying(255) |
city | character varying(255) |
state | character varying(255) |
country_code | character varying(255) |
postal | character varying(255) |
lonlat | geography(Point,4326) |
accommodation_id | integer |
You can first get the duplicate accommodation_id from addresses table grouping by lonlat column like
select accommodation_id
from addresses
group by lonlat
having count(*) > 1
Then join this result with accommodation table to get the names column like below
CREATE TABLE tbl_folded AS
select ac.id,
ac.names
from accommodation ac
join (
select accommodation_id
from addresses
group by lonlat
having count(*) > 1
) tab on ac.id = tab.accommodation_id
So this is how I solved it. I just filter coordinates within a radius
SELECT
lower(name) AS base_name,
array_agg(accommodations.id) AS ids
FROM accommodations
INNER JOIN addresses ON accommodations.id = addresses.accommodation_id
GROUP BY 1, round(ST_X(lonlat::geometry)::numeric,2), round(ST_Y(lonlat::geometry)::numeric,2)