Postgresql how to? - postgresql

Given the following tables:
\d users
Table "public.users"
Column | Type | Modifiers
--------------------+-----------------------------+----------------------------------------
id | integer | not null
email | character varying | not null default ''::character varying
encrypted_password | character varying | not null default ''::character varying
created_at | timestamp without time zone |
address_id | integer |
\d address
Table "public.address"
Column | Type | Modifiers
----------+-------------------+-----------
id | integer | not null
street | character varying |
city | character varying |
province | character varying |
zip | character varying |
country | character varying |
Write some SQL to select all users who live in XXXX?
Now scope that query to users created older than 3 months ordered by newest first?
What index would improve the performance of this query?

First query:
select u.id, u.email
from users u
inner join address a
on u.address_id = a.id
where a.city = 'XXXX'
Second query:
select u.id, u.email
from users u
inner join address a
on u.address_id = a.id
where a.city = 'XXXX' AND
u.created_at < NOW() - interval '3 months'
order by u.created_at desc

assuming you want to select users who live in a particular city, lets say "Addis Ababa" in a particular country "Ethiopia"
SELECT * FROM "users" AS u JOIN "address" AS a ON a.id = u.address_id WHERE a.city = 'Addis Ababa' AND a.country = 'Ethiopia' AND u.created_at < CURRENT_DATE - INTERVAL '3 months' ORDER BY u.created_at DESC;

Related

PostgreSQL How to merge two tables row to row without condition

I have two tables
The first table contains three text fields(username, email, num) the second have only one column with random birth_date DATE.
I need to merge tables without condition
For example
first table:
+----------+--------------+-----------+
| username | email | num |
+----------+--------------+-----------+
| 'user1' | 'user1#mail' | '+794949' |
| 'user2' | 'user2#mail' | '+799999' |
+----------+--------------+-----------+
second table:
+--------------+
| birth_date |
+--------------+
| '2001-01-01' |
| '2002-02-02' |
+--------------+
And I need result like
+----------+------------+-------------+--------------+
| username | email | num | birth_date |
+----------+------------+-------------+--------------+
| 'user1' | 'us1#mail' | '+7979797' | '2001-01-01' |
| 'user2' | 'us2#mail' | '+79898998' | '2002-02-02' |
+----------+------------+-------------+--------------+
I need to get in result table with 100 rows too
Tried different JOIN but there is no condition here
Sure there is a join condition, about the simplest there is: Join on true or cross join. Either is the basic merge tables without condition. However this does not result in what you want as it generates a result set of 10k rows. But you an then use limit:
select *
from table1
join table2 on true
order by random()
limit 100;
select *
from table1
cross join table2
order by random()
limit 100;
There is other option, witch I think may be closer to what you want. Assign a value to each row of each table. Then join on this assigned value:
select <column list>
from (select *, row_number() over() rn from table1) t1
join (select *, row_number() over() rn from table2) t2
on (t1.rn = t2.rn);
To eliminate the assigned value you must specifically list each column desired in the result. But that is the way it should be done anyway.
See demo here. (demo user just 3 rows instead of 100)

Fetch records with distinct value of one column while replacing another col's value when multiple records

I have 2 tables that I need to join based on distinct rid while replacing the column value with having different values in multiple rows. Better explained with an example set below.
CREATE TABLE usr (rid INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(12) NOT NULL,
email VARCHAR(20) NOT NULL);
CREATE TABLE usr_loc
(rid INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
code CHAR NOT NULL PRIMARY KEY,
loc_id INT NOT NULL PRIMARY KEY);
INSERT INTO usr VALUES
(1,'John','john#product'),
(2,'Linda','linda#product'),
(3,'Greg','greg#product'),
(4,'Kate','kate#product'),
(5,'Johny','johny#product'),
(6,'Mary','mary#test');
INSERT INTO usr_loc VALUES
(1,'A',4532),
(1,'I',4538),
(1,'I',4545),
(2,'I',3123),
(3,'A',4512),
(3,'A',4527),
(4,'I',4567),
(4,'A',4565),
(5,'I',4512),
(6,'I',4567);
(6,'I',4569);
Required Result Set
+-----+-------+------+-----------------+
| rid | name | Code | email |
+-----+-------+------+-----------------+
| 1 | John | B | 'john#product' |
| 2 | Linda | I | 'linda#product' |
| 3 | Greg | A | 'greg#product' |
| 4 | Kate | B | 'kate#product' |
| 5 | Johny | I | 'johny#product' |
| 6 | Mary | I | 'mary#test' |
+-----+-------+------+-----------------+
I have tried some queries to join and some to count but lost with the one which exactly satisfies the whole scenario.
The query I came up with is
SELECT distinct(a.rid)as rid, a.name, a.email, 'B' as code
FROM usr
JOIN usr_loc b ON a.rid=b.rid
WHERE a.rid IN (SELECT rid FROM usr_loc GROUP BY rid HAVING COUNT(*) > 1);`
You need to group by the users and count how many occurrences you have in usr_loc. If more than a single one, then replace the code by B. See below:
select
rid,
name,
case when cnt > 1 then 'B' else min_code end as code,
email
from (
select u.rid, u.name, u.email, min(l.code) as min_code, count(*) as cnt
from usr u
join usr_loc l on l.rid = u.rid
group by u.rid, u.name, u.email
) x;
Seems to me that you are using MySQL, rather than IBM DB2. Is that so?

Postgres - updates with join gives wrong results

I'm having some hard time understanding what I'm doing wrong.
The result of this query shows the same results for each row instead of being updated by the right result.
My DATA
I'm trying to update a table of stats over a set of business
business_stats ( id SERIAL,
pk integer not null,
b_total integer,
PRIMARY KEY(pk)
);
the details of each business are stored here
business_details (id SERIAL,
category CHARACTER VARYING,
feature_a CHARACTER VARYING,
feature_b CHARACTER VARYING,
feature_c CHARACTER VARYING
);
and here a table that associate the pk with the category
datasets (id SERIAL,
pk integer not null,
category CHARACTER VARYING;
PRIMARY KEY(pk)
);
WHAT I DID (wrong)
UPDATE business_stats
SET b_total = agg.total
FROM business_stats b,
( SELECT d.pk, count(bd.id) total
FROM business_details AS bd
INNER JOIN datasets AS d
ON bd.category = d.category
GROUP BY d.pk
) agg
WHERE b.pk = agg.pk;
The result of this query is
| id | pk | b_total |
+----+----+-----------+
| 1 | 14 | 273611 |
| 2 | 15 | 273611 |
| 3 | 16 | 273611 |
| 4 | 17 | 273611 |
but if I run just the SELECT the results of each pk are completely different
| pk | agg.total |
+----+-------------+
| 14 | 273611 |
| 15 | 407802 |
| 16 | 179996 |
| 17 | 815580 |
THE QUESTION
why is this happening?
why is the WHERE clause not working?
Before writing this question I've used as reference these posts: a, b, c
Do the following (I always recommend against joins in Updates)
UPDATE business_stats bs
SET b_total =
( SELECT count(c.id) total
FROM business_details AS bd
INNER JOIN datasets AS d
ON bd.category = d.category
where d.pk=bs.pk
)
/*optional*/
where exists (SELECT *
FROM business_details AS bd
INNER JOIN datasets AS d
ON bd.category = d.category
where d.pk=bs.pk)
The issue is your FROM clause. The repeated reference to business_stats means you aren't restricting the join like you expect to. You're joining agg against the second unrelated mention of business_stats rather than the row you want to update.
Something like this is what you are after (warning not tested):
UPDATE business_stats AS b
SET b_total = agg.total
FROM
(...) agg
WHERE b.pk = agg.pk;

Postgres: How to JOIN

I am working on Postgres 9.3. I have two tables, the first for payment items:
Table "public.prescription"
Column | Type | Modifiers
-------------------+-------------------------+--------------------------------------------------------------------
id | integer | not null default nextval('frontend_prescription_id_seq'::regclass)
presentation_code | character varying(15) | not null
presentation_name | character varying(1000) | not null
actual_cost | double precision | not null
pct_id | character varying(3) | not null
And the second for organisations:
Table "public.pct"
Column | Type | Modifiers
-------------------+-------------------------+-----------
code | character varying(3) | not null
name | character varying(200) |
I have a query to get all the payments for a particular code:
SELECT sum(actual_cost) as total_cost, pct_id as row_id
FROM prescription
WHERE presentation_code='1234' GROUP BY pct_id
Here is the query plan for that query.
Now, I'd like to annotate each row with the name property of the associated organisation. This is what I'm trying:
SELECT sum(prescription.actual_cost) as total_cost, prescription.pct_id, pct.name as row_id
FROM prescription, pct
WHERE prescription.presentation_code='0212000AAAAAAAA'
GROUP BY prescription.pct_id, pct.name;
Here's the ANALYSE for that query. It's incredibly slow: what am I doing wrong?
I think there must be a way to annotate each row with the pct.name AFTER the first query has run, which would be faster.
With JOIN (LEFT JOIN in this case, because we want the line even if there is no pct):
SELECT
sum(prescription.actual_cost) as total_cost,
prescription.pct_id,
pct.name as row_id
FROM prescription
LEFT JOIN pct ON pct.code = prescription.pct_id
WHERE
prescription.presentation_code='0212000AAAAAAAA'
GROUP BY
prescription.pct_id,
pct.name;
I don't know if it's work well, I didn't try this query.
You are taking data from 2 tables, but you do not join the tables in any way. Effectively, you make a full join, resulting in the Cartesian product of both tables. If you look at your ANALYZE statistics, you see that your nested loop processes 62 million rows. that takes time.
Add in a join condition to make this all fast:
SELECT sum(prescription.actual_cost) as total_cost, prescription.pct_id, pct.name as row_id
FROM prescription
JOIN pct On pct.code = prescription.pct_id
WHERE prescription.presentation_code = '0212000AAAAAAAA'
GROUP BY prescription.pct_id, pct.name;

How to find duplicates by name filtered by geo coordinate

I need to find duplicate entries (accommodations) by name which will be done like this:
CREATE TABLE tbl_folded AS
SELECT name
, array_agg(id) AS ids
FROM accommodations
GROUP BY 1;
Which is fine to get all the ids of accommodations with same name, unfortunately they need further filtering. I just need to get accommodation of same name within a location.
Every accommodation has an address (addresses table has foreign key, accommodation_id and lonlat column for the geo coordinate).
In order to find the closest locations I would go for s.th. like this
ORDER BY ST_Distance(addresses.lonlat, addresses.lonlat)
So how can I extend the query above to apply this location filtering?
Help is very much appreciated.
Column | Type | Modifiers
-------------+------------------------+-------------------------------------------------------------
id | integer | not null default nextval('accommodations_id_seq'::regclass)
name | character varying(255) |
category | character varying(255) |
Table "public.addresses"
Column | Type | Modifiers
------------------+-----------------------------+--------------------------------------------------------
id | integer | not null default nextval('addresses_id_seq'::regclass)
formatted | character varying(255) |
city | character varying(255) |
state | character varying(255) |
country_code | character varying(255) |
postal | character varying(255) |
lonlat | geography(Point,4326) |
accommodation_id | integer |
You can first get the duplicate accommodation_id from addresses table grouping by lonlat column like
select accommodation_id
from addresses
group by lonlat
having count(*) > 1
Then join this result with accommodation table to get the names column like below
CREATE TABLE tbl_folded AS
select ac.id,
ac.names
from accommodation ac
join (
select accommodation_id
from addresses
group by lonlat
having count(*) > 1
) tab on ac.id = tab.accommodation_id
So this is how I solved it. I just filter coordinates within a radius
SELECT
lower(name) AS base_name,
array_agg(accommodations.id) AS ids
FROM accommodations
INNER JOIN addresses ON accommodations.id = addresses.accommodation_id
GROUP BY 1, round(ST_X(lonlat::geometry)::numeric,2), round(ST_Y(lonlat::geometry)::numeric,2)