Merge two columns based on a table value - postgresql

I'm trying to merge french strings (language ID 1) into one column. So far, I'm able to get french strings in table1.title and table2.translated_topic, but am not sure how to concatenate them.
Ver: Postgres 9.6.0
Source table schemas:
Table 1: knowledgebase_topics
id | title | language_id |
------------------------------------
64 | The Topic | 91 |
65 | The Topic 2 | 91 |
62 | Le fav sujet | 1 |
63 | Le fav sujet 2 | 1 |
61 | le bonjour | 1 |
Table 2: knowledgebase_topics_translations
id | translated_topic| knowledgebase_topic_id | language_id |
-------------------------------------------------------------
| Le sujet | 64 | 1 |
| Le sujet 2 | 65 | 1 |
| Fav The Topic | 62 | 91 |
| Fav The Topic 2 | 63 | 91 |
Given the following Query:
SELECT title, translated_topic, "kbt".language_id, "kbtt".language_id
FROM knowledgebase_topics as "kbt"
LEFT JOIN knowledgebase_topics_translations as "kbtt" on ("kbtt".knowledgebase_topic_id = "kbt".id)
INNER JOIN knowledgebase_topics_organizations as "kbto" on ("kbto".knowledgebase_topic_id = "kbt".id)
WHERE "kbto"."organization_id" = 1
AND to_tsvector("kbt".title) ## to_tsquery('le')
OR to_tsvector("kbtt".translated_topic) ## to_tsquery('le')
AND "kbt".language_id = 1
OR "kbtt".language_id = 1;
I get the following results:
title | translated_topic | language_id | language_id
----------------+------------------+-------------+-------------
The Topic | Le sujet | 91 | 1
The Topic 2 | Le sujet 2 | 91 | 1
Le fav sujet | Fav The Topic | 1 | 91
Le fav sujet 2 | Fav The Topic 2 | 1 | 91
le bonjour | | 1 |
Desired results: table1.title and table2.translated_topics have been merged based on language_id == 1. Both tables have a language ID column.
title | language_id
----------------+--------------
Le sujet | 1
Le sujet 2 | 1
Le fav sujet | 1
Le fav sujet 2 | 1
le bonjour | 1
How can I do this?
Note: I do not simply want to check lang IDs = 1, such as
and "kbt".language_id = 1 AND (instead of OR) "kbtt".language_id = 1;
Because this results in 2 missing records from table 2 of language ID 1:
title | translated_topic | language_id | language_id
----------------+------------------+-------------+-------------
Le fav sujet | Fav The Topic | 1 | 91
Le fav sujet 2 | Fav The Topic 2 | 1 | 91
le bonjour | | 1 |
So, I've got it working... but is this performant?
SELECT title, "kbt".language_id
FROM knowledgebase_topics as "kbt"
INNER JOIN knowledgebase_topics_organizations as "kbto" on ("kbto".knowledgebase_topic_id = "kbt".id)
WHERE "kbto"."organization_id" = 1
AND to_tsvector("kbt".title) ## to_tsquery('le')
AND "kbt".language_id = 1
UNION ALL
SELECT translated_topic, "kbtt".language_id
FROM knowledgebase_topics_translations as "kbtt"
INNER JOIN knowledgebase_topics_organizations as "kbto" on ("kbto".knowledgebase_topic_id = "kbtt".id)
WHERE "kbto"."organization_id" = 1
AND to_tsvector("kbtt".translated_topic) ## to_tsquery('le')
AND "kbtt".language_id = 1;
Gives output:
title | language_id
----------------+-------------
le bonjour | 1
Le fav sujet | 1
Le fav sujet 2 | 1
Le sujet | 1
Le sujet 2 | 1
(5 rows)

Setting up an environment to answer the question
First, observe how we best describe the problem with concise DDL. Preferably in the future, you'll learn how to write questions like this..
CREATE TEMPORARY TABLE knowledgebase_topics AS
SELECT * FROM ( VALUES
(64,'The Topic',91),
(65,'The Topic 2',91),
(62,'Le fav sujet',1),
(63,'Le fav sujet 2',1),
(61,'le bonjour',1)
) AS t(knowledgebase_topic_id, title, language_id);
CREATE TEMPORARY TABLE knowledgebase_topics_translations AS
SELECT * FROM ( VALUES
('Le sujet' ,64,1 ),
('Le sujet 2' ,65,1 ),
('Fav The Topic' ,62,91 ),
('Fav The Topic 2',63,91 )
) AS t(translated_topic, knowledgebase_topic_id, language_id);
Then you need only tell us what you want and we can get a working environment up easily and answer your question. No English required! Easier on both of us.
The solution
Here we use a UNION ALL we wrap that in a SELECT so we can sort by id, and easily change in one place the language that you're looking for.
SELECT title, language_id
FROM (
SELECT knowledgebase_topic_id, title, language_id
FROM knowledgebase_topics
UNION ALL
SELECT knowledgebase_topic_id, translated_topic, language_id
FROM knowledgebase_topics_translations
) AS t(id, title, language_id)
WHERE language_id = 1
ORDER BY id;
Output
title │ language_id
────────────────┼─────────────
le bonjour │ 1
Le fav sujet │ 1
Le fav sujet 2 │ 1
Le sujet │ 1
Le sujet 2 │ 1
(5 rows)

Related

Postgres join when only one row is equal

I have two tables and I am wanting to do an inner join between table_1 and table_2 but only when there is one row in table_2 that meets the join criteria.
For example:
table_1
id | name | age |
-----------------+------------------+--------------+
1 | john jones | 10 |
2 | pete smith | 15 |
3 | mary lewis | 12 |
4 | amy roberts | 13 |
table_2
id | name | age | hair | height |
-----------------+------------------+--------------+--------------+--------------+
1 | john jones | 10 | brown | 100 |
2 | john jones | 10 | blonde | 132 |
3 | mary lewis | 12 | brown | 146 |
4 | pete smith | 15 | black | 171 |
So I want to do a join when name is equal, but only when there is one corresponding matching name in table_2
So my results would look like this:
id | name | age | hair |
-----------------+------------------+--------------+--------------+
2 | pete smith | 15 | black |
3 | mary lewis | 12 | brown |
As you can see, John Jones isn't in the results as there are two corresponding rows in table_2.
My initial code looks like this:
select tb.id,tb.name,tb.age,sc.hair
from table_1 tb
inner join table_2 sc
on tb.name = sc.name and tb.age = sc.age
Can I apply a clause within the join so that it only joins on rows which are unique matches?
Group by all columns and apply having count(*) = 1
select tb.id,tb.name,tb.age,sc.hair
from table_1 tb
join table_2 sc
on tb.name = sc.name and tb.age = sc.age
group by tb.id,tb.name,tb.age,sc.hair
having count(*) = 1
The interesting thing to note is that you don’t need the aggregate expression (in the case count(*) )in the select clause.

How to keep related groups together?

DB fiddle DB fiddle (updated)
I pay attention to 1-2; 3-4 cases. Notice, that all rows from config_id group still together with all rows of xgroup
Detailed task description:
I have some order details for some resource. For each ordered resource there should be allocated resource. Resource may belongs to each other and belongs to some group. Consider virtual machine VM has: RAM, CPU, HDD.
Currently relation between orders and allocated resources is broken. I try to write query to analyze what is ordered without allocated resource and what is allocated without order.
CREATE TABLE order_detail (
id SERIAL,
order_id INTEGER,
resource_type_id INTEGER,
allocated_resource_id INTEGER,
amount INTEGER
)
CREATE TABLE allocated_resource (
id SERIAL,
group_id INTEGER,
resource_type_id INTEGER,
resource_uuid UUID,
)
This is easily done:
select * from order_detail od
where od.allocated_resource_id IS NULL
SELECT * FROM allocated_resource ar
WHERE NOT EXISTS ( SELECT 1 FROM order_detail od where od.allocated_resource_id = ar.id)
But I need to found to which Order assign that allocated resource. Or which resource allocate for this Order. Or which Order bind to allocated resource.
Data example:
id | order_id | allocated_resource_id | resource_type_id
--------------------------------------------------------
41 | 1 | 1 | 70
42 | 1 | | 71
43 | 1 | | 73
44 | 2 | | 70
45 | 2 | 5 | 71
id | group_id | resource_type_id
--------------------------------
1 | 1 | 70
2 | 1 | 71
3 | 1 | 72
4 | 2 | 70
5 | 2 | 71
6 | 2 | 73
Here I want to get:
id | order_id | allocated_resource_id | resource_type_id | ar.id | ar.group_id | ar.resource_type_id
----------------------------------------------------------------------------------------------------
41 | 1 | 1 | 70 | 1 | 1 | 70
42 | 1 | | 71 |
43 | 1 | | 73 |
| | | | 1 | 1 | 71
| | | | 1 | 1 | 72
44 | 2 | | 70 |
45 | 2 | 5 | 71 | 5 | 1 | 71
| | | | 4 | 1 | 70
| | | | 6 | 1 | 73
Unfortunately order by ar.id or order by od.id will move unbound order details/allocated resource to bottom. I want to keep order details/allocated resources together as in example above.
To resolve task I select all available related groups first:
od_ar_group AS (
SELECT
od.order_id, ar.group_id,
cast( od.order_id AS TEXT ) || '-' || cast( ar.group_id AS TEXT ) AS odar
FROM order_detail od
FULL JOIN allocated_resource ar ON ar.id = od.allocated_resource_id
WHERE od.order_id IS NOT NULL AND ar.group_id IS NOT NULL
GROUP BY od.order_id, ar.group_id
)
Then I attach group info to both tables:
SELECT od.*, odar.odar
FROM order_detail od
LEFT JOIN od_ar_group odar ON odar.order_id = od.order_id
SELECT ar.*, odar.odar
FROM allocated_resource ar
LEFT JOIN od_ar_group odar ON odar.parent_id = ar.parent_id
Finally I can join those groups and sort inside them. So unbound order detail/allocated resource is inside group and not at the bottom of table:
SELECT
CASE WHEN od.odar IS NOT NULL THEN od.odar ELSE ar.odar END AS odar,
od.*, ar.*
FROM od_grouped
FULL JOIN ar_grouped ar ON ar.odar = od.odar
AND ar.id = od.allocated_resource_id
ORDER BY odar, CASE WHEN od.id IS NULL THEN 1 WHEN ar.id IS NULL THEN 2 ELSE 0 END, od.id NULLS LAST
Is there a more easy way to keep related rows inside their group?

How to do sum of different values without duplicate

How to do a sum of different values but same ID without duplicate different values on a column?
My Input in SQL Command.
SELECT
students.id AS student_id,
students.name,
COUNT(*) AS enrolled,
c2.price AS course_price,
(COUNT(*) * price) AS paid
FROM students
LEFT JOIN enrolls e on students.id = e.student_id
LEFT JOIN courses c2 on e.course_id = c2.id
WHERE student_id NOTNULL
GROUP BY students.id, students.name, c2.price
ORDER BY student_id ASC;
My result.
student_id | name | enrolled | paid
------------+---------------------+----------+------
1001 | Gulbadan Bálint | 1 | 90
1002 | Hanna Adair | 5 | 450
1003 | Taddeo Bhattacharya | 1 | 90
1004 | Persis Havlíček | 1 | 75
1004 | Persis Havlíček | 5 | 450
1005 | Tory Bateson | 1 | 90
1007 | Dávid Fèvre | 1 | 90
1008 | Masuyo Stoddard | 1 | 90
1009 | Iiris Levitt | 1 | 75
1009 | Iiris Levitt | 2 | 180
1013 | Artair Kovač | 1 | 30
1013 | Artair Kovač | 1 | 90
1015 | Matilda Guinness | 2 | 180
1017 | Margarita Ek | 1 | 90
1018 | Misti Zima | 3 | 270
1019 | Conall Ventura | 1 | 90
1020 | Vivian Monday | 2 | 180
My expected result.
student_id | name | enrolled | paid
------------+---------------------+----------+------
1001 | Gulbadan Bálint | 1 | 90
1002 | Hanna Adair | 5 | 450
1003 | Taddeo Bhattacharya | 1 | 90
1004 | Persis Havlíček | 6 | 525
1005 | Tory Bateson | 1 | 90
1007 | Dávid Fèvre | 1 | 90
1008 | Masuyo Stoddard | 1 | 90
1009 | Iiris Levitt | 3 | 255
1013 | Artair Kovač | 2 | 120
1015 | Matilda Guinness | 2 | 180
1017 | Margarita Ek | 1 | 90
1018 | Misti Zima | 3 | 270
1019 | Conall Ventura | 1 | 90
1020 | Vivian Monday | 2 | 180
I think that the cause come from a GROUP BY command but it will throw an error if I do not write a GROUP BY price.
Perhaps you can use SUM() function.
Please see link below, maybe it's same case with you:
how to group by and return sum row in Postgres
You have excluded course_price column both in your current and expected result. It seems you had wrongly included that in group by.
SELECT
students.id AS student_id,
students.name,
COUNT(*) AS enrolled,
--c2.price AS course_price, --exclude this in o/p?
(COUNT(*) * price) AS paid
FROM students
LEFT JOIN enrolls e on students.id = e.student_id
LEFT JOIN courses c2 on e.course_id = c2.id
WHERE student_id NOTNULL
GROUP BY students.id, students.name --,c2.price --and remove it from here
ORDER BY student_id ASC;

Merge multiple tables with a common column name

I am trying to merge multiple tables that have a common column name which need not have the same values across the tables. For ex,
-tmp1-
id dat
1 234
2 432
3 412
-tmp2-
id nom
1 jim
2
3 ryan
4 jack
-tmp3-
id pin
1 gi23
2 x4ed
3 yit42
8 hiu11
If above are the input, the output needs to be,
id dat nom pin
1 234 jim gi23
2 432 x4ed
3 412 ryan yit42
4 jack
8 hiu11
Thanks in advance.
postgresql 8.2.15 on greenplum from R(pass-through queries)
use FULL JOIN ... USING (id) syntax.
please see example: http://sqlfiddle.com/#!12/3aff2/1
this is how diffrent join types work (provided that tab1.row3 meets joining condition with tab2.row1, and tab1.row3 meets tab2.row2):
| tab1 | | tab2 | | JOIN | | LEFT JOIN | | RIGHT JOIN | | FULL JOIN |
-------- -------- ------------------------- ------------------------- ------------------------- -------------------------
| row1 | | tab1.row1 | | tab1.row1 |
| row2 | | tab1.row2 | | tab1.row2 |
| row3 | | row1 | | tab1.row3 | tab2.row1 | | tab1.row3 | tab2.row1 | | tab1.row3 | tab2.row1 | | tab1.row3 | tab2.row1 |
| row4 | | row2 | | tab1.row4 | tab2.row2 | | tab1.row4 | tab2.row2 | | tab1.row4 | tab2.row2 | | tab1.row4 | tab2.row2 |
| row3 | | tab2.row3 | | tab2.row3 |
| row4 | | tab2.row4 | | tab2.row4 |

I'm a bit new to PostgreSQL and need how to construct complex query

I need to list all the cities you can get to after stopping off at exactly one other city, starting off from any city of my choice. And list with it the distance to the final city and the intermediate city.
The tables in the database consist of cities, with the attributes:
| city_id | name |
1 Edinburgh
2 Newcastle
3 Manchester
citypairs:
| citypair_id | city_id |
1 1
1 2
2 1
2 3
3 2
3 3
and distances:
| citypair_id | distance |
1 1234
2 1324
3 1324
and trains:
| train_id | departure_city_id | destination_city_id |
1 1 2
2 2 3
3 1 3
4 3 2
I haven't put any of the data in but basically if a city.name is chosen at random by me I need to find out which cities I can get to from this city if I go via another city (i.e. in two journeys) and then the distance to the final and intermediate city.
How would you, or how should I, go about forming a query to return the desired table?
Edited to include data and a missing table! As an example you can go from Edinburgh(1) to Manchester(3) via Newcastle(2) and you can go from Edinburgh to Newcastle via Manchester, however you can not go from Manchester to Edinburgh via Newcastle (since a train departs from 3, arrives at 2, but no train from 2 arrives in 1) and this route should not be returned from the query. Apologies for any confusion beforehand.
I've got a CTE that builds a tree of all the destinations.
WITH RECURSIVE trip AS (
SELECT c.city_id AS start_city,
ARRAY[c.city_id] AS route,
cast(c.name AS varchar(100)) AS route_text,
c.city_id AS leg_start_city,
c.city_id AS leg_end_city,
0 AS trip_count,
0 AS leg_length,
0 AS total_length
FROM cities c
UNION ALL
SELECT
trip.start_city,
trip.route || t.destination_city_id,
cast(trip.route_text || ',' || c.name AS varchar(100)),
t.departure_city_id,
t.destination_city_id,
trip.trip_count + 1,
d.distance,
trip.total_length + d.distance
FROM trains t
INNER JOIN trip
ON t.departure_city_id = trip.leg_end_city
INNER JOIN citypairs cps
ON t.departure_city_id = cps.city_id
INNER JOIN citypairs cpe
ON t.destination_city_id = cpe.city_id AND
cpe.citypair_id = cps.citypair_id
INNER JOIN distances d
ON cps.citypair_id = d.citypair_id
INNER JOIN cities c
ON t.destination_city_id = c.city_id
WHERE NOT (array[t.destination_city_id] <# trip.route))
SELECT *
FROM trip
WHERE trip_count = 2
AND start_city = (SELECT city_id FROM cities WHERE name = 'Edinburgh');
The CTE starts from each city (in the non-recursive part at the start), then determines all the destination cities it can go to. It keeps a track of all the cities its been to in an array (the route column), so it won't loop back to itself again. As it progresses, it keeps track of the overall trip distance, and the number of trains taken (in trip_count).
As it goes through the tree, it keeps a running total of the distance.
This gives results of
| START_CITY | ROUTE | ROUTE_TEXT | LEG_START_CITY | LEG_END_CITY | TRIP_COUNT | LEG_LENGTH | TOTAL_LENGTH |
--------------------------------------------------------------------------------------------------------------------------------
| 1 | 1,2,3 | Edinburgh,Newcastle,Manchester | 2 | 3 | 2 | 1324 | 2558 |
| 1 | 1,3,2 | Edinburgh,Manchester,Newcastle | 3 | 2 | 2 | 1324 | 2648 |
If you change remove the final WHERE clause it'll show all the possible trips in the data, likewise you can change the trip_count to find all single train destinations etc.
| START_CITY | ROUTE | ROUTE_TEXT | LEG_START_CITY | LEG_END_CITY | TRIP_COUNT | LEG_LENGTH | TOTAL_LENGTH |
--------------------------------------------------------------------------------------------------------------------------------
| 1 | 1 | Edinburgh | 1 | 1 | 0 | 0 | 0 |
| 2 | 2 | Newcastle | 2 | 2 | 0 | 0 | 0 |
| 3 | 3 | Manchester | 3 | 3 | 0 | 0 | 0 |
| 1 | 1,2 | Edinburgh,Newcastle | 1 | 2 | 1 | 1234 | 1234 |
| 1 | 1,3 | Edinburgh,Manchester | 1 | 3 | 1 | 1324 | 1324 |
| 2 | 2,3 | Newcastle,Manchester | 2 | 3 | 1 | 1324 | 1324 |
| 3 | 3,2 | Manchester,Newcastle | 3 | 2 | 1 | 1324 | 1324 |
| 1 | 1,2,3 | Edinburgh,Newcastle,Manchester | 2 | 3 | 2 | 1324 | 2558 |
| 1 | 1,3,2 | Edinburgh,Manchester,Newcastle | 3 | 2 | 2 | 1324 | 2648 |
The cast( ... as varchar(100)) is a bit hacky, and I'm not sure why it was needed, but I haven't had a chance to get around that yet.
The SQL is here for testing: http://sqlfiddle.com/#!1/93964/24
The first part is easy:
SELECT c2.name
FROM cities AS c
JOIN trains t ON c.city_id=t.departure_city_id
JOIN trains t2 ON t.destination_city_id=t2.departure_city_id
JOIN cities AS c2 ON t2.destination_city_id=c2.city_id
WHERE c2.city_id!=c.city_id
AND c.name='Edinburgh';
http://sqlfiddle.com/#!12/a656f/14
In PG 9.1+ you could even do it with a recursive CTE for any number of cities in between. The distances are a little more complicated and you probably would be better off transforming city_pairs into actual pairs.