I have no control over the data or the database structure. I have this EAV type of data where a consultant can speak one or many languages and he can travel to 1 or many countries in Europe and he has many skills indeed.
FYI there are 10 different main categories in my data.
Some consultants speak 10 languages while other speak only one.
The data looks a bit like this
____________________________________________
| ConsultantID | Category | Value |
--------------------------------------------
| 1 | Language | English |
| 1 | Language | French (fluent) |
| 1 | Language | Spanish (working)|
| 1 | Country | Ireland |
| 1 | Country | Italy |
| 1 | Country | Germany |
| 1 | Country | Belgium |
| 456 | Language | French (working) |
| 456 | Country | Belgium |
| 847 | Language | English |
| 847 | Country | Belgium |
--------------------------------------------
I want to list all consultants willing to travel to Belgium and who speak French (working or fluent). Based on my current example that would be #1 and #456
I wrote the query below which list all values matching a category for a consultant (note this is not dynamic as the number of value in my example is set to 5 max - so already a poor design).
SELECT
ID, category,
MAX(CASE seq WHEN 1 THEN value ELSE '' END ) +
MAX(CASE seq WHEN 2 THEN ',' + value ELSE '' END ) +
MAX(CASE seq WHEN 3 THEN ',' + value ELSE '' END ) +
MAX(CASE seq WHEN 4 THEN ',' + value ELSE '' END ) +
MAX(CASE seq WHEN 5 THEN ',' + value ELSE '' END )
FROM
(SELECT
p1.ID, p1.category, p1.value,
(SELECT COUNT(*)
FROM tblWebPracticeInfo p2
WHERE p2.category = p1.category
AND p2.ID = P1.ID
AND p2.value <= p1.value)
FROM
tblWebPracticeInfo p1) D (ID, category, value, seq )
GROUP BY
ID, category
ORDER BY
ID;
I would then need to query this table...
But without even a where clause it takes already 2 seconds to execute
I have something else more basic (but similarly not efficient)
select *
from tblWebMemberInfo m
where
m.ID in (select p.id from tblWebPracticeInfo p
where p.category = 'Language' and p.value like 'French%')
and m.ID in (select p.id from tblWebPracticeInfo p
where p.category = 'Country' and p.value = 'Belgium')
order by m.ID
That's basically where I am. As you can see nothing genius and nothing which is really working.
Can you point me to the right track.
I'm using SQL Server 2005 - v9.00.1
Many thanks in advance for your time & help
If you just need to list the consultants then you can use exists():
select p.Id ...
from Person p /* Assuming you have a regular table for people,
if not, use distinct or group by */
where exists (
select 1
from tblWebPracticeInfo l
where l.Id = p.Id
and l.Category = 'Language'
and l.Value = 'French'
)
and exists (
select 1
from tblWebPracticeInfo c
where c.Id = p.Id
and c.Category = 'Country'
and c.Value = 'Belgium'
)
You could also use aggregation and having like so:
select ConsultantID
from tblWebMemberInfo m
where (p.category = 'Language' and p.value like 'French%')
or (p.category = 'Country' and p.value = 'Belgium')
group by ConsultantID
having count(*) = 2 /* number of conditions to match is 2 */
Related
I want to insert default rows into a result set if the LEFT JOIN is NULL.
For example if Jane has no roles, I want to return some default ones in the results.
A query like this will return the following:
SELECT * FROM employees LEFT OUTER JOIN roles ON roles.employee_id = employees.id
Employee ID | Employee Name | Role ID | Role Name
1 | John | 1 | Admin
1 | John | 2 | Standard
2 | Jane | NULL | NULL
I want to return:
Employee ID | Employee Name | Role ID | Role Name
1 | John | 1 | Admin
1 | John | 2 | Standard
2 | Jane | NULL | Admin
2 | Jane | NULL | Standard
Is there a good way to do this in PostgreSQL?
I think you're looking for
SELECT e.*, r.*
FROM employees e
JOIN roles r ON r.employee_id = e.id
UNION ALL
SELECT e.*, NULL, default_name
FROM employees e
JOIN (VALUES ('Admin'), ('Standard')) AS roles(default_name)
WHERE NOT EXISTS (
SELECT *
FROM roles r
WHERE r.employee_id = e.id
)
I don't think there's a (good) way around the UNION because a LEFT JOIN introduces only a single row per unmatched row. You might be able to lift out the join against the employees table though:
SELECT e.*, r.*
FROM employees e,
LATERAL (
SELECT r.id, r.name
FROM roles r
WHERE r.employee_id = e.id
UNION ALL
SELECT NULL, default_name
FROM (VALUES ('Admin'), ('Standard')) AS roles(default_name)
WHERE NOT EXISTS (
SELECT *
FROM roles r
WHERE r.employee_id = e.id
)
)
I am trying to write a postgres query that uses 3 tables: people, attribute, and a people_attribute join
people table:
id, name
attribute table:
id, name, attr_group
people_attribute join:
people_id, attribute_id
desired output:
name | fav_colors | fav_music | fav_foods
-----------------------------------------------------------------
michael | red,blue,green | pop,hip-hop,jazz | pizza,burgers,tacos
bob | orange,green | null | tacos,steak,fish
...etc
The tags can vary from none to ~12 for each attr_group
Here is the query I am working with:
select
p.id,
p.name,
(case when a.attr_group like 'fav_colors' then string_agg(a.name, ',') else null end) as fav_colors,
(case when a.attr_group like 'fav_music' then string_agg(a.name, ',') else null end) as fav_music,
(case when a.attr_group like 'fav_foods' then string_agg(a.name, ',') else null end) as fav_foods,
from people as p
join people_attribute as pa on pa.people_id = p.id
join "attribute" as a on a.id = pa.attribute_id
group by 1,2,a.attr_group
order by 1 asc;
which returns:
name | fav_colors | fav_music | fav_foods
-----------------------------------------------------------------
michael | red,blue,green | null | null
michael | null | pop,hip-hop,jazz | null
michael | null | null | pizza,burgers,tacos
bob | null | null | null
bob | orange,green | null | null
bob | null | null | tacos,steak,fish
I feel like I'm getting close, but am unsure how to flatten this out to achieve the desired output as shown above. Any help would be greatly appreciated!
You want to use filter for this:
select p.id,
p.name,
string_agg(a.name, ',') filter (where a.attr_group = 'fav_color') as fav_colors,
string_agg(a.name, ',') filter (where a.attr_group = 'fav_music') as fav_music,
string_agg(a.name, ',') filter (where a.attr_group = 'fav_foods') as fav_foods,
from people as p
join people_attribute as pa
on pa.people_id = p.id
join "attribute" as a
on a.id = pa.attribute_id
group by p.id, p.name
order by 1 asc;
Using filter passes only values that match the filter where condition into the aggregation.
The reason yours was showing three rows per people record is because you added attribute.attr_group to your group by. You had no choice since you were using attribute.attr_group in your case conditionals.
Using filter makes attribute.attr_group part of the aggregation, so you do not have to include it in your group by list.
In my Postgresql 9.3 database I have a table stock_rotation:
+----+-----------------+---------------------+------------+---------------------+
| id | quantity_change | stock_rotation_type | article_id | date |
+----+-----------------+---------------------+------------+---------------------+
| 1 | 10 | PURCHASE | 1 | 2010-01-01 15:35:01 |
| 2 | -4 | SALE | 1 | 2010-05-06 08:46:02 |
| 3 | 5 | INVENTORY | 1 | 2010-12-20 08:20:35 |
| 4 | 2 | PURCHASE | 1 | 2011-02-05 16:45:50 |
| 5 | -1 | SALE | 1 | 2011-03-01 16:42:53 |
+----+-----------------+---------------------+------------+---------------------+
Types:
SALE has negative quantity_change
PURCHASE has positive quantity_change
INVENTORY resets the actual number in stock to the given value
In this implementation, to get the current value that an article has in stock, you need to sum up all quantity changes since the latest INVENTORY for the specific article (including the inventory value). I do not know why it is implemented this way and unfortunately it would be quite hard to change this now.
My question now is how to do this for more than a single article at once.
My latest attempt was this:
WITH latest_inventory_of_article as (
SELECT MAX(date)
FROM stock_rotation
WHERE stock_rotation_type = 'INVENTORY'
)
SELECT a.id, sum(quantity_change)
FROM stock_rotation sr
INNER JOIN article a ON a.id = sr.article_id
WHERE sr.date >= (COALESCE(
(SELECT date FROM latest_inventory_of_article),
'1970-01-01'
))
GROUP BY a.id
But the date for the latest stock_rotation of type INVENTORY can be different for every article.
I was trying to avoid looping over multiple article ids to find this date.
In this case I would use a different internal query to get the max inventory per article. You are effectively using stock_rotation twice but it should work. If it's too big of a table you can try something else:
SELECT sr.article_id, sum(quantity_change)
FROM stock_rotation sr
LEFT JOIN (
SELECT article_id, MAX(date) AS date
FROM stock_rotation
WHERE stock_rotation_type = 'INVENTORY'
GROUP BY article_id) AS latest_inventory
ON latest_inventory.article_id = sr.article_id
WHERE sr.date >= COALESCE(latest_inventory.date, '1970-01-01')
GROUP BY sr.article_id
You can use DISTINCT ON together with ORDER BY to get the latest INVENTORY row for each article_id in the WITH clause.
Then you can join that with the original table to get all later rows and add the values:
WITH latest_inventory as (
SELECT DISTINCT ON (article_id) id, article_id, date
FROM stock_rotation
WHERE stock_rotation_type = 'INVENTORY'
ORDER BY article_id, date DESC
)
SELECT article_id, sum(sr.quantity_change)
FROM stock_rotation sr
JOIN latest_inventory li USING (article_id)
WHERE sr.date >= li.date
GROUP BY article_id;
Here is my take on it: First, build the list of products at their last inventory state, using a window function. Then, join it back to the entire list, filtering on operations later than the inventory date for the item.
with initial_inventory as
(
select article_id, date, quantity_change from
(select article_id, date, quantity_change, rank() over (partition by article_id order by date desc)
from stockRotation
where type = 'INVENTORY'
) a
where rank = 1
)
select ii.article_id, ii.quantity_change + sum(sr.quantity_change)
from initial_inventory ii
join stockRotation sr on ii.article_id = sr.article_id and sr.date > ii.date
group by ii.article_id, ii.quantity_change
Four categories in category table.
id | name
--------------
1 | 'wine'
2 | 'chocolate'
3 | 'autos'
4 | 'real estate'
Two of the many (thousands of) forecasters in forecaster table.
id | name
--------------
1 | 'sothebys'
2 | 'cramer'
Relevant forecasts by the forecasters for the categories in the forecast table.
| id | forecaster_id | category_id | forecast |
|----+---------------+-------------+--------------------------------------------------------------|
| 1 | 1 | 1 | 'bad weather, prices rise short-term' |
| 2 | 1 | 2 | 'cocoa bean surplus, prices drop' |
| 3 | 1 | 3 | 'we dont deal with autos - no idea' |
| 4 | 2 | 2 | 'sell, sell, sell' |
| 5 | 2 | 3 | 'demand for cocoa will skyrocket - prices up - buy, buy buy' |
I want prioritized mapping of (forecaster, category, forecast) such that, if a forecast exists for some primary forecaster (e.g. 'cramer') use it because I trust him more. If a forecast exists for some secondary forecaster (e.g. 'sothebys') use that. If no forecast exists for a category, return a row with that category and null for forecast.
I have something that almost works and after I get the logic down I hope to turn into parameterized query.
select
case when F1.category is not null
then (F1.forecaster, F1.category, F1.forecast)
when F2.category is not null
then (F2.forecaster, F2.category, F2.forecast)
else (null, C.category, null)
end
from
(
select
FR.name as forecaster,
C.id as cid,
C.category as category,
F.forecast
from
forecast F
inner join forecaster FR on (F.forecaster_id = FR.id)
inner join category C on (C.id = F.category_id)
where FR.name = 'cramer'
) F1
right join (
select
FR.name as forecaster,
C.id as cid,
C.category as category,
F.forecast
from
forecast F
inner join forecaster FR on (F.forecaster_id = FR.id)
inner join category C on (C.id = F.category_id)
where FR.name = 'sothebys'
) F2 on (F1.cid = F2.cid)
full outer join category C on (C.id = F2.cid);
This gives:
'(sothebys,wine,"bad weather, prices rise short-term")'
'(cramer,chocolate,"sell, sell, sell")'
'(cramer,autos,"demand for cocoa will skyrocket - prices up - buy, buy buy")'
'(,"real estate",)'
While that is the desired data it is a record of one column instead of three. The case was the only way I could find to achieve the ordering of cramer first sothebys next and there is lots of duplication. Is there a better way and how can the tuple like results be pulled back apart into columns?
Any suggestions, especially related to removal of duplication or general simplification appreciated.
This sounds like a case for DISTINCT ON (untested):
SELECT DISTINCT ON (c.id)
fr.name AS forecaster,
c.name AS category,
f.forecast
FROM forecast f
JOIN forecaster fr ON f.forecaster_id = fr.id
RIGHT JOIN category c ON f.category_id = c.id
ORDER BY
c.id,
CASE WHEN fr.name = 'cramer' THEN 0
WHEN fr.name = 'sothebys' THEN 1
ELSE 2
END;
For each category, the first row in the ordering will be picked. Since Cramer has a higher id than Sotheby's, it will be given preference.
Adapt the ORDER BY clause if you need a more complicated ranking.
I have 3 tables in a PostgreSQL database:
localities (loc, 12561 rows)
plants (pl, 17052 rows)
specimens or samples (esp, 9211 rows)
pl and esp each have a field loc, to specify where that tagged plant lives, or where that sample (usually a branch with leaves and flowers) came from.
I need a report of the places that have plants or samples, and the number of plants and samples in each place. The best I did up to now is the union of two subqueries, that runs very fast (33 ms to fetch 69 rows):
(select l.id,l.nome,count(pl.id) pls,null esps
from loc l
left join pl on pl.loc = l.id
where l.id in
(select distinct pl.loc
from pl
where pl.loc > 0)
group by l.id,l.nome
union
select l.id,l.nome,null pls,count(e.id) esps
from loc l
left join esp e on e.loc = l.id
where l.id in
(select distinct e.loc
from esp e
where e.loc > 0)
group by l.id,l.nome)
order by id
The point is, when the same place has both plants and samples, it becomes two distinct lines, like:
11950 | San Martin | | 5 |
11950 | San Martin | 61 | |
Of course what I want is:
11950 | San Martin | 61 | 5 |
Before that, I have tried doing all in one query:
select l.id,l.nome,count(pl.id),count(e.id) esps
from loc l
left join pl on pl.loc = l.id
left join esp e on e.loc = l.id
where l.id in
(select distinct pl.loc
from pl
where pl.loc > 0)
or l.id in
(select distinct e.loc
from esp e
where e.loc > 0)
group by l.id,l.nome
but it returns a strange repetition (it's multiplying both results and showing the result twice):
11950 | San Martin | 305 | 305 |
I have tried without subqueries, but it was taking about 13 seconds, which is too long.
I created test layout with:
create table localities (id integer, loc_name text);
create table plants (plant_id integer, loc_id integer);
create table samples (sample_id integer, loc_id integer);
insert into localities select x, ('Loc ' || x::text) from generate_series(1, 12561) x ;
insert into plants select x, (random()*12561)::integer from generate_series(1, 17052) x;
insert into samples select x, (random()*12561)::integer from generate_series(1, 9211) x;
The trick is to create an intermediate table from plants and samples but with same structure. Where data doesn't make sense (plant has no sample_id), you add null:
select loc_id, plant_id, null as sample_id from plants
union all
select loc_id, null as plant_id, sample_id from samples
This table has unified structure and you can then aggregate on it (I'm using WITH to make it a bit more readable.):
with localities_used as (
select loc_id, plant_id, null as sample_id from plants
union all
select loc_id, null as plant_id, sample_id from samples)
select
localities_used.loc_id,
count(localities_used.plant_id) plant_count,
count(localities_used.sample_id) sample_count
from
localities_used
group by
localities_used.loc_id;
If you need additional data from localities, you can join them on the aggregated table:
with localities_used as (
select loc_id, plant_id, null as sample_id from plants
union all
select loc_id, null as plant_id, sample_id from samples),
aggregated as (
select
localities_used.loc_id,
count(localities_used.plant_id) plant_count,
count(localities_used.sample_id) sample_count
from
localities_used
group by
localities_used.loc_id)
select * from aggregated left outer join localities on aggregated.loc_id = localities.id;
This takes 75ms on my laptop all together.
This should be as easy as
select * from (
select
location.*,
(select count(id) from plant where plant.location = location.id) as plants,
(select count(id) from sample where sample.location = location.id) as samples
from location
) subquery
where subquery.plants > 0 or subquery.samples > 0;
id | name | plants | samples
----+------------+--------+---------
1 | San Martin | 2 | 1
2 | Rome | 1 | 2
3 | Dallas | 3 | 1
(3 rows)
This is the database I quickly set up to experiment with:
create table location(id serial primary key, name text);
create table plant(id serial primary key, name text, location integer references location(id));
create table sample(id serial primary key, name text, location integer references location(id));
insert into location (name) values ('San Martin'), ('Rome'), ('Dallas'), ('Ghost Town');
insert into plant (name, location) values ('San Martin Dandelion', 1),('San Martin Camomile', 1), ('Rome Raspberry', 2), ('Dallas Locoweed', 3), ('Dallas Lemongrass', 3), ('Dallas Setaria', 3);
insert into sample (name, location) values ('San Martin Bramble', 1), ('Rome Iris', 2), ('Rome Eucalypt', 2), ('Dallas Dogbane', 3);
tests=# select * from location;
id | name
----+------------
1 | San Martin
2 | Rome
3 | Dallas
4 | Ghost Town
(4 rows)
tests=# select * from plant;
id | name | location
----+----------------------+----------
1 | San Martin Dandelion | 1
2 | San Martin Camomile | 1
3 | Rome Raspberry | 2
4 | Dallas Locoweed | 3
5 | Dallas Lemongrass | 3
6 | Dallas Setaria | 3
(6 rows)
tests=# select * from sample;
id | name | location
----+--------------------+----------
1 | San Martin Bramble | 1
2 | Rome Iris | 2
3 | Rome Eucalypt | 2
4 | Dallas Dogbane | 3
(4 rows)
I didn't test that but I think it could be something like this:
SELECT
l.id,
l.nome,
SUM(CASE WHEN pl.id IS NOT NULL THEN 1 ELSE 0 END) as plants_count,
SUM(CASE WHEN e.id IS NOT NULL THEN 1 ELSE 0 END) as esp_count
FROM loc l
LEFT JOIN pl ON pl.loc = l.id
LEFT JOIN esp e ON e.loc = l.id
GROUP BY l.id,l.nome
The point is to count non null ids of each type.