T-SQL: Conditional join or convoluted WHERE clause? - tsql

I have a table called MapObjects which is used to store information about objects placed on a map. I have another table called OrgLocations which is used to store all the locations where an organisation is located. Locations are defined with a latitude and longitude. Finally, I have another table called ObjectLocations which maps a map object to an organistion in the OrgLocations table. It is used to indicate a subset of the locations for an object that is shown on a map.
As an example, suppose an organisation (OrgID = 10) has 4 locations (stored in the OrgLocations table): Dallas, Atlanta, Miami, New York.
The organisation has 1 map object associated with Atlanta and Miami (MapObjects.ID = 5).
My dataset must return the records from OrgLocations that correspond with Atlanta and Miami (but not include Dallas or New York) . However, I can also have a map object that is not assigned to any location (no record in ObjectLocations). These map objects still belong to an organisation but are not associated with any specific location. In this case I want to return all the locations assigned to the organisation.
I am not sure if this is done through a conditional join or something in the WHERE clause. Here is what the tables would look like with some data:
OrgLocations
ID OrgID Latitude Longitude Name
0 10 32.780 -96.798 Dallas
1 10 33.7497 -84.394 Atlanta
2 10 25.7863 -80.2270 Miami
3 10 40.712 -74.005 New York
4 11 42.348 -83.071 Detroit
ObjectLocations
OrgLocationID MapObjectID
1 5
2 5
MapObjects
ID OrgID
5 10
6 11
In this example, when MapObjects.ID is 5, 2 locations for this object exist in ObjectLocations: Atlanta and Miami. When MapObjects.ID is 6, there is no record in ObjectLocations so all the locations in OrgLocatons that belong to the organisation (OrgID = 11) are returned.
Thanks for any help!

I guess you will have the cleanest queries if you check for the existence of MapObjectID in ObjectLocations to decide what query to use.
Something like this:
declare #MapObjectID int
set #MapObjectID = 5
if exists(select *
from ObjectLocations
where MapObjectID = #MapObjectID)
begin
select *
from OrgLocations
where ID in (select OrgLocationID
from ObjectLocations
where MapObjectID = #MapObjectID)
end
else
begin
select *
from OrgLocations
where OrgID in (select OrgID
from MapObjects
where ID = #MapObjectID)
end
As a single query.
select OL.*
from OrgLocations as OL
inner join ObjectLocations as OLoc
on OL.ID = OLoc.OrgLocationID
where OLoc.MapObjectID = #MapObjectID
union all
select OL.*
from OrgLocations as OL
inner join MapObjects as MO
on OL.OrgID = MO.OrgID
where MO.ID = #MapObjectID and
not exists (select *
from ObjectLocations
where MapObjectID = #MapObjectID)

Related

Finding all edges joining nodes within a set of nodes in postgres

I have data stored in two tables called objects and object_relationships.
It's a simple self referential many to many.
Objects table
id
description
type
1
Subject: an email about birds
email
2
Subject: birds
email
3
john
person
4
mark
person
5
lex
person
6
Subject: ants
email
words between tables to fix SE formatting
Object_relationships table
object_id
child_id
type
1
3
to
3
1
from
6
4
to
5
4
family
2
5
from
5
3
friends
Using an initial query like
select * from objects where description like '%birds%' or description like '%lex%' or description like '%john%'
Returns id's [1, 2, 3, 5]
I then want every edge between these "nodes"
specifically:
1 - to - 3
3 - from - 1
2 - from - 5
5 - friends - 3
The code I have for the getting the edges using joins but it's wrong because it pulls in new nodes and I can't figure out how to exclude nodes outside the initial query.
I think my approach is wrong because the query does not even consider the parents of the objects. The json build object is to quickly plot the output in any cytoscape compatible viewer
json_build_object(
'source', base.object_id,
'target', base.child_id,
'type', base.child_type
) as edge1,
json_build_object(
'source', base.child_id,
'target', base.child2_id,
'type', base.child2_type
) as edge2 from
(
with parent as (
select
distinct unnest(array[base.object_id, base.child_id, base.child2_id]) as id
from
(
select
o.id as object_id,
o.type,
or1.child_object_id as child_id,
or1."type" as child_type
or2.child_object_id as child2_id,
or2."type" as child2_type
from
objects o
join object_relationships or1 on
or1.object_id = o.id
join objects o1 on
o1.id = or1.child_object_id
join object_relationships or2 on or2.object_id = o1.id
join objects o2 on o2.id = or2.child_object_id
where
o.description like '%birds%' or o.description like '%lex%' or o.description like '%john%'
limit 1) base
limit 100)
select
o.id as object_id,
or1.child_object_id as child_id,
or1."type" as child_type,
or2.child_object_id as child2_id,
or2."type" as child2_type
from
parent p
join objects o on
o.id = p.id
join object_relationships or1 on
or1.object_id = o.id
join objects o1 on
o1.id = or1.child_object_id
join object_relationships or2 on
or2.object_id = o1.id
join objects o2 on
o2.id = or2.child_object_id
limit 100) base;
Get an array of ids from object table and search rows in object_relationships by elements of the array.
select object_id, type, child_id
from (
select array_agg(id) as ids
from objects
where description like any('{%birds%, %lex%, %john%}')
) s
join object_relationships
on object_id = any(ids) and child_id = any(ids)
order by object_id, child_id;
Test it in db<>fiddle.

PostgreSQL: Merging sets of rows which text fields are contained in other sets of rows

Given the following table, I need to merge the fields in different "id" only if they are the same type (person or dog), and always as the value of every field of an "id" is contained in the values of other "ids".
id
being
feature
values
1
person
name
John;Paul
1
person
surname
Smith
2
dog
name
Ringo
3
dog
name
Snowy
4
person
name
John
4
person
surname
5
person
name
John;Ringo
5
person
surname
Smith
In this example, the merge results should be as follows:
1 and 4 (Since 4's name is present in 1's name and 4's surname is empty)
1 and 5 cannot be merged (the name field show different values)
4 and 5 can be merged
2 and 3 (dogs) cannot be merged. They have only the field "name" and they do not share values.
2 and 3 cannot be merged with 1, 4, 5 since they have different values in "being".
id
being
feature
values
1
person
name
John;Paul
1
person
surname
Smith
2
dog
name
Ringo
3
dog
name
Snowy
5
person
name
John;Ringo
5
person
surname
Smith
I have tried this:
UPDATE table a
SET values = (SELECT array_to_string(array_agg(distinct values),';') AS values FROM table b
WHERE a.being= b.being
AND a.feature= b.feature
AND a.id<> b.id
AND a.values LIKE '%'||a.values||'%'
)
WHERE (select count (*) FROM (SELECT DISTINCT c.being, c.id from table c where a.being=c.being) as temp) >1
;
This doesn't work well because it will merge, for example, 1 and 5. Besides, it duplicates values when merging that field.
One option is to aggregate names with surnames on "id" and "being". Once you get a single string per "id", a self join may find when a full name is completely included inside another (where the "being" is same for both "id"s), then you just select the smallest fullname, candidate for deletion:
WITH cte AS (
SELECT id,
being,
STRING_AGG(values, ';') AS fullname
FROM tab
GROUP BY id,
being
)
DELETE FROM tab
WHERE id IN (SELECT t2.id
FROM cte t1
INNER JOIN cte t2
ON t1.being = t2.being
AND t1.id > t2.id
AND t1.fullname LIKE CONCAT('%',t2.fullname,'%'));
Check the demo here.

How to use 'Distinct' for just one column?

I have a query checking the visits from some "locations" table I have. If the user signed up with a referral of "emp" or "oth", their first visit shouldn't count but the second visit and forward should count.
I'm trying to get a count of those "first visits" per location. Whenever they do a visit, I get a record on which location it was.
The problem is that my query is counting correctly, but some users have visits on different locations. So instead of just counting one visit for that location (the first one), is adding one per location where a user has done a visit.
This is my query
SELECT COUNT(DISTINCT CASE WHEN customer.ref IN ('emp', 'oth') THEN customer.id END) as visit_count, locations.name as location FROM locations
LEFT JOIN visits ON locations.location_name = visits.location_visit_name
LEFT JOIN customer ON customer.id = visits.customer_id
WHERE locations.active = true
GROUP BY locations.location_name, locations.id;
The results I'm getting are
visit_count | locations
-------------------------
7 | Loc 1
3 | Loc 2
1 | Loc 3
How it should be:
visit_count | locations
-------------------------
6 | Loc 1
2 | Loc 2
1 | Loc 3
Because 2 of these people have visits on both locations, so its counting one for each location. I think the DISTINCT is also doing it for the locations, when it should be only on the counting for the customer.id
Is there a way I can add something to my query to just grab the location for the first visit, without caring they have done other visits on other locations?
If I followed you correctly, you want to count only the first visit of each customer, spread by location.
One solution would be to use a correlated subquery in the on clause of the relevant join to filter on first customer visits. Assuming that column visit(visit_date) stores the date of each visit, you could do:
select
count(c.customer_id) visit_count,
l.name as location
from locations l
left join visits v
on l.location_name = v.location_visit_name
and v.visit_date = (
select min(v1.visit_date)
from visit v1
where v1.customer_id = v.customer_id
)
left join customer c
on c.id = v.customer_id
and c.ref in ('emp', 'oth')
where l.active = true
group by l.location_name, l.id;
Side notes:
properly fitering on the first visit per customer avoids the need for distinct in the count() aggregate function
table aliases make the query more concise and easier to understand; I recommend to use them in all queries
the filter on customer(ref) is better placed in the where clause than as a conditional count criteria
Try moving the when condition in where clause
SELECT COUNT( distinct customer.id) as visit_count
, locations.name as location
FROM locations
LEFT JOIN visits ON locations.location_name = visits.location_visit_name
LEFT JOIN customer ON customer.id = visits.customer_id
WHERE locations.active = true
AND customer.ref IN ('emp', 'oth')
GROUP BY locations.location_name;c

How to select vehicle counts group by region using postgres

I am new to postgres.
My postgres table name is Vehicle consisting of following columns
1.ID
2.name
3. wheel (2,3,4,6,8) // two wheeleer,4 whellers
4. region ('hyderabad','mumbai','delhi',...)
5. polluted ('yes','no')
My query is how to select count of 4 wheeler vehicles which are polluted group by regions
Expected Output
hyderabad -> 4
mumbai -> 3
delhi -> 8,...
Ideally you should have a regions table somewhere which contains all regions. Assuming this, you could write the following query:
SELECT
r.region,
COALESCE(v.cnt, 0) AS count
FROM regions r
LEFT JOIN
(
SELECT region, COUNT(*) cnt
FROM Vehicle
WHERE wheel = 4 AND polluted = 'yes'
GROUP BY region
) v
ON r.region = v.region;
If you only have a Vehicle table, which is bad database design, then we can try the following query:
SELECT
region,
SUM(CASE WHEN wheel = 4 AND polluted = 'yes' THEN 1 ELSE 0 END) AS count
FROM Vehicle
GROUP BY region;
This is inefficient, but at least it would let you report every region even if it has no matching records.

OrientDB Traverse Sum and Group By Top-Most Record

We have Orders that include "caused_order" edges from Order to Order because friends can refer other friends to make purchases. We know from the links we generate for the friends that Order ID 42 caused Order ID 47, so we create a "caused_order" edge between the two Order vertices.
We're looking to identify the people that are generating the most referral business. Right now we just loop through in C# and figure it out because our datasets are relatively small. But I'd like to figure out if there's a way to use the Traverse SQL to accomplish this instead.
The problem I'm running in to is getting an accurate count/sum for each Original Order ID.
Consider the following scenario:
Order 42 caused four other Orders, including Order 47. Order 47 caused 2 additional Orders. And Order 51, unrelated to 42 or 47, caused 3 Orders.
I can run the following SQL to get the best referrers for this specific {ProductId}:
select in_caused_order[0].id as OrderID, count(*) as ReferCount, sum(amount) as ReferSum
from ( traverse out('caused_order') from Order )
where out_includes.id = '{ProductId}' and $depth >= 1
group by in_caused_order[0].id
EDIT: the schema is a bit more complex than this, I was just including the out_includes WHERE clause to show that there's a bit of filtering of the Orders. But it's a bit like:
Product(V) <-- includes(E) <-- Order(V) --> caused_order(E) --> Order(V)
(the Order vertex has "amount" as a property, which stores the money spent and is being SUM'd in the SELECT, along with a few fields like date which aren't important)
But that will result in something like:
OrderID | ReferCount | ReferSum
42 | 4 | 525
47 | 2 | 130
51 | 3 | 250
Except that's not quite right, is it? Because Order 42 also technically caused 47's two orders. So we'd want to see something like:
OrderID | ReferCount | ReferSum | ExtendedCount | ExtendedSum
42 | 4 | 525 | 2 | 130
47 | 2 | 130 | 0 | 0
51 | 3 | 250 | 0 | 0
I recognize that the two "Extended" count/sum columns might be tricky. We might have to run the query twice, once with $depth = 1, and again with $depth > 1, and then assemble the results of those two queries in C#, which is fine.
But I can't even figure out how to get the overall total calculated correctly. The first step would even be to see something like:
OrderID | ReferCount | ReferSum
42 | 6 | 635 <-- includes its 4 orders + 47's 2 orders
47 | 2 | 130
51 | 3 | 250
And since this can be n-levels deep, it's not like I can somehow just do in_caused_order.in_caused_order.in_caused_order in the SQL, I don't know how many deep that will go. Order 83 could be caused by Order 47, and Order 105 could be caused by Order 83, and so on.
Any help would be much appreciated. Or maybe the answer is, Traverse can't handle this, and we'll have to figure something else out entirely.
I'm trying your usecase, following is my testdata:
create class caused_order extends e
create class Order extends v
create property Order.id integer
create property Order.amount integer
begin
create vertex Order set id=1 ,amount=1
create vertex Order set id=2 ,amount=5
create vertex Order set id=3 ,amount=11
create vertex Order set id=4 ,amount=23
create vertex Order set id=5 ,amount=31
create vertex Order set id=6 ,amount=49
create vertex Order set id=7 ,amount=4
create vertex Order set id=8 ,amount=74
create vertex Order set id=9 ,amount=87
create edge caused_order from (select from Order where id=1) to (select from Order where id=2)
create edge caused_order from (select from Order where id=1) to (select from Order where id=3)
create edge caused_order from (select from Order where id=2) to (select from Order where id=4)
create edge caused_order from (select from Order where id=2) to (select from Order where id=5)
create edge caused_order from (select from Order where id=6) to (select from Order where id=7)
create edge caused_order from (select from Order where id=6) to (select from Order where id=8)
commit retry 20
then I wrote these 2 queries to show orders with relative referSum and ReferCount.
First one including head order in the count:
select id as OrderID, $a[0].Amount as ReferSum, $a[0].Count as ReferCount from Order
let $a=(select sum(amount) as Amount, count(*) as Count from (traverse out('caused_order') from $parent.$current) group by Amount)
second one, excluding the head:
select id as OrderID, $a[0].Amount as ReferSum, $a[0].Count as ReferCount from Order
let $a=(select sum(amount) as Amount, count(*) as Count from (select from (traverse out('caused_order') from $parent.$current) where $depth>=1) group by Amount)
EDIT
I've added this to my data:
create class includes extends E
create class Product extends V
create property Product.id Integer
create vertex Product set id = 101
create vertex Product set id = 102
create vertex Product set id = 103
create vertex Product set id = 104
create edge includes from (select from Order where id=1) to (select from Product where id=101)
create edge includes from (select from Order where id=2) to (select from Product where id=102)
create edge includes from (select from Order where id=3) to (select from Product where id=103)
create edge includes from (select from Order where id=4) to (select from Product where id=104)
create edge includes from (select from Order where id=5) to (select from Product where id=101)
create edge includes from (select from Order where id=6) to (select from Product where id=102)
create edge includes from (select from Order where id=7) to (select from Product where id=103)
create edge includes from (select from Order where id=8) to (select from Product where id=104)
create edge includes from (select from Order where id=9) to (select from Product where id=101)
create edge includes from (select from Order where id=1) to (select from Product where id=102)
create edge includes from (select from Order where id=1) to (select from Product where id=103)
create edge includes from (select from Order where id=2) to (select from Product where id=104)
and these are the modified queries (added the while out('includes').id contains {prodID_number} in traverse and where out('includes').id contains {prodID_number}:
select id as OrderID, $a[0].Amount as ReferSum, $a[0].Count as ReferCount from Order
let $a=(select sum(amount) as Amount, count(*) as Count from (traverse out('caused_order') from $parent.$current while out('includes').id contains 102) group by Amount)
where out('includes').id contains 102
select id as OrderID, $a[0].Amount as ReferSum, $a[0].Count as ReferCount from Order
let $a=(select sum(amount) as Amount, count(*) as Count from (traverse out('caused_order') from $parent.$current while out('includes').id contains 102) where $depth >= 1 group by Amount)
where out('includes').id contains 102