How can I find distinct accountid's with no CEO contact? - tsql

I have a contact table which has unique contactid as primary key, but may or may not have multiple records for the same account. My task is to return a list of accountid's and account_name's that do not have any contact with a CEO designation.
I agree this should be simple, and I freely admit to being dumb, so what I did was create a temp table with all unique accountid's, then flag the ones that did have the CEO job title, then do a select distinct accountid, account_name where flag is null group by, etc., which worked quickly and correctly, but is pretty lame. I frequently write lame scripts which work great, but are shamefully elementary, namely because that's how I think.
There must a nice, elegant way to do this so maybe I can learn something. Can someone help out? Thanks heaps in advance for your help! (p.s. Using SS2014)
Sample data below, in which companies 2,3,5 do not have a CEO:
create table contact (
contactid int,
accountid int,
account_name varchar(10),
designation varchar(5));
insert into contact
values
(1, 100, 'COMPANY1', 'MGR'),
(2, 100, 'COMPANY1', 'MGR'),
(3, 100, 'COMPANY1', 'VP'),
(4, 100, 'COMPANY1', 'CEO'),
(5, 200, 'COMPANY2', 'COO'),
(6, 200, 'COMPANY2', 'CIO'),
(7, 200, 'COMPANY2', 'VP'),
(8, 200, 'COMPANY2', 'VP'),
(9, 300, 'COMPANY3', 'MGR'),
(10, 400, 'COMPANY4', 'MGR'),
(11, 400, 'COMPANY4', 'MGR'),
(12, 400, 'COMPANY4', 'CEO'),
(13, 500, 'COMPANY5', 'VP'),
(14, 500, 'COMPANY5', 'VP'),
(15, 500, 'COMPANY5', 'VP'),
(16, 500, 'COMPANY5', 'VP');

For something like this, I usually just go with a self-join where null, like this:
SELECT DISTINCT
C.accountid
FROM contact C
LEFT JOIN contact CEO
ON CEO.accountid = C.accountid
AND CEO.designation = 'CEO'
WHERE
CEO.contactid IS NULL

Something like this?
WITH CEO_IDs AS
(
SELECT DISTINCT accountID
FROM contact
WHERE designation='CEO'
)
SELECT DISTINCT accountID
FROM contact
WHERE accountid NOT IN(SELECT x.accountID FROM CEO_IDs AS x)
The CTE finds all accountID, which do have a CEO and uses this as a negative filter to get all accountIDs, which do not have a CEO...
You'd get the same with a sub-select:
SELECT DISTINCT accountID
FROM contact
WHERE accountid NOT IN
(SELECT x.accountID
FROM contact AS x
WHERE x.designation='CEO')

Related

Window functions partition and order without subquery

Given a simple table like so in postgres:
CREATE TABLE products (
product_id serial PRIMARY KEY,
group_id INT NOT NULL,
price DECIMAL (11, 2)
);
INSERT INTO products (product_id, group_id,price)
VALUES
(1, 1, 200),
(2, 1, 400),
(3, 1, 500),
(4, 1, 900),
(5, 2, 1200),
(6, 2, 700),
(7, 2, 700),
(8, 2, 800),
(9, 3, 700),
(10, 3, 150),
(11, 3, 200);
How do I query using window functions the group_id and the avg_price, order by avg_price? So the current result I have is only via a subquery:
select * from (
select
distinct group_id,
avg(price) over (partition by group_id) avg_price
from products)
a order by avg_price desc;
But I believe there are more elegent solutions to this.
Window functions can be used in the ORDER BY clause, in addition to the SELECT clause, so the following query is valid:
SELECT
group_id,
AVG(price) OVER (PARTITION BY group_id) avg_price
FROM products
ORDER BY
AVG(price) OVER (PARTITION BY group_id);
However, given that you seem to want to use DISTINCT, I suspect that what you really want here is a GROUP BY query:
SELECT
group_id,
AVG(price) AS avg_price
FROM products
GROUP BY
group_id
ORDER BY
AVG(price);

T-SQL Grouping with LESS THAN {date} that breaks off on each occurrence of date

I am struggling with creating a grouping using LESS THAN that breaks off on each date for the parent row. I have created a contrived example to explain the data and what I would like out as a result:
CREATE TABLE [dbo].[CustomerOrderPoints](
[CustomerID] [int] NOT NULL,
[OrderPoints] [int] NOT NULL,
[OrderPointsExpiry] [date] NOT NULL
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[CustomerOrderPointsUsed](
[CustomerID] [int] NOT NULL,
[OrderPointsUsed] [int] NOT NULL,
[OrderPointsUseDate] [date] NOT NULL
) ON [PRIMARY]
GO
INSERT [dbo].[CustomerOrderPoints] ([CustomerID], [OrderPoints], [OrderPointsExpiry]) VALUES (10, 200, CAST(N'2018-03-18' AS Date))
GO
INSERT [dbo].[CustomerOrderPoints] ([CustomerID], [OrderPoints], [OrderPointsExpiry]) VALUES (10, 100, CAST(N'2018-04-18' AS Date))
GO
INSERT [dbo].[CustomerOrderPoints] ([CustomerID], [OrderPoints], [OrderPointsExpiry]) VALUES (20, 120, CAST(N'2018-05-10' AS Date))
GO
INSERT [dbo].[CustomerOrderPoints] ([CustomerID], [OrderPoints], [OrderPointsExpiry]) VALUES (30, 75, CAST(N'2018-02-10' AS Date))
GO
INSERT [dbo].[CustomerOrderPoints] ([CustomerID], [OrderPoints], [OrderPointsExpiry]) VALUES (30, 60, CAST(N'2018-04-24' AS Date))
GO
INSERT [dbo].[CustomerOrderPoints] ([CustomerID], [OrderPoints], [OrderPointsExpiry]) VALUES (30, 90, CAST(N'2018-06-25' AS Date))
GO
INSERT [dbo].[CustomerOrderPoints] ([CustomerID], [OrderPoints], [OrderPointsExpiry]) VALUES (40, 100, CAST(N'2018-06-13' AS Date))
GO
INSERT [dbo].[CustomerOrderPointsUsed] ([CustomerID], [OrderPointsUsed], [OrderPointsUseDate]) VALUES (10, 15, CAST(N'2018-02-10' AS Date))
GO
INSERT [dbo].[CustomerOrderPointsUsed] ([CustomerID], [OrderPointsUsed], [OrderPointsUseDate]) VALUES (10, 30, CAST(N'2018-02-17' AS Date))
GO
INSERT [dbo].[CustomerOrderPointsUsed] ([CustomerID], [OrderPointsUsed], [OrderPointsUseDate]) VALUES (10, 25, CAST(N'2018-03-16' AS Date))
GO
INSERT [dbo].[CustomerOrderPointsUsed] ([CustomerID], [OrderPointsUsed], [OrderPointsUseDate]) VALUES (10, 45, CAST(N'2018-04-10' AS Date))
GO
INSERT [dbo].[CustomerOrderPointsUsed] ([CustomerID], [OrderPointsUsed], [OrderPointsUseDate]) VALUES (20, 10, CAST(N'2018-02-08' AS Date))
GO
INSERT [dbo].[CustomerOrderPointsUsed] ([CustomerID], [OrderPointsUsed], [OrderPointsUseDate]) VALUES (20, 70, CAST(N'2018-04-29' AS Date))
GO
INSERT [dbo].[CustomerOrderPointsUsed] ([CustomerID], [OrderPointsUsed], [OrderPointsUseDate]) VALUES (20, 25, CAST(N'2018-05-29' AS Date))
GO
INSERT [dbo].[CustomerOrderPointsUsed] ([CustomerID], [OrderPointsUsed], [OrderPointsUseDate]) VALUES (30, 60, CAST(N'2018-02-05' AS Date))
GO
INSERT [dbo].[CustomerOrderPointsUsed] ([CustomerID], [OrderPointsUsed], [OrderPointsUseDate]) VALUES (30, 30, CAST(N'2018-03-13' AS Date))
GO
INSERT [dbo].[CustomerOrderPointsUsed] ([CustomerID], [OrderPointsUsed], [OrderPointsUseDate]) VALUES (40, 120, CAST(N'2018-06-10' AS Date))
Customers gain points, which have an expiry. We have a CustomerOrderPoints table which shows OrderPoints for customers together with the Expiry date for the points. A Customer may have many rows in this table.
We then also have the CustomerOrderPointsUsed table which shows the points that have been used and when they were used by a Customer.
I am trying to get a grouping of Customer data which will show OrderPoints used as a group against each customer but, separated on the ExpiryDate. The picture below shows an example of the Grouped Results that I would like to obtain.
We have bad, but working code that has been developed using a recursive method (RBAR), but it is very slow. I have tried a number of different SET Based grouping approaches, but cannot get the final Less Than grouping which takes into account the previous expiry dates.
This DB is on SQL Server 2008R2. Ideally I am looking for a solution that will work with SQL Server 2008R2, but will welcome options for later versions, as we may need to move this particular DB to solve this problem.
I have tried using a combination of RanK, DenseRank and RowNumber (for later versions) and LAG, but have not been able to get anything working that can be built upon.
Is there a way to use SET based T-SQL to achieve this?
First - this is ignoring the question I raised in the comments above, and just allocates all rows to the expiry date on or after the use by date. You would need to rethink this if you need to split one use among multiple expiry dates
First, allocate an expiry date to each PointsUsed row. This is done by joining to all OrderPoints rows with an expiry date on or after the UseDate, then taking the minimum date.
Then the second query reports all OrderPoints rows, joining to the first query by the allocated expiry date, which has all the data needed.
WITH allocatedPoints as
(
Select U.CustomerID, U.OrderPointsUsed, MIN(P.OrderPointsExpiry) as OrderPointsExpiry
from CustomerOrderPointsUsed U
inner join CustomerOrderPoints P on P.CustomerID = U.CustomerID and P.OrderPointsExpiry >= U.OrderPointsUseDate
GROUP BY U.CustomerID, U.OrderPointsUseDate, U.OrderPointsUsed
)
Select P.CustomerID, P.OrderPoints, P.OrderPointsExpiry,
ISNULL(SUM(AP.OrderPointsUsed), 0) as used,
P.OrderPoints - ISNULL(SUM(AP.OrderPointsUsed), 0) as remaining
from CustomerOrderPoints P
left outer join allocatedPoints AP on AP.CustomerID = P.CustomerID and AP.OrderPointsExpiry = P.OrderPointsExpiry
GROUP BY P.CustomerID, P.OrderPoints, P.OrderPointsExpiry

Performance Issue with finding recent date of each group and joining to all records

I have following tables:
CREATE TABLE person (
id INTEGER NOT NULL,
name TEXT,
CONSTRAINT person_pkey PRIMARY KEY(id)
);
INSERT INTO person ("id", "name")
VALUES
(1, E'Person1'),
(2, E'Person2'),
(3, E'Person3'),
(4, E'Person4'),
(5, E'Person5'),
(6, E'Person6');
CREATE TABLE person_book (
id INTEGER NOT NULL,
person_id INTEGER,
book_id INTEGER,
receive_date DATE,
expire_date DATE,
CONSTRAINT person_book_pkey PRIMARY KEY(id)
);
/* Data for the 'person_book' table (Records 1 - 9) */
INSERT INTO person_book ("id", "person_id", "book_id", "receive_date", "expire_date")
VALUES
(1, 1, 1, E'2016-01-18', NULL),
(2, 1, 2, E'2016-02-18', E'2016-10-18'),
(3, 1, 4, E'2016-03-18', E'2016-12-18'),
(4, 2, 3, E'2017-02-18', NULL),
(5, 3, 5, E'2015-02-18', E'2016-02-23'),
(6, 4, 34, E'2016-12-18', E'2018-02-18'),
(7, 5, 56, E'2016-12-28', NULL),
(8, 5, 34, E'2018-01-19', E'2018-10-09'),
(9, 5, 57, E'2018-06-09', E'2018-10-09');
CREATE TABLE book (
id INTEGER NOT NULL,
type TEXT,
CONSTRAINT book_pkey PRIMARY KEY(id)
) ;
/* Data for the 'book' table (Records 1 - 8) */
INSERT INTO book ("id", "type")
VALUES
( 1, E'Btype1'),
( 2, E'Btype2'),
( 3, E'Btype3'),
( 4, E'Btype4'),
( 5, E'Btype5'),
(34, E'Btype34'),
(56, E'Btype56'),
(67, E'Btype67');
My query should list name of all persons and for persons with recently received book types of (book_id IN (2, 4, 34, 56, 67)), it should display the book type and expire date; if a person hasn’t received such book type it should display blank as book type and expire date.
My query looks like this:
SELECT p.name,
pb.expire_date,
b.type
FROM
(SELECT p.id AS person_id, MAX(pb.receive_date) recent_date
FROM
Person p
JOIN person_book pb ON pb.person_id = p.id
WHERE pb.book_id IN (2, 4, 34, 56, 67)
GROUP BY p.id
)tmp
JOIN person_book pb ON pb.person_id = tmp.person_id
AND tmp.recent_date = pb.receive_date AND pb.book_id IN
(2, 4, 34, 56, 67)
JOIN book b ON b.id = pb.book_id
RIGHT JOIN Person p ON p.id = pb.person_id
The (correct) result:
name | expire_date | type
---------+-------------+---------
Person1 | 2016-12-18 | Btype4
Person2 | |
Person3 | |
Person4 | 2018-02-18 | Btype34
Person5 | 2018-10-09 | Btype34
Person6 | |
The query works fine but since I'm right joining a small table with a huge one, it's slow. Is there any efficient way of rewriting this query?
My local PostgreSQL version is 9.3.18; but the query should work on version 8.4 as well since that's our productions version.
Problems with your setup
My local PostgreSQL version is 9.3.18; but the query should work on version 8.4 as well since that's our productions version.
That makes two major problems before even looking at the query:
Postgres 8.4 is just too old. Especially for "production". It has reached EOL in July 2014. No more security upgrades, hopelessly outdated. Urgently consider upgrading to a current version.
It's a loaded footgun to use very different versions for development and production. Confusion and errors that go undetected. We have seen more than one desperate request here on SO stemming from this folly.
Better query
This equivalent should be substantially simpler and faster (works in pg 8.4, too):
SELECT p.name, pb.expire_date, b.type
FROM (
SELECT DISTINCT ON (person_id)
person_id, book_id, expire_date
FROM person_book
WHERE book_id IN (2, 4, 34, 56, 67)
ORDER BY person_id, receive_date DESC NULLS LAST
) pb
JOIN book b ON b.id = pb.book_id
RIGHT JOIN person p ON p.id = pb.person_id;
To optimize read performance, this partial multicolumn index with matching sort order would be perfect:
CREATE INDEX ON person_book (person_id, receive_date DESC NULLS LAST)
WHERE book_id IN (2, 4, 34, 56, 67);
In modern Postgres versions (9.2 or later) you might append book_id, expire_date to the index columns to get index-only scans. See:
How does PostgreSQL perform ORDER BY if a b-tree index is built on that field?
About DISTINCT ON:
Select first row in each GROUP BY group?
About DESC NULLS LAST:
PostgreSQL sort by datetime asc, null first?

Create a GROUP BY query to show the latest row

So my tables are:
user_msgs: http://sqlfiddle.com/#!9/7d6a9
token_msgs: http://sqlfiddle.com/#!9/3ac0f
There are only these 4 users as listed. When a user sends a message to another user, the query checks if there is a communication between those 2 users already started by checking the token_msgs table's from_id and to_id and if no token exists, create token and use that in the user_msgs table. So the token is a unique field in these 2 tables.
Now, I want to list the users with whom user1 has started the conversation. So if from_id or to_id include 1 those conversation should be listed.
There are multiple rows for conversations in the user_msgs table for same users.
I think I need to use group_concat but not sure. I am trying to build the query to do the same and show the latest of the conversation on the top, hence ORDER BY time DESC:
SELECT * FROM (SELECT * FROM user_msgs ORDER BY time DESC) as temp_messages GROUP BY token
Please help in building the query.
Thanks.
CREATE TABLE `token_msgs` (
`id` int(11) NOT NULL,
`from_id` int(100) NOT NULL,
`to_id` int(100) NOT NULL,
`token` varchar(50) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
--
-- Dumping data for table `token_msgs`
--
INSERT INTO `token_msgs` (`id`, `from_id`, `to_id`, `token`) VALUES
(1, 1, 2, '1omcda84om2'),
(2, 1, 3, '1omd0666om3'),
(3, 4, 1, '4om6713bom1'),
(4, 3, 4, '3om0e1abom4');
---
CREATE TABLE `user_msgs` (
`id` int(11) NOT NULL,
`token` varchar(50) NOT NULL,
`from_id` int(50) NOT NULL,
`to_id` int(50) NOT NULL,
`message` text NOT NULL,
`time` datetime NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
--
-- Dumping data for table `user_msgs`
--
INSERT INTO `user_msgs` (`id`, `token`, `from_id`, `to_id`, `message`, `time`) VALUES
(1, '1omcda84om2', 1, 2, '1 => 2\r\nCan I have your picture so I can show Santa what I want for Christmas?', '2016-08-14 22:50:34'),
(2, '1omcda84om2', 2, 1, 'Makeup tip: You\'re not in the circus.\r\n2=>1', '2016-08-14 22:51:26'),
(3, '1omd0666om3', 1, 3, 'Behind every fat woman there is a beautiful woman. No seriously, your in the way. 1=>3', '2016-08-14 22:52:08'),
(4, '1omd0666om3', 3, 1, 'Me: Siri, why am I alone? Siri: *opens front facing camera*', '2016-08-14 22:53:24'),
(5, '1omcda84om2', 1, 2, 'I know milk does a body good, but damn girl, how much have you been drinking? 1 => 2', '2016-08-14 22:54:36'),
(6, '4om6713bom1', 4, 1, 'Hi, Im interested in your profile. Please send your contact number and I will call you.', '2016-08-15 00:18:11'),
(7, '3om0e1abom4', 3, 4, 'Girl you\'re like a car accident, cause I just can\'t look away. 3=>4', '2016-08-15 00:42:57'),
(8, '3om0e1abom4', 3, 4, 'Hola!! \r\n3=>4', '2016-08-15 00:43:34'),
(9, '1omd0666om3', 3, 1, 'Sometext from 3=>1', '2016-08-15 13:53:54'),
(10, '3om0e1abom4', 3, 4, 'More from 3->4', '2016-08-15 13:54:46');
Let's try this (on fiddle):
SELECT *
FROM (SELECT * FROM user_msgs
WHERE from_id = 1 OR to_id = 1
ORDER BY id DESC
) main
GROUP BY from_id + to_id
ORDER BY id DESC
Thing to mention GROUP BY from_id + to_id this is because sum makes it unique for each conversation between two persons: like from 1 to 3 is same as from 3 to 1. No need for extra table, and it makes it harder to maintain
UPDATE:
Because sometimes GROUPing works weird in MySQL I've created new approach to this problem:
SELECT
a.*
FROM user_msgs a
LEFT JOIN user_msgs b
ON ((b.`from_id` = a.`from_id` AND b.`to_id` = a.`to_id`)
OR (b.`from_id` = a.`to_id` AND b.`to_id` = a.`from_id`))
AND a.`id` < b.`id`
WHERE (a.from_id = 1 OR a.to_id = 1)
AND b.`id` IS NULL
ORDER BY a.id DESC

Postgres how can I merge 2 separate select queries into 1

I am using postgres 9.4 and I would like to merge 2 separate queries into one statement. I been looking at this How to merge these queries into 1 using subquery post but still can't figure out how to work it. These 2 queries do work independently. Here they are
# 1: select * from votes v where v.user_id=32 and v.stream_id=130;
#2: select city,state,post,created_on,votes,id as Voted from streams
where latitudes >=28.0363 AND 28.9059>= latitudes order by votes desc limit 5 ;
I would like query #2 to be limited by 5, however I don't want query #1 to be included in that limit so that up to 6 rows could be returned in total. This works like a suggestion engine where query #1 has a main thread and query #2 gives up to 5 different suggestions however they are obviously located in a different table.
Having no model and data I simulated this problem with dummies of both in this SQL Fiddle.
CREATE TABLE votes
(
id smallint
, user_id smallint
);
CREATE TABLE streams
(
id smallint
, foo boolean
);
INSERT INTO votes
VALUES (1, 42), (2, 32), (3, 17), (4, 37), (5, 73), (6, 69), (7, 21), (8, 18), (9, 11), (10, 15), (11, 28);
INSERT INTO streams
VALUES (1, true), (2, true), (3, true), (4, true), (5, true), (6, true), (7, false), (8, false), (9, false), (10, false), (11, false);
SELECT
id
FROM
(SELECT id, 1 AS sort FROM votes WHERE user_id = 32) AS query_1
FULL JOIN (SELECT id FROM streams WHERE NOT foo) AS query_2 USING (id)
ORDER BY
sort
LIMIT 6;
Also I have to point out, that this isn't my work entirely, but an adaptation of this answer I came across the other day. Maybe this is an approach here too.
So, what's going on? Column id stands for any column your tables and sub-queries will have in common. votes.user_id I made to have sth. to select in the one sub-query and streams.foo in the other.
As you demanded to have 6 rows at the most I used the limit clause twice. First in the sub-query just in case there is a huge amount of rows in your table you don't want to select and again in the outer query to finally restrict the number of rows. Fiddle about a little on the two limits and toggle WHERE foo and WHERE NOT foo and you see why.
In the first sub-query I added a sort column like it is done in that answer. That's because I guess you want the result of the first sub-query always on top too.