GROUP BY clause sees all VARCHAR fields as different - tsql

I have witnessed a strange behaviour while trying to GROUP BY a VARCHAR field.
Let the following example, where I try to spot customers that have changed name at least once in the past.
CREATE TABLE #CustomersHistory
(
Id INT IDENTITY(1,1),
CustomerId INT,
Name VARCHAR(200)
)
INSERT INTO #CustomersHistory VALUES (12, 'AAA')
INSERT INTO #CustomersHistory VALUES (12, 'AAA')
INSERT INTO #CustomersHistory VALUES (12, 'BBB')
INSERT INTO #CustomersHistory VALUES (44, '444')
SELECT ch.CustomerId, count(ch.Name) AS cnt
FROM #CustomersHistory ch
GROUP BY ch.CustomerId HAVING count(ch.Name) != 1
Which oddly yields (as if 'AAA' from first INSERT was different from the second one)
CustomerId cnt // (I was expecting)
12 3 // 2
44 1 // 1
Is this behaviour specific to T-SQL?
Why does it behave in this rather counter-intuitive way?
How is it customary to overcome this limitation?
Note: This question is very similar to GROUP BY problem with varchar, where I didn't find the answer to Why
Side Note: Is it good practice to use HAVING count(ch.Name) != 1 instead of HAVING count(ch.Name) > 1 ?

The COUNT() operator will count all rows regardless of value. I think you might want to use a COUNT(DISTINCT ch.Name) which will only count unique names.
SELECT ch.CustomerId, count(DISTINCT ch.Name) AS cnt
FROM #CustomersHistory ch
GROUP BY ch.CustomerId HAVING count(DISTINCT ch.Name) > 1
For more information, take a look at the COUNT() article on book online

Related

Set repeating IDs till first record repeats (bulk load csv file)

I have a file that I imported via bulk-insert and I want to assign group IDs/sequences.
I would like to assign the IDs till the first record with the first character is repeated. In this example its "A".
The challenge I have is how to achieve this example and set the IDs like this example:
ID
data
1
A000abcefd
1
E00asoaskdaok
1
C000dasdasok
2
A100abcasds
2
E100aandas
2
C100adsokdas
Here is one way to do it, but given the limited info you provided I will make the following assumptions:
**The data in your table has some order to it. This obviously will not work if that is not the case. I used an ID, you use what you have.
**The first row in the table has the character you are looking for.
CREATE TABLE #tmp(ID int, [data] varchar(20))
INSERT INTO #tmp
VALUES
(1, 'A000abcefd'),
(2, 'E00asoaskdaok'),
(3, 'C000dasdasok'),
(4, 'A100abcasds'),
(5, 'E100aandas'),
(6, 'C100adsokdas')
DECLARE #CHAR varchar(1)
SELECT #CHAR = (SELECT TOP 1 SUBSTRING([data],1,1) FROM #tmp ORDER BY ID)
SELECT SUM(CASE WHEN SUBSTRING([data],1,1) = #CHAR THEN 1 ELSE 0 END)
OVER(ORDER BY ID ROWS BETWEEN UNBOUNDED PRECEDING and CURRENT ROW) SeqNum
,[data]
FROM #tmp

How to use a declare statement to update a table

I have this Declare Statement
declare #ReferralLevelData table([Type of Contact] varchar(10));
insert into #ReferralLevelData values ('f2f'),('nf2f'),('Travel'),('f2f'),('nf2f'),('Travel'),('f2f'),('nf2f'),('Travel');
select (row_number() over (order by [Type of Contact]) % 3) +1 as [Referral ID]
,[Type of Contact]
from #ReferralLevelData
order by [Referral ID]
,[Type of Contact];
It does not insert into the table so i feel this is not working as expect, i.e it doesn't modify the table.
If it did work I was hoping to modify the statement to make it update.
At the moment the table just prints this result
1 f2f
1 nf2f
1 Travel
2 f2f
2 nf2f
2 Travel
3 f2f
3 nf2f
3 Travel
EDIT:
I want TO Update the table to enter recurring data in groups of three.
I have a table of data, it is duplicated twice in the same table to make three sets.
Its "ReferenceID" is the primary key, i want to in a way group the 3 same ReferenceID's and inject these three values "f2f" "NF2F" "Travel" into the row called "Type" in any order but ensure that each ReferenceID only has one of those values.
Do you mean the following?
declare #ReferralLevelData table(
[Referral ID] int,
[Type of Contact] varchar(10)
);
insert into #ReferralLevelData([Referral ID],[Type of Contact])
select
(row_number() over (order by [Type of Contact]) % 3) +1 as [Referral ID]
,[Type of Contact]
from
(
values ('f2f'),('nf2f'),('Travel'),('f2f'),('nf2f'),('Travel'),('f2f'),('nf2f'),('Travel')
) v([Type of Contact]);
If it suits you then you also can use the next query to generate data:
select r.[Referral ID],ct.[Type of Contact]
from
(
values ('f2f'),('nf2f'),('Travel')
) ct([Type of Contact])
cross join
(
values (1),(2),(3)
) r([Referral ID]);

How can I execute a least cost routing query in postgresql, without temporary tables?

How can I execute a telecoms least cost routing query in PostgreSQL?
The purpose is generate a result set with ordered by the lowest price for the carriers. The table structure is below
SQL Fiddle
CREATE TABLE tariffs (
trf_tariff_id integer,
trf_carrier_id integer,
trf_prefix character varying,
trf_destination character varying,
trf_price numeric(15,6),
trf_connect_charge numeric(15,6),
trf_billing_interval integer,
trf_minimum_interval integer
);
For instance to check the cost for a call if passed through a particular carrier carrier_id the query is:
SELECT trf_price, trf_prefix as lmp FROM tariffs WHERE SUBSTRING(dialled_number,1, LENGTH(trf_prefix)) = trf_prefix and trf_carrier_id = carrier_id ORDER BY trf_prefix DESC limit 1
For the cost of the call for each carrier ie the least cost query the query is:
-- select * from tariffs
select distinct banana2.longest_prefix, banana2.trf_carrier_id_2, apple2.trf_carrier_id, apple2.lenprefix, apple2.trf_price, apple2.trf_destination from
(select banana.longest_prefix, banana.trf_carrier_id_2 from (select max(length(trf_prefix)) as longest_prefix, trf_carrier_id as trf_carrier_id_2 from (select *, length(trf_prefix) as lenprefix from tariffs where substring('35567234567', 1, length(trf_prefix) )= trf_prefix) as apple group by apple.trf_carrier_id) as banana) as banana2,
(select *, length(trf_prefix) as lenprefix from tariffs where substring('35567234567', 1, length(trf_prefix) )= trf_prefix) as apple2 -- group by apple2.trf_carrier_id where banana2.trf_carrier_id_2=apple2.trf_carrier_id and banana2.longest_prefix=apple2.lenprefix order by trf_price
The query works on the basis that for each carrier the longest matching prefix for a dialled number is unique and it will be the longest. So a join involving the longest prefix and carrier on the selection gives the set for all the carriers.
I one problem with my query:
I don't want to do the apple(X) query twice
(select *, length(trf_prefix) as lenprefix from tariffs where substring('35567234567', 1, length(trf_prefix) )= trf_prefix) as apple
There must be a more elegant way, probably declaring it once and using it twice.
What I want to do is run the query on the single carrier for each carrier:
SELECT trf_price, trf_prefix as lmp FROM tariffs WHERE SUBSTRING(dialled_number,1, LENGTH(trf_prefix)) = trf_prefix and trf_carrier_id = carrier_id ORDER BY trf_prefix DESC limit 1
and combine them into one set which will be sorted by price.
In fact I want to generalize the method for any such query where the output for the various values for a particular column or set of columns are combined into one set for further querying. I am told that CTEs are the way to accomplish that kind of query but I find the docs rather confusing. It is much easier with your own use cases.
PS. I am aware that the prefix length can be precomputed and stored.
Common Table Expressions:
with apple as (
select *, length(trf_prefix) as lenprefix
from tariffs
where substring('35567234567', 1, length(trf_prefix)) = trf_prefix
)
select distinct banana2.longest_prefix, banana2.trf_carrier_id_2,
apple.trf_carrier_id, apple.lenprefix, apple.trf_price,
apple.trf_destination
from (select banana.longest_prefix, banana.trf_carrier_id_2
from (select max(length(trf_prefix)) as longest_prefix,
trf_carrier_id as trf_carrier_id_2
from apple
group by apple.trf_carrier_id) as banana) as banana2,
apple
where banana2.trf_carrier_id_2 = apple.trf_carrier_id
and banana2.longest_prefix = apple.lenprefix
order by trf_price
You can just pull out the repeated table definition. Even if I'm just using one of those sub-select-in-a-from things a single time, I still use CTEs. I find the style you're using basically unreadable.

How do I replace a SSN with a 9 digit random number in SQL Server 2008R2?

To satisfy security requirements, I need to find a way to replace SSN's with unique, random 9 digit numbers, before providing said database to a developer. The SSN is in a column in a table of a database. There may be 10's of thousands of rows in said table. The number does not need hyphens. I am a beginner with SQL and programming in general.
I have been unable to find a solution for my specific needs. Nothing seems quite right. But if you know of a thread that I have missed, please let me know.
Thanks for any help!
Here is one way.
I'm assuming that you already have a backup of the real data as this update is not reversible.
Below I've assumed your table name is Person with your ssn column named SSN.
UPDATE Person SET
SSN = CAST(LEFT(CAST(ABS(CAST(CAST(NEWID() as BINARY(10)) as int)) as varchar(max)) + '00000000',9) as int)
If they do not have to be random, you could just replace them with ascending numeric values. Failing that, you’d have to generate a random number. As you may have discovered, the RAND function will only generate a single value per query statement (select, update, etc.); the work-around to that is the newid() function, which would generate a GUID for each row produced by a query (run SELECT newid() from MyTable to see how this works). Wrap this in a checksum() to generate an integer; modulus that by 1,000,00,000 to get a value within the SSN range (0 to 999,999,999); and, assuming you’re storing it as a char(9) prefix it with leading zeros.
Next trick is ensuring it’s unique for all values in your table. This gets tricky, and I’d do it by setting up a temp table with the values, populating it, then copying them over. Lessee now…
DECLARE #DummySSN as table
(
PrimaryKey int not null
,NewSSN char(9) not null
)
-- Load initial values
INSERT #DummySSN
select
UserId
,right('000000000' + cast(abs(checksum(newid()))%1000000000 as varchar(9)), 9)
from Users
-- Check for dups
select NewSSN from #DummySSN group by NewSSN having count(*) > 1
-- Loop until values are unique
IF exists (SELECT 1 from #DummySSN group by NewSSN having count(*) > 1)
UPDATE #DummySSN
set NewSSN = right('000000000' + cast(abs(checksum(newid()))%1000000000 as varchar(9)), 9)
where NewSSN in (select NewSSN from #DummySSN group by NewSSN having count(*) > 1)
-- Check for dups
select NewSSN from #DummySSN group by NewSSN having count(*) > 1
This works for a small table I have, and it should work for a large one. I don’t see this turning into an infinite loop, but even so you might want to add a check to exit the loop after say 10 iterations,
I've run a couple million tests in this and it seems to generate random (URN) 9 digit numbers (no leading zeros).
I cannot think of a more efficient way to do this.
SELECT CAST(FLOOR(RAND(CHECKSUM(NEWID())) * 900000000 ) + 100000000 AS BIGINT)
The test used;
;WITH Fn(N) AS
(
SELECT CAST(FLOOR(RAND(CHECKSUM(NEWID())) * 900000000 ) + 100000000 AS BIGINT)
UNION ALL
SELECT CAST(FLOOR(RAND(CHECKSUM(NEWID())) * 900000000 ) + 100000000 AS BIGINT)
FROM Fn
)
,Tester AS
(
SELECT TOP 5000000 *
FROM Fn
)
SELECT LEN(MIN(N))
,LEN(MAX(N))
,MIN(N)
,MAX(N)
FROM Tester
OPTION (MAXRECURSION 0)
Not so fast, but easiest... I added some dot's...
DECLARE #tr NVARCHAR(40)
SET #tr = CAST(ROUND((888*RAND()+111),0) AS CHAR(3)) + '.' +
CAST(ROUND((8888*RAND()+1111),0) AS CHAR(4)) + '.' + CAST(ROUND((8888*RAND()+1111),0) AS
CHAR(4)) + '.' + CAST(ROUND((88*RAND()+11),0) AS CHAR(2))
PRINT #tr
If the requirement is to obfuscate a database then this will return the same unique value for each distinct SSN in any table preserving referential integrity in the output without having to do a lookup and translate.
SELECT CAST(RAND(SSN)*999999999 AS INT)

Concatenated columns should not match in 2 tables

I'll just put this in layman's terms since I'm a complete noobie:
I have 2 tables A and B, both having 2 columns of interest namely: employee_number and salary.
What I am looking to do is to extract rows of 'combination' of employee_number and salary from A that are NOT present in B, but each of employee_number and salary should be present in both.
I am looking to doing it with the 2 following conditions(please forgive the wrong function
names.. this is just to present the problem 'eloquently'):
1.) A.unique(employee_number) exists in B.unique(employee_number) AND A.unique(salary)
exists in B.unique(salary)
2.) A.concat(employee_number,salary) <> B.concat(employee_number,salary)
Note: A and B are in different databases, so I'm looking to use dblink to do this.
This is what I tried doing:
SELECT distinct * FROM dblink('dbname=test1 port=5432
host=test01 user=user password=password','SELECT employee_number,salary, employee_number||salary AS ENS FROM empsal.A')
AS A(employee_number int8, salary integer, ENS numeric)
LEFT JOIN empsalfull.B B on B.employee_number = A.employee_number AND B.salary = A.salary
WHERE A.ENS not in (select distinct employee_number || salary from empsalfull.B)
but it turned out to be wrong as I had it cross-checked by using spreadsheets and I don't get the same result.
Any help would be greatly appreciated. Thanks.
For easier understanding I left out the dblink.
Because, the first one selects lines in B that equal the employeenumber in A as well as the salery in A, so their concatenated values will equal as well (if you expect this to not be true, please provide some test data).
SELECT * from firsttable A
LEFT JOIN secondtable B where
(A.employee_number = B.employee_number AND a.salery != b.salery) OR
(A.salery = B.salery AND A.employee_number != B.employee_number)
If you have troubles with lines containing nulls, you might also try somthing like this:
AND (a.salery != b.salery OR (a.salery IS NULL AND b.salery IS NOT NULL) or (a.salery IS NOT
NULL and b.salery IS NULL))
I think you're looking for something along these lines.
(Sample data)
create table A (
employee_number integer primary key,
salary integer not null
);
create table B (
employee_number integer primary key,
salary integer not null
);
insert into A values
(1, 20000),
(2, 30000),
(3, 20000); -- This row isn't in B
insert into B values
(1, 20000), -- Combination in A
(2, 20000), -- Individual values in A
(3, 50000); -- Only emp number in A
select A.employee_number, A.salary
from A
where (A.employee_number, A.salary) NOT IN (select employee_number, salary from B)
and A.employee_number IN (select employee_number from B)
and A.salary IN (select salary from B)
output: 3, 20000