In SQL server 2008, I have below table.
Thanks
create table test1 ([Number] int, Item varchar(10))
insert into test1 values (20 , 'Item 1'),(30 , 'Item 2'),(60 , 'Item 3'),(23 , 'Item 4'),(10 , 'Item 5'),(76 , 'Item 6'),(44 , 'Item 7'),(99 , 'Item 8'),(10 , 'Item 9'),(22 , 'Item 10'),(77 , 'Item 11'),(10 , 'Item 12')
Without a specific primary key, there is no inherent order to the data in the table, e.g. imagine each record as a sheet of paper dumped in a basket (in no particular order).
select a.*--, b.pvt
from test1 a
inner join (select MIN(1*substring(item,6,10)) pvt from test1 where number=10) b
on 1*substring(a.item,6,10) >= b.pvt
order by 1*substring(a.item,6,10)
I have made the following assumptions:
The order is by the Item number, where
Item number is always the 6th character onwards in the column "Item"
If the assumptions are wrong, then you can still use a similar technique, which is to find the pivotal record and join to it using >=
Related
I have a table of contacts. Each contact has an associating website. Each website can have multiple contacts.
I ran a query to get one contact with Select distinct on (website). This works fine.
But I want to do something the the rest of the data not selected but Select distinct on (website). Is there an inverse command where I can find all records from websites that have NOT been processed?
Use except. Here is an illustration. order by is for clarity.
create temporary table the_table (i integer, tx text);
insert into the_table values
(1, 'one'),
(1, 'one more one'),
(1, 'yet another one'),
(2, 'two'),
(2, 'one more two'),
(2, 'yet another two'),
(3, 'three'),
(3, 'three alternative');
select * from the_table
EXCEPT
select distinct on (i) * from the_table
order by i;
i
tx
1
one more one
1
yet another one
2
yet another two
2
one more two
3
three alternative
This is my first post here, so please let me know if I've not given everything needed.
I have been struggling to rewrite a process that has recently been causing me and my server significant performance issues.
The overall task is to identify where a customer has had to contact us back within +2 hours to +28 days of their previous contact. Currently this is being completed via the use of a cursor for all the contacts we received yesterday. This equates to approximately 50k contacts per day.
I am aware that this can be done through a cursor or a recursive CTE, but I feel like both options are bad. I am looking for another method to do the same job.
Below is a sample extract and the outcome i am expecting to see.
INSERT INTO SourceData ([CUSTOMER_KEY], [CONTACT_REFERENCE], [CONTACT_DATETIME], [EXPECTED_RESULT])
VALUES ('1', '100', '01/04/2020 09:00', 'Original Contact'),
('2', '101', '01/04/2020 10:00', 'Original Contact'),
('3', '102', '01/04/2020 11:00', 'Original Contact'),
('1', '103', '01/04/2020 12:00', 'Repeat of Contact Reference 100'),
('1', '104', '01/04/2020 13:00', 'Not Repeat - within 2 hours of previous contact'),
('1', '50' , '01/04/2020 14:00', 'Repeat of Contact Reference 103'),
('2', '105', '01/04/2020 14:00', 'Repeat of Contact Reference 101'),
('1', '106', '01/04/2020 15:00', 'Repeat of Contact Reference 104'),
('1', '200', '27/04/2020 12:00', 'Repeat of Contact Reference 106');
The process i currently follow is below. I am happy to update my post to provide code, but I don't think this will be too useful given that I am looking for other solutions.
Identify the current latest repeat of every customer. This was here to reduce the requirement on the full data table. If there was a repeat contact within the time frame already, then I can just assign it straight to that. This data is loaded into a new temp table: TempTable_Repeats_By_Customer.
Add all the contacts from yesterday to a temp table: TempTable_Yesterdays_Contacts
Open the cursor to start processing each Contact (from step 2) in order of Contact_DateTime (Ascending). At the same time i use TempTable_Repeats_By_Customer to identify if the customer has already had a repeat - and if this was within the eligible time frame.
If an existing repeat exists, retrieve the details from my existing reporting table and load a new row in.
If no existing repeat exists, check the full data table for other contacts received during the eligible period.
If there are more contacts from the same customer on a single day, I then go back and update TempTable_Repeats_By_Customer with the new details.
Either go to the next item in the cursor, or close and deallocate it.
Any help you all can give is much appreciated.
Perhaps I am overlooking something, but I think you should be able to do this using the LAG() function.
IF OBJECT_ID('tempdb.dbo.#SourceData', 'U') IS NOT NULL
DROP TABLE #SourceData;
CREATE TABLE #SourceData
(
[CUSTOMER_KEY] VARCHAR(10)
, [CONTACT_REFERENCE] VARCHAR(10)
, [CONTACT_DATETIME] DATETIME
, [EXPECTED_RESULT] VARCHAR(50)
);
INSERT INTO #SourceData
(
[CUSTOMER_KEY]
, [CONTACT_REFERENCE]
, [CONTACT_DATETIME]
, [EXPECTED_RESULT]
)
VALUES
('1', '100', '04/01/2020 09:00', 'Original Contact')
, ('2', '101', '04/01/2020 10:00', 'Original Contact')
, ('3', '102', '04/01/2020 11:00', 'Original Contact')
, ('1', '103', '04/01/2020 12:00', 'Repeat of Contact Reference 100')
, ('1', '104', '04/01/2020 13:00', 'Not Repeat - within 2 hours of previous contact')
, ('2', '105', '04/01/2020 14:00', 'Repeat of Contact Reference 101')
, ('1', '106', '04/01/2020 15:00', 'Repeat of Contact Reference 103')
, ('1', '200', '04/27/2020 12:00', 'Repeat of Contact Reference 106');
SELECT x.CUSTOMER_KEY
, x.CONTACT_REFERENCE
, x.CONTACT_DATETIME
, x.EXPECTED_RESULT
, x.[Minutes Difference]
FROM (
SELECT
CUSTOMER_KEY
, CONTACT_REFERENCE
, CONTACT_DATETIME
, EXPECTED_RESULT
, DATEDIFF(
MINUTE
, LAG(CONTACT_DATETIME) OVER
(PARTITION BY CUSTOMER_KEY ORDER BY CONTACT_DATETIME)
, CONTACT_DATETIME
) AS [Minutes Difference]
FROM #SourceData
) x
WHERE x.[Minutes Difference] > 60
AND x.[Minutes Difference] < 40320 -- this is the number of minutes in 28 days
Here is the demo.
The following code uses a recursive CTE to process the contacts in date/time order for each customer. Like Isaac's answer it calculates a delta time in minutes which may or may not be adequate resolution for your purposes.
NB: DateDiff "returns the count (as a signed integer value) of the specified datepart boundaries crossed". If you specify a datepart of day you'll get the number of midnights crossed, not the number of 24-hour periods. For example, Monday # 23:00 to Wednesday # 01:00 is 26 hours or two midnights, while Tuesday # 01:00 to Wednesday # 03:00 is still 26 hours, but only one midnight.
declare #SourceData as Table ( Customer_Key Int, Contact_Reference Int, Contact_DateTime DateTime, Expected_Result VarChar(50) );
INSERT INTO #SourceData ([CUSTOMER_KEY], [CONTACT_REFERENCE], [CONTACT_DATETIME], [EXPECTED_RESULT])
VALUES ('1', '100', '2020-04-01 09:00', 'Original Contact'),
('2', '101', '2020-04-01 10:00', 'Original Contact'),
('3', '102', '2020-04-01 11:00', 'Original Contact'),
('1', '103', '2020-04-01 12:00', 'Repeat of Contact Reference 100'),
('1', '104', '2020-04-01 13:00', 'Not Repeat - within 2 hours of previous contact'),
('2', '105', '2020-04-01 14:00', 'Repeat of Contact Reference 101'),
('1', '106', '2020-04-01 15:00', 'Repeat of Contact Reference 103'),
('1', '200', '2020-04-27 12:00', 'Repeat of Contact Reference 106');
with
ContactsByCustomer as (
-- Add a row number to simplify processing the contacts for each customer in Contact_DateTime order.
select Customer_Key, Contact_Reference, Contact_DateTime, Expected_Result,
Row_Number() over ( partition by Customer_Key order by Contact_DateTime ) as RN
from #SourceData ),
ProcessedContacts as (
-- Process the contacts in date/time order for each customer.
-- Start with the first contact for each customer ...
select Customer_Key, Contact_Reference, Contact_DateTime, Expected_Result, RN,
Cast( 'Original Contact' as VarChar(100) ) as Computed_Result,
0 as Delta_Minutes
from ContactsByCustomer
where RN = 1
union all
-- ... and add each subsequent contact in date/time order.
select CBC.Customer_Key, CBC.Contact_Reference, CBC.Contact_DateTime, CBC.Expected_Result, CBC.RN,
Cast(
case
when PH.Delta_Minutes < 120 then
'No Repeat - within 2 hours of previous contact'
when 120 <= PH.Delta_Minutes and PH.Delta_Minutes <= 40320 then
'Repeat of Contact Reference ' + Cast( PC.Contact_Reference as VarChar(10) )
else
'Original'
end
as VarChar(100) ),
PH.Delta_Minutes
from ProcessedContacts as PC inner join
ContactsByCustomer as CBC on CBC.Customer_Key = PC.Customer_Key and CBC.RN = PC.RN + 1 cross apply
-- Using cross apply makes it easy to use the calculated value as needed.
( select DateDiff( minute, PC.Contact_DateTime, CBC.Contact_DateTime ) as Delta_Minutes ) as PH
)
-- You can uncomment the select to see the intermediate results.
-- select * from ContactsByCustomer;
select *
from ProcessedContacts
order by Customer_Key, Contact_DateTime;
I didn't find much info on inner joins with substring.
I am not very well versed in SQL and I am trying to do a string match here but am getting a problem with the LIKE operator in the INNER JOIN clause.
I have data in Table 1 and Table 2. Table 1 for example has JUY and Table 2 has Tyy_ss_JUY. Both the tables have over 10000 entires. I want to match both and give me a result when it matches the string.
Assume that I have two tables as follows:
Table1
LocationID Model CAMERA
1 Zone A ABCD
2 Zone B ALI
3 Zone A JUY
4 Zone A LOS
5 Zone C OMG
Table2
Vehicle NAME
Honda Txx_ss_ABCD
Myvi Tyy_ss_ABCD
Vios Tyy_ss_JUY
Proton Tyy_ss_LOS
SUV Tyb_ss_OMG
SUV UUS_ss_OMG
SUV Lyx_ss_JUY
SELECT Vehicle,NAME
FROM Table2
INNER JOIN (SELECT CAMERA FROM Table1 WHERE Model LIKE '%Zone A%')sub on
NAME LIKE '%'+sub.CAMERA+'%'
Expected Result
Result
Vehicle NAME
Honda Txx_ss_ABCD
Myvi Tyy_ss_ABCD
Vios Tyy_ss_JUY
Proton Tyy_ss_LOS
SUV Lyx_ss_JUY
I get an error message in DB2 when I execute this
Invalid character found in a character string argument of the function "DECFLOAT".. SQLCODE=-420, SQLSTATE=22018, DRIVER=3.69.24 SQL Code: -420, SQL State: 22018
Thank you
DB2 doesn't support the '+' symbol for string concatenation.
Use one of the following ways to get the desired result instead:
with
Table1(LocationID, Model, CAMERA) as (values
(1, 'Zone A', 'ABCD')
, (2, 'Zone B', 'ALI')
, (3, 'Zone A', 'JUY')
, (4, 'Zone A', 'LOS')
, (5, 'Zone C', 'OMG')
)
, Table2 (Vehicle, NAME) as (values
('Honda', 'Txx_ss_ABCD')
, ('Myvi', 'Tyy_ss_ABCD')
, ('Vios', 'Tyy_ss_JUY')
, ('Proton', 'Tyy_ss_LOS')
, ('SUV', 'Tyb_ss_OMG')
, ('SUV', 'UUS_ss_OMG')
, ('SUV', 'Lyx_ss_JUY')
)
SELECT Vehicle,NAME
FROM Table2
INNER JOIN (SELECT CAMERA FROM Table1 WHERE Model LIKE '%Zone A%')sub on
NAME LIKE
'%'||sub.CAMERA||'%'
--concat(concat('%', sub.CAMERA), '%')
;
I have to build a procedure that returns a table at the end, which contains a list of fields where specific substances were applied. I need to return one row for each field and the applied substance.
This works great for all fields where something was actually applied, but I also need to display the same amount of rows for those fields, were nothing was applied.
At the moment I get a table like this:
Field 1 | Substance 1 | 12345 kg
Field 1 | Substance 2 | 23423 kg
Field 2 | Substance 1 | 23236 kg
Field 2 | Substance 2 | 12312 kg
Field 3 | NULL | NULL
I know that I could swap the NULL value with at least one Substance by making a Case-Condition, but I need two rows (one for Substance 1 and one for Substance 2) containing the names of each substance.
Is there any way to achieve this?
Or maybe you have something like this:
CREATE TABLE Fields (
FieldID INT PRIMARY KEY,
FieldName VARCHAR(50) NOT NULL UNIQUE,
)
INSERT INTO dbo.Fields (FieldID, FieldName) VALUES
(1, 'Field 1'),
(2, 'Field 2'),
(3, 'Field 3')
CREATE TABLE dbo.Substances (
SubstanceID INT PRIMARY KEY,
Substance VARCHAR(50) NOT NULL UNIQUE
)
INSERT INTO dbo.Substances (SubstanceID, Substance) VALUES
(1, 'Substance 1'),
(2, 'Substance 2')
CREATE TABLE AppliedSubstances (
FieldID INT NOT NULL REFERENCES dbo.Fields,
SubstanceID INT NOT NULL REFERENCES dbo.Substances,
Quantity INT NOT NULL
)
INSERT INTO dbo.AppliedSubstances (FieldID, SubstanceID, Quantity) VALUES
(1, 1, 12345),
(1, 2, 23423),
(2, 1, 23236),
(2, 2, 12312)
Then you can use the following query:
SELECT f.FieldName, s.Substance, a.Quantity
FROM dbo.AppliedSubstances a
INNER JOIN dbo.Fields f ON f.FieldID = a.FieldID
INNER JOIN dbo.Substances s ON s.SubstanceID = a.SubstanceID
UNION ALL
SELECT f.FieldName, s.Substance, NULL AS Quantity
FROM dbo.Fields f
CROSS JOIN dbo.Substances s
WHERE NOT EXISTS (
SELECT * FROM dbo.AppliedSubstances a
WHERE a.FieldID=f.FieldID AND a.SubstanceID=s.SubstanceID
)
Or a shorter stranger version (with a different meaning if you have some substances that were applied only for some fields):
SELECT f.FieldName, s.Substance, a.Quantity
FROM dbo.AppliedSubstances a
RIGHT JOIN dbo.Fields f ON f.FieldID = a.FieldID
INNER JOIN dbo.Substances s ON s.SubstanceID = ISNULL(a.SubstanceID,s.SubstanceID)
I'm not sure if I understand your question correctly, but try this:
CREATE TABLE SourceData (
FieldName VARCHAR(50),
Substance VARCHAR(50),
Quantity INT
)
INSERT INTO dbo.SourceData (FieldName, Substance, Quantity) VALUES
('Field 1', 'Substance 1', 12345),
('Field 1', 'Substance 2', 23423),
('Field 2', 'Substance 1', 23236),
('Field 2', 'Substance 2', 12312),
('Field 3', NULL, NULL)
SELECT FieldName, Substance, Quantity
FROM dbo.SourceData WHERE Substance IS NOT NULL
UNION ALL
SELECT s1.FieldName, x.Substance, NULL AS Quantity
FROM dbo.SourceData s1 CROSS JOIN (
SELECT DISTINCT s2.Substance
FROM dbo.SourceData s2
WHERE s2.Substance IS NOT NULL
) x
WHERE s1.Substance IS NULL
Imagine I had this table:
declare #tmpResults table ( intItemId int, strTitle nvarchar(100), intWeight float )
insert into #tmpResults values (1, 'Item One', 7)
insert into #tmpResults values (2, 'Item One v1', 6)
insert into #tmpResults values (3, 'Item Two', 6)
insert into #tmpResults values (4, 'Item Two v1', 7)
And a function, which we'll call fn_Lev that takes two strings, compares them to one another and returns the number of differences between them as an integer (i.e. the Levenshtein distance).
What's the most efficient way to query that table, check the fn_Lev value of each strTitle against all the other strTitles in the table and delete rows are similar to one another by a Levenshtein distance of 3, preferring to keeping higher intWeights?
So the after the delete, #tmpResults should contain
1 Item One 7
4 Item Two v1 7
I can think of ways to do this, but nothing that isn't horribly slow (i.e iterative). I'm sure there's a faster way?
Cheers,
Matt
SELECT strvalue= CASE
WHEN t1.intweight >= t2.intweight THEN t1.strtitle
ELSE t2.strtitle
END,
dist = Fn_lev(t1.strtitle, t2.strtitle)
FROM #tmpResults AS t1
INNER JOIN #tmpResults AS t2
ON t1.intitemid < t2.intitemid
WHERE Fn_lev(t1.strtitle, t2.strtitle) = 3
This will perform a self join that will match each row only once. It will excluding matching a row on itself or reverse of a previous match ie if A<->B is a match then B<->A isn't.
The case statement selects the highest weighted result
If I've understood you correctly, you can use a cross join
SELECT t1.intItemId AS Id1, t2.intItemId AS Id2, fn_Lev(t1.strTitle, t2.strTitle) AS Lev
FROM #tmpResults AS t1
CROSS JOIN #tmpResults AS t2
The cross join will give you the results of every combination of rows between the left and right side of the join (hence it doesn't need any ON clause, as it is matching everything to everything else). You can then use the result of the SELECT to choose which to delete.