Comparing 2 tables for new or updated rows using composite keys - tsql

I'm writing tsql for SQL Server 2008. I've got two tables with roughly 2 million rows each. The Source table gets updated daily and changes are pushed to the Destination table based on a last_edit date. If this date is newer in source than destination then update the destination row. If a new row exists in source compared to destination insert it into destination. This is really only a one way process that I'm concerned with, from source to destination. The source and destination table use a unique identifier across 4 columns, serialid, itemid, systemcode, and role.
My table are modeled similar to the script below. There are many data columns but I've limited it to 3 in this example. I'm looking for 2 outputs. 1 set of data with rows to update and 1 set of data with rows to add.
CREATE TABLE [dbo].[TABLE_DEST](
[SERIALID] [nvarchar](20) NOT NULL,
[ITEMID] [nvarchar](20) NOT NULL,
[SYSTEMCODE] [nvarchar](20) NOT NULL,
[ROLE] [nvarchar](10) NOT NULL,
[LAST_EDIT] [datetime] NOT NULL],
[DATA_COLUMN_1] [nvarchar](10) NOT NULL,
[DATA_COLUMN_2] [nvarchar](10) NOT NULL,
[DATA_COLUMN_3] [nvarchar](10) NOT NULL
)
CREATE TABLE [dbo].[TABLE_SOURCE](
[SERIALID] [nvarchar](20) NOT NULL,
[ITEMID] [nvarchar](20) NOT NULL,
[SYSTEMCODE] [nvarchar](20) NOT NULL,
[ROLE] [nvarchar](10) NOT NULL,
[LAST_EDIT] [datetime] NOT NULL],
[DATA_COLUMN_1] [nvarchar](10) NOT NULL,
[DATA_COLUMN_2] [nvarchar](10) NOT NULL,
[DATA_COLUMN_3] [nvarchar](10) NOT NULL
)
Here's what I've got for the update dataset.
select s.*
from table_dest (nolock) inner join table_source s (nolock)
on s.SYSTEMCODE = fd.SYSTEMCODE1Y
and s.ROLE = d.ROLE
and s.SERIALID = d.SERIALID
and s.ITEMID = d.ITEMID
and s.LAST_EDIT > d.LAST_EDIT
I don't know how best to accomplish finding the rows to add. But the solution has to be pretty efficient for the database.

Unmatched rows can be found with left/right join and checking target table keys for null:
select s.*, case when d.key1 is null then 'insert' else 'update' end [action]
from [table_dest] d right join [table_source] s on (d.key1 = s.key1 /* etc.. */)
If you need these rows just to perform respective operations then there is special feature for you:
merge [table_dest] d
using [table_source] s on (d.key1 = s.key1 /* etc.. */)
when mathed then
update set d.a = s.a
when not matched by target then
insert (key1, .., a) values (s.key1, ..., s.a);

Related

Postgresql inserts values falsely

I want to add a denormalized table for some data of a gtfs-feed. For that I created a new table:
CREATE TABLE denormalized_trips (
stops_coords json NOT NULL,
stops_object json NOT NULL,
agency_key text NOT NULL,
trip_id text NOT NULL,
route_id text NOT NULL,
service_id text NOT NULL,
shape_id text,
route_color text,
route_long_name text,
route_desc text,
direction_id text
);
CREATE INDEX denormalized_trips_index ON denormalized_trips (agency_key, trip_id);
CREATE UNIQUE INDEX denormalized_trips_index ON denormalized_trips (agency_key, route_id);
Now I want to transfer data from one table to the other via an insert statement. The statement is rather complex.
INSERT INTO denormalized_trips
SELECT
trps.stops_coords,
trps.stops_object,
trps.trip_id,
trps.service_id,
trps.route_id,
trps.direction_id,
trps.agency_key,
trps.shape_id,
trps.route_color,
trps.route_long_name,
trps.route_desc
FROM (
SELECT
array_to_json(ARRAY_AGG(array[stop_lat, stop_lon])) AS stops_coords,
array_to_json(ARRAY_AGG(array[
stops.stop_id,
CAST ( stop_times.stop_sequence AS TEXT ),
stops.stop_name,
stop_times.departure_time,
CAST ( stop_times.departure_time_seconds AS TEXT ),
stop_times.arrival_time,
CAST ( stop_times.arrival_time_seconds AS TEXT )
])) AS stops_object,
trips.trip_id,
trips.service_id,
trips.direction_id,
trips.agency_key,
trips.shape_id,
routes.route_id,
routes.route_color,
routes.route_long_name,
routes.route_desc
FROM gtfs_stop_times AS stop_times
INNER JOIN gtfs_trips AS trips
ON trips.trip_id = stop_times.trip_id AND trips.agency_key = stop_times.agency_key
INNER JOIN gtfs_routes AS routes ON trips.agency_key = routes.agency_key AND routes.route_id = trips.route_id
INNER JOIN gtfs_stops AS stops
ON stops.stop_id = stop_times.stop_id
AND stops.agency_key = stop_times.agency_key
AND NOT EXISTS (
SELECT 0
FROM denormalized_max_stop_sequence AS max
WHERE max.agency_key = stop_times.agency_key
AND max.trip_id = stop_times.trip_id
AND max.trip_max = stop_times.stop_sequence
)
GROUP BY
trips.trip_id,
trips.service_id,
trips.direction_id,
trips.agency_key,
trips.shape_id,
routes.route_id,
routes.route_color,
routes.route_long_name,
routes.route_desc
) as trps
If I just run the inner select statement I will get the right results. They look something like this: (screenshot does not show all tables because it's too long)
But if I execute the insert statement and display the content of the table i will get something like this:
As you may notice the contents are not inserted into the right columns of the table. The agency_key now has the values of the trip_id and the direction_id is now the service_id (and there are more tables that are messed up).
So my question is what am I doing wrong that my insert statement inserts the contents into the wrong columns of the newly created table?
Thanks for your help.
Postgres, by default, will insert your values in the order the columns are declared in the table; it has nothing to do with what your columns are named in the query.
https://www.postgresql.org/docs/9.5/static/sql-insert.html
If no list of column names is given at all, the default is all the columns of the table in their declared order; or the first N column names, if there are only N columns supplied by the VALUES clause or query.
You can alter your insert to declare the order of the columns you're inserting, or you can change the order of your select to match the order of columns in the table.

I'm trying to insert tuples into a table A (from table B) if the primary key of the table B tuple doesn't exist in tuple A

Here is what I have so far:
INSERT INTO Tenants (LeaseStartDate, LeaseExpirationDate, Rent, LeaseTenantSSN, RentOverdue)
SELECT CURRENT_DATE, NULL, NewRentPayments.Rent, NewRentPayments.LeaseTenantSSN, FALSE from NewRentPayments
WHERE NOT EXISTS (SELECT * FROM Tenants, NewRentPayments WHERE NewRentPayments.HouseID = Tenants.HouseID AND
NewRentPayments.ApartmentNumber = Tenants.ApartmentNumber)
So, HouseID and ApartmentNumber together make up the primary key. If there is a tuple in table B (NewRentPayments) that doesn't exist in table A (Tenants) based on the primary key, then it needs to be inserted into Tenants.
The problem is, when I run my query, it doesn't insert anything (I know for a fact there should be 1 tuple inserted). I'm at a loss, because it looks like it should work.
Thanks.
Your subquery was not correlated - It was just a non-correlated join query.
As per description of your problem, you don't need this join.
Try this:
insert into Tenants (LeaseStartDate, LeaseExpirationDate, Rent, LeaseTenantSSN, RentOverdue)
select current_date, null, p.Rent, p.LeaseTenantSSN, FALSE
from NewRentPayments p
where not exists (
select *
from Tenants t
where p.HouseID = t.HouseID
and p.ApartmentNumber = t.ApartmentNumber
)

T-SQL - compare column values and column names

I am trying to get the values from a small table, that are not present as columns in an existing table.
Here is some code I tried in SQL Server 2012:
BEGIN TRAN;
GO
CREATE TABLE [dbo].[TestValues](
[IdValue] [int] IDENTITY(1,1) NOT NULL,
[Code] [nvarchar](100) NULL,
) ON [PRIMARY];
GO
CREATE TABLE [dbo].[TestColumns](
[DateHour] [datetime2](7) NULL,
[test1] [nvarchar](100) NULL,
[test2] [nvarchar](100) NULL
) ON [PRIMARY]
GO
INSERT INTO [dbo].[TestValues] ([Code])
VALUES
(N'test1')
, (N'test2')
, (N'test3')
;
GO
SELECT
v.[Code]
, c.[name]
FROM
[dbo].[TestValues] AS v
LEFT OUTER JOIN [sys].[columns] AS c ON v.[Code] = c.[name]
WHERE
(c.[object_id] = OBJECT_ID(N'[dbo].[TestColumns]'))
AND (c.[column_id] > 1)
;
WITH
cteColumns AS (
SELECT
c.[name]
FROM
[sys].[columns] AS c
WHERE
(c.[object_id] = OBJECT_ID(N'[dbo].[TestColumns]'))
AND (c.[column_id] > 1)
)
SELECT
v.[Code]
, c.[name]
FROM
[dbo].[TestValues] AS v
LEFT OUTER JOIN cteColumns AS c ON v.[Code] = c.[name]
;
GO
ROLLBACK TRAN;
GO
In my opinion the two selects should have the same output. Can someone offer an explanation please?
TestValues is a table receiving data. TestColumns is a table that was created, when the project was started, by persisting a PIVOT query. Recently the process inserting into TestValues received some new data. I tried to get the new values using the first query and I was surprised when the result didn't show anything new.
Edit 1: Thank you dean for the answer, it sounds like a good explanation. Do you have any official page describing the behaviours of unpreserved tables? I did a quick google search and all I got was links towards Oracle. (added as an edit because I do not have enough reputation points to comment)

View all Data of two related tables , even if something is not registered from table A to table B

i have two sql server table like this :
[Management].[Person](
[PersonsID] [int] IDENTITY(1,1) NOT NULL,
[FirstName] [nvarchar](50) NOT NULL,
[LastName] [nvarchar](100) NOT NULL,
[Semat] [nvarchar](50) NOT NULL,
[Vahed] [nvarchar](50) NOT NULL,
[Floor] [int] NOT NULL,
[ShowInList] [bit] NOT NULL,
[LastState] [nchar](10) NOT NULL)
and
[Management].[PersonEnters](
[PersonEnters] [int] IDENTITY(1,1) NOT NULL,
[PersonID] [int] NOT NULL,
[Vaziat] [nchar](10) NOT NULL,
[Time] [nchar](10) NOT NULL,
[PDate] [nchar](10) NOT NULL)
that PersonsID in second table is a foreign key.
i register every person's enter to system on PersonsEnter Table.
i want to show all person enter stastus in a certain date (PDate field) , if a person entered to system show it's information an if did not, show null insted,
i tried this query :
select * from [Management].[Person] left outer join [Management].[PersonEnters]
on [Management].[Person].[PersonsID] = [Management].[PersonEnters].[PersonID]
where [Management].[PersonEnters].PDate = '1392/11/14'
but it just shows registered person enter data at 1392/11/14 and shows nothing for others,
i wanna show this data plus null or a constant string like "NOT REGISTERED" for other persons that not registered their enter in PersonEnters Table on '1392/11/14'. Please Help Me.
Logically, the WHERE clause will be applied after the join. If some Person entries do not have matches in PersonEnters, they will have NULLs in PDate as a result of the join, but the WHERE clause will filter them out because the comparison NULL = '1392/11/14' will not yield true.
If I understand your question correctly, you essentially want an outer join to a subset of PersonEnters (the one where PDate = '1392/11/14'), not to the entire table. One way to express that could be like this:
SELECT *
FROM Management.Person AS p
LEFT JOIN (
SELECT *
FROM Management.PersonEnters
WHERE PDate = '1392/11/14'
) AS pe
ON p.Person.ID = pe.PersonID
;
As you can see, this query very explicitly tells the server that a particular subset should be derived from PersonEnters before the join takes place – because you want to indicate matches with that particular subset, not with the whole table.
However, the same intent could be rewritten in a more concise way (without a derived table):
SELECT *
FROM Management.Person AS p
LEFT JOIN Management.PersonEnters AS pe
ON p.Person.ID = pe.PersonID AND pe.PDate = '1392/11/14'
;
The effect of the above query would be the same and you would get all Person entries, with matching results from PersonEnters only if they have PDate = '1392/11/14'.
select *
from [Management].[Person]
left outer join [Management].[PersonEnters]
on [Management].[Person].[PersonsID] = [Management].[PersonEnters].[PersonID]
and [Management].[PersonEnters].PDate = '1392/11/14'

an empty row with null-like values in not-null field

I'm using postgresql 9.0 beta 4.
After inserting a lot of data into a partitioned table, i found a weird thing. When I query the table, i can see an empty row with null-like values in 'not-null' fields.
That weird query result is like below.
689th row is empty. The first 3 fields, (stid, d, ticker), are composing primary key. So they should not be null. The query i used is this.
select * from st_daily2 where stid=267408 order by d
I can even do the group by on this data.
select stid, date_trunc('month', d) ym, count(*) from st_daily2
where stid=267408 group by stid, date_trunc('month', d)
The 'group by' results still has the empty row.
The 1st row is empty.
But if i query where 'stid' or 'd' is null, then it returns nothing.
Is this a bug of postgresql 9b4? Or some data corruption?
EDIT :
I added my table definition.
CREATE TABLE st_daily
(
stid integer NOT NULL,
d date NOT NULL,
ticker character varying(15) NOT NULL,
mp integer NOT NULL,
settlep double precision NOT NULL,
prft integer NOT NULL,
atr20 double precision NOT NULL,
upd timestamp with time zone,
ntrds double precision
)
WITH (
OIDS=FALSE
);
CREATE TABLE st_daily2
(
CONSTRAINT st_daily2_pk PRIMARY KEY (stid, d, ticker),
CONSTRAINT st_daily2_strgs_fk FOREIGN KEY (stid)
REFERENCES strgs (stid) MATCH SIMPLE
ON UPDATE CASCADE ON DELETE CASCADE,
CONSTRAINT st_daily2_ck CHECK (stid >= 200000 AND stid < 300000)
)
INHERITS (st_daily)
WITH (
OIDS=FALSE
);
The data in this table is simulation results. Multithreaded multiple simulation engines written in c# insert data into the database using Npgsql.
psql also shows the empty row.
You'd better leave a posting at http://www.postgresql.org/support/submitbug
Some questions:
Could you show use the table
definitions and constraints for the
partions?
How did you load your data?
You get the same result when using
another tool, like psql?
The answer to your problem may very well lie in your first sentence:
I'm using postgresql 9.0 beta 4.
Why would you do that? Upgrade to a stable release. Preferably the latest point-release of the current version.
This is 9.1.4 as of today.
I got to the same point: "what in the heck is that blank value?"
No, it's not a NULL, it's a -infinity.
To filter for such a row use:
WHERE
case when mytestcolumn = '-infinity'::timestamp or
mytestcolumn = 'infinity'::timestamp
then NULL else mytestcolumn end IS NULL
instead of:
WHERE mytestcolumn IS NULL