Finding Orphaned Records in SQL Server 2000 - tsql

I am saddled with an ERP database which lacks any foreign keys and therefore lacks referential integrity.
I am writing a script to check most of the major tables in our database for ophaned records.
For example, in this case I'm working with our Sales tables.
SLCDPM - Customer Master Table
SOMAST - Sales Order Master
SOITEM - Sales Order Items
SORELS - Sales Order Releases
Basically, for these (and a whole bunch of other tables) I need to check to see if there are records in the SORELS that don't appear in any table above it. Then take SOITEM and check above it. Etc.
I started writing scripts, but the number of lines gets kind of ridiculous. Here is where I started with just these 4 tables.
select 'Sales Order Master',* from somast where fcustno not in (select fcustno from slcdpm where ftype <> 'P')
SELECT 'Sales Order Item',* FROM soitem WHERE fsono NOT IN (SELECT fsono FROM somast)
select 'Sales Order Release',* from sorels where (fsono+finumber) not in (select (fsono+finumber) from SOITEM)
The reason I stopped was that I just realized that SORELS (the bottom table) only checks the table before it, not all of the tables before it.
Anyone know of a script I can use to make this more automated or a better way to do it?

And people sell database junk like this; it always amazes me what I see in commercial products.
This is genuine case for dynamic sql and a cursor, I think. This is exactly the kind of one-time administrative function that is reason why these techniques exist (they weren't really meant for production code mostly, but for administrative tasks).
I'd create a table showing each table and the table I think it should have a foriegn key to. (You may even be able to populate this from the system tables if they at least had a good naming convention).
Then I would use a cursor to go through the table and create the sql dymanically for looking to see if the FK table has orphaned records.
It's still alot of code but at least you didn't have to write it all.

Related

Create unique integer id column for result rows of union query

I have a view as below in which I union several tables and I'm thinking it might be a good idea to have a unique row number for each row in the result set. The prescient reason is I have an admin tool which doesn't know I'm using a view rather than an ordinary table, and which expects a unique id to be present, but I'm now speculating it might be worth doing more generally (i.e. it may make sense to do this in certain theoretical terms - discussion on this would be welcome). Wondering how to do this in postgresql.
CREATE VIEW subscriptions AS (
SELECT subscriber_id, course, end_at
FROM subscriptions_individual_stripe
UNION ALL SELECT subscriber_id, course, end_at
FROM subscriptions_individual_bank_transfer
ORDER BY end_at DESC);
Discussion
The reason these are separate tables is of course that they are actually different entities, and yet I also need to be able to contemplate them in a combined way, hence the VIEW. This is my way of avoiding so-called 'polymorphic relationships' in certain popular web frameworks.
I have a tool that expects an id and while my first thought was that views don't need a unique key, on the other hand, maybe they do...?
Reason being two records could exist in one of the UNIONed tables which were only unique by virtue of the primary key. If one does not include the primary key, the union should remove one of those, so a record would be lost. Should we also take that into account, i.e. select the primary key (here an integer id) for each of the UNIONed tables, but, "convert it" to some other unique id, so the view has its own unique integer primary key? Of course this won't be usable in terms of referencing anything in the original UNIONed tables, but I'm OK with that (The view is a terminal point of my analysis, I don't intend to do anything further with it, and of course it is not writable).
Update
I'm accepting S-Man's answer below because it is a solution to the question I asked, however, as pointed out, the row_number() must not be treated as if it was a real identifier because it will not be.
So as an important aside, I'm left wondering what row_number() is really intended for then. Perhaps it's (mainly? occasionally?) useful where you want to output some query when you plan to export the data somewhere else (i.e. seems almost spreadsheet-ish), and you abandon any sense of it being integrated with the rest of your database?
Table inheritance may be better as Abelisto has pointed out in the comments.
You can add a row count to the UNION using the row_number() window function:
demo:db<>fiddle
CREATE VIEW v_myview AS
SELECT
row_number() OVER (ORDER BY ...) AS id,
*
FROM (
SELECT ...
UNION
SELECT ...
) AS foo;
The main problem with this is: You should never deal with this id as an real identifier because the data of the table can change. So it could be that one table today generates a few records more than yesterday. So, the generated row numbers wouldn't match to the same record as before.
Edit: Removed the md5 solution I added before because of some problems with uniqueness on same data.

PostgreSQL 9.5 ON CONFLICT DO UPDATE command cannot affect row a second time

I have a table from which I want to UPSERT into another, when try to launch the query, I get the "cannot affect row a second time" error. So I tried to see if I have some duplicate on my first table regarding the field with the UNIQUE constraint, and I have none. I must be missing something, but since I cannot figure out what (and my query is a bit complex because it is including some JOIN), here is the query, the field with the UNIQUE constraint is "identifiant_immeuble" :
with upd(a,b,c,d,e,f,g,h,i,j,k) as(
select id_parcelle, batimentimmeuble,etatimmeuble,nb_loc_hab_ma,nb_loc_hab_ap,nb_loc_pro, dossier.id_dossier, adresse.id_adresse, zapms.geom, 1, batimentimmeuble2
from public.zapms
left join geo_pays_gex.dossier on dossier.designation_siea=zapms.id_dossier
left join geo_pays_gex.adresse on adresse.id_voie=(select id_voie from geo_pays_gex.voie where (voie.designation=zapms.nom_voie or voie.nom_quartier=zapms.nom_quartier) and voie.rivoli=lpad(zapms.rivoli,4,'0'))
and adresse.num_voie=zapms.num_voie
and adresse.insee=zapms.insee_commune::integer
)
insert into geo_pays_gex.bal2(identifiant_immeuble, batimentimmeuble, id_etat_addr, nb_loc_hab_ma, nb_loc_hab_ap, nb_loc_pro, id_dossier, id_adresse, geom, raccordement, batimentimmeuble2)
select a,b,c,d,e,f,g,h,i,j,k from upd
on conflict (identifiant_immeuble) do update
set batimentimmeuble=excluded.batimentimmeuble, id_etat_addr=excluded.id_etat_addr, nb_loc_hab_ma=excluded.nb_loc_hab_ma, nb_loc_hab_ap=excluded.nb_loc_hab_ap, nb_loc_pro=excluded.nb_loc_pro,
id_dossier=excluded.id_dossier, id_adresse=excluded.id_adresse,geom=excluded.geom, raccordement=1, batimentimmeuble2=excluded.batimentimmeuble2
;
As you can see, I use several intermediary tables in this query : one to store the street's names (voie), one related to this one storing the adresses (adresse, basically numbers related through a foreign key to the street's names table), and another storing some other datas related to the projects' names (dossier).
I don't know what other information I could give to help find an answer, I guess it is better I do not share the actual content of my tables since it may touch some privacy regulations or such.
Thanks for your attention.
EDIT : I found a workaround by deleting the entries present in the zapms table from the bal2 table, as such
delete from geo_pays_gex.bal2 where bal2.identifiant_immeuble in (select id_parcelle from zapms);
it is not entirely satisfying though, since I would have prefered to keep track of the data creator and the date of creation, as much as the fact that the data has been modified (I have some fields to store this information) and here I simply erase all this history... And I have another table with the primary key of the bal2 table as a foreign key. I am still in the DB creation so I can afford to truncate this table, but in production it wouldn't be possible since I would lose some datas.

Postgres Sequence out of sync

I'm running a multi-master setup with bucardo and postgres.
I'm finding that some of my table sequences are getting out of sync with each other. Particularly the auto-incremented id.
example:
db1 - table1
INSERT INTO distributors (did, dname) VALUES (DEFAULT, 'XYZ Widgets')
The id of the new row is 1
db2 - table1
INSERT INTO distributors (did, dname) VALUES (DEFAULT, 'XYZ Widgets')
The id of the new row is 1
The id of the new row on db2 should be 2, because bucardo has replicated the data from db1, but db2's auto increment is based on:
nextval('oauth_sessions_id_seq'::regclass)
And if we check the "oauth_sessions_id_seq" we see the last value as 0.
phew... Make sense?
Anyway, can I do any of the following?
Replicate the session tables with bucardo, so each DB's session is shared?
Manipulate the default auto-increment function above to take into account the max existing items in the table?
If you have any better ideas, please feel free to throw them in. Questions just ask, thanks for any help.
You are going to have to change your id generation method, because there is no Bucardo solution according to this comment in the FAQ.
Can Bucardo replicate DDL?
No, Bucardo relies on triggers, and Postgres does not yet provide DDL
triggers or triggers on its system tables.
Since Bucardo uses triggers, it cannot "see" the sequence changes, only the data in tables, which it replicates. Sequences are interesting objects that do not support triggers, but you can manually update them. I suppose you could add something like the code below before the INSERT, but there still might be issues.
SELECT setval('oauth_sessions_id_seq', (SELECT MAX(did) FROM distributors));
See this question for more information.
I am not fully up on all the issues involved, but you could perform the maximum calculation manually and do the insert operation in a re-try loop. I doubt it will work if you are actually doing inserts on both DBs and allowing Bucardo to replicate, but if you can guarantee that only one DB updates at a time, then you could try something like an UPSERT retry loop. See this post for more info. The "guts" of the loop might look like this:
INSERT INTO distributors (did, dname)
VALUES ((SELECT max(did)+1 FROM distributors), 'XYZ Widgets');
Irrespective of the DB (PostgreSQL, Oracle, etc.), dynamic sequence was created for each of the table which has the primary key associated with it.
Most of the sequences go out of sync whenever a huge import of data is happened or some person has manually modified the sequence of the table.
Solution: The only way we can set back the sequence is by taking the max value of the PK table and set the sequence next val to it.
The below query will list you out all the sequences created in your DB schema:
SELECT c.relname FROM pg_class c WHERE c.relkind = 'S';
SELECT MAX('primary_key') from table;
SELECT setval('the_primary_key_sequence', (SELECT MAX(the_primary_key) FROM the_table)+1);

Postgres table partitioning with star schema

I have a schema with one table with the majority of data, customer, and three other tables with foreign key references to customer.entry_id which is a BIGSERIAL field. The three other tables are called location, devices and urls where we store various data related to a specific entry in the customer table.
I want to partition the customer table into monthly child tables, and have that part worked out; customer will stay as-is, each month will have a table customer_YYYY_MM that inherits from the master table with the right CHECK constraint and indexes will be created on each individual child table. Data will be moved to the correct child tables while the master table stays empty.
My question is about the other three tables, as I want to partition them as well. However, they have no date information (at all), only the reference to the primary key from the master table. How can I setup the constraints on these tables? Is it even meaningful or possible without date information?
My application logic knows where to insert all the data (it's fairly trivial), but I expect to be able to do simple SELECT queries without specifying which child tables to get it from. So this should work as you would expect from non-partitioned tables:
SELECT l.*
FROM customer c
JOIN location l USING entry_id
WHERE c.date_field > '2015-01-01'
I would partition them by the reference key. The foreign key is used in join conditions and is not usually subject to change so it fulfills the following important points:
Partition by the information that is used mostly in the WHERE clauses of the queries or other parts where partitioning can be used to filter out tables that don't need to be scanned. As one guide puts it:
The objective when defining partitions should be to allow as many queries as possible to fetch data from as few partitions as possible - ideally one.
Partition by information that is not going to be changed so that rows don't constantly need to be thrown from one subtable to another
This all depends of the size of the tables too of course. If the sizes stay small then there is no need to partition.
Read more about partitioning here.
Use views:
create view customer as
select * from customer_jan_15 union all
select * from customer_feb_15 union all
select * from customer_mar_15;
create view location as
select * from location_jan_15 union all
select * from location_feb_15 union all
select * from location_mar_15;

What is the benefit of Enforce Join option in Crystal Reports?

What is the benefit the makes SAP Crystal reports Enforce Join default option in Link Dialog is "Not Enforced"?
Is it performance issue? because I noticed if you don't select field from the joined table it'll generate SELECT query with only fields of the selected Table only without any joins.
here's some information about the Enforce Join options:
Not Enforced: When you select this option, the link
you've created is used only if it's
explicitly required by the Select
statement. This is the default
option. Your users can create reports
based on the selected tables without
restriction (that is, without
enforcement based on other tables).
Enforced From: When you select this
option, if the "to" table for the
link is used, the link is enforced.
For example, if you create a link
from Table A to Table B using Enforce
From and select only a field from
Table B, the Select statement will
still include the join to Table A
because it is enforced. Conversely,
selecting only from Table A with the
same join condition will not cause
the join to Table B to be enforced.
Enforced To: When you select this
option, if the "from" table for the
link is used, the link is enforced.
For example, if you create a link
from Table A to Table B using Enforce
To and select only a field from Table
A, the join to Table B will be
enforced, and the Select statement
that is generated will include both
tables.
Enforced Both: When you select this
option, if either the "from" table or
the "to" table for this link is used,
the link is enforced.
The "enforced" part is used to FORCE the inclusion of tables that contain fields that are NOT used in the report/select conditions.
Well crap, that's what you said.
My understanding:
If you have two tables (tbl_A, tbl_B) w/ a link-able field, and you don't USE any field from the second table, it can be dropped from the select, and the "regular" effects of the join may disappear.
Select
'You're account is in default!' as Message,
tbl_A.full_name, tbl_A.street_address, tbl_A.city, tbl_A.blah_blah
From
all_customers tbl_A,
delinquent_accounts tbl_B
Where
tbl_A.account_no = tbl_B.account_no
Without the enforced join, might wind up as
Select
'You're account is in default!' as Message,
tbl_A.full_name, tbl_A.street_address, tbl_A.city, tbl_A.blah_blah
From
all_customers tbl_A,
In other words, you might wind up setting dunning letters to your whole customer base instead of just the delinquent accounts. (Which is why we test reports before implementing them, I guess).