On INSERT to a table INSERT data in connected tables - postgresql

I have two tables that have a column named id_user in common. These two tables are created in my Drupal webpage at some point (that I don't know because I didn't created the Netbeans project).
I checked on the internet and found that probably by adding REFERENCES 1sttable (id_user) to the second table, it should copy the value of the 1sttable (that is always created when a new user arrives) to the id_user value of the 2ndtable (that I don't know at which point is created). Is it correct?
If it's not correct I would like to know a way in pgAdmin that could make me synchronize those tables, or at least create both of them in the same moment.
The problem I have is that the new user has a new row on 1sttable automatically as soon as he registers, while to get a new row on 2ndtable it needs some kind of "activation" like inserting all of the data. What I'm looking for is a way that as soon as there is a new row in the 1sttable, it automatically creates the new row on the other table too. I don't know how to make it more clear (English is not my native language).
The solution you gave me seems clear for the question, but the problem is a little bigger: the two tables presents different kinds of variables, and it should be that they are, one in mySQL, with the user data (drupal default for users), then i have 2 in postgresql, both with the same primary key (id_user):
the first has 118 columns, most of them real integer;
the second has 50 columns, with mixed types.
the web application i'm using needs both this column with all the values NOT EMPTY (otherwise i get a NullPointerException) to work, so what i'm searching for is (i think):
when the user register -inserting his email- in drupal, automatically it creates the two fulfilled columns, to make the web automatically works as soon as the email is stored in mysql. Is it possible? Is it well explained?
My environment is:
windows server 2008 enterprise edition
glassfish 2.1
netbeans 6.7.1
drupal 6.17
postgresql 8.4
mysql 5.1.48

pgAdmin is just the GUI. You mean PostgreSQL, the RDBMS.
A foreign key constraint, like you have only enforces that no value can be used, that isn't present in the referenced column. You can use ON UPDATE CASCADE or ON DELETE CASCADE to propagate changes from the referenced column, but you cannot create new rows with it like you describe. You got the wrong tool.
What you describe could be achieved with a trigger. Another, more complex way would be a RULE. Go with a trigger here.
In PostgreSQL you need a trigger function, mostly using plpgsql, and a trigger on a table that makes use of it.
Something like:
CREATE OR REPLACE FUNCTION trg_insert_row_in_tbl2()
RETURNS trigger AS
$func$
BEGIN
INSERT INTO tbl2 (my_id, col1)
VALUES (NEW.my_id, NEW.col1) -- more columns?
RETURN NEW; -- doesn't matter much for AFTER trigger
END
$func$ LANGUAGE plpgsql;
And a trigger AFTER INSERT on tbl1:
CREATE TRIGGER insaft
AFTER INSERT ON tbl1
FOR EACH ROW EXECUTE PROCEDURE trg_insert_row_in_tbl2();

You might want to read about using Drupal hooks to add extra code to be run when a user is registered. Once you know how to use hooks, you can write code (in a module) to insert a corresponding record in the 2nd table. A good candidate hook to use here would be hook_user for Drupal 6 or hook_user_insert for Drupal 7.
The REFERENCES you read about is part of an SQL command to define a foreign key constraint from the second table to the first. This is not strictly necessary to solve your problem, but it can help in keeping your database consistent. I suggest you read up on database structures and constraints if you want to learn more on this topic.

Related

Auto generate script for CREATE TABLE including all indices, constraints, etc (not via SSMS)

I have a data anonymization process that takes a production copy of a database and turns it into an anonymized copy by UPDATE-ing some columns.
Some of the tables contain several million rows so instead of UPDATE-ing the columns, which is very log intensive, I went down the way of
SELECT
Id,
CAST('Redacted' AS NVARCHAR(255)) [ColumnRequiringAnonymization]
INTO MyTable_New
FROM MyTable
EXEC sp_rename MyTable, MyTable_old
EXEC sp_rename MyTable_new, MyTable
DROP TABLE MyTable_old
The problem with this approach is that the "new" table no longer has any of the keys, indices and other dependent objects. I have figured out the keys and indices using SPs to generate the DROP and CREATE scripts. The SPs are based on manually written SQL as can be seen e.g. in this answer.
The next problem is that we have a schemabound view on top of this table, which has indices and a full-text index on its own. The number of SPs to generate scripts is growing and I am sure there will be mistakes.
Is there a way to completely script a table/view by using SQL commands only? ie. just like SSMS does when you click "Script table as - CREATE to" but within a stored procedure?
Right-click on the database, select Tasks; there is Generate Scripts there. Just follow prompts or Google for additional information.

Way to migrate a create table with sequence from postgres to DB2

I need to migrate a DDL from Postgres to DB2, but I need that it works the same as in Postgres. There is a table that generates values from a sequence, but the values can also be explicitly given.
Postgres
create sequence hist_id_seq;
create table benchmarksql.history (
hist_id integer not null default nextval('hist_id_seq') primary key,
h_c_id integer,
h_c_d_id integer,
h_c_w_id integer,
h_d_id integer,
h_w_id integer,
h_date timestamp,
h_amount decimal(6,2),
h_data varchar(24)
);
(Look at the sequence call in the hist_id column to define the value of the primary key)
The business logic inserts into the table by explicitly providing an ID, and in other cases, it leaves the database to choose the number.
If I change this in DB2 to a GENERATED ALWAYS it will throw errors because there are some provided values. On the other side, if I create the table with GENERATED BY DEFAULT, DB2 will throw an error when trying to insert with the same value (SQL0803N), because the "internal sequence" does not take into account the already inserted values, and it does not retry with a next value.
And, I do not want to restart the sequence each time a provided ID was inserted.
This is the problem in BenchmarkSQL when trying to port it to DB2: https://sourceforge.net/projects/benchmarksql/ (File sqlTableCreates)
How can I implement the same database logic in DB2 as it does in Postgres (and apparently in Oracle)?
You're operating under a misconception: that sources external to the db get to dictate its internal keys. Ideally/conceptually, autogenerated ids will never need to be seen outside of the db, as conceptually there should be unique natural keys for export or reporting. Still, there are times when applications will need to manage some ids, often when setting up related entities (eg, JPA seems to want to work this way).
However, if you add an id value that you generated from a different source, the db won't be able to manage it. How could it? It's not efficient - for one thing, attempting to do so would do one of the following
Be unsafe in the face of multiple clients (attempt to add duplicate keys)
Serialize access to the table (for a potentially slow query, too)
(This usually shows up when people attempt something like: SELECT MAX(id) + 1, which would require locking the entire table for thread safety, likely including statements that don't even touch that column. If you try to find any "first-unused" id - trying to fill gaps - this gets more complicated and problematic)
Neither is ideal, so it's best to not have the problem in the first place. This is usually done by having id columns be autogenerated, but (as pointed out earlier) there are situations where we may need to know what the id will be before we insert the row into the table. Fortunately, there's a standard SQL object for this, SEQUENCE. This provides a db-managed, thread-safe, fast way to get ids. It appears that in PostgreSQL you can use sequences in the DEFAULT clause for a column, but DB2 doesn't allow it. If you don't want to specify an id every time (it should be autogenerated some of the time), you'll need another way; this is the perfect time to use a BEFORE INSERT trigger;
CREATE TRIGGER Add_Generated_Id NO CASCADE BEFORE INSERT ON benchmarksql.history
NEW AS Incoming_Entity
FOR EACH ROW
WHEN Incoming_Entity.id IS NULL
SET id = NEXTVAL FOR hist_id_seq
(something like this - not tested. You didn't specify where in the project this would belong)
So, if you then add a row with something like:
INSERT INTO benchmarksql.history (hist_id, h_data) VALUES(null, 'a')
or
INSERT INTO benchmarksql.history (h_data) VALUES('a')
an id will be generated and attached automatically. Note that ALL ids added to the table must come from the given sequence (as #mustaccio pointed out, this appears to be true even in PostgreSQL), or any UNIQUE CONSTRAINT on the column will start throwing duplicate-key errors. So any time your application needs an id before inserting a row in the table, you'll need some form of
SELECT NEXT VALUE FOR hist_id_seq
FROM sysibm.sysdummy1
... and that's it, pretty much. This is completely thread and concurrency safe, will not maintain/require long-term locks, nor require serialized access to the table.

Insert data from staging table into multiple, related tables?

I'm working on an application that imports data from Access to SQL Server 2008. Currently, I'm using a stored procedure to import the data individually by record. I can't go with a bulk insert or anything like that because the data is inserted into two related tables...I have a bunch of fields that go into the Account table (first name, last name, etc.) and three fields that will each have a record in an Insurance table, linked back to the Account table by the auto-incrementing AccountID that's selected with SCOPE_IDENTITY in the stored procedure.
Performance isn't very good due to the number of round trips to the database from the application. For this and some other reasons I'm planning to instead use a staging table and import the data from there. Reading up on my options for approaching this, a cursor that executes the same insert stored procedure on the data in the staging table would make sense. However it appears that cursors are evil incarnate and should be avoided.
Is there any way to insert data into one table, retrieve the auto-generated IDs, then insert data for the same records into another table using the corresponding ID, in a set-based operation? Or is a cursor my only option here?
Look at the OUTPUT clause. You should be able to add it to your INSERT statement to do what you want.
BTW, if you need to output columns into the second table that weren't inserted into the first one, then use MERGE instead of INSERT (as suggested in the comment to the original question) as its OUTPUT clause supports referencing other columns from the source table(s). Otherwise, keeping it with an INSERT is more straightforward, and it does give you access to the inserted identity column.
I'm having experiment to worked out in inserting multiple record into related table using databinding. So, try this!
Hopefully this is very helpful. Follow this link How to insert record into related tables. for more information.

Creating Stored Procedures that can work with different tables

I need to use the same Stored Procedures against many tables all with the same structure in my DB. This is data loaded from customers,with one table/customer and the data needs calculations/checks run before it's loaded to our DataWarehouse.
So far these are the options and issues I've found and I'm looking for a better pattern/approach.
Create a view that points to the
table I want to process, the SPs
then talk to that view. This works
well (especially once I'd worked out
how to create views 'automagically'
based on their columns). But the
view can only be used with one table
at a time, forcing the system to
deal with one customer at a time.
Use dynamic sql within each SP -
makes the SPs much harder to
read/debug and for those reasons has
been ruled out
Create a partitioned view across
all the tables and then use a
paramatised table function to return
just the data we're interested in -
ah but then I can't update the data
as the function returns a table that
can be only used for select
Use dynamic sql inside a function
(can't be done) to create a view
(which also can't be done) .... give
up
Within the SP create a temp table
with over the target table using
dynamic sql, but then the temp table
only exists in the session that runs
the dynamic sql not the 'parent'
session that's running the SP ...
give up
Create a global temp table using
dynamic SQL to avoid the scope issue
of 5, then run the SP against the
global temp table. Still run into
the single customer issue.
Create the view as in 1 within a
transaction and then run all the SPs
and then commit - works fine for one
user, but any others are now blocked
trying to create a new view of the
same name
Use a temporary view ... can't in
T/Sql
Move all the code into .Net - but
we have environment issues where
tsql is much easier to host/run
I know I'm not the only person who has this problem, have any of you good people solved it, please help.
Maybe your approach is wrong, I will go deep in details in a while but it seems that your problem can be solved using SSIS
-- Updated answer:
First, the big picture:
The most affordable way to process the tables dynamically is using a script instead of a stored procedure. If you want to make table access randomly chosen, you certainly will not use any of the performance advantages of stored procedures, i.e. execution plans. A SQL Script can be easily upgraded to point one table at runtime using placeholders and replacing it before executing.
The script can be loaded from the filesystem, a variable, a text column in a table, etc. The loading process consists in read the script content to a string variable. This step occurs once.
The next step is the preparation stage. This step will be executed for each table to be processed. The main business of this step is to replace the table placeholders with the current table being processed. Also is possible to set parameter values like any parameter you can need to pass into the sp that you already wrote.
The last step is the execution of the script. As is already loaded into a variable and the placeholders were set to the current table name, you can safely call a ExecuteSQLTask with the sql variable as the input. This process of course happens for each table you want to process.
Ok. Now let's see this in action.
This is a sample database model:
CREATE TABLE [dbo].[t_n](
[id] [int] IDENTITY(1,1) NOT NULL,
[name] [varchar](50) NOT NULL,
[start] [datetime] NULL,
CONSTRAINT [PK_t_n] PRIMARY KEY CLUSTERED ([id] ASC)
) ON [PRIMARY]
where t_n represents any table (t_1, t_2, t_3, etc).
This is your current stored procedure:
CREATE PROCEDURE SpProcessT_n
AS
BEGIN
SET NOCOUNT ON;
SELECT * FROM [t1];
END
GO
Now, transform this stored procedure to a Sql script, placing a placeholder instead of the table name
SET NOCOUNT ON;
SELECT * FROM [$table_name];
I choose to save this in a .sql file in the filesystem to keep the POC as simple as possible.
Next, create a SSIS Package like this:
These are the settings I choose to set up the loop:
And this is the way you can assign the table name to a variable called appropriately _table_name_
This is the setup of the script task, here you find that the variable _table_name_ has read only access, while a new variable called SqlExec has read/write access:
And this is it's Main function:
public void Main()
{
String Table_Name = Dts.Variables["table_name"].Value.ToString();
String SqlScript;
Regex reg = new Regex(#"\$table_name", RegexOptions.Compiled);
using (var f = File.OpenText(#"c:\sqlscript.sql")) {
SqlScript = f.ReadToEnd();
f.Close();
}
SqlScript = reg.Replace(SqlScript, Table_Name);
Dts.Variables["SqlExec"].Value = SqlScript;
Dts.TaskResult = (int)ScriptResults.Success;
}
You can notice that the Dts Variable SqlExec contains the sql script that will be executed. Now you can set the following options in your ExecuteSqlTask:
Successfully tested in MSSQL 2008, if you put a insert inside the script file you will notice new rows in each table.
Hope this helps!
If your application can afford to have one cut-off day late, then you can have a nightly scheduled job to run an SSIS package that will consolidate all 150+ tables into one single huge table. Since the freshness of the results of the queries against that huge table will then be 1 'date' late, this solution will not include any rows that recently been loaded.
You can actually time the running of this package. If it is still amazingly fast, say within 30 minutes, then you can bet to run it in every few hours, like during: the start of work day, lunch break, and end of day. This way you can have a nearly fresh data to query with.
Write a partitioned view including table names?
SELECT 'TableName', t.* FROM TableName t
UNION ALL
SELECT 'TableName2', t.* FROM TableName2 t
Then write a single instead of trigger which uses dynamic SQL for writing (less testing involved with that use of dynamic SQL because you'd just write the simple CRUD operations once for all tables I'd think)
I would not do this with SQL. What you are describing sounds like a traditional ETL situation.
Since all of the customer tables are the same, I would create a table in the data warehouse with all the columns from the client table, a surrogate key column, and a type identifier. You have an option to create a "staging" table here that will only have data in it during the ETL process, or just working on a single "live" table. I would create the staging table.
Then within SSIS package (don't worry you can still schedule from SQL Server agent, it hasn't totally left the DB server), start the ETL process...
E(xtract): copy the data from your source into the staging table in the data warehouse. You most likely want to use a sub-package within a foreach loop and changing the name of the table that you want to process from an external store (most people would say put this in the warehouse, but its up to you).
T(ransform): run the calculations/checks you were talking about, but do it on the whole set...
L(oad): Copy it to your real within the data warehouse.
There are a couple things I would NOT do.
1. Modify the data in the source table.
2. Try to do this in t-sql. Its just not what tsql is good at.
If you need more detail on this approach, I would probably ask the question with some Business Intelligence tags. I'll be traveling for the next week or so, but I will try to look at the comments to clear anything up if you need me to.
I am fairly certain that the standard way to solve this is using dynamic SQL in each sp (your option 2), which has already been ruled out.
Your goal is to make generic, multi-table SQL. I don't see how you intend to accomplish that without sacrificing some efficiency and readability.

Schema Patches Practices

We've got a schema in Postgres, and we want to institute a good method to apply schema patches.
Currently, we have a series of DDL files that create the schema, tables, sequences, functions, etc. We also have a population script for test environments. Those files are all used to recreate our database environments, for development, testing, etc.
We also have a number of 'patch' files that correspond to versions of our system. ie. patches/1.0.0.sql, patches/1.0.1.sql, etc. These files are used to update our staging and production databases.
This process works for us so far, but there has been some debate in-house how to best patch the schema.
I'm curious what others out there have, as a process, to patch staging and production schema's and how to manage versions of the database.
Thanks!
At work, for SQL Server, we write schema change scripts that first roll back the change to be made (idempotently, so the rollback section runs fine even if the schema change isn't applied yet), and then a section to apply the change. In TSQL it's easy to peek in the system catalogue or other tables to see if tables/columns/indices/rows already exist and do nothing if not.
In PostgreSQL, you're a bit more constrained with what commands you can simply send to the server-- but on the other hand, DDL is transactional, so a half-applied schema change shouldn't happen. I've adapted the scheme I'm used to at work to use on my own little projects quite well (overkill? but even here I have a dev/test db and a "real" db), for example:
\echo Rolling back schema change #35
BEGIN;
DELETE FROM schema_version WHERE schema_id = 35;
DROP TABLE IF EXISTS location_coordinates;
DROP FUNCTION IF EXISTS location_coordinates_populate();
END;
\echo Applying schema change #35
BEGIN;
INSERT INTO schema_version(schema_id, description) VALUES(35, 'Add location_coordinates table');
CREATE TABLE location_coordinates(
location_id INT PRIMARY KEY REFERENCES location(location_id),
latitude FLOAT NOT NULL,
longitude FLOAT NOT NULL,
earth_coordinates earth NOT NULL,
box_10miles cube NOT NULL
);
GRANT SELECT, INSERT, UPDATE, DELETE ON location_coordinates TO ui;
CREATE FUNCTION location_coordinates_populate() RETURNS TRIGGER LANGUAGE 'plpgsql' AS $$
BEGIN
new.earth_coordinates := ll_to_earth(new.latitude, new.longitude);
new.box_10miles := earth_box(new.earth_coordinates, 10 * 1609.344);
RETURN new;
END
$$;
CREATE TRIGGER location_coordinates_populate BEFORE INSERT OR UPDATE ON location_coordinates
FOR EACH ROW EXECUTE PROCEDURE location_coordinates_populate();
INSERT INTO location_coordinates(location_id, latitude, longitude)
SELECT location_id, latitude, longitude FROM location WHERE latitude IS NOT NULL AND longitude IS NOT NULL;
CREATE INDEX location_coordinates_10miles ON location_coordinates USING gist (box_10miles);
END;
\echo Done
This script can be run on the database just with "psql -f schema-changes/35.sql". By just cutting out up to the "applying..." message, I can get the commands to roll it back. And as you can see, the change maintains a metadata table "schema_version" so I can see which changes are applied. The entire change is done as a transaction, data migration and all. Here I've used the "IF EXISTS" capability of the DROP commands to make the rollback section happy even when the change is unapplied. Istr one thing we did at work for Oracle was write schema changes as PL/SQL-- you could perhaps have some functions in plpgsql to help with making changes?
Note that in the change above, where I'm migrating the "latitude" and "longitude" columns (which were nullable) out of "location" to a separate "location_coordinates" relation (and adding in the earthdistance stuff), I didn't drop the old columns. One thing we have to be careful of is to make schema changes backwards-compatible if possible. So I can apply this schema change before updating the app to use the new tables. I'd have a second change to drop the old columns to apply after updating the app. At work, these would be done in two different release cycles, so during release X we still have the optional of rolling back the app to release X-1 without having to roll back all the schema changes first; as well as being able to deploy schema changes in a separate window before the apps. (Technically I should have written a trigger so updates to the old table are synced to the new table, but I haven't because that's too much like work :))
We also have things like an application that spiders all our databases to see what's in the schema_version table, and tracks changes, so people can even see what changes were made without having to connect, and get an idea of the history of each change (we track "rolled back in dev", "applied in dev" etc). At work our schema_version table also includes authorship information etc. A magic way of applying the version information from version control would be cool- one problem we have is that if a SC gets applied in QA, for instance, then changed in Perforce, maybe no-one notices. So a way to track that schema change 35 revision #4 was applied would be good.
One thing to note- schema changes for us are numbered independently of application versions. Obviously they are related--- that's another thing the spidering app allows people to enter--- but we try to have lots of small changes rather than a giant "here's everything for release X" patch. Schema changes are also used for things like adding new indices, so might not be app-driven at all. In general, schema changes are "owned" by developers, not DBAs- although in the "create index" example above, the DBA is basically acting in a developer role and owning the schema change. Yes, we insist on a high level of SQL-ability from developers- although other groups in the company work a bit differently and give more work to the DB team.