Import and overwrite duplicate rows - postgresql

I'm importing some rows to my postgres database like so:
psql -U postgres import_test < 1432798324_data
Where my import_test is my database and 1432798324_data file is just plain text formatted like:
COPY cars FROM stdin;
<row data>
<row data>
...
\.
COPY drivers FROM stdin;
<row data>
<row data>
...
\.
(I got the format for this plain text file from the answer here).
This method works fine when I'm importing into a blank database. However, if the database isn't blank and during the import any duplicate rows are found I get an error:
ERROR: duplicate key value violates unique constraint "car_pkey"
Is there any way I could modify my import command to force an overwrite if duplicates are found? In other words, if I'm importing a row and there's already a row with that id, I want my new row to overwrite it.

You can import into a temporary table. Then you can delete rows that were already there before you copy over the new data:
create temporary table import_drivers as select * from drivers limit 0;
copy import_drivers from stdin;
begin transaction;
delete from drivers
where id in
(
select id
from import_drivers
);
insert into drivers
select *
from import_drivers;
commit transaction;

One way to deal with this where you are constantly doing a bulk import (lets say daily) is to use table partitioning.
You would just add a time field to your cars and drivers table. The time field is the time when you do the import. Your primary key will have to change for both tables as a two tuple of your existing primary key and the time field.
Once you are done you then just drop older tables (if you are using a daily scheme then you would drop the previous day) or alternatively use max(time_field) in your queries.

Related

Delete duplicates from a huge table in Postgresql

I have an unusual problem: I need to delete duplicate records from a table in Postgresql. As i have duplicate records so i dont have primary key and unique index in this table. The table conatins like 20million records and it has duplicate records in it. While i am trying the below query it is taking too long time.
'DELETE FROM temp a using temp b where a.recordid=b.recordid and a.ctid < b.ctid;'
So what should be a better approach to handle such huge table with no index in it?
Appreciate for help.
if you have enough empty space, your can copy table without duplicates, then remove old table and rename new table
like this
INSERT INTO new_table
VALUES
SELECT
DISTINCT ON (column)
*
FROM old_table
ORDER BY column ASC
Use COPY TO to dump the table.
Then Unix sort -u to de-duplicate it.
Drop or truncate the table in Postgres, use COPY FROM to read it back in.
Add a primary key column.

PostgreSQL id column not defined

I am new in PostgreSQL and I am working with this database.
I got a file which I imported, and I am trying to get rows with a certain ID. But the ID is not defined, as you can see it in this picture:
so how do I access this ID? I want to use an SQL command like this:
SELECT * from table_name WHERE ID = 1;
If any order of rows is ok for you, just add a row number according to the current arbitrary sort order:
CREATE SEQUENCE tbl_tbl_id_seq;
ALTER TABLE tbl ADD COLUMN tbl_id integer DEFAULT nextval('tbl_tbl_id_seq');
The new default value is filled in automatically in the process. You might want to run VACUUM FULL ANALYZE tbl to remove bloat and update statistics for the query planner afterwards. And possibly make the column your new PRIMARY KEY ...
To make it a fully fledged serial column:
ALTER SEQUENCE tbl_tbl_id_seq OWNED BY tbl.tbl_id;
See:
Creating a PostgreSQL sequence to a field (which is not the ID of the record)
What you see are just row numbers that pgAdmin displays, they are not really stored in the database.
If you want an artificial numeric primary key for the table, you'll have to create it explicitly.
For example:
CREATE TABLE mydata (
id integer GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
obec text NOT NULL,
datum timestamp with time zone NOT NULL,
...
);
Then to copy the data from a CSV file, you would run
COPY mydata (obec, datum, ...) FROM '/path/to/csvfile' (FORMAT 'csv');
Then the id column is automatically filled.

Update/Insert from CSV to Table

for example i need to export mytbl as csv
CREATE TABLE public.mytbl
(
id integer,
product character varying(20),
patent character varying(50)
)
WITH (
OIDS = FALSE
)
;
and i use the following query to export the mytbl into csv
copy(select * from mytbl) to 'D:\mytbl.csv' with csv header
and using COPY mytbl FROM 'D:\mytbl.csv' CSV HEADER this will inserts from csv
but i need to delete the existing data in mytbl before importing it from mytbl.csv,
when i deletes getting error
ERROR: update or delete on table "mytbl" violates foreign key constraint "mytblX_forinkey_productid" on table "mytblX"
how to overcome this ?
On PostgreSQL 9.2
It appears that your mytblX has a FK to mytbl. Before you can drop your mytbl you should ALTER TABLE mytblX DROP CONSTRAINT mytblX_forinkey_productid. Then you can copy the data back in and issue ALTER TABLE mytblX ADD your_table_constraint.
Note that FK constraints are based on an index so you should create the appropriate index on the newly copied in data before you recreate the FK constraint. Also note that the new data may not meet the requirements set by mytblX data; i.e. if that references a productid which is not in the data you copy into the database then you will have problems that need to be solved first (usually manually and tediously).
You can set the constraint as deferrable, and then defer it. This will let you delete the contents of the original table and reload it from the file within a single transaction. But if the file doesn't contain all the rows it needs to satisfy the constraint, then you will get an error on COMMIT.

How to copy table between two models in Mysql workbench?

I am doing some databese thing, I need copy one table from one model to another, but i try many ways there no effect.
Is there any way for doing this?
If you just want to do a single table through the MySQL Workbench.
In MySQL Workbench:
Connect to a MySQL Server
Expand a Database
Right Click on a table
Select Copy To Clipboard
Select Create Statement
A create statement for the table will be copied to your clipboard similar to the below:
CREATE TABLE `cache` (
`cid` varchar(255) NOT NULL DEFAULT '',
`data` longblob,
`expire` int(11) NOT NULL DEFAULT '0',
`created` int(11) NOT NULL DEFAULT '0',
`headers` text,
`serialized` smallint(6) NOT NULL DEFAULT '0',
PRIMARY KEY (`cid`),
KEY `expire` (`expire`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
Create the table in the new database
Open a new SQL tab for executing queries (File->New Query Tab)
Alter the create table code to include the database to create the table on.
CREATE TABLE `databaseName`.`cache` (
`cid` varchar(255) NOT NULL DEFAULT '',
`data` longblob,
`expire` int(11) NOT NULL DEFAULT '0',
`created` int(11) NOT NULL DEFAULT '0',
`headers` text,
`serialized` smallint(6) NOT NULL DEFAULT '0',
PRIMARY KEY (`cid`),
KEY `expire` (`expire`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
Then click the Execute button (looks like a lightening Bolt)
That will copy the table schema from one db to another using the MySQL workbench. Just refresh the tables in the database and you should see your newly added table
Select tab with source database
In menu: Server->Data Export
Select Schema and the Table as Schema Object
Select option Export to Self-Contained File and check Create Dump in a Single Transaction (self-contained only)
Copy full file path to clipboard
Start Export
Select tab with target database
In menu: Server->Data Import. Make sure your target database name is at the top left corner of the Data Import view
Select Import from self contained file and paste full file path from clipboard
Select Default Target Schema
Select Dump Content (Dump Structure and Data etc…)
Start Import
Your best option is probably to create a stripped down version of the model that contains the objects you want to carry over. Then open the target model and run File -> Include Model.... Select the stripped down source model and there you go.
You can just use a select statement. Here I am creating a duplicate of "original_table" table from the "original_schema" schema/database to the "new_schema" schema :
CREATE TABLE new_schema.duplicate_table AS
Select * from original_schema.original_table;
You can just put any select statement you need ,add a condition and select the columns :
CREATE TABLE new_schema.duplicate_table AS
SELECT column1, column2
FROM original_schema.original_table
WHERE column2 < 11000000;
I think it is worth mentioning that
a copied table may reference fields in tables of the original schema, that do not exist, in the schema where it's to be copied. It might be a good idea, to inspect the table for these discrepancies, before adding it to the other schema.
it's probably a good idea, to check engine compatibility (e.g. InnoDB vs MyISAM) and character set.
step 1 : Righit click on table > copy to clipboard > create statement
step 2: paste clipboard in the query field of workbench.
step 3: remove (``) from the name of the table and name of the model(schema)followed by a dot.
eg : `cusine_menus` -> schema_name.cusine_menus
execute
If you already have your table created and just want to copy the data, I'd recommend using the "Export Data Wizard" and "Import Data Wizard". It is basically choosing stuff in the program for exporting and then importing the data and is easy to use.
MySQL has an article on the wizards here: Table Data Export and Import Wizard
To copy data using the wizards, do the following:
Find the table in the list from which you want to copy data from.
Right click and choose "Table Data Export Wizard."
Choose the columns you wish to copy.
Choose a location to save a *.csv or *.json file with the copied data.
Find the table to insert the copied data to.
Right click and choose "Table data import wizard".
Choose the file you just exported.
Map the columns from the table you copied from to the table you insert to.
Press "Finish". The data is inserted as you chose.
In this post, we are going to show you how to copy a table in MySQL
First, this query will copy the data and structure, but the indexes are not included:
CREATE TABLE new_table SELECT * FROM old_table;
Second, this query will copy the table structure and indexes, but not data:
CREATE TABLE new_table LIKE old_table;
So, to copy everything, including database objects such as indexes, primary key constraint, foreign key constraints, triggers, etc., run these queries:
CREATE TABLE new_table LIKE old_table;
INSERT new_table SELECT * FROM old_table;
If you want to copy a table from one database to another database:
CREATE TABLE destination_db.new_table LIKE source_db.old_table;
INSERT destination_db.new_table
SELECT
*
FROM
source_db.old_table;
create table .m_property_nature like .m_property_nature;
INSERT INTO .m_property_nature SELECT * from .m_property_nature;
You can get the crate table query from table info and use the same query on different database instance.
show create table TABLENAME.content and copy the query;
Run the generated query on another Db instance connected.

PostgreSQL bulk insert with ActiveRecord

I've a lot of records that are originally from MySQL. I massaged the data so it will be successfully inserted into PostgreSQL using ActiveRecord. This I can easily do with insertions on row basis i.e one row at a time. This is very slow I want to do bulk insert but this fails if any of the rows contains invalid data. Is there anyway I can achieve bulk insert and only the invalid rows failing instead of the whole bulk?
COPY
When using SQL COPY for bulk insert (or its equivalent \copy in the psql client), failure is not an option. COPY cannot skip illegal lines. You have to match your input format to the table you import to.
If data itself (not decorators) is violating your table definition, there are ways to make this a lot more tolerant though. For instance: create a temporary staging table with all columns of type text. COPY to it, then fix offending rows with SQL commands before converting to the actual data type and inserting into the actual target table.
Consider this related answer:
How to bulk insert only new rows in PostreSQL
Or this more advanced case:
"ERROR: extra data after last expected column" when using PostgreSQL COPY
If NULL values are offending, remove the NOT NULL constraint from your target table temporarily. Fix the rows after COPY, then reinstate the constraint. Or take the route with the staging table, if you cannot afford to soften your rules temporarily.
Sample code:
ALTER TABLE tbl ALTER COLUMN col DROP NOT NULL;
COPY ...
-- repair, like ..
-- UPDATE tbl SET col = 0 WHERE col IS NULL;
ALTER TABLE tbl ALTER COLUMN col SET NOT NULL;
Or you just fix the source table. COPY tells you the number of the offending line. Use an editor of your preference and fix it, then retry. I like to use vim for that.
INSERT
For an INSERT (like commented) the check for NULL values is trivial:
To skip a row with a NULL value:
INSERT INTO (col1, ...
SELECT col1, ...
WHERE col1 IS NOT NULL
To insert sth. else instead of a NULL value (empty string in my example):
INSERT INTO (col1, ...
SELECT COALESCE(col1, ''), ...
A common work-around for this is to import the data into a TEMPORARY or UNLOGGED table with no constraints and, where data in the input is sufficiently bogus, text typed columns.
You can then do INSERT INTO ... SELECT queries against the data to populate the real table with a big query that cleans up the data during import. You can use a lot of CASE statements for this. The idea is to transform the data in one pass.
You might be able to do many of the fixes in Ruby as you read the data in, then push the data to PostgreSQL using COPY ... FROM STDIN. This is possible with Ruby's Pg gem, see eg https://bitbucket.org/ged/ruby-pg/src/tip/sample/copyfrom.rb .
For more complicated cases, look at Pentaho Kettle or Talend Studio ETL tools.