I have a PostgreSQL database on an Amazon EC2 instance, and I am trying to seed the database with 100M lines of data (I have 10 files of 10M lines each). I used the secure copy (scp) command to move the csv files into the EC2 instance. When I try to copy the csv files into the database, it is taking way too much time. Is there a way to fasten the procedure?
Here is my schema.sql file:
DROP DATABASE IF EXISTS reviews_db;
CREATE DATABASE reviews_db;
\c reviews_db;
CREATE TABLE reviews (
id INT PRIMARY KEY NOT NULL,
houseId INT NOT NULL,
name VARCHAR(30) NOT NULL,
picture VARCHAR(55) NOT NULL,
reviewText TEXT NOT NULL,
reviewDate VARCHAR (15) NOT NULL,
accuracyRating INT NOT NULL,
locationRating INT NOT NULL,
communicationRating INT NOT NULL,
checkinRating INT NOT NULL,
cleanlinessRating INT NOT NULL,
valueRating INT NOT NULL,
overallRating DECIMAL NOT NULL
);
CREATE INDEX ON reviews (houseId);
Then, on my EC2 instance, I run this command to seed the database :
pv ./reviews1.csv | psql -U postgres -d reviews_db -c "COPY reviews FROM STDIN with (FORMAT csv);"
Note: my reviews1.csv file is 3.1GB, and it is uploading at a speed of 200kb/s
Related
I am running a script that creates a database, some tables with foreign keys and insert some data, but somehow creating the tables is not working, although it doesn't throw any error: I go to pgAdmin, look for the tables created and there's no one...
When I copy the text of my script and execute it into the Query Tool, it works fine and creates the tables.
Can you please explain me what I am doing wrong?
Script:
DROP DATABASE IF EXISTS test01 WITH (FORCE); --drops even if in use
CREATE DATABASE test01
WITH
OWNER = postgres
ENCODING = 'UTF8'
LC_COLLATE = 'German_Germany.1252'
LC_CTYPE = 'German_Germany.1252'
TABLESPACE = pg_default
CONNECTION LIMIT = -1
IS_TEMPLATE = False
;
CREATE TABLE customers
(
customer_id INT GENERATED ALWAYS AS IDENTITY,
customer_name VARCHAR(255) NOT NULL,
PRIMARY KEY(customer_id)
);
CREATE TABLE contacts
(
contact_id INT GENERATED ALWAYS AS IDENTITY,
customer_id INT,
contact_name VARCHAR(255) NOT NULL,
phone VARCHAR(15),
email VARCHAR(100),
PRIMARY KEY(contact_id),
CONSTRAINT fk_customer
FOREIGN KEY(customer_id)
REFERENCES customers(customer_id)
ON DELETE CASCADE
);
INSERT INTO customers(customer_name)
VALUES('BlueBird Inc'),
('Dolphin LLC');
INSERT INTO contacts(customer_id, contact_name, phone, email)
VALUES(1,'John Doe','(408)-111-1234','john.doe#bluebird.dev'),
(1,'Jane Doe','(408)-111-1235','jane.doe#bluebird.dev'),
(2,'David Wright','(408)-222-1234','david.wright#dolphin.dev');
I am calling the script from a Windows console like this:
"C:\Program Files\PostgreSQL\15\bin\psql.exe" -U postgres -f "C:\Users\my user name\Desktop\db_create.sql" postgres
My script is edited in Notepad++ and saved with Encoding set to UTF-8 without BOM, as per a suggestion found here
I see you are using -U postgres command line parameter, and also using database name as last parameter (postgres).
So all your SQL commands was executed while you are connected to postgres database. Of course, CREATE DATABASE command did creation of test01 database, but CREATE TABLE and INSERT INTO did executed not for test01 database, but for postgres database, and all your tables are in postgres database, but not in test01.
You need to split your SQL script into 2 scripts (files): first for 'CREATE DATABASE', second for the rest of.
You need to execute first script as before, like
psql.exe -U postgres -f "db_create_1.sql" postgres
And for second one need to choose the database which was created at 1st step, like
psql.exe -U postgres -f "db_create_2.sql" test01
I just set up a new foreign table and it works as intended if I just select the "ID" (integer) field.
When I add the "Description"(text) field and try to select the table, it fails with this error message:
utf-8 'Codec cannot decode byte 0xfc in position 10: invalid start byte
After checking the remote table, I found that "Description" contains special characters like: "ö, ü, ä"
What can i do to fix this?
Table definitions (Only first 2 rows)
Remote table:
CREATE TABLE test (
[Id] [char](8) NOT NULL,
[Description] [nvarchar](50) NOT NULL
)
Foreign table:
Create Foreign Table "Test" (
"Id" Char(8),
"Description" VarChar(50)
) Server "Remote" Options (
schema_name 'dbo', table_name 'test'
);
Additional information:
Foreign data wrapper: tds_fdw
Local server: Postgres 12, encoding: UTF8
Remote server: Sql Server, encoding: Latin1_General_CI_AS
As Laurenz Albe suggested in the comments, I created a freetds.conf in my PostgreSQL folder with the following content:
[global]
tds version = auto
client charset = UTF-8
Don't forget to set the path to the configuration file in the environment variable FREETDS.
Powershell:
[System.Environment]::SetEnvironmentVariable('FREETDS','C:\Program Files\PostgreSQL\12',[System.EnvironmentVariableTarget]::Machine)
I want to import the csv file into database table .but it was not working..
I run the bash shell in the linux env .
CREATE TABLE test.targetDB (
no int4 NOT NULL GENERATED ALWAYS AS IDENTITY,
year varchar(15) NOT NULL,
name bpchar(12) NOT NULL,
city varchar(15) NOT NULL,
ts_load timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (seq_no)
)
test.csv file
"2019","112 ","1123",2019-07-26-05.33.43.000000
Linux Cmd
psql -d $database -c " COPY test.targetDB from 'test.csv' delimiter ',' csv "
Error
ERROR: invalid input syntax for integer: "2019"
CONTEXT: COPY targetDB, line 1, column no: "2019"
How can I resolve this issue
You need to tell copy, that the no column is not part of the CSV file by specifying the columns that should be populated:
COPY test.targetDB(year, name, city, ts_load) from 'test.csv' delimiter ',' csv
I would recommend, using datagrip - a postgresql client tool. You can use a evaluation version if you don't wish to purchase. It's pretty simple from the UI to import a file rather using a command line.
I am following the examples in CREATE TABLE:
CREATE TABLE distributors (
did integer PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
name varchar(40) NOT NULL CHECK (name <> '')
);
However, it gives me ERROR: syntax error at or near "GENERATED". Why is that and how should I fix it?
\! psql -V returns psql (PostgreSQL) 10.5 (Ubuntu 10.5-1.pgdg14.04+1)
SELECT version(); returns PostgreSQL 9.4.19 on x86_64-pc-linux-gnu (Ubuntu 9.4.19-1.pgdg14.04+1), compiled by gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4, 64-bit
Edits:
Thanks to #muistooshort, I checked the 9.4 docs. So I execute:
CREATE TABLE distributors (
did integer PRIMARY KEY DEFAULT nextval('serial'),
name varchar(40) NOT NULL CHECK (name <> '')
);
Nevertheless, it now gives me ERROR: relation "serial" does not exist...
The SQL standard IDENTITY was added in PostgreSQL 10 but your server (which does all the real work) is 9.4. Before 10 you have to use serial or bigserial types:
CREATE TABLE distributors (
did serial not null primary key,
name varchar(40) NOT NULL CHECK (name <> '')
);
The serial type will create a sequence to supply values, attach the sequence to the table, and hook up a default value for did to get values from the sequence.
I'm trying to build a table with csvsql.
When I use command:
csvsql --db mysql://user:password#localhost:3306/database_name --table table_name file.csv
I get the error:
(in table 'blabla', column 'xyz'): VARCHAR requires a length on dialect mysql
I've then tried to build a database schema and force it with --db-schema flag,
The db-schema format is:
CREATE TABLE table_name (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`x` varchar(29) DEFAULT NULL,
`y` int(10) NOT NULL DEFAULT '0',
`z` BOOL NOT NULL,
PRIMARY KEY (`id`),
KEY `indexed` (`indexed`)
);
but I still get the same error.
The complete command with db-schema is:
csvsql --db mysql://user:password#localhost:3306/database_name --table table_name --db-schema db_schema_filename csvfile.csv
I've read the manual for csvkit, but I don't get what I'm doing wrong.
This command should print the conversion result right?
Can someone please help?
Thank you.
Well, found the solution in the github.
https://github.com/wireservice/csvkit/issues/758#issue-201924611
After update from github, no more errors and tables are created normaly.