Easiest way to combine 6 .csv files into 1 table - postgresql

I need to merge 6 .csv files into one table (and then 1 .csv). Table has only one column (email). I am very new at this...
Currently I am doing it right this:
CREATE TABLE tablename (
email char(200)
);
and then, one by one I Do this, and for some reason instead of 40mb file I get 500mb file.
COPY tablename(email) from 'E:\WORK\FXJohn1.csv' DELIMITER ',' CSV HEADER
and I do it 5 more times
COPY tablename(email) from 'E:\WORK\FXJohn2.csv' DELIMITER ',' CSV HEADER
COPY tablename(email) from 'E:\WORK\FXJohn3.csv' DELIMITER ',' CSV HEADER
COPY tablename(email) from 'E:\WORK\FXJohn4.csv' DELIMITER ',' CSV HEADER
COPY tablename(email) from 'E:\WORK\FXJohn5.csv' DELIMITER ',' CSV HEADER
COPY tablename(email) from 'E:\WORK\FXJohn6.csv' DELIMITER ',' CSV HEADER

Issuing the command several times is the proper way to load several files.
The size increase is caused by the use of char(200) as it fills the 200 characters for each row. You should be using varchar(200) as it will store a shorter string without padding it, or event text if you don't need to enforce a size limit. See the doc

Related

copy columns of a csv file into postgresql table

I have a CSV file with 12 - 11 - 10 or 5 columns.
After creating a PostgreSQL table with 12 columns, I want to copy this CSV into the table.
I use this request:
COPY absence(champ1, champ2, num_agent, nom_prenom_agent, code_gestion, code_service, calendrier_agent, date_absence, code_absence, heure_absence, minute_absence, periode_absence)
FROM 'C:\temp\absence\absence.csv'
DELIMITER '\'
CSV
My CSV file contains 80000 line.
Ex :
20\05\ 191\MARKEY CLAUDIE\GA0\51110\39H00\21/02/2020\1471\03\54\Matin
21\05\ 191\MARKEY CLAUDIE\GA0\51110\39H00\\8130\7H48\Formation avec repas\
30\05\ 191\MARKEY CLAUDIE\GA0\51430\39H00\\167H42\
22\9993\Temps de déplacement\98\37
when I execute the request, I get a message indicating that there is missing data for the lines with less than 12 fields.
Is there a trick?
copy is extremely fast and efficient, but less flexible because of that. Specifically it can't cope with files that have a different number of "columns" for each line.
You can either use a different import tool, or if you want to stick to built-in tools, copy the file into staging table that only has a single column, then use Postgres string functions to split the lines into the columns:
create unlogged table absence_import
(
line text
);
\COPY absence_import(line) FROM 'C:\temp\absence\absence.csv' DELIMITER E'\b' CSV
E'\b' is the "backspace" character which can't really appear in a text file, so no column splitting is taking place.
Once you have imported the file, you can split the line using string_to_array() and the insert that into the real table:
insert into absence(champ1, champ2, num_agent, nom_prenom_agent, code_gestion, code_service, calendrier_agent, date_absence, code_absence, heure_absence, minute_absence, periode_absence)
select line[1], line[2], line[3], .....
from (
select string_to_array(line, '\') as line
from absence_import
) t;
If there are non-text columns, might need to cast the values to the target data type explicitly: e.g. line[3]::int.
You can add additional expressions to deal with missing columns, e.g. something like: coalesce(line[10], 'default value')

Why does my postgresql csv export have more rows than the table?

I am trying to copy a table in a postgresql database (version 10.12) via psql. One of the rows contains strings representing xml data. When I query the database for a row count with this query I get a count of about 50,000:
select count(column) from table;
But when I try to export the data to a csv file the output has more than 1,000,000 rows! I don't understand how a csv export could have a different number of rows than the table!
This is the copy command:
\copy (select column from table) to 'directory/output.csv' with csv;
It doesn't seem to matter if I change the delimiter or quote either. I've tried using | as a delimiter and ` as a quote and the number of rows in the csv was the same. Why is the row count different in the csv export?
The row count is not different: the CSV output simply has linefeeds (LF, ASCII code 10) embedded in fields, which is expected in XML.
If you want one line per row with COPY, don't use CSV, use the text format, that is, just omit with csv. Then newlines are encoded with \n instead of being output verbatim.

SQL server Openquery equivalent to PostgresQL

Is there query equivalent to sql server's openquery or openrowset to use in postgresql to query from excel or csv ?
You can use PostgreSQL's COPY
As per doc:
COPY moves data between PostgreSQL tables and standard file-system
files. COPY TO copies the contents of a table to a file, while COPY
FROM copies data from a file to a table (appending the data to
whatever is in the table already). COPY TO can also copy the results
of a SELECT query
COPY works like this:
Importing a table from CSV
Assuming you already have a table in place with the right columns, the command is as follows
COPY tblemployee FROM '~/empsource.csv' DELIMITERS ',' CSV;
Exporting a CSV from a table.
COPY (select * from tblemployee) TO '~/exp_tblemployee.csv' DELIMITERS ',' CSV;
Its important to mention here that generally if your data is in unicode or need strict Encoding, then Always set client_encoding before running any of the above mentioned commands.
To set CLIENT_ENCODING parameter in PostgreSQL
set client_encoding to 'UTF8'
or
set client_encoding to 'latin1'
Another thing to guard against is nulls, while exporting , if some fields are null then PostgreSQL will add '/N' to represent a null field, this is fine but may cause issues if you are trying to import that data in say SQL server.
A quick fix is modify the export command by specifying what would you prefer as a null placeholder in exported CSV
COPY (select * from tblemployee ) TO '~/exp_tblemployee.csv' DELIMITERS ',' NULL as E'';
Another common requirement is import or export with the header.
Import CSV to table with Header for columns present in first row of csv file.
COPY tblemployee FROM '~/empsource.csv' DELIMITERS ',' CSV HEADER
Export a table to CSV with Headers present in the first row.
COPY (select * from tblemployee) TO '~/exp_tblemployee.csv' DELIMITERS ',' CSV HEADER

Data correction exporting CSV file to Postgres

I am importing a csv file into postgres, and would like to know how to import the correct data type while using the COPY command. For instance, I have a column column_1 integer; and want to insert the value 6 into it from my csv file.
I run the command copy "Table" from 'path/to/csv' DELIMITERS ',' CSV; and every time I try to do this I get the error ERROR: invalid input syntax for integer: "column_1". I figured out that it's because it is automatically importing every piece of data from the csv file as a string or text. If I change the column type to text then it works successfully, but this defeats the purpose of using a number as I need it for various calculations. Is there a way to conserve the data type when transferring? Is there something I need to change in the csv file? Or is there another datatype to assign to column_1? Hope this makes sense. Thanks in advance!
I did this and it worked flawlessly:
I put the plain number in the stack.csv
(The stack.csv has only one value 6)
# create table stack(i int);
# \copy stack from 'stack.csv' with (format csv);
I read in your comment that you have 25 columns in your CSV file. You need to have at least 25 columns in your table. All columns need to be mapped from CSV. If you have more than 25 columns in table you need the map only the columns mapped from CSV.
That's why it works at a text field because all data is put in one row cell.
If you have more columns that "fields" in your CSV file than the format is like this
\copy stack(column1, column2, ..., column25) from 'stack.csv' with (format csv);

Creating a postgres External table from a CSV file (comma separated) which has an email column with multiple email addressed separated by comma

I am trying to create a postgres external table from a CSV file which has one of its column as email address and this column has multiple email addresses separated by comma. Since the delimited for the file is a comma, when creating an external table, it is not able to differentiate between the "," within a column from the "," between columns. The list of emails in the email column is enclosed in double quotes as well.
Is there a way to load it without changing the delimited for the entire file?
It's standard practice with CSV to enclose fields between double quotes.
COPY... CSV will import this directly.
Example:
CREATE TABLE email(id int, emails text);
COPY email FROM STDIN CSV;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> 1,"abc#example.com,def#example.com"
>> \.
Result:
select * from email ;
id | emails
----+---------------------------------
1 | abc#example.com,def#example.com
(1 row)