I have a list of synonyms in a csv file format: word,meaning1,meaning2,meaning3....
Different words have different number of synonyms which means that rows are likely to have a variable number of columns. I am trying to import the csv file into an sqlite database like so:
sqlite3 synonyms
sqlite> create table list(word text, meaning0 text, meaning1 text, meaning2 text, meaning3 text, meaning4 text, meaning5 text, meaning6 text, meaning7 text, meaning8 text, meaning9 text);
sqlite> .mode list
sqlite> .separator ,
sqlite> .import ./csv/synonyms.csv list
To be on the safe side, I assumed a max. number of 10 columns to each word. For those words with less than 10 synonyms, the other columns should be null. The error I get on executing the import command is:
Error: ./csv/synonyms.csv line 1: expected 11 columns of data but found 3
My question(s):
1. In case the number of columns is less than 10, how can I tell SQLite to substitute it with null?
2. Is there some way of specifying that I want 10 columns after word instead of typing it automatically?
You can do following:
Import all data into single column;
Update table splitting column contents into other columns.
Sample:
-- Create a table with only one column;
CREATE TABLE table_name(first);
-- Choose a separator which doesn't exist within file
.separator ~
-- Import data
.import file.csv table_name
-- Add another column to split data
ALTER TABLE table_name ADD COLUMN second;
-- Split data between first and second column
UPDATE table_name SET first=SUBSTR(first, 1, INSTR(first, ",")-1), second=SUBSTR(first, INSTR(first, ",")+1) WHERE INSTR(first, ",")>0;
-- Repeat to next column
ALTER TABLE table_name ADD COLUMN third;
-- Split data between second and third column
UPDATE table_name SET second=SUBSTR(second, 1, INSTR(second, ",")-1), third=SUBSTR(second, INSTR(second, ",")+1) WHERE INSTR(second, ",")>0;
-- And so on...
ALTER TABLE table_name ADD COLUMN fourth;
UPDATE table_name SET third=SUBSTR(third, 1, INSTR(third, ",")-1), fourth=SUBSTR(third, INSTR(third, ",")+1) WHERE INSTR(third, ",")>0;
-- Many times as needed...
Not being an optimal method, sqlite performance should render it enough fast.
Related
The maximum size of limited character types (e.g. varchar(n)) in Postgres is 10485760.
description on max length of postgresql's varchar
Please download the file for testing and extract it in /tmp/2019q4, we only use pre.txt to import data with.
sample data
Enter you psql and create a database:
postgres=# create database edgar;
postgres=# \c edgar;
Create table according to the webpage:
fields in pre table definations
edgar=# create table pre(
id serial ,
adsh varchar(20),
report numeric(6,0),
line numeric(6,0),
stmt varchar(2),
inpth boolean,
rfile char(1),
tag varchar(256),
version varchar(20),
plabel varchar(512),
negating boolean
);
CREATE TABLE
Try to import data:
edgar=# \copy pre(adsh,report,line,stmt,inpth,rfile,tag,version,plabel,negating) from '/tmp/2019q4/pre.txt' with delimiter E'\t' csv header;
We analyse the error info:
ERROR: value too long for type character varying(512)
CONTEXT: COPY pre, line 1005798, column plabel: "LIABILITIES AND STOCKHOLDERS EQUITY 0
0001493152-19-017173 2 11 BS 0 H LiabilitiesAndStockholdersEqu..."
Time: 1481.566 ms (00:01.482)
1.What size i set in the field is just 512 ,more less than 10485760.
2.the content in line 1005798 is not same as in error info:
0001654954-19-012748 6 20 EQ 0 H ReclassificationAdjustmentRelatingToAvailableforsaleSecuritiesNetOfTaxEffect 0001654954-19-012748 Reclassification adjustment relating to available-for-sale securities, net of tax effect" 0
Now i drop the previous table ,convert the plabel field as text,re-create it:
edgar=# drop table pre;
DROP TABLE
Time: 22.763 ms
edgar=# create table pre(
id serial ,
adsh varchar(20),
report numeric(6,0),
line numeric(6,0),
stmt varchar(2),
inpth boolean,
rfile char(1),
tag varchar(256),
version varchar(20),
plabel text,
negating boolean
);
CREATE TABLE
Time: 81.895 ms
Import the same data with same copy command:
edgar=# \copy pre(adsh,report,line,stmt,inpth,rfile,tag,version,plabel,negating) from '/tmp/2019q4/pre.txt' with delimiter E'\t' csv header;
COPY 275079
Time: 2964.898 ms (00:02.965)
edgar=#
No error info in psql console,let me check the raw data '/tmp/2019q4/pre.txt' ,which it contain 1043000 lines.
wc -l /tmp/2019q4/pre.txt
1043000 /tmp/2019q4/pre.txt
There are 1043000 lines,how much lines imported then?
edgar=# select count(*) from pre;
count
--------
275079
(1 row)
Why so less data imported without error info ?
The sample data you provided is obviously not the data you are really loading. It does still show the same error, but of course the line numbers and markers are different.
That file occasionally has double quote marks where there should be single quote marks (apostrophes). Because you are using CSV mode, these stray double quotes will start multi-line strings, which span all the way until the next stray double quote mark. That is why you have fewer rows of data than lines of input, because some of the data values are giant multiline strings.
Since your data clearly isn't CSV, you probably shouldn't be using \copy in CSV format. It loads fine in text format as long as you specify "header", although that option didn't become available in text format until v15. For versions before that, you could manually remove the header line, or use PROGRAM to skip the header like FROM PROGRAM 'tail +2 /tmp/pre.txt' Alternatively, you could keep using CSV format, but choose a different quote character, one that never shows up in your data such as with (delimiter E'\t', format csv, header, quote E'\b')
I have a CSV file with 12 - 11 - 10 or 5 columns.
After creating a PostgreSQL table with 12 columns, I want to copy this CSV into the table.
I use this request:
COPY absence(champ1, champ2, num_agent, nom_prenom_agent, code_gestion, code_service, calendrier_agent, date_absence, code_absence, heure_absence, minute_absence, periode_absence)
FROM 'C:\temp\absence\absence.csv'
DELIMITER '\'
CSV
My CSV file contains 80000 line.
Ex :
20\05\ 191\MARKEY CLAUDIE\GA0\51110\39H00\21/02/2020\1471\03\54\Matin
21\05\ 191\MARKEY CLAUDIE\GA0\51110\39H00\\8130\7H48\Formation avec repas\
30\05\ 191\MARKEY CLAUDIE\GA0\51430\39H00\\167H42\
22\9993\Temps de déplacement\98\37
when I execute the request, I get a message indicating that there is missing data for the lines with less than 12 fields.
Is there a trick?
copy is extremely fast and efficient, but less flexible because of that. Specifically it can't cope with files that have a different number of "columns" for each line.
You can either use a different import tool, or if you want to stick to built-in tools, copy the file into staging table that only has a single column, then use Postgres string functions to split the lines into the columns:
create unlogged table absence_import
(
line text
);
\COPY absence_import(line) FROM 'C:\temp\absence\absence.csv' DELIMITER E'\b' CSV
E'\b' is the "backspace" character which can't really appear in a text file, so no column splitting is taking place.
Once you have imported the file, you can split the line using string_to_array() and the insert that into the real table:
insert into absence(champ1, champ2, num_agent, nom_prenom_agent, code_gestion, code_service, calendrier_agent, date_absence, code_absence, heure_absence, minute_absence, periode_absence)
select line[1], line[2], line[3], .....
from (
select string_to_array(line, '\') as line
from absence_import
) t;
If there are non-text columns, might need to cast the values to the target data type explicitly: e.g. line[3]::int.
You can add additional expressions to deal with missing columns, e.g. something like: coalesce(line[10], 'default value')
I have a table with more than 30.000 entries and have to add a new column (zip_prefixes) containing the first digit of the a zip code (zcta).
I created the column successfully:
alter table zeta add column zip_prefixes text;
Then I tried to put the values in the column with:
update zeta
set zip_prefixes = (
select substr(cast (zctea as text)1,1)
from zeta)
)
Of course I got:
error more than one row returned by a subquery used as an expression
How can I get the first digit of the value from zctea into column zip_prefixes of the same row?
No need for sub-select:
update zeta
set zip_prefixes = substr(zctea, 1, 1);
update zeta
set zip_prefixes = substr(zctea as text)1,1)
There is no need for select query and casting
Consider not adding a functionally dependent column. It's typically cleaner and cheaper overall to retrieve the first character on the fly. If you need a "table", I suggest to add a VIEW.
Why the need to cast(zctea as text)? A zip code should be text to begin with.
Name it zip_prefix, not zip_prefixes.
Use the simpler and cheaper left():
CREATE VIEW zeta_plus AS
SELECT *, left(zctea::text, 1) AS zip_prefix FROM zeta; -- or without cast?
If you need the additional column in the table and the first character is guaranteed to be an ASCII character, consider the data type "char" (with double quotes). 1 byte instead of 2 (on disk) or 5 (in RAM). Details:
What is the overhead for varchar(n)?
Any downsides of using data type "text" for storing strings?
And run both commands in one transaction if you need to minimize lock time and / or avoid a visible column with missing values in the meantime. Faster, too.
BEGIN;
ALTER TABLE zeta ADD COLUMN zip_prefix "char";
UPDATE zeta SET zip_prefixes = left(zctea::text, 1);
COMMIT;
I had a question surrounding the COPY command in PostgreSQL. I have a CSV file that I only want to copy some of the columns values into my PostgreSQL table.
Is it possible to do this? I am familiar with using the COPY command to copy all of the data from a CSV into a table using the header to map to the column names but how is this possible when I only want some of the columns?
Either pre-process the CSV file, or (what I probably would do) import into a temporary copy of the target table and INSERT only selected columns in a second step:
CREATE TEMP TABLE tmp AS SELECT * FROM target_table LIMIT 0;
ALTER TABLE tmp ADD COLUMN etra_column1 text
, ADD COLUMN etra_column2 text; -- add excess columns
COPY tmp FROM '/path/tp/file.csv';
INSERT INTO target_table (col1, col2, col3)
SELECT col1, col2, col3 FROM tmp -- only reelvant columns
WHERE ... -- optional, to also filter rows
A temporary table is dropped automatically at the end of the session. If the processing takes longer, use a regular table.
COPY target_table FROM PROGRAM 'cut -f1,2,3 -d, /path/tp/file.csv';
I have many tables in Sqlite3 db and now I want to export it to PostgreSQL, but all the time I get errors.
I've used different techniques to dump from sqlite:
.mode csv
.header on
.out ddd.sql
select * from my_table
and
.mode insert
.out ddd.sql
select * from my_table
And when I try to import it through phppgadmin I get errors like this:
ERROR: column "1" of relation "my_table" does not exist
LINE 1: INSERT INTO "public"."my_table" ("1", "1", "Vitas", "a#i.ua", "..
How to avoid this error?
Thanks in advance!
Rant
You get this "column ... does not exist" error with INSERT INTO "public"."my_table" ("1", ... - because quotes around the "1" mean this is an identifier, not literal.
Even if you fix this, the query still will give error, because of missing VAULES keyword, as Jan noticed in other answer.
The correct form would be:
INSERT INTO "public"."my_table" VALUES ('1', ...
If this SQL was autogenerated by sqlite, bad for sqlite.
This great chapter about SQL syntax is only about 20 pages in print. My advice to whoever generated this INSERT, is: read it :-) it will pay off.
Real solution
Now, to the point... To transfer table from sqlite to postgres, you should use COPY because it's way faster than INSERT.
Use CSV format as it's understood on both sides.
In sqlite3:
create table tbl1(one varchar(20), two smallint);
insert into tbl1 values('hello',10);
insert into tbl1 values('with,comma', 20);
insert into tbl1 values('with "quotes"', 30);
insert into tbl1 values('with
enter', 40);
.mode csv
.header on
.out tbl1.csv
select * from tbl1;
In PostgreSQL (psql client):
create table tbl1(one varchar(20), two smallint);
\copy tbl1 from 'tbl1.csv' with csv header delimiter ','
select * from tbl1;
See http://wiki.postgresql.org/wiki/COPY.
Seems there is missing "VALUES" keyword:
INSERT INTO "public"."my_table" VALUES (...)
But! - You have to insert values with appropriate quotes - single quotes for text and without quotes for numbers.