How to transfer data from a csv with multiple/single delimiter to postgres DB - postgresql

Hi I have a dataset in CSV. I want to import it into Postgre DB in a peculiar format
The data in the CSV is in following format.
1::comedy*drama*horror
2::suspense*thriller
Now I want to import this in a Postgre table having two columns id and genre, where id is a foreign key as :
id genre
1 comedy
1 drama
1 horror
2 suspense
2 thriller
Appreciate your help thanks!

Related

How to COPY CSV file into table resolving foreign key values into ids

I'm expert at mssql but a beginner with PostgreSQL so all my assumptions are likely wrong.
Boss is asking me to get a 300 MB CSV file into PostgreSQL 12 (1/4 million rows and 100+ columns). The file has usernames in 20 foreign key columns that would need to be looked up and converted to id int values before getting inserted into a table. The COPY command doesn't seem to handle joining a csv to other tables before inserting. Am I going in a wrong direction? I want to test locally but ultimately am only allowed to give the CSV to a DBA for importing into a docker instance on a server. If only I could use pgAdmin and directly insert the rows!

Import CSV in PGAdmin 3 with autoincrement ID

I want to import CSV in PGAdmin III with ID field auto increment. I am not able to do it. The CSV contains a column as ID with null values.
need help.

How do I map tables with n columns to a database?

We are currently using PostgreSQL, now have to save some tables in a database. The tables are never updated once created, but may be filtered.
The tables are dynamic in nature, as there may be n columns,
so a table would be:
|------|--------|--------|
| NAME | DATA 1 | DATA 2 |
|------|--------|--------|
another table would be:
|------|--------|--------|--------|--------|--------|
| NAME | DATA 1 | DATA 2 | DATA 3 | DATA 4 | DATA 5 |
|------|--------|--------|--------|--------|--------|
The data is not normalized because it hurts when dealing with n rows as all rows are read all at once.
These are the solutions that I come up with,
Save the table as JSON in a JSON Type or HStore pairs.
Save the table as CSV data in a Text Field
What are the alternative methods to store the above data? Can NoSQL databases handle this data?
I see nothing in your question that would keep you from using plain tables with the according number of data columns. That's the most efficient form of storage by far. Smallest storage size, fastest queries.
Tables that are "never updated once created, but may be filtered" are hardly "dynamic". Unless you are withholding essential details that's all there is.
And unless there can be more than several 100 columns. See:
What is the maximum number of columns in a PostgreSQL select query
(But you later commented a maximum of 12, which is no problem at all.)
From what you've described, it sounds like a job for jsonb. Assuming name is unique in a certain table, I can imagine sth like this:
create table test (
tableId integer,
name text,
data jsonb,
constraint pk primary key(tableId, name)
);
insert into test values (1, 'movie1', '{"rating": 10, "name": "test"}');
insert into test values (1, 'movie2', '{"rating": 9, "name": "test2"}');
insert into test values (2, 'book1', '{"rank": 100, "name": "test", "price": 10}');
insert into test values (2, 'book2', '{"rank": 10, "name": "test", "price": 12}');
Basically the idea is to use tableId to identify each sub-table and store rows of the subtables in this one db table.
This opens some possibilities:
create a separate table to store metadata about each sub-table. For example, schema of the sub-tables could be stored here for application layer validation.
partial index on large/hot sub-tables: create index test_1_movie_name on test ((data->>'name')) where tableid = 1
Dynamic column means schema less is the option we should look for . MongoDB is preferred. Are we storing as JSON ? If so Mongo will help manipulatin data / extracting / reporting will make life easier.
If you are not familiar with NOSQL . MSSQL 2016 onwards JSON storage in column is supported as varchar(MAX). SQL Server is providing the functions to deal with JSON data. Even though its a text based index by default for nvarchar . SQL supports computed column based indexing which will help to handle the elements look in JSON. Any number of non clustered index computed column is allowed which will ease the indexing to handle JSON data.
SQL 2019 has more support for JSON

Postgresql CSV import not working [duplicate]

I use basketball data tables to get some understanding of Postgres 9.2 & phppgadmin. Therefore I would like to import csv tables into that database. However, I get:
ERROR: missing data for column "year"
CONTEXT: COPY coaches, line 1: ""coachid";"year";"yr_order";"firstname";"lastname";"season_win";"season_loss";"playoff_win";"playoff..."
with command:
\copy coaches FROM '/Users/Desktop/Database/NBAPostGres/DataOriginal/coaches_data.csv' DELIMITER ',' CSV;
The current table has no missings. So my questions are:
What did I wrong and if using a table with missing values?
How to import such table or handle such structure generally(also in respect to missing values)?
Data structure:
coachid year yr_order firstname lastname season_win
HAMBLFR01 204 2 Frank Hamblen 10
RUSSEJO01 1946 1 John Russell 22
I used:
varchar integer integer character character integer
You can have columns missing for the whole table. Tell COPY (or the psql wrapper \copy) to only fill selected columns by appending a column list to the table:
\copy coaches (coachid, yr_order, firstname)
FROM '/Users/.../coaches_data.csv' (FORMAT csv, HEADER, DELIMITER ',');
Missing values are filled in with column defaults. The manual:
If there are any columns in the table that are not in the column list,
COPY FROM will insert the default values for those columns.
But you cannot have values missing for just some rows. That's not possible. The text representation of NULL can be used (overruling respective column defaults).
It's all in the manual, really:
SQL-COPY
psql \copy
ERROR: missing data for column "year" CONTEXT: COPY coaches, line 1:
""coachid";"year";"yr_order";"firstname";"lastname";"season_win";"season_loss";"playoff_win";"playoff..."
This type of error is also the result of a Table-mismatch. The table you are importing the text file into either has more columns or less columns than the text file has.

How to import tables with missing values?

I use basketball data tables to get some understanding of Postgres 9.2 & phppgadmin. Therefore I would like to import csv tables into that database. However, I get:
ERROR: missing data for column "year"
CONTEXT: COPY coaches, line 1: ""coachid";"year";"yr_order";"firstname";"lastname";"season_win";"season_loss";"playoff_win";"playoff..."
with command:
\copy coaches FROM '/Users/Desktop/Database/NBAPostGres/DataOriginal/coaches_data.csv' DELIMITER ',' CSV;
The current table has no missings. So my questions are:
What did I wrong and if using a table with missing values?
How to import such table or handle such structure generally(also in respect to missing values)?
Data structure:
coachid year yr_order firstname lastname season_win
HAMBLFR01 204 2 Frank Hamblen 10
RUSSEJO01 1946 1 John Russell 22
I used:
varchar integer integer character character integer
You can have columns missing for the whole table. Tell COPY (or the psql wrapper \copy) to only fill selected columns by appending a column list to the table:
\copy coaches (coachid, yr_order, firstname)
FROM '/Users/.../coaches_data.csv' (FORMAT csv, HEADER, DELIMITER ',');
Missing values are filled in with column defaults. The manual:
If there are any columns in the table that are not in the column list,
COPY FROM will insert the default values for those columns.
But you cannot have values missing for just some rows. That's not possible. The text representation of NULL can be used (overruling respective column defaults).
It's all in the manual, really:
SQL-COPY
psql \copy
ERROR: missing data for column "year" CONTEXT: COPY coaches, line 1:
""coachid";"year";"yr_order";"firstname";"lastname";"season_win";"season_loss";"playoff_win";"playoff..."
This type of error is also the result of a Table-mismatch. The table you are importing the text file into either has more columns or less columns than the text file has.