PostgreSQL, import a csv file - postgresql

I'm trying to copy a CSV file into an empty table, after trying to match the columns in the CSV which failed with the exact same error.
COPY books
FROM '/path/to/file/books.csv' CSV HEADER;
error:
ERROR: extra data after last expected column
CONTEXT: COPY books, line 2: "1,Harry Potter and the Half-Blood Prince (Harry Potter #6),J.K. Rowling/Mary GrandPré,4.57,0439785..."
SQL state: 22P04
Also, I would like that the publication_date will be of date type, so it can be queried, How can that be applied during copying?
a piece of the CSV file:
bookID| title | authors | average_rating | isbn|isbn13 |num_pages | ratings_count| text_reviews_count| publication_date|
----------------------------------------------------------------------------------------------------------------------------------
1 | harry potter | | | |
|(harry Potter | | | |
| #6) | author | 4 |"num" | "num"| 600 | 3243 | 534 | 01/01/2000 |
SELECT * FROM books
output:
bookID| title | authors | average_rating | isbn |isbn13| language_code
---------------------------------------------------------------------------------
text | character| text | integer | text | text | character |
| varying | | | | | varying
| num_pages | ratings_count| text_reviews_count| publication_date| publisher
-------------------------------------------------------------------------------
| integer | bigint | bigint | date | character
varying

First of all, the columns number mismatch from CSV file and TABLE, but you can specify via COPY command the columns for the table that you need:
COPY books (bookID,title,authors,average_rating,isbn,isbn13,num_pages,ratings_count , text_reviews_count , publication_date) FROM '/path/to/file/books.csv' CSV header delimiter ',';
You can specify you delimiter

Related

Dump integer-only sql table to binary file?

I have a PostgrqSQL table w/ a bunch of columns that are just different sizes of integer.
Table "public.place2022_final"
Column | Type | Collation | Nullable | Default
---------------+---------+-----------+----------+---------
toff | integer | | |
palette_index | bigint | | |
censorship | boolean | | |
row0 | integer | | |
col0 | integer | | |
row1 | integer | | |
col1 | integer | | |
uint_id | bigint | | |
seqno | bigint | | |
I can export it to a CSV, but for my purposes I really want the data to be small. Is there a way I can create a minimal dump to a binary file, w/ a format something like
<8 bytes for # of rows in table><4 bytes for row 1 toff><8 bytes for row 1 palette_index>...<do that for all fields, then repeat for all rows>.
I also know for a fact that all these bigints can be squashed down to 32-bit ints... so doing that "conversion" here would be nice too.

Weird ghost records in PostgreSQL - what are they?

I have a very weird issue on our postgresql DB. I have a table called "statement" which has some strange records in it.
Using the command line console psql, I query select * from customer.statement where type in ('QUOTE'); and get 12 rows back. 7 rows look normal, 5 are missing all data except a single column which is a nullable column but seems to hold real values entered by the user. psql tells me that 7 rows were returned even though there are 12. Most of the other columns are not nullable. The weird records look like this:
select * from customer.statement where type = 'QUOTE';
id | issuer_id | recipient_id | recipient_name | recipient_reference | source_statement_id | catalogue_id | reference | issue_date | due_date | description | total | currency | type | tax_level | rounding_mode | status | recall_requested | time_created | time_updated | time_paid
------------------+------------------+------------------+----------------+---------------------+---------------------+--------------+-----------+------------+------------+------------------------------------------------------------------+-----------+----------+-------+-----------+---------------+-----------+------------------+----------------------------+----------------------------+-----------
... 7 valid records removed ...
| | | | | | | | | | Build bulkheads and sheet with plasterboard. +| | | | | | | | | |
| | | | | | | | | | Patch all patches. +| | | | | | | | | |
| | | | | | | | | | Set and sand all joints ready for painting. +| | | | | | | | | |
| | | | | | | | | | Use wall angle on bulkhead in main bedroom. +| | | | | | | | | |
| | | | | | | | | | Build nib and sheet and set in entrance | | | | | | | | | |
(7 rows)
If I run the same query using pgAdmin, I don't see those weird records.
Anyone know what these are?
The plus sign before the separator (+|) indicates a newline character in the displayed string value in psql. So no additional rows, just the same row continued with line breaks. The final line of output in your quote confirms as much: (7 rows).
In pgAdmin you don't see the extra lines as long as you don't increase the height of the field (or copy / paste the content somewhere), but there are multiple lines as well.
Try in psql and in pgAdmin:
test=# SELECT E'This\nis\na\ntest.' AS multi_line, 'foo' AS single_line;
multi_line | single_line
--------------+-------------
This +| foo
is +|
a +|
test. |
(1 row)
The manual about psql:
linestyle
Sets the border line drawing style to one of ascii, old-ascii, or unicode. [...] The default setting is ascii. [...]
ascii style uses plain ASCII characters. Newlines in data are shown using a + symbol in the right-hand margin. [...]

Error in Insert query : syntax error at or near ","

My insert query is,
insert into app_library_reports
(app_id,adp_id,reportname,description,searchstr,command,templatename,usereporttemplate,reporttype,sentbothfiles,useprevioustime,usescheduler,cronstr,option,displaysettings,isanalyticsreport,report_columns,chart_config)
values
(25,18,"Report_Barracuda_SpamDomain_summary","Report On Domains Sending Spam Emails","tl_tag:Barracuda_spam AND action:2","BarracudaSpam/Report_Barracuda_SpamDomain_summary.py",,,,,,,,,,,,);
Schema for the table 'app_library_reports' is:
Table "public.app_library_reports"
Column | Type | Modifiers | Storage | Stats target | Description
-------------------+---------+------------------------------------------------------------------+----------+--------------+-------------
id | integer | not null default nextval('app_library_reports_id_seq'::regclass) | plain | |
app_id | integer | | plain | |
adp_id | integer | | plain | |
reportname | text | | extended | |
description | text | | extended | |
searchstr | text | | extended | |
command | text | | extended | |
templatename | text | | extended | |
usereporttemplate | boolean | | plain | |
reporttype | text | | extended | |
sentbothfiles | text | | extended | |
useprevioustime | text | | extended | |
usescheduler | text | | extended | |
cronstr | text | | extended | |
option | text | | extended | |
displaysettings | text | | extended | |
isanalyticsreport | boolean | | plain | |
report_columns | json | | extended | |
chart_config | json | | extended | |
Indexes:
"app_library_reports_pkey" PRIMARY KEY, btree (id)
Foreign-key constraints:
"app_library_reports_adp_id_fkey" FOREIGN KEY (adp_id) REFERENCES app_library_adapter(id)
"app_library_reports_app_id_fkey" FOREIGN KEY (app_id) REFERENCES app_library_definition(id)
When I execute insert query it gives error:ERROR: syntax error at or near ","
Please help me to find out this error.Thank you.
I'm fairly certain your immediate error is coming from the empty string of commas (i.e. ,,,,,,,) appearing at the end of the INSERT. If you don't want to specify values for a particular column, you can pass NULL for the value. But in your case, since you only specify values for the first 6 columns, another way is to just specify those 6 columns names when you insert:
INSERT INTO app_library_reports
(app_id, adp_id, reportname, description, searchstr, command)
VALUES
(25, 18, 'Report_Barracuda_SpamDomain_summary',
'Report On Domains Sending Spam Emails', 'tl_tag:Barracuda_spam AND action:2',
'BarracudaSpam/Report_Barracuda_SpamDomain_summary.py')
This insert would only work if the columns not specified accept NULL. If some of the other columns are not nullable, then you would have to pass in values for them.

psycopg2 is mixing column names with values depending on the table selected

I query a whole postgres table using
c.execute("select * from train_temp")
trans=np.array(c.fetchall())
and amid the expected data I got one row with the column names.
trans[-1,]
Out[63]:
array(['ACTION', 'RESOURCE', 'MGR_ID', 'ROLE_ROLLUP_1', 'ROLE_ROLLUP_2',
'ROLE_DEPTNAME', 'ROLE_TITLE', 'ROLE_FAMILY_DESC', 'ROLE_FAMILY',
'ROLE_CODE', None, None, None, None, None, None, None, None, None], dtype=object)
More puzzling is the fact the the number of rows returned match the number of row in the table
trans.shape
Out[67]: (32770, 19)
select count(1) from train_temp ;
count
-------
32770
(1 row)
Here's the schema of the table
Table "public.train_temp"
Column | Type | Modifiers | Storage | Description
---------------------+------------------+-----------+----------+-------------
action | text | | extended |
resource | text | | extended |
mgr_id | text | | extended |
role_rollup_1 | text | | extended |
role_rollup_2 | text | | extended |
role_deptname | text | | extended |
role_title | text | | extended |
role_family_desc | text | | extended |
role_family | text | | extended |
role_code | text | | extended |
av_role_code | double precision | | plain |
av_role_family | double precision | | plain |
av_role_family_desc | double precision | | plain |
av_role_title | double precision | | plain |
av_role_deptname | double precision | | plain |
av_role_rollup_2 | double precision | | plain |
av_role_rollup_1 | double precision | | plain |
av_mgr_id | double precision | | plain |
av_resource | double precision | | plain |
Has OIDs: no
What's going on here? Note it does not happen with all tables. Actually for this last one the process works fine
Table "public.play"
Column | Type | Modifiers | Storage | Description
-----------+------------------+-----------+----------+-------------
row.names | text | | extended |
action | double precision | | plain |
color | text | | extended |
type | text | | extended |
Has OIDs: no
This last table is completely passed as string, while the problematic one respects the data types.
play[1,]
Out[73]:
array(['2', '0.0', 'blue', 'car'],
dtype='|S5')
trans[1,]
Out[74]:
array(['1', '0', '36', '117961', '118413', '119968', '118321', '117906',
'290919', '118322', 0.920412992041299, 0.942349726775956,
0.933439675174014, 0.920412992041299, 0.976, 0.964478764478764,
0.949222217031812, 0.909090909090909, 0.923076923076923], dtype=object)
Thanks for insight.
Actually I just wrote the headers myself when importing the *csv into postgres.
I should have used the header option in psql such
\copy test from 'test.csv' with (delimiter ',' , format csv, header TRUE);

Using named fields to determine ranges with vsum in Emacs org-table-mode, impossible?

I have been trying to simplify a semi-complex table that I have by adding named fields, without a problem, until I get to the vsum operator. I had the formula set to $M=vsum($3..#-4) which works, however I am continuously having to add and remove items from those fields, which changes the column numbering. This results in me having to change the field specifications of the vsum range after every update/change. I thus tried naming the top field and bottom fields with the thought of supplying the named variables to vsum, giving me a table similar to the following:
| / | <> | <> |
|---+--------+---------|
| | Title1 | Title 2 |
|---+--------+---------|
| _ | | START |
| | name | 1000 |
| | name | 3456 |
| | name | 123 |
| ^ | | END |
|---+--------+---------|
| _ | | MT |
| # | Total | #ERROR |
| # | | |
|---+--------+---------|
#+TBLFM: $MT=vsum($START..$END)
This is the debug formula output from the above table:
Substitution history of formula
Orig: vsum($START..$END)
$xyz-> vsum((1000)..(123))
#r$c-> vsum((1000)..(123))
$1-> vsum((1000)..(123))
-----------^
Error: Expected `)'
I have tried embrasing the named field variables in parenthesis, and several other ways but have thus far not been able to get this to work. I am hoping I am just missing something and being blind, but perhaps this is not possible to do?
I have also tried the sum-up function with no success as well. Thank you in advance for your assistance.
The following solution works by using #II and #III to refer to all entries between the second and third hline.
| / | <> | <> |
|---+--------+---------|
| | Title1 | Title 2 |
|---+--------+---------|
| | name | 1000 |
| | name | 3456 |
| | name | 123 |
|---+--------+---------|
| _ | | MT |
| # | Total | 4579 |
| # | | |
|---+--------+---------|
#+TBLFM: $MT=vsum(#II..#III)
Documentation: http://orgmode.org/manual/References.html#References