is there an way to upload 212 columns csv files in PostgreSQL - postgresql

I have a csv file with 122 columns I am trying this in Postgres. I am trying this
create tble appl_train ();
\copy appl_train FROM '/path/ to /file' DELIMITER ',' CSV HEADER;
I get this error
ERROR: extra data after last expected column
CONTEXT: COPY application_train, line 2: "0,100001,Cash loans,F,N,Y,0,135000.0,568800.0,20560.5,450000.0,Unaccompanied,Working,Higher educatio..."

The error message means that the number of columns of your table is less then the number of columns of your csv files.
If the DDL of your table is exactly what you reported, you created a table with no columns. You have to enumerate (at least) all column name and column data type while creating a table, as reported from documentation:
CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXISTS ] table_name ( [
{ column_name data_type [ COLLATE collation ] [ column_constraint [ ... ] ]
| table_constraint
| LIKE parent_table [ like_option ... ] }
[, ... ]
] )
[ INHERITS ( parent_table [, ... ] ) ]
[ WITH ( storage_parameter [= value] [, ... ] ) | WITH OIDS | WITHOUT OIDS ]
[ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
[ TABLESPACE tablespace ]
In your code you should have something like this:
create table appl_train (
first_column_name integer,
second_column_name integer,
third_column_name character varying (20),
// more fields here
)

Related

Trying to create query to export data as csv

I have a Postgresql table I wish to export as CSV on demand using a query, without superuser.
I tried:
COPY myapp_currencyprice to STDOUT WITH (DELIMITER ',', FORMAT CSV, HEADER) \g /tmp/prices.csv
But I get a syntax error at "\g"
So I tried:
\copy myapp_currencyprice to '/tmp/prices.csv' with (DELIMITER ',', FORMAT CSV, HEADER)
But I also get a syntax error at "" from "\copy"
You can do the following in psql.
SELECT 1 as one, 2 as two \g /tmp/1.csv
then in psql
\! cat /tmp/1.csv
or you can
copy (SELECT 1 as one, 2 as two) to '/tmp/1.csv' with (format csv , delimiter '|');
But You can't STDOUT and filename. Because in manual(https://www.postgresql.org/docs/current/sql-copy.html):
COPY { table_name [ ( column_name [, ...] ) ] | ( query ) }
TO { 'filename' | PROGRAM 'command' | STDOUT }
[ [ WITH ] ( option [, ...] ) ]
the Vertical line | means: you must choose one alternative.(source: https://www.postgresql.org/docs/14/notation.html)

SBT throwing JdbcSQLExceptionSyntax exception for an PostgreSQL query

I am running a SQL query using slick to write to a PostgreSQL DB. Why am i getting a Syntax error in SQL statement error? Please assume all configurations are correct.
I have imported import slick.jdbc.PostgresProfile.api._ in the client and import slick.jdbc.H2Profile.api._ in the query builder. I have also separated postgresql and MySQL statements into different builders.
import bbc.rms.client.programmes.util.MySqlStringEscaper
import org.joda.time.DateTime
import slick.jdbc.H2Profile.api._
abstract class PopularBlurProgrammesQueryBuilder extends QueryBuilder with
MySqlStringEscaper {
def incrementBlurScoreQuery(pid: String, date: DateTime): DBIO[Int] = {
sqlu"""
INSERT INTO radio.core_entity_popularity (pid, score, date)
VALUES($pid, 1, ${flooredSQLDateTimeString(date)}
) ON CONFLICT ON CONSTRAINT core_entity_popularity_pkey
DO UPDATE
SET score = core_entity_popularity.score + 1
"""
}
}
````
import slick.jdbc.PostgresProfile.api._
class SlickPopularBlurProgrammesClient[T](database: Database)(implicit
executionContext: ExecutionContext)
extends PopularBlurProgrammesQueryBuilder with
PopularBlurProgrammesClient[T] {
override def writeBlurIncrementedScore(pid: String, date: DateTime):
Future[Int] = {
database.run(incrementBlurScoreQuery(pid, date))
}
}
Expected result is that the exception is not thrown and the integration tests pass. Integration test:
val currentDate = dateTimeFormat.parseDateTime("2018-12-19 16:00:00")
client.writeBlurIncrementedScore("pid", currentDate)
whenReady(client.writeBlurIncrementedScore("pid", currentDate)) {
updatedRows =>
updatedRows must be equalTo 1
}
}
stack trace:
org.h2.jdbc.JdbcSQLException: Syntax error in SQL statement "
INSERT INTO radio.core_entity_popularity (pid, score, date)
VALUES(?, 1, ?
) ON[*] CONFLICT ON CONSTRAINT core_entity_popularity_pkey
DO UPDATE
SET score = core_entity_popularity.score + 1
"; SQL statement:
INSERT INTO radio.core_entity_popularity (pid, score, date)
VALUES(?, 1, ?
) ON CONFLICT ON CONSTRAINT core_entity_popularity_pkey
DO UPDATE
SET score = core_entity_popularity.score + 1
[42000-193]
org.h2.jdbc.JdbcSQLException: Syntax error in SQL statement "
INSERT INTO radio.core_entity_popularity (pid, score, date)
VALUES(?, 1, ?
) ON[*] CONFLICT ON CONSTRAINT core_entity_popularity_pkey
DO UPDATE
SET score = core_entity_popularity.score + 1
"; SQL statement:
INSERT INTO radio.core_entity_popularity (pid, score, date)
VALUES(?, 1, ?
) ON CONFLICT ON CONSTRAINT core_entity_popularity_pkey
DO UPDATE
SET score = core_entity_popularity.score + 1
[42000-193]
at org.h2.message.DbException.getJdbcSQLException(DbException.java:345)
at org.h2.message.DbException.get(DbException.java:179)
at org.h2.message.DbException.get(DbException.java:155)
at org.h2.message.DbException.getSyntaxError(DbException.java:191)
at org.h2.command.Parser.getSyntaxError(Parser.java:530)
at org.h2.command.Parser.prepareCommand(Parser.java:257)
at org.h2.engine.Session.prepareLocal(Session.java:561)
at org.h2.engine.Session.prepareCommand(Session.java:502)
at org.h2.jdbc.JdbcConnection.prepareCommand(JdbcConnection.java:1203)
at org.h2.jdbc.JdbcPreparedStatement.<init>(JdbcPreparedStatement.java:73)
at org.h2.jdbc.JdbcConnection.prepareStatement(JdbcConnection.java:287)
at slick.jdbc.JdbcBackend$SessionDef$class.prepareStatement(JdbcBackend.scala:336)
at slick.jdbc.JdbcBackend$BaseSession.prepareStatement(JdbcBackend.scala:448)
at slick.jdbc.StatementInvoker.results(StatementInvoker.scala:32)
at slick.jdbc.StatementInvoker.iteratorTo(StatementInvoker.scala:21)
at slick.jdbc.Invoker$class.first(Invoker.scala:30)
at slick.jdbc.StatementInvoker.first(StatementInvoker.scala:15)
at slick.jdbc.StreamingInvokerAction$HeadAction.run(StreamingInvokerAction.scala:52)
at slick.jdbc.StreamingInvokerAction$HeadAction.run(StreamingInvokerAction.scala:51)
at slick.basic.BasicBackend$DatabaseDef$$anon$2.liftedTree1$1(BasicBackend.scala:275)
at slick.basic.BasicBackend$DatabaseDef$$anon$2.run(BasicBackend.scala:275)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
The problem you are facing is that you are submitting a specific PostgreSQL query to an H2 database.
The INSERT syntax for PostgreSQL allow the key ON CONFLICT
[ WITH [ RECURSIVE ] with_query [, ...] ]
INSERT INTO table_name [ AS alias ] [ ( column_name [, ...] ) ]
{ DEFAULT VALUES | VALUES ( { expression | DEFAULT } [, ...] ) [, ...] | query }
[ ON CONFLICT [ conflict_target ] conflict_action ]
[ RETURNING * | output_expression [ [ AS ] output_name ] [, ...] ]
where conflict_target can be one of:
( { index_column_name | ( index_expression ) } [ COLLATE collation ] [ opclass ] [, ...] ) [ WHERE index_predicate ]
ON CONSTRAINT constraint_name
and conflict_action is one of:
DO NOTHING
DO UPDATE SET { column_name = { expression | DEFAULT } |
( column_name [, ...] ) = ( { expression | DEFAULT } [, ...] ) |
( column_name [, ...] ) = ( sub-SELECT )
} [, ...]
[ WHERE condition ]
from PostgreSQL docs
While the H2 INSERT syntax is
INSERT INTO tableName
{ [ ( columnName [,...] ) ]
{ VALUES
{ ( { DEFAULT | expression } [,...] ) } [,...] | [ DIRECT ] [ SORTED ] select } } |
{ SET { columnName = { DEFAULT | expression } } [,...] }
from H2 DB docs
The problem was that the postgreSQL is very strict on expressions and did not like the way the date was handled. It didn't explicitly know if the date value was a Timestamp so I had to explicitly call the postgreSQL function to_timestamp(text, text) and also add $ to use variables in the query.

Import csv file beginning on specific line number

I want to import a csv file into a table beginning on line 9 of the csv file. How do I specify this condition in postgresql?
The first 8 lines have a bunch of irrelevant text describing the data below. This is a screenshot of the file imported into Excel.
And this is the table in my db I am trying to insert the data into.
CREATE TABLE trader.weather
(
station text NOT NULL,
"timestamp" timestamp with time zone NOT NULL,
temp numeric(6,2),
wind numeric(6,2)
)
It can't be done on PostgreSQL, you should do it with an external tool or process before postgres.
According to the manual, the only processes you can do to a CSV are mostly QUOTE or NULL related:
COPY table_name [ ( column_name [, ...] ) ]
FROM { 'filename' | STDIN }
[ [ WITH ]
[ BINARY ]
[ OIDS ]
[ DELIMITER [ AS ] 'delimiter' ]
[ NULL [ AS ] 'null string' ]
[ CSV [ HEADER ]
[ QUOTE [ AS ] 'quote' ]
[ ESCAPE [ AS ] 'escape' ]
[ FORCE NOT NULL column_name [, ...] ] ] ]
COPY { table_name [ ( column_name [, ...] ) ] | ( query ) }
TO { 'filename' | STDOUT }
[ [ WITH ]
[ BINARY ]
[ OIDS ]
[ DELIMITER [ AS ] 'delimiter' ]
[ NULL [ AS ] 'null string' ]
[ CSV [ HEADER ]
[ QUOTE [ AS ] 'quote' ]
[ ESCAPE [ AS ] 'escape' ]
[ FORCE QUOTE { column_name [, ...] | * } ] ] ]
There are many ways to alter a CSV automatically before using it in PostgreSQL, you should check other options.
It can be done with Postgres, just not with COPY directly.
Use a temporary staging table like this:
CREATE TEMP TABLE target_tmp AS
TABLE target_tbl LIMIT 0; -- create temp table with same columns as target table
COPY target_tmp FROM '/absolute/path/to/file' (FORMAT csv);
INSERT INTO target_tbl
TABLE target_tmp
OFFSET 8; -- start with line 9
DROP TABLE target_tmp; -- optional, else it's dropped at end of session automatically
The skipped rows must be valid, too.
Obviously, this is more expensive - which should not matter much with small to medium tables. Matters with big tables. Then you really should trim the surplus rows in the input file before importing.
Make sure your temp_buffers setting is big enough to hold the temp table to minimize the performance penalty.
Related (with instructions for \copy without superuser privileges):
How to update selected rows with values from a CSV file in Postgres?

How to retrieve Postgresql Sequence-cache value from Postgresql Catalog tables?

I have used this below query to get the complete information of Sequence objects from the Postgresql catalog table
select s.sequence_name, s.start_value, s.minimum_value, s.maximum_value, s.increment, s.cycle_option
from information_schema.sequences s
where s.sequence_schema='schema1'
One more attribute value am not able to get is "Cache" value.
Am using Postgresql 9.2
Here is the DDL syntax for the sequence with cache,
ALTER SEQUENCE [ IF EXISTS ] name [ INCREMENT [ BY ] increment ]
[ MINVALUE minvalue | NO MINVALUE ] [ MAXVALUE maxvalue | NO MAXVALUE
]
[ START [ WITH ] start ]
[ RESTART [ [ WITH ] restart ] ]
[ CACHE cache ] [ [ NO ] CYCLE ]
[ OWNED BY { table_name.column_name | NONE } ]
Is there any Postgres functions to get this Sequence cache value ?
Thanks,
Ravi
With PostgreSQL 10 or newer, the cache size can be obtained from the system view pg_sequences or the system table pg_sequence:
SELECT cache_size FROM pg_catalog.pg_sequences
WHERE schemaname='public' and sequencename='s';
or alternatively
SELECT seqcache FROM pg_catalog.pg_sequence
WHERE seqrelid = 'public.s'::regclass;
Omit the schema qualification (public or more generally the name of the schema) in the 2nd query to use automatically search_path instead of a fixed schema.
With versions older than v10, you may query the sequence itself as if it was a table.
For example:
CREATE SEQUENCE s CACHE 10;
SELECT cache_value FROM s;
Result:
cache_value
-------------
10
Or
\x
SELECT * FROM s;
Result:
-[ RECORD 1 ]-+--------------------
sequence_name | s
last_value | 1
start_value | 1
increment_by | 1
max_value | 9223372036854775807
min_value | 1
cache_value | 10
log_cnt | 0
is_cycled | f
is_called | f
This no longer works in Postgres 10. You can use
select seqcache from pg_sequence where seqrelid = 's'::regclass;

How to create session level table in PostgreSQL?

I am working on an application using Spring, Hibernate, and PostgreSQL 9.1. The requirement is user can upload bulk data from the browser.
Now the data getting uploaded by each user is very crude and requires lots of validation before it can be put into the actual transaction table. I want a temporary table to be created whenever a user uploads; after data is successfully dumped into this temp table, I will call a procedure to perform the actual work of validating and taking the data from the temp table to the transaction table. If anywhere error is encountered then I will dump logs to any other table so the user can know the status of their upload from the browser.
In PostgreSQL do we have anything like temporary, session-level table?
From the 9.1 manual:
CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXISTS ] table_name ( [
{ column_name data_type [ COLLATE collation ] [ column_constraint [ ... ] ]
| table_constraint
| LIKE parent_table [ like_option ... ] }
[, ... ]
] )
[ INHERITS ( parent_table [, ... ] ) ]
[ WITH ( storage_parameter [= value] [, ... ] ) | WITH OIDS | WITHOUT OIDS ]
[ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
[ TABLESPACE tablespace ]
The key word here is TEMPORARY although it is not necessary to the table to be temporary. It could be a permanent table that you truncate before inserting. The whole operation (inserting and validating) would have to be wrapped in a transaction.