I need to insert 1M rows of data to a table in postgres so i'm using postgres' copy from csv command. since the COPY needs a superuser account to work, i'm using the /copy instead.
here's my scala code:
val execCommand = Seq("psql", s"postgresql://$user:$pwd#$host:5432/$db", "-c", s"""\"\\copy $fullTableName (${columnsList}) from '${file.getAbsolutePath}' with delimiter ',' csv HEADER;\" """)
val result = execCommand.!!
println(result)
the command would look like this and works when run from my terminal:
psql postgresql://user:password#host:5432/db -c "\copy tableName (column1, column2, column3) from 'file_to_load.csv' with delimiter ',' csv HEADER;"
but when my code is run, it throws this error:
syntax error at or near ""\copy tableName (column1, column2, column3) with delimiter ',' csv HEADER;""
if i replace the command with a select query, it works fine. can someone help me identify the error on the \copy command? the syntax looks correct to me. maybe i'm missing out on something. i'm new to scala's process builder also so i also don't know if i need to fix the command. and if i do, how should i change this? thanks.
There is probably no need to run a psql command. You're usually better off using the corresponding JDBC API:
https://jdbc.postgresql.org/documentation/publicapi/org/postgresql/copy/CopyManager.html
posting how i fixed it using CopyManager. for documentation on this, see Matthias Berndt's comment.
object PgDbHandler {
def getConnection(db: ConnectionName, userName: String = user, pwd: String = password): Connection = {
Class.forName("org.postgresql.Driver")
DriverManager.getConnection(s"jdbc:postgresql://${db.sqlDns}/${db.databaseName}?user=$userName&password=$pwd&sslmode=require")
}
def copyFileToPg(file: File, fullTableName: String, columnsList: List[String]): Long = {
logger.info(s"Writing $file to postgres")
val conn = getConnection(db, user, pwd)
try{
val rowsInserted = new CopyManager(conn.asInstanceOf[BaseConnection])
.copyIn(s"COPY $fullTableName (${columnsList.mkString(",")}) FROM STDIN (DELIMITER ',',FORMAT csv, header true)",
new FileInputStream(file.getAbsolutePath))
logger.info(s"$rowsInserted row(s) inserted for file $file")
rowsInserted
}
finally
conn.close
}
}
Related
I'm trying to load data into postgresql from csv. During this process, I come across certain issue:-
here is my code:-
stmt1 = conPost.createStatement();
ResultSet rs = stmt1.executeQuery(sqlQuery);
ResultSetMetaData rsmd = rs.getMetaData();
int columnCount=rsmd.getColumnCount();
for(int i=1; i<=columnCount; i++) {
if(rsmd.getColumnTypeName(i) == "varchar" || rsmd.getColumnTypeName(i) == "text" || rsmd.getColumnTypeName(i) == "char") {
header.add(rsmd.getColumnName(i));
}
}
if(header.isEmpty()) {
copyQuery="COPY "+this.tableName+"("+columns+") FROM STDIN WITH DELIMITER '"+this.delimit+"' CSV HEADER";
} else {
strListString = header.toString();
strListString = strListString.replaceAll("\\[|\\]", "");
copyQuery="COPY "+this.tableName+"("+columns+") FROM STDIN WITH DELIMITER '"+this.delimit+"' CSV HEADER FORCE NULL " + strListString;
}
Previously, I come across issue that, NULL values in csv for varchar,text columns are as "" . So, I used FORCE NULL option on varchar and text type columns so that it will insert "" NULL only. Now it is working fine.
Now the new issue is:- if in my csv text data is like
"Hi, "vignesh". How do you do"
After loading into postgres:- it is like
Hi, vignesh. How do you do
I mean, suppose if inside the text data in csv contains double quotes, it is not inserting it as double quotes in postgres.
Is I'm missing something??? How can we overcome this??? Thanks in Advance
I would state that the problem is in your source string.
"Hi, "vignesh". How do you do"
Contains " also in the middle of the string, making it parsable incorrectly.
I would try to alter the string to be something similar to
"Hi, ""vignesh"". How do you do"
I tested the case with a file composed by
1,"Hi, ""vignesh"". How do you do"
And a table made of
create table testtable (id int, text varchar);
when executing the
copy testtable from 'data.csv' delimiter ',' CSV;
I end up with
id | text
----+------------------------------
1 | Hi, "vignesh". How do you do
I am facing challenges with below code to loop hive sql queries in spark.sql.
def missing_pks(query: String) = {
//println(f"spark.sql( $query )")
spark.sql(query)
}
var hql_query_list_df=spark.sql("select distinct hql_qry from table where msr_nm='orders' and rgn_src='europe'")
var hql= hql_query_list_df.select('hql_qry).as[String].collect()
var hql_f=hql_query_list_df.map( "\"" + _ + "\"" )
hql_f.foreach(missing_pks)
here I am calling hive sql statements from table and load them as list then try to execute, unfortunately its not working. not sure what missing in my code. Interesting part is if the list was created manually with in spark shell code is working perfectly. It would be great if someone help me here.
Can you please help me on below query?
php code://
$countdate='2017-01-03';
$countsql='SELECT rucid,"databaseType","countLoggedOn","prodCount","nprodCount","countType" FROM "ru_countLog" WHERE "countLoggedOn"=$countdate';
--> It's giving syntax error
syntax error at or near "$" LINE 1: ...untType" FROM "ru_countLog"
WHERE "countLoggedOn"=$countdate
Remove the internal double quotes from your query:
$countsql = "SELECT rucid, databaseType, countLoggedOn,
prodCount, nprodCount, countType
FROM ru_countLog
WHERE countLoggedOn = $countdate";
Note that this query is vulnerable to SQL injection. Consider parametrizing $countdate. With http://php.net/manual/en/function.pg-query-params.php, this would become
$countsql = 'SELECT rucid, databaseType, countLoggedOn,
prodCount, nprodCount, countType
FROM ru_countLog
WHERE countLoggedOn = $1';
$result = pg_query_params($dbconn, $countsql, array($countdate));
where $dbconn is your database connection
Maybe you should try like this
$countdate='2017-01-03';
$countsql='SELECT rucid,"databaseType","countLoggedOn","prodCount","nprodCount","countType" FROM "ru_countLog" WHERE "countLoggedOn"='.$countdate;
Hope this help
Hi I need to create a table in Phoenix from a spark job . I have tried 2 ways below but none of them work, seems this is still not supported.
1) Dataframe.write still requires that the tables exists previously
df.write.format("org.apache.phoenix.spark").mode("overwrite").option("table", schemaName.toUpperCase + "." + tableName.toUpperCase ).option("zkUrl", hbaseQuorum).save()
2) if we connect to phoenix thru JDBC, and try to execute the CREATE statemnt, then we get a parsing error (same create works in phoenix)
var ddlCode="create table test (mykey integer not null primary key, mycolumn varchar) "
val driver = "org.apache.phoenix.jdbc.PhoenixDriver"
val jdbcConnProps = new Properties()
jdbcConnProps.setProperty("driver", driver);
val jdbcConnString = "jdbc:phoenix:hostname:2181/hbase-unsecure"
sqlContext.read.jdbc(jdbcConnString, ddlCode, jdbcConnProps)
error:
org.apache.phoenix.exception.PhoenixParserException: ERROR 601 (42P00): Syntax error. Encountered "create" at line 1, column 15.
Anyone with similar challenges that managed to do it differently?
i have finally worked in a solution for this. Basically i think was wrong by trying to use SQLContext read method for this. I think this method is designed just to "read" data sources. The way to workaournd it has been basically to open a standard JDBC connection against Phoenix:
var ddlCode="create table test (mykey integer not null primary key, mycolumn varchar) "
val jdbcConnString = "jdbc:hostname:2181/hbase-unsecure"
val user="USER"
val pass="PASS"
var connection:Connection = null
Class.forName(driver)
connection = DriverManager.getConnection(jdbcConnString, user, pass)
val statement = connection.createStatement()
statement.executeUpdate(ddlCode)
We are using Scala Play, and I am trying to ensure that all SQL queries are using Anorm's String Interpolation. It works with some queries, but many are not actually replacing the variables before the query is executing.
import anorm.SQL
import anorm.SqlStringInterpolation
object SecureFile
{
val table = "secure_file"
val pk = "secure_file_idx"
...
// This method works exactly as I would hope
def insert(secureFile: SecureFile): Option[Long] = {
DBExec { implicit connection =>
SQL"""
INSERT INTO secure_file (
subscriber_idx,
mime_type,
file_size_bytes,
portal_msg_idx
) VALUES (
${secureFile.subscriberIdx},
${secureFile.mimeType},
${secureFile.fileSizeBytes},
${secureFile.portalMsgIdx}
)
""" executeInsert()
}
}
def delete(secureFileIdx: Long): Int = {
DBExec { implicit connection =>
// Prints correct values
println(s"table: ${table} pk: ${pk} secureFileIdx: ${secureFileIdx} ")
// Does not work
SQL"""
DELETE FROM $table WHERE ${pk} = ${secureFileIdx}
""".executeUpdate()
// Works, but unsafe
val query = s"DELETE FROM ${table} WHERE ${pk} = ${secureFileIdx}"
SQL(query).executeUpdate()
}
}
....
}
Over in the PostgreSQL logs, it's clear that the delete statement has not acquired the correct values:
2015-01-09 17:23:03 MST ERROR: syntax error at or near "$1" at character 23
2015-01-09 17:23:03 MST STATEMENT: DELETE FROM $1 WHERE $2 = $3
2015-01-09 17:23:03 MST LOG: execute S_1: ROLLBACK
I've tried many variations of execute, executeUpdate, and executeQuery with similar results. For the moment, we are using basic string replacement but of course this is bad because it's not using PreparedStatements.
For anyone else sitting on this page scratching their head and wondering what they might be missing...
SQL("select * from mytable where id = $id")
is NOT the same as
SQL"select * from mytable where id = $id"
The former does not do String interpolation whereas the latter does.
This is easily overlooked in the aforementioned docs as all the samples provided just happen to have a (non-related) closing parenthesis on them (like this sentence does)
Anorm String interpolation was introduced to pass parameter (e.g. SQL"Select * From Test Where id = $x), with interpolation arguments (e.g. $x) set on underlying PreparedStament according proper type conversion (see use cases on https://www.playframework.com/documentation/2.3.x/ScalaAnorm ).
Next Anorm release will also have the #$foo syntax to mix interpolation for parameter with standard string interpolation. This will allow to write DELETE FROM #$table WHERE #${pk} = ${secureFileIdx} and having it executed as DELETE FROM foo WHERE bar = ? (if literal table is "foo" and pk is "bar"), with literal secureFileIdx passed as parameter. See related pull request.
Until next revision is release, you can build Anorm from its master sources ti benefit from this change.