I have a webpage made in perl and dojo using a PostgreSQL database. I have to search for availale people in the database and since im from Denmark the letters æ,ø and å has to be available in the search. I thought this was standard when using UTF8 and when I normally program in php over mysql I didn't think it would be that hard.
I have done properly every trick I know to convert this search_word to the right encoding so i can search in the postgre sql database for correct names with æ,ø and å... but it still fails.
i have my perl code making the fetch but this fetch returns 0 rows and when i insert the same command in the psql terminal i get 46 rows returned (copy from "tail -f log terminal" the STDERR statement and inserts it into another terminal connected to the database through the psql command)... the perl code is:
sub dbSearchPersons {
my $search_word = escapeSql($_[0]);
$search_word = Encode::decode_utf8($search_word);
$statement = "SELECT id,name,initials,email FROM person WHERE name ilike '\%".$search_word."\%' OR email ilike '\%".$search_word."\%' OR initials ilike '\%".$search_word."\%' ORDER BY name ASC";
$sth = $dbh->prepare($statement);
$num_rows = $sth->execute();
print STDERR "Statement: " . $statement;
if($num_rows > 0){
$persons = $dbh->selectall_hashref($statement,'id');
}
dbFinish($sth);
webdie($DBI::errstr) if($DBI::errstr);
}
and as you can see i write the SQL statement to STDERR and which outputs the following:
[Fri Apr 27 11:24:26 2012] [error] [client 10.254.0.1] Statement: SELECT id,name,initials,email FROM person WHERE name ilike '%Jørgen%' OR email ilike '%Jørgen%' OR initials ilike '%Jørgen%' ORDER BY name ASC, referer: https://xx.xxx.xxx.xx/cgi-bin/users.cgi
The sql I correctly written (as i can see it through the terminal output above) and if I copy and paste the statement from the terminal and inserts it directly into the psql terminal, i get 46 rows returned as I should... But the perl still wont return any rows.
I don't get it? When formatting a string to display "ø" and not "ø" (as perl translates the UTF8 encoding to, from "J%C3%B8rgen" which gets send through dojo.xhr.post), should I not be able to use it in a SQL statement? Is it because the psql database can have a certain encoding i have to take that into account somehow? Or could it be some completely different?
Hope someone can help me. I have been struggling with this problem for two days now and since the things looks like they should, but don't work I get a little sad :/
Regards,
Thor Astrup Pedersen
You probably forgot to pg_enable_utf8. The database interface will return then Perl character data to you.
$ createdb -e -E UTF-8 -l en_US.UTF-8 -T template0 so10349280
CREATE DATABASE so10349280 ENCODING 'UTF-8' TEMPLATE template0 LC_COLLATE 'en_US.UTF-8' LC_CTYPE 'en_US.UTF-8';
$ echo 'create table person (id int, name varchar, initials varchar, email varchar)'|psql so10349280
CREATE TABLE
$ echo "insert into person (id, name) values (1, 'Jørgensen')"|psql so10349280
INSERT 0 1
$ echo 'select * from person'|psql so10349280
id | name | initials | email
----+-----------+----------+-------
1 | Jørgensen | |
$ perl -Mutf8 -Mstrictures -MDBI -MDevel::Peek -E'
my $dbh = DBI->connect(
"DBI:Pg:dbname=so10349280", $ENV{LOGNAME}, "", { RaiseError => 1, AutoCommit => 1, pg_enable_utf8 => 1}
);
my $r = $dbh->selectall_hashref("select * from person where name = ?", "id", undef, "Jørgensen");
Dump $r->{1}{name};
'
SV = PV(0x836e20) at 0xa58dc8
REFCNT = 1
FLAGS = (POK,pPOK,UTF8)
PV = 0xa5a000 "J\303\270rgensen"\0 [UTF8 "J\x{f8}rgensen"]
CUR = 10
LEN = 16
You don't say quite clear, I think you eventually intend to send out the character data as JSON for use with Dojo. You need to encode them into UTF-8 octets; the various JSON libaries take care of that automatically for you, no need to invoke Encode functions manually.
Related
I have a requirement to dump the contents of a definable selection of tables as CSV's for an initial load of systems that are not able to connect with PostgreSQL for various reasons.
I have written a script to do this which runs through a list of tables using psql with the -c flag to run psql's \COPY command to dump the corresponding table to a file like this:
COPY table_name TO table_name.csv WITH (FORMAT 'csv', HEADER, QUOTE '\"', DELIMITER '|');
It works fine. But I am sure you have already spotted the problem: as the process takes ~57 minutes for ~60 odd tables, the likelyhood of consistency is quite close to absolute zero.
I had a think about it and suspected I could make a few lightweight changes to pg_dump to do what I want, i.e., create multiple csv's from pg_dump whilst having a hope of integrity between the tables - and being able to specify parallel dumps too.
I have added a few flags to allow me to apply a file postfix (the date), set the format options and pass in a path for the relevant output file.
However my modified pg_dump was failing when writing to a file, like:
COPY table_name (pkey_id, field1, field2 ... fieldn) TO table_name.csv WITH (FORMAT 'csv', HEADER, QUOTE '"', DELIMITER '|')
Note: Within pg_dump, the column list is expanded
So I cast around for further information and found these COPY Tips.
It looks like writing to a file is a no-no over the network; however I am on the same machine (for now). I felt writing to /tmp would be OK as it is writable by anyone.
So I tried cheating with:
seingramp#seluonkeydb01:~$ ./tp_dump -a -t table_name -D /tmp/ -k "FORMAT 'csv', HEADER, QUOTE '\"', DELIMITER '|'" -K "_$DATE_POSTFIX"
tp_dump: warning: there are circular foreign-key constraints on this table:
tp_dump: table_name
tp_dump: You might not be able to restore the dump without using --disable-triggers or temporarily dropping the constraints.
tp_dump: Consider using a full dump instead of a --data-only dump to avoid this problem.
--
-- PostgreSQL database dump
--
-- Dumped from database version 12.3
-- Dumped by pg_dump version 14devel
SET statement_timeout = 0;
SET lock_timeout = 0;
SET idle_in_transaction_session_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SELECT pg_catalog.set_config('search_path', '', false);
SET check_function_bodies = false;
SET xmloption = content;
SET client_min_messages = warning;
SET row_security = off;
--
-- Data for Name: material_master; Type: TABLE DATA; Schema: mm; Owner: postgres
--
COPY table_name (pkey_id, field1, field2 ... fieldn) FROM stdin;
tp_dump: error: query failed:
tp_dump: error: query was: COPY table_name (pkey_id, field1, field2 ... fieldn) TO PROGRAM 'gzip > /tmp/table_name_20200814.csv.gz' WITH (FORMAT 'csv', HEADER, QUOTE '"', DELIMITER '|')
I have neutered the data as it is customer specific.
I didn't find pg_dump's error message very helpful, do you have any ideas as to what I am doing wrong?
The changes really are quite small (excuse the code!) starting ~line 1900, ignoring the flags added around getopt().
/*
* Use COPY (SELECT ...) TO when dumping a foreign table's data, and when
* a filter condition was specified. For other cases a simple COPY
* suffices.
*/
if (tdinfo->filtercond || tbinfo->relkind == RELKIND_FOREIGN_TABLE)
{
/* Note: this syntax is only supported in 8.2 and up */
appendPQExpBufferStr(q, "COPY (SELECT ");
/* klugery to get rid of parens in column list */
if (strlen(column_list) > 2)
{
appendPQExpBufferStr(q, column_list + 1);
q->data[q->len - 1] = ' ';
}
else
appendPQExpBufferStr(q, "* ");
if ( copy_from_spec )
{
if ( copy_from_postfix )
{
appendPQExpBuffer(q, "FROM %s %s) TO PROGRAM 'gzip > %s%s%s.csv.gz' WITH (%s)",
fmtQualifiedDumpable(tbinfo),
tdinfo->filtercond ? tdinfo->filtercond : "",
copy_from_dest ? copy_from_dest : "",
fmtQualifiedDumpable(tbinfo),
copy_from_postfix,
copy_from_spec);
}
else
{
appendPQExpBuffer(q, "FROM %s %s) TO PROGRAM 'gzip > %s%s.csv.gz' WITH (%s)",
fmtQualifiedDumpable(tbinfo),
tdinfo->filtercond ? tdinfo->filtercond : "",
copy_from_dest ? copy_from_dest : "",
fmtQualifiedDumpable(tbinfo),
copy_from_spec);
}
}
else
{
appendPQExpBuffer(q, "FROM %s %s) TO stdout;",
fmtQualifiedDumpable(tbinfo),
tdinfo->filtercond ? tdinfo->filtercond : "");
}
}
else
{
if ( copy_from_spec )
{
if ( copy_from_postfix )
{
appendPQExpBuffer(q, "COPY %s %s TO PROGRAM 'gzip > %s%s%s.csv.gz' WITH (%s)",
fmtQualifiedDumpable(tbinfo),
column_list,
copy_from_dest ? copy_from_dest : "",
fmtQualifiedDumpable(tbinfo),
copy_from_postfix,
copy_from_spec);
}
else
{
appendPQExpBuffer(q, "COPY %s %s TO PROGRAM 'gzip > %s%s.csv.gz' WITH (%s)",
fmtQualifiedDumpable(tbinfo),
column_list,
copy_from_dest ? copy_from_dest : "",
fmtQualifiedDumpable(tbinfo),
copy_from_spec);
}
}
else
{
appendPQExpBuffer(q, "COPY %s %s TO stdout;",
fmtQualifiedDumpable(tbinfo),
column_list);
}
I tried a couple of other cheats too, like specifying a directory owned by postgres. I know it's a quick hack but I hope you can help, and thanks for looking.
This is a use case for pg_restore -f.
So:
-- Create custom format dump file
pg_dump -d some_db -U some_user -Fc -f dump.out
-- Move that file to where you need it
-- Dump data only from named table to a file from the dump file.
pg_restore -a -t table_1 -f table_1_data.sql dump.out
The pg_dump will create a consistent snapshot of the tables, so you have the database in a 'frozen' state in dump.out. Then you can use pg_restore to 'thaw out' those parts you need on your schedule. By using -a you will get the COPY you want.
I'm a novice perl programmer trying to use DBI to write a buffer of text that contains an email with umlauts and other non-ASCII characters to a joomla database and having a problem.
DBD::mysql::st execute failed: Incorrect string value: '\xD6sterl...' for column `lsv5webstage`.`xuxgc_content`.`fulltext` at row 1 at /home/alerts/scripts_linstage/AdvisoryTest.pm line 373.
I'm not familiar enough with how encoding works to fully understand what the problem is. This is a fedora29 system with mariadb-10.3.12 and joomla-3.9.
Apparently the '\xD6' is an O with an umlaut in "Sebastian �sterlund". I read something about utf8 not being able to handle 4-char, but I don't fully understand.
I found the following reference online which talks about changing the encoding type from utf8 to utf8mb4, but the tables all appear to already be using that encoding:
> SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR
Variable_name LIKE 'collation%';
+--------------------------+--------------------+
| Variable_name | Value |
+--------------------------+--------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| collation_connection | utf8mb4_unicode_ci |
| collation_database | utf8mb4_unicode_ci |
| collation_server | utf8mb4_unicode_ci |
+--------------------------+--------------------+
I'm not sure it's helpful, but this is the insert statement I'm using in my perl code:
my $sql = <<EOF;
INSERT INTO xuxgc_content (title, alias, introtext, `fulltext`, state, catid, created, created_by, created_by_alias, modified, modified_by, checked_out, checked_out_time, publish_up, publish_down, images, urls, attribs, version, ordering, metakey, metadesc, metadata, access, hits, language)
VALUES ($title, "$title_alias", $introText, $fullText, $state, $catid, $created, $created_by, $created_by_alias, $modified, $modified_by, $checked_out, $checked_out_time, $publish_up, $publish_down, $images, $urls, $attribs, $version, $ordering, $metakey, $metadesc, $metadata, $access, $hits, $language);
EOF
my $sth = $dbh->prepare($sql);
$sth->execute();
db_disconnect($dbh);
The $fullText variable is populated from a buffer that contains the body of the email. I'm running it through quote() before performing the INSERT.
$fullText = $dbh->quote($fullText);
I also tried using "SET NAMES utf8mb4;INSERT INTO Mytable ...;" and it just didn't like the format.
Here's the full function that's used to connect to the database:
sub db_connect () {
my %DB = (
'host' => 'myhost',
'db' => 'mydb',
'user' => 'myuser',
'pass' => 'mypass',
);
return DBI->connect("DBI:mysql:database=$DB{'db'};host=$DB{'host'}", $DB{'user'}, $DB{'pass'}, { mysql_enable_utf8mb4 => 1 });
}
I don't recall having this problem in the past, and this script has been in use for quite a while.
D6 is hex for Ö in CHARACTER SET latin1 (and several others).
You have declared that your client uses UTF-8 (utf8mb4) encoding, so it spit at you.
Please provide SELECT HEX(col), col ... to see if the D6 got into the database (hence an insert problem) or something else (possibly a fetch/display problem).
Also, you have not quoted your $fulltext string, so you are likely to get all sorts of syntax errors.
Please don't blindly put strings into INSERT statements, but escape them as you put them in.
There may be some useful Perl hint in this:
use utf8;
use open ':std', ':encoding(UTF-8)';
my $dbh = DBI->connect("dbi:mysql:".$dsn, $user, $password, {
PrintError => 0,
RaiseError => 1,
mysql_enable_utf8 => 1, # Switch to UTF-8 for communication and decode.
});
# or {mysql_enable_utf8mb4 => 1} if using utf8mb4
And look for techniques for binding/quoting/escaping.
1.sqlite3
import sqlite3
con=sqlite3.connect("g:\\mytest1.db")
cur=con.cursor()
cur.execute('create table test (上市 TEXT)')
con.commit()
cur.close()
con.close()
I successfully create a test table mytest1.db ,and a chinese character name "上市" as field.
2.in mysql command console.
C:\Users\root>mysql -uroot -p
Welcome to the MySQL monitor. Commands end with ; or \g.
mysql> create database mytest2;
Query OK, 1 row affected (0.00 sec)
mysql> use mytest2;
Database changed
mysql> set names "gb2312";
Query OK, 0 rows affected (0.00 sec)
mysql> create table stock(上市 TEXT) ;
Query OK, 0 rows affected (0.07 sec)
The conclusion can be get : chinese characters can be used in mysql console.
3.pymysql
code31
import pymysql
con = pymysql.connect(host='127.0.0.1', port=3306, user='root', passwd='******')
cur=con.cursor()
cur.execute("create database if not exists mytest31")
cur.execute("use mytest31")
cur.execute('set names "gb2312" ')
cur.execute('create table stock(上市 TEXT) ')
con.commit()
code32
import pymysql
con = pymysql.connect(host='127.0.0.1', port=3306, user='root', passwd='******')
cur=con.cursor()
cur.execute("create database if not exists mytest32")
cur.execute("use mytest32")
cur.execute('set names "gb2312" ')
cur.execute('create table stock(上市 TEXT) ')
con.commit()
The same problem occurs
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 21-22: o rdinal not in range(256)
4.mysql-python-connect
code 41
import mysql.connector
config={'host':'127.0.0.1',
'user':'root',
'password':'123456',
'port':3306 ,
'charset':'utf8'
}
con=mysql.connector.connect(**config)
cur=con.cursor()
cur.execute("create database if not exists mytest41")
cur.execute("use mytest41")
cur.execute('set names "gb2312" ')
str='create table stock(上市 TEXT)'
cur.execute(str)
code 42
import mysql.connector
config={'host':'127.0.0.1',
'user':'root',
'password':'******',
'port':3306 ,
'charset':'utf8'
}
con=mysql.connector.connect(**config)
cur=con.cursor()
cur.execute("create database if not exists mytest42")
cur.execute("use mytest42")
cur.execute('set names "gb2312" ')
str='create table stock(上市.encode("utf-8") TEXT)'
cur.execute(str)
same errrors such as in pymysql.
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 22-23: o
rdinal not in range(256)
It is surely a bug in python mysql module that chinese characters can not be used as field name.
1.Chinese characters can be used as field name in python sqlite3 module.
2.Chinese characters can be used as field name in mysql console only if you 'set name "gb2312" '
pymysql.connect() accepts a charset argument. I have tested charset="utf8" and charset="gb2312" and both works (Python 3, PyMySQL 0.6.2). You don't need to use a "SET NAMES" query in this case.
import pymysql
con = pymysql.connect(host='127.0.0.1', port=3306,
user='root', passwd='******',
charset="utf8")
cur = con.cursor()
cur.execute("create database if not exists mytest31")
cur.execute("use mytest31")
cur.execute("create table stock(上市 TEXT)")
con.commit()
You're encodeing when you should decode. To convert a Chinese character to a unicode character, use:
"上市".decode("GB18030")
Which is an encoding generally used for Chinese chars. latin-1 will not work as most Chinese characters are not within its scope. The GB18030 encoding should work, but if not, there are a host of other encodings you can try, like gbk or big5_hkscs (generally for encodings done within HK/China).
Unicode errors are easy to spot, they show up as u'\ufffd' (which when encoded will be a diamond with a question mark in the middle).
I hope this was helpful!
Edit: I'm somewhat confused by your comment.
>>> print type("上市")
<type 'str'>
>>> print type("上市".decode("GB18030"))
<type 'unicode'>
str.decode() returns unicode.
I am writing a Perl script that is using the DBI module and is connecting to a Sybase DB. I am calling a stored procedure (one that I don't have access to so I cannot post sample code) and when I get data back I get an error that reads "error_handler: Data-conversion resulted in overflow". I still get data back and after doing some intensive research it seems that some data types in the columns (such as BigInt, nvarchar, etc) are the culprits. Now the question is, how can I fix this? Can this be fixed on the client side or can it only be fixed on the server side?
my $dbh = DBI->connect("DBI:Sybase:server=$server", $username, $password, {PrintError => 0}) or die;
$dbh->do("use $database") or die;
my $sql = &getQuery;
my $sth = $dbh->prepare($sql) or die;
$sth->execute() or die;
while ($rowRef = $sth->fetchrow_arrayref) #Error seems to occur here
{
#Parse through each row
}
Part of the FreeTDS 0.82 log that explains the problem:
_ct_bind_data(): column 7 is type 38 and has length 8
_ct_get_server_type(0)
_ct_get_client_type(type 38, user 0, size 8)
cs_convert(0x18dfed40, 0x7fff73216050, 0x18e44250, 0x7fff73215fa0, 0x18e387c0, 0x18e45a64)
_ct_get_server_type(30)
_ct_get_server_type(0)
converting type 127 (8 bytes) to type = 47 (9 bytes)
cs_convert() calling tds_convert
cs_convert() tds_convert returned 10
cs_prretcode(0)
cs_convert() returning CS_FAIL
cs_convert-result = 1
The problem is on the FreeTDS side. I've had the same problem before and successfully fixed it by converting the returned fields to varchar in the select statement.
Given you don't have access to modify the original query, you can do some regex search and replace on the returned $sql variable in your code. In particular, if the original query has a part that looks like
SELECT field1, field2, field3 FROM ...
After you retrieve the query statement, you may run
my $new_sql;
if ($sql =~ /SELECT\s+(.*)\s+FROM/i) { # match selected field string
my $field_str = $1;
my #fields = split ",", $field_str; # parse individual fields
map s/\s//g, #fields; # get rid of spaces
my $new_str = join ", ", (map {sprintf "convert(varchar, $_)"} #fields); # construct new query string
my $quoted_field_str = quotemeta($field_str); # prepare regex replacement string
$new_sql = $sql;
$new_sql =~ s/$quoted_field_str/$new_str/i # actual replacement
}
print $new_sql;
Of course, if your original statement is more complex, you should print it out and check how to modify it with a generic replacement bearing the same spirit. Alternatively, you can ask your DBA (or whoever has access to the stored procedure) to modify the actual query directly.
Hope this helps.
Could some tell me if there is a function which works the same as PHP's mysql_real_escape_string() for Perl from the DBI module?
You should use placeholders and bind values.
Don't. Escape. SQL.
Don't. Quote. SQL.
Use SQL placeholders/parameters (?). The structure of the SQL statement and the data values represented by the placeholders are sent to the database completely separately, so (barring a bug in the database engine or the DBD module) there is absolutely no way that the data values can be interpreted as SQL commands.
my $name = "Robert'); DROP TABLE Students; --";
my $sth = $dbh->prepare('SELECT id, age FROM Students WHERE name = ?');
$sth->execute($name); # Finds Little Bobby Tables without harming the db
As a side benefit, using placeholders is also more efficient if you re-use your SQL statement (it only needs to be prepared once) and no less efficient if you don't (if you don't call prepare explicitly, it still gets called implicitly before the query is executed).
Like quote?
I would also recommend reading the documentation for DBD::MySQL if you are worried about utf8.
From http://www.stonehenge.com/merlyn/UnixReview/col58.html :
use SQL::Abstract;
...
my $sqa = SQL::Abstract->new;
my ($owner, $account_type) = #_; # from inputs
my ($sql, #bind) = $sqa->select('account_data', # table
[qw(account_id balance)], # fields
{
account_owner => $owner,
account_type => $account_type
}, # "where"
);
my $sth = $dbh->prepare_cached($sql); # reuse SQL if we can
$sth->execute(#bind); # execute it for this query
Database Handle Method "quote"
my $dbh = DBI->connect( ... );
$sql = sprintf "SELECT foo FROM bar WHERE baz = %s",
$dbh->quote("Don't");
http://metacpan.org/pod/DBI#quote