One of the products I write software for is an accounting type application. It is written in C++, uses C++ Builder and VCL controls, connects to PostgreSQL database running on Linux.
The PostgreSQL database is currently at version 8.4.x. We use UTF8 encoding. Everything works pretty good.
We are running tests of our software against PostgreSQL v9.2.3 with exact same encoding and are finding a problem in which all our text editing inputs are replacing multiple lines with \r\n characters.
So for example, you enter 3 lines of text and hit enter key after each line then save it and read it back, I get one line with the line ending characters removed. When we fetch the data from the database, we wind up with one line like so: line1\r\nline2\r\nline3\r\n where "\r\n" is displayed instead of getting 0x0A, 0x0D in the stream.
Our application is not Unicode aware. Borland's AnsiString. (In the process of migrating this app. to C++ Builder XE). Does anyone know what might be causing this or offer some things to try to fix this in the current code base while the larger conversion is underway?
I've tried the Borland DBText and DBRichText controls and they both do the same thing.
The other point I should mention is we only tested against new PostgreSQL on the server and are still using a 8.x PostgreSQL client library (psql.lib). So the client and server version aren't exactly at the same level but I don't suspect this is an issue but any insight certainly welcome.
UPDATE:
Here are some command line results from the two versions of PostgreSQL.
Version 9.2.3
testdb=# select * from notes where oid=5146352;
docid | docno | username | created | followup | reminder | subject | comments
-------+----------+----------+-------------------------------+----------+----------+-----------+-----------------------------
3001 | 11579522 | eric | 2013-02-15 22:38:24.136517+00 | f | f | Test Note | line1\r\nline2\r\nline3\r\n
Version 8.4.8
testdb=# select * from notes where oid=16490575;
docid | docno | username | created | followup | reminder | subject | comments
-------+----------+----------+------------------------------+----------+----------+--------------+----------
3001 | 11579522 | eric | 2013-02-18 20:15:23.10943-05 | f | f | <> | line1\r
: line2\r
: line3\r
:
Not sure how to format this for SO, but in the 8.4.8 command line output, I have 3 new lines printed on the screen where as the 9.2.3 version concatenates the output.
The insert for both databases is the same client. So something changed in the way PostgreSQL handles new line characters and I'm wondering if there is a config setting to revert the old behavior or something I can do within my select statement to get the old behavior back.
8.4 has standard_conforming_strings set to off by default, and 9.2 has it on by default.
When it's off, in a literal string, '\n' means a newline as in the C language, whereas when it's on, it means a backslash character followed by the character n.
To go back to the 8.4 behavior, you may issue SET standard_conforming_strings=off inside your sessions
or
ALTER DATABASE yourdb SET standard_conforming_strings=off;
for it to persist and be the default for new connections to this database.
Long term it's recommended to adapt your code to deal with standard_conforming_strings to on since it's the way forward.
Your problem looks like something to do with postgres config variable standard_conforming_strings. Before Postgres 9.1, this was turned off by default. Thats why postgres did not treat backslashes literally but interpreted them. But According to SQL standard, backslashes should be treated literally. So, from postgres 9.1, this config variable has been turned on and you see your \r\n as literal instead of interpretations.
Although this i not the right approach, to make it work in your case, you need to edit your server's configuration file(postgresql.conf) and turn off this setting(standard_conforming_strings=on)
Related
Edit after asked to better specify my need:
TL;DR: How to show whitespace escaped characters (such as /r) in the Athena console when performing a query? So this: "abcdef/r" instead of this "abcdef ".
I have a dataset with a column that contains some strings of variable length, all of them with a trailing whitespace.
Now, since I had analyzed this data before, using python, I know that this whitespace is a \r; however, if in Athena I SELECT my_column, it obviously doesn't show the escaped whitespace.
Essentially, what I'm trying to achieve:
my_column | ..
----------+--------
abcdef\r | ..
ghijkl\r | ..
What I'm getting instead:
my_column | ..
----------+--------
abcdef | ..
ghijkl | ..
If you're asking why would I want that, it's just to avoid having to parse this data through python if I ever incur in this situation again, so that I can immediately know if there's any weird escaped characters in my strings.
Any help is much appreciated.
I am using the Org mode that comes with Emacs 24.3, and I am having an issue that when Org creates a table from the result of a code block it is replacing characters like '-' and '.' with 0 (integer zero). Then when I pass the table to another code block that's expecting a column of strings I get type errors etc.
I haven't been able to find anything useful, as it seems to be practically un-Googleable. Has anyone had the same problem? If I update to the latest version of org-mode, will that fix it?
EDIT:
I updated to Org 8.2 and this problem seems to have gone away. Now I have another (related) problem, where returning a table with a cell containing a string consisting of one double quote character ('"' in python) messes something up; Org added 2 extra columns to the table, one had something like
(quote (quote ) ())
in it. The reason my tables have things like this in them is that I'm working with part-of-speech tags from natural language data.
It's pretty obvious Org is doing some stuff to try to interpret the table contents, and not dealing well with meta characters. Technically I think these are bugs where Org should be dealing better with unexpected input.
EDIT 2:
Here is a minimal reproduction with Org 7.9.3f (system Python is 3.4):
#+TBLNAME: table
| DT | The |
| . | . |
| - | - |
#+BEGIN_SRC python :var table=table
return table
#+END_SRC
#+RESULTS:
| DT | The |
| 0 | 0 |
| 0 | 0 |
Incidentally, Org does not like the '"' character at all, in tables or in code blocks (I just get a "End of file during parsing" message when the above table has a cell with just '"' in it). It's probably just better to avoid it altogether, so I think my problem is solved. If nobody wants to add anything, I'll answer this myself in a day or so.
I have this text (taken from concatenated field row)
Astronomic Event 2013/1434H - Aceh ....
How do We search it by 2013 or 1434h keywords?
I have tried below code but it resulting no row.
to_tsvector result:
'2013/1434h':8,12 'aceh':1 'bin.....
Sample Case:
WITH sample_table as
(SELECT to_tsvector('Astronomic Event 2013/1434H - Aceh') sample_content)
SELECT *
FROM sample_table, to_tsquery('2013') q
WHERE sample_content ## q
How do We search it by 2013 or 1434h keywords?
It seems like you want to replace:
to_tsquery('1434h') q
with:
to_tsquery('1434h | 2013') q
http://www.postgresql.org/docs/current/static/functions-textsearch.html
Side note: the to_tsquery() syntax is extremely capricious. It doesn't allow for much if any fantasy, and many of the assumptions in Postgres are everything but end-user friendly.
More often than not, you'll be better off using plainto_tsquery(), which allows any amount of garbage to be thrown at it. Thus, consider pre-processing the string before issuing the query. For instance, you could split the string, and OR the original parts together:
where sc.text_index ## (plainto_tsquery('1434h') || plainto_tsquery('2013'))
Doing so will make your code a bit more complex, but it won't rely on your users needing to understand that (contrary to what they're accustomed to in Google) they should enter 'quick & brown & fox & jumps & lazy & dog' instead of plain 'The quick brown fox jumps over the lazy dog'.
Edit: I ended up actually trying your sample query, and it seems you're actually running into a parser issue:
# SELECT alias, description, token FROM ts_debug('Astronomic Event 2013/1434H - Aceh');
alias | description | token
-----------+-------------------+------------
asciiword | Word, all ASCII | Astronomic
blank | Space symbols |
asciiword | Word, all ASCII | Event
blank | Space symbols |
file | File or path name | 2013/1434H
blank | Space symbols |
blank | Space symbols | -
asciiword | Word, all ASCII | Aceh
(8 rows)
http://www.postgresql.org/docs/current/static/textsearch-parsers.html
It looks like you might need to write (or find) and configure an app-specific parser. Having never done this personally, the best I can do is to highlight that Postgres allows this and includes a sample:
http://www.postgresql.org/docs/current/static/test-parser.html
Alternatively, change your tsvector-related trigger so that it matches e.g. \d{4}/\d+[a-zA-Z] or whatever seems most appropriate, and adds spaces accordingly, before converting it to a tsvector. Something as simple as the following might do the trick if you never need to store file names:
SELECT alias, description, token
FROM ts_debug(replace('Astronomic Event 2013/1434H - Aceh', '/', ' / '));
I have been searching but so far I only found how to insert date into tables based on a csv files.
I have the following scenario:
Directory name = ticketID
Inside this directory I have a couple of files, like:
Description.txt
Summary.txt - Contains ticket header and has been imported succefully.
Progress_#.txt - this is everytime a ticket gets udpdated. I get a new file.
Solution.txt
Importing the Issue.txt was easy since this was actually a CSV.
Now my problem is with Description and Progress files.
I need to update the existing rows with the data from this files. Something on the line of
update table_ticket set table_ticket.description = Description.txt where ticket_number = directoryname
I'm using PostgreSQL and the COPY command is valid for new data and it would still fail due to the ',;/ special chars.
I wanted to do this using bash script, but it seem that it is it won't be possible:
for i in `find . -type d`
do
update table_ticket
set table_ticket.description = $i/Description.txt
where ticket_number = $i
done
Of course the above code would take into consideration connection to the database.
Anyone has a idea on how I could achieve this using shell script. Or would it be better to just make something in Java and read and update the record, although I would like to avoid this approach.
Thanks
Alex
Thanks for the answer, but I came across this:
psql -U dbuser -h dbhost db
\set content = `cat PATH/Description.txt`
update table_ticket set description = :'content' where ticketnr = TICKETNR;
Putting this into a simple script I created the following:
#!/bin/bash
for i in `find . -type d|grep ^./CS`
do
p=`echo $i|cut -b3-12 -`
echo $p
sed s/PATH/${p}/g cmd.sql > cmd.tmp.sql
ticketnr=`echo $p|cut -b5-10 -`
sed -i s/TICKETNR/${ticketnr}/g cmd.tmp.sql
cat cmd.tmp.sql
psql -U supportAdmin -h localhost supportdb -f cmd.tmp.sql
done
The downside is that it will create always a new connection, later I'll change to create a single file
But it does exactly what I was looking for, putting the contents inside a single column.
psql can't read the file in for you directly unless you intend to store it as a large object in which case you can use lo_import. See the psql command \lo_import.
Update: #AlexandreAlves points out that you can actually slurp file content in using
\set myvar = `cat somefile`
then reference it as a psql variable with :'myvar'. Handy.
While it's possible to read the file in using the shell and feed it to psql it's going to be awkward at best as the shell offers neither a native PostgreSQL database driver with parameterised query support nor any text escaping functions. You'd have to roll your own string escaping.
Even then, you need to know that the text encoding of the input file is valid for your client_encoding otherwise you'll insert garbage and/or get errors. It quickly lands up being easier to do it in a langage with proper integration with PostgreSQL like Python, Perl, Ruby or Java.
There is a way to do what you want in bash if you really must, though: use Pg's delimited dollar quoting with a randomized delimiter to help prevent SQL injection attacks. It's not perfect but it's pretty darn close. I'm writing an example now.
Given problematic file:
$ cat > difficult.txt <__END__
Shell metacharacters like: $!(){}*?"'
SQL-significant characters like "'()
__END__
and sample table:
psql -c 'CREATE TABLE testfile(filecontent text not null);'
You can:
#!/bin/bash
filetoread=$1
sep=$(printf '%04x%04x\n' $RANDOM $RANDOM)
psql <<__END__
INSERT INTO testfile(filecontent) VALUES (
\$x${sep}\$$(cat ${filetoread})\$x${sep}\$
);
__END__
This could be a little hard to read and the random string generation is bash specific, though I'm sure there are probably portable approaches.
A random tag string consisting of alphanumeric characters (I used hex for convenience) is generated and stored in seq.
psql is then invoked with a here-document tag that isn't quoted. The lack of quoting is important, as <<'__END__' would tell bash not to interpret shell metacharacters within the string, wheras plain <<__END__ allows the shell to interpret them. We need the shell to interpret metacharacters as we need to substitute sep into the here document and also need to use $(...) (equivalent to backticks) to insert the file text. The x before each substitution of seq is there because here-document tags must be valid PostgreSQL identifiers so they must start with a letter not a number. There's an escaped dollar sign at the start and end of each tag because PostgreSQL dollar quotes are of the form $taghere$quoted text$taghere$.
So when the script is invoked as bash testscript.sh difficult.txt the here document lands up expanding into something like:
INSERT INTO testfile(filecontent) VALUES (
$x0a305c82$Shell metacharacters like: $!(){}*?"'
SQL-significant characters like "'()$x0a305c82$
);
where the tags vary each time, making SQL injection exploits that rely on prematurely ending the quoting difficult.
I still advise you to use a real scripting language, but this shows that it is indeed possible.
The best thing to do is to create a temporary table, COPY those from the files in question, and then run your updates.
Your secondary option would be to create a function in a language like pl/perlu and do this in the stored procedure, but you will lose a lot of performance optimizations that you can do when you update from a temp table.
I need to get pages count from word documents. I've tested many libraries and scripts (apache poi, perl scripts, some application for linux and some more) and the only working solution was to install Microsoft Office with Wine and access OLE with perl. I've managed to do it but it seems I can't use it on server due to licensing problems...
The problem with apachepoi and other solutions providing access to word documents info is related to incompleteness of some docs. pageCount property in document summary is sometimes missing (it's often case with odt documents saved as doc and older docs).
Is there any way to actually count pages (not only get info from summary) without installing Microsoft Office on server?
I was going to say wvSummary, but I think this uses the metadata you're referring to. I'm not sure there is a way to get the page count without actually laying out the document. So you might have to resort to using APIs to drive a real Office-compatible application like OpenOffice or AbiWord.
If you trust the document summary, instead of using wvSummary, you can just open the file and do a Regex search for "nofpages(\d+)". Groups[1] will contain the number of pages.
Since Word always saves the summary when it saves, I think this is pretty safe if you know the document was last saved with Word, which in my experience is 99% of the time.
This is a version that also gets the page count from the document summary. I've added it late because MS Word has been through a number of updates since the question was asked.
The environment in which the following works is:
GNU bash, version 5.1.0(1)-release (x86_64-redhat-linux-gnu)
MS Word version 12 (2007) and version 16 (2016 - 2021)
It does not work for MS Word 9 (2000), and I assume earlier. I've also not tested the code on other shells.
DOCUMENT=<YourDocumentName>
PROPS=`unzip -c "$DOCUMENT" docProps/app.xml | tail -1`
NUMBER_OF_PAGES=`sed -e 's/.*\(<Pages>[0-9]*<\/Pages>\).*/\1/' <<< $PROPS | cut -d'>' -f2 | cut -d'<' -f1`
The PROPS variable is used so that you might get the number of lines or words without reading the entire file again.
NUMBER_OF_LINES=`sed -e 's/.*\(<Lines>[0-9]*<\/Lines>\).*/\1/' <<< $PROPS | cut -d'>' -f2 | cut -d'<' -f1`
NUMBER_OF_WORDS=`sed -e 's/.*\(<Words>[0-9]*<\/Words>\).*/\1/' <<< $PROPS | cut -d'>' -f2 | cut -d'<' -f1`
You can view the other properties with:
echo $PROPS