Problem with the deprecation of the postgresql XML2 module 'xml_is_well_formed' function - postgresql

We need to make extensive use of the 'xml_is_well_formed' function provided by the XML2 module.
Yet the documentation says that the xml2 module will be deprecated since "XML syntax checking and XPath queries"
is covered by the XML-related functionality based on the SQL/XML standard in the core server from PostgreSQL 8.3 onwards.
However, the core function XMLPARSE does not provide equivalent functionality since when it detects an invalid XML document,
it throws an error rather than returning a truth value (which is what we need and currently have with the 'xml_is_well_formed' function).
For example:
select xml_is_well_formed('<br></br2>');
xml_is_well_formed
--------------------
f
(1 row)
select XMLPARSE( DOCUMENT '<br></br2>' );
ERROR: invalid XML document
DETAIL: Entity: line 1: parser error : expected '>'
<br></br2>
^
Entity: line 1: parser error : Extra content at the end of the document
<br></br2>
^
Is there some way to use the new, core XML functionality to simply return a truth value
in the way that we need?.
Thanks,
-- Mike Berrow

After asking about this on the pgsql-hackers e-mail list, I am happy to report that the guys there agreed that it was still needed and they have now moved this function to the core.
See:
http://web.archiveorange.com/archive/v/alpsnGpFlZa76Oz8DjLs
and
http://postgresql.1045698.n5.nabble.com/review-xml-is-well-formed-td2258322.html

Related

How to use Postgresql ts_delete function

I am trying to use Postgresql Full Text Search. I read that the stop words (words ignored for indexing) are implemented via dictionary. But I would like to give the user a limited control over the stop words (insert new ones), so I grouped then in a table.
From the example below:
select strip(to_tsvector('simple', texto)) from longtxts where id = 23;
I can get the vector:
{'alta' 'aluno' 'cada' 'do' 'em' 'leia' 'livro' 'pedir' 'que' 'trecho' 'um' 'voz'}
And now I would like to remove the elements from the stopwords table:
select array(select palavra_proibida from stopwords);
That returns the array:
{a,as,ao,aos,com,default,e,eu,o,os,da,das,de,do,dos,em,lhe,na,nao,nas,no,nos,ou,por,para,pra,que,sem,se,um,uma}
Then, following documentation:
ts_delete(vector tsvector, lexemes text[]) tsvector remove any occurrence of lexemes in lexemes from vector ts_delete('fat:2,4 cat:3 rat:5A'::tsvector, ARRAY['fat','rat'])
I tried a lot. For example:
select ts_delete((select strip(to_tsvector('simple', texto)) from longtxts where id = 23), array[(select palavra_proibida from stopwords)]);
But I always receive the error:
ERROR: function ts_delete(tsvector, character varying[]) does not exist
LINE 1: select ts_delete((select strip(to_tsvector('simple', texto))...
^
HINT: No function matches the given name and argument types. You might need to add explicit type casts.
Could anyone help me? Thanks in advance!
ts_delete was introduced in PostgreSQL 9.6. Based on the error message, you're using an earlier version. You may try select version(); to be sure.
When you land on the PostgreSQL online documentation with a web search, it may correspond to any version. The version is in the URL and there's a "This page in another version" set of links at the top of each page to help switching to the equivalent doc for a different version.

How to insert similar value into multiple locations of a psycopg2 query statement using dict? [duplicate]

I have a Python script that runs a pgSQL file through SQLAlchemy's connection.execute function. Here's the block of code in Python:
results = pg_conn.execute(sql_cmd, beg_date = datetime.date(2015,4,1), end_date = datetime.date(2015,4,30))
And here's one of the areas where the variable gets inputted in my SQL:
WHERE
( dv.date >= %(beg_date)s AND
dv.date <= %(end_date)s)
When I run this, I get a cryptic python error:
sqlalchemy.exc.ProgrammingError: (psycopg2.ProgrammingError) argument formats can't be mixed
…followed by a huge dump of the offending SQL query. I've run this exact code with the same variable convention before. Why isn't it working this time?
I encountered a similar issue as Nikhil. I have a query with LIKE clauses which worked until I modified it to include a bind variable, at which point I received the following error:
DatabaseError: Execution failed on sql '...': argument formats can't be mixed
The solution is not to give up on the LIKE clause. That would be pretty crazy if psycopg2 simply didn't permit LIKE clauses. Rather, we can escape the literal % with %%. For example, the following query:
SELECT *
FROM people
WHERE start_date > %(beg_date)s
AND name LIKE 'John%';
would need to be modified to:
SELECT *
FROM people
WHERE start_date > %(beg_date)s
AND name LIKE 'John%%';
More details in the pscopg2 docs: http://initd.org/psycopg/docs/usage.html#passing-parameters-to-sql-queries
As it turned out, I had used a SQL LIKE operator in the new SQL query, and the % operand was messing with Python's escaping capability. For instance:
dv.device LIKE 'iPhone%' or
dv.device LIKE '%Phone'
Another answer offered a way to un-escape and re-escape, which I felt would add unnecessary complexity to otherwise simple code. Instead, I used pgSQL's ability to handle regex to modify the SQL query itself. This changed the above portion of the query to:
dv.device ~ E'iPhone.*' or
dv.device ~ E'.*Phone$'
So for others: you may need to change your LIKE operators to regex '~' to get it to work. Just remember that it'll be WAY slower for large queries. (More info here.)
For me it's turn out I have % in sql comment
/* Any future change in the testing size will not require
a change here... even if we do a 100% test
*/
This works fine:
/* Any future change in the testing size will not require
a change here... even if we do a 100pct test
*/

Syntax error on DB2 XMLELEMENT

I get this error when trying out this command in the BIRT Classic Models sample database in Data Studio
select xmlelement(name "custno", customers.customernumber) from customers
Syntax error: Encountered "\"custno\"" at line 1, column 24.
I do not know how to correct it.
Thanks.
I'm not familiar with db2, but according to this your statement looks quite alrigth (although I'd place an alias to name this field...)
But this
Syntax error: Encountered "\"custno\"" at line 1, column 24.
seems to be a quite clear hint, that your error is connected to the NAME of the element.
I'm pretty sure, that this statement was created on string level.
Did you try to escape the "-characters with \"?
The SQL reaching the engine might look like
select xmlelement(name \"custno\", customers.customernumber) from customers
or
select xmlelement(name "\"custno"\", customers.customernumber) from customers
... which is wrong of course...
But to be honest: just guessing...

How to get the sql state from libpq?

I program with libpq.so. I want to get the error code which is called sql state in SQL Standard.How should I get this in my c code?
The obvious Google search for libpq get sqlstate finds the libpq-exec documentation. Searching that for SQLSTATE finds PG_DIAG_SQLSTATE in the PQresultErrorField section.
Thus, you can see that you can call PQresultErrorField(thePgResult, PG_DIAG_SQLSTATE) to get the SQLSTATE.
This is just addition I can not leave in form of comment due to reputation reasons.
Note that you can not portably get SQLSTATE for errors that can occur during PQconnectdb. In theory, you can read pg_conn (internal struct) field last_sqlstate which contains correct value.
For example, if you try to connect with invalid login/password, it will give you 28P01. For wrong database it will contain 3D000.
I wish they defined publically available getter for this field.
You can check this one as well:
libpq: How to get the error code after a failed PGconn connection

Special character handling when fetching data from MS SQL Server using Perl DBD

I have an MS SQL Server 2008 Database, from which I am fetching data using perl DBD::Sybase module. But there are some special characters in the DB, like the Copyright symbol, Trademark symbol etc., which are not getting imported properly. Perl seems to change all of these special characters to a Question mark character. Is there a way to fix this?
I have tried specifying charset=utf8 in the connection string. The doc mentions a syb_enable_utf8 (bool) setting, but whenever I try that, I get an error:
Can't locate object method "syb_enable_utf8" via package "DBI::db"
One solution I found was this:
use Encode qw(encode_utf8);
Then, wherever you are writing data to a file or anywhere else, use Encode::encode_utf8($data);
where $data is the column/value which you have fetched from MSSQL.
I don't use DBD::Sybase but a) I use a lot of other DBDs and b) I am currently collecting information about unicode support in DBDs. According to the pod you need at least OpenClient 15.x when using syb_enable_utf8. Are you using 15.x or later? Perhaps syb_enable_utf8 is not defined if your client is less than 15.x or perhaps you have too old a version of DBD::Sybase. Unfortunately I cannot see from the Changes file when syb_enable_utf8 was added.
However, when you say "can't locate method" I think that is a clue as syb_enable_utf8 is not a method, it is an attribute (it is under Sybase Specific Attributes) in the pod. So you need to add it to your connect call or set it via a connection handle like this:
my $h = DBI->connect("dbi:Sybase:something","user","password", {syb_enable_utf8 => 1});
or
$h->{syb_enable_utf8} = 1;
You should also read the bits in the pod describing what happens when syb_enable_utf8 is set as it appears from the documents it only applies to UNIVARCHAR, UNICHAR, and UNITEXT columns.
Lastly, you need to ensure you insert the data correctly in the first place. I'd guess if it is not inserted from Perl with syb_enable_utf8 and charset=utf8 and your data is not proper unicode characters in Perl before you insert you'll get garbage back.
The comment Raze2dust made had nothing to do with your issue but is worth heeding if you are going to write the data retrieved from your database elsewhere. Just remember to decode any data input to your script and encode any data output.