marklogic encoding xdmp:document-load

marklogic encoding xdmp:document-load - encoding

I have noticed that utf-8 xml documents loaded (xdmp:document-get() + xdmp:document-insert()) into our development marklogic server (7.0-6.8) have ascii encoding. Meanwhile back on production server (7.0-5.1), there is no problem; utf-8 is loaded as utf-8. I traced the problem and found it to be caused by xdmp:document-get().
So I wrote the following code snippet and ran it on both server consoles and got incorrect encoding on the development server and correct encoding on production.
let $options := <options xmlns="xdmp:document-get">
<repair>full</repair>
<encoding>UTF-8</encoding>
<format>xml</format>
</options>
let $url := "http://******/ref_batches/electronic/20170801_e31_004 /201731780-004.xml"
return xdmp:document-get($url, $options)
My initial guess: different version numbers may have caused this. So I tested on a local server (7.0-6-12) and got correct utf-8 encoding. Later we upgraded our development server to (7.0-6-12) and re-tested to get incorrect encoding (ascii)
Is there some marklogic configurations that are responsible for this trans-coding?
thanks

Related

processing an XSL-FO file with FOP in eXist-db exits with "permission denied" (only on Linux)

I'll start by stating the context this question is based on: I'm running eXist-4.7.1 in a Tomcat container, and am trying to specify fonts in a configuration file for a PDF transformation using FOP (eXist-4.7.1 ships with FOP version 2.3).
The good news: it seems that some progress has been made since earlier reports on font configuration on the eXist-open mailing list (https://markmail.org/message/so43jgratswpu4dz), and I'm now able to load fonts via the http:// protocol. Here is a self-contained XQuery example (which can be stored in and run from the db):
xquery version "3.1";
import module namespace xslfo="http://exist-db.org/xquery/xslfo";
let $fo :=
<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">
<fo:layout-master-set>
<fo:simple-page-master master-name="my_page" margin="0.5in">
<fo:region-body/>
</fo:simple-page-master>
</fo:layout-master-set>
<fo:page-sequence master-reference="my_page">
<fo:flow flow-name="xsl-region-body">
<fo:block font-family="urbanist">Hello world!</fo:block>
</fo:flow>
</fo:page-sequence>
</fo:root>
let $fop.config :=
<fop version="1.0">
<use-cache>false</use-cache>
<renderers>
<renderer mime="application/pdf">
<fonts>
<font kerning="yes" embed-url="https://github.com/coreywho/Urbanist/raw/master/fonts/static/Urbanist-Black.ttf">
<font-triplet name="urbanist" style="normal" weight="normal"/>
</font>
</fonts>
</renderer>
</renderers>
</fop>
let $pdf := xslfo:render($fo, "application/pdf", (), ($fop.config))
return response:stream-binary($pdf, "application/pdf", "output.pdf")
The even better news: this is working without problems on my Windows box, where it produces a PDF document looking as follows (with the right font):
The bad news: when running the exact same XQuery example in exactly the same Tomcat setup on my Linux production server, the xslfo:render() call exits with an error:
<exception>
<path>/db/apps/test-fop-fonts/test-fop-fonts.xq</path>
<message>exerr:ERROR .fop (Permission denied) [at line 40, column 13]</message>
</exception>
Unfortunately, this is about everything that's being logged. Clearly, something is going wrong on the Linux box, but I have no clue what it could be. Apart from this glitch, eXist is operating perfectly in my Linux Tomcat, so I'm quite confident file permissions should be OK.
Has anyone else encountered this "permission denied" error?
Best,
Ron

Apparently, it turned out to be a lower-level OS problem: the problem disappeared when starting Tomcat as root user, after which eXist could happily create the PDF file.
After some more digging, it appeared that FOP caches files in home directory of the user running it, which was lacking for my non-privileged Tomcat user.... and that the problem could be fixed by just creating a home directory for this user, or providing the path to a writable folder for that user in the Tomcat startup script, e.g. -Duser.home=$CATALINA_TMPDIR!
If anyone else should bump into this, I've found the solution here: https://forum.xwiki.org/t/pdf-export-issue-with-file-permissions/4933/11. (phew!)

How do I properly receive UTF-8 characters in JBoss 7?

I’m using JBoss 7.1.3. Currently, when I submit a request to the server with a special character, for example
Café
it is received by the server as
CafÃ©
The only piece of advice I found online for correcting this was to add these sysmte properties to $JBOSS_HOME/standalone/configuration/standalone.xml …
<system-properties>
…
<property name="org.apache.catalina.connector.URI_ENCODING" value="UTF-8"/>
<property name="org.apache.catalina.connector.USE_BODY_ENCODING_FOR_QUERY_STRING" value="true"/>
</system-properties>
However, even after restarting my server my special characters are still being received incorrectly on the server side. What else can I do to properly interpret the characters?

The thing that fixed the encoding for me for the JBOSS versions 7.1.0 beta or higher was to include the following line to the standalone.conf file directly under the bin
JAVA_OPTS="$JAVA_OPTS -Dorg.apache.catalina.connector.URI_ENCODING=UTF-8"
OR
Try the following in the standalone.conf directly under bin or domain.conf whichever might suit you
-Dfile.encoding=UTF-8
*Works for version 7.1 JBOSS

The e-acute is U+00E9.
Encoding using UTF-8, you get 0xC3 0xA9.
If you convert that on the assumption that your terminal or whatever is using cp1252 or similar, you get Ã©. Solution: don't do that. Tell your jboss to use UTF-8.

have you tries following option?
set start up param ,
-Dorg.wildfly.undertow.ALLOW_UNESCAPED_CHARACTERS_IN_URL=true
https://issues.jboss.org/browse/JBEAP-13710

Try setting org.apache.catalina.connector.USE_BODY_ENCODING_FOR_QUERY_STRING to false.
Had the same problem very recently, the above config being true was the core issue on Jboss EAP 6.4.0.
It seems that no matter what you set in URI_ENCODING, if USE_BODY_ENCODING_FOR_QUERY_STRING is set to true (and body has no encoding) it will default to ISO-8859-1.
As in tomcat docs: https://tomcat.apache.org/tomcat-8.5-doc/config/http.html (look for useBodyEncodingForURI connector attribute).

Had the same issue in JBoss 6.4. JVM was correctly set, displaying special characters was ok, but the POST requests were encoded in ISO-8859-1.
Someone put a filter which set the encoding without luck.
request.setCharacterEncoding("UTF-8");
In the end the issue was that this filter was not the first to be executed. What I did what to put this "UTF8 filter" in the first position and it solved the issue.
The trick is that the fist filter executed set the encoding. It seems that another filter did set the wrong encoding.
The class implements javax.servlet.Filter with the following code:
public void doFilter(ServletRequest inRequest, ServletResponse inResponse, FilterChain inFilterChain) throws IOException, ServletException {
inRequest.setCharacterEncoding("UTF-8");
inFilterChain.doFilter(inRequest, inResponse);
}

Can I use a URL with CFZIP action=read?

When I use Coldfusion 10, locally, I can read a zip file's text file content using:
<cfzip action="read" file="http://someurl.com/somezip.zip" entrypath="sometext.txt" variable="somevar" />
But on my Railo VPS, this produces an internal server error 500 on IIS7.5
Can anyone tell me where I am going wrong with Railo?

Not sure about Railo, but according to the <cfzip> doc, ACF does not support reading off http://, only ram://. You should use <cfhttp> to download the content first.

Intersystems Cache Unexpected error occurred: <WIDE CHAR>

I am trying to load in an old CACHE.DAT database into Intersystems Cache (2012.1.1 win32 evaluation). I've managed to create a namespace and database, and I'm able to query some of the database tables.
However, for other tables, I get the following error:
ERROR #5540: SQLCODE -400 Message: Unexpected error occurred: <WIDE CHAR>
The documentation tells me that this means that a multibyte character is read where a one byte character is expected. I suspect this might mean that the original database was in UTF-16, while my new installation is using UTF-8.
My question is: is there a way to either convert the database, to configure Cache so that it can deal with , or to deal with this problem in another way?

maybe the original database was created in unicode installation
and current installation 8-bit
Caché read a multibyte character where a 1-byte character was expected.

you can send your cboot.log from mgr directory ?
for example first lines in my cboot.log
Start of Cache initialization at 02:51:00PM on Apr 7, 2012
Cache for Windows (x86-64) 2012.2 (Build 549U) Sun Apr 1 2012 17:34:18 EDT
Locale setting is rusw
Source directory is c:\intersystems\ensemble12\mgr\utils\

Rails 3 - (incompatible character encodings: UTF-8 and ASCII-8BIT):

incompatible character encodings: UTF-8 and ASCII-8BIT
I'm finding lots of old information yet scant advice about this error message but wondered what the current status is as there seems to be less discussion of it around the net. It occurs for me when I try to render text from a locale file that includes accented characters, for example 'é'.
I'm using rails 3.0.3, ruby 1.9.2 (and have tried 1.8.7 with same result), mysql2 adapter, utf8 encoding.

I've gotten this error when there is an encoding mismatch between how my Ruby app is parsing strings and how the database stores them.
To fix this for myself when I'm dealing with UTF-8, I make sure I have this at the top of the .rb file in question:
# encoding: utf-8
Alternatively, you can globally set default UTF-8 encoding in your application config file with this line:
Encoding.default_internal, Encoding.default_external = ['utf-8'] * 2
And finally, I make sure that my database is using UTF-8 internally by setting the encoding option in database.yml:
development:
adapter: postgresql
encoding: UTF8
database: pg_development
username: abe
pool: 5

I remember resolving this once by using "string".force_encoding("UTF-8")

For the time being, this can be caused by an issue in Mail 2.5.4, which 'pollutes' the encoding of the mail object.
#email = Email.find(1)
#email.body.encoding # This is a fresh instance from db, still okay
Mail.new(#email.body)
#email.body.encoding # value has been changed