Coldfusion/Lucee Encoding Issue When Using EncodeForHTML - encoding

Running into an issue when using EncodeForHTML for certain characters (Emojis in this case)
The text in this case is:
⌛️a😊b👍c😟 💥🍉🍔 💩 🤦🏼‍♀️🤦🏼‍♀️🤦🏼‍♀️ 😘
Now if I just a straight output
<cfoutput>#txt#</cfoutput>
It displays correctly, no issues, but if I use EncodeForHTML first
<cfoutput>#EncodeForHTML(txt)#</cfoutput>
I get this
⌛️a��b��c�� ������ �� ����‍♀️����‍♀️����‍♀️ ��
I tested it with EncodeForXML & esapiEncode as well to be sure; all are giving me the same result.
I've verified the encoding settings in Lucee are UTF-8, and the meta charset tag is also set to UTF-8. I can't find any documenation re: EncodeForHTML saying if it make any changes to the character encoding, if it requires the character encoding to be something specific, or if it has any known issues with emojis or certain code points.
I appreciate any help or clarification anyone can provide.
Edit: Thank you everyone. Wish I could accept multiple answers.

I was required to sanitize emojis in order ensure that third-party content was cross-compatible with external services. Some of the content contained emojis and was causing export/import problems. I wrote a ColdFusion wrapper for the emoji-java library to identify, sanitize and convert emojis.
https://github.com/JamoCA/cf-emoji-java
For example, the parseToAliases() function "replaces all the emoji's unicodes found in a string by their aliases".
emojijava = new emojijava();
emojijava.parseToAliases('I like 🍕'); // I like :pizza:
To "encode" you could use either the parseToHtmlDecimal() or parseToHtmlHexadecimal() functions prior to using EncodeForHTML().
emojijava = new emojijava();
test = emojijava.parseToHtmlDecimal('I like 🍕'); // I ❤️ 🍕
EncodeForHTML(test);

At the time of this writing, ColdFusion's latest version is 2018 update 9
In turn, it uses ESAPI 2.1.1
Recent release notes don't mention Emoji,
https://github.com/ESAPI/esapi-java-legacy/tree/develop/documentation
But they do mention in Pull request 413
"Fixing ESAPI's inability to handle non-BMP codepoints."
This dates from 2017
https://github.com/ESAPI/esapi-java-legacy/pull/413
So based on all this information, I would recommend doing both of the following
Try using ESAPI directly. This is how it was done before ESAPI was added to CF. This issue may or may not still exist in ESAPI
Put in a ticket with Adobe to update this library.

Yes, ESAPI 2.2.0.0 addressed the issue of not correctly encoding non-BMP characters (see https://github.com/ESAPI/esapi-java-legacy/issues/300) as part of PR #413 that James mentioned above.
But I just uploaded release ESAPI 2.2.1.0-RC1 (release candidate 1) to Maven Central early this morning and hope to have an official 2.2.1.0 release out by next weekend, so if you are going to put in a ticket with Adobe for fix this with an updated version of ESAPI, I'd wait another week and then tell them to update to 2.2.1.0.

Related

Why watson Personality Inisights shows different results using different API versions/demo

My apologies if the question is duplicated. We are facing an issue with the analysis of a profile using Watson Personality Insights API in Spanish. We have a demo we implemented using PI API version 2 and then we tested the results (exact same text) with the demo published on developer cloud(in spanish) and we found important differences on how the big five were calculated when the facet values were not that different. Is it possible that these differences are caused because of the API version? The issue that with our demo the big five values produced a kind of negative summary profile when the developercloud summary is kinder.
We could send both result jsons. For example here is how the big five rated:
BigFive DeveloperCloud Demo V2
Openness 0.773834349 0.847273232
Conscientiousness 0.916616088 0.914907481
Extraversion 0.796331544 0.612606551
Agreeableness 0.17445636 0.096118648
Emotional range 0.036287447 0.01623536
thanks in advance!!
So the API version would not make a difference, as that just governs the format of the API; the back-end models are the same for both v2 and v3 of the API.
So the jist of your question is that when you run the same text in your app, and in the demo you get different big5 results, while the facet values are about the same.
This might be easiest solved by you opening a support ticket so we can debug the issue together; if you'd rather not do that then can you provide a sample text? Typically it boils down to a difference in the way the text is parsed.
Another question; did you try making the request using curl? That would cut out any custom logic in your app and narrow down the problem.
thanks Neil for you answer!
We tested the text using CURL and we noticed that the results didn't change by the service version used but instead by how the text was sent. If we called the service using curl passing a plain text input(formatted in UTF-8 with line breaks) it returned the same results for version2 and version 3 and also matched the ones from our demo. If we called the service using curl passing json input WITHOUT line breaks it returned the same values as well. But if we called the service passing the json input WITH line breaks then the results changed and almost matched those shown by ibm demo. My question here is which are the correct results? The ones shown when the text is sent as a plain text input(with line breaks) or when the text is sent as json input(with line breaks)? Is there any technical guideline besides the one shown in developercloud on how the text should be parsed to use this service?
Thanks again!

WebSphere MQ binary fiiles

This might be a question that may not be answered due to the nature of the external tool I am using (lack of documentation).
Basically, I am using a tool that pushes and pulls messages from the queue, more precisely - it pushes and pulls files. It worked perfectly for text files but when I tried pushing and then pulling a binary file - the pulled one was corrupted, it's size increased in comparsion with the original file (1.33 ratio).
For example moving a zip file wouldn't work...
I suppose it has something to do with the tools configuration, the only settings that can be changed regarding the problem are CCSID and encoding (UTF-8, Base16, etc.), I tried playing with both, unfortunately without success.
Tried using the following CCSIDs: 65535, 1208, 819
and encodings : UTF-8, Base16, Base64
In every case the binary file was corrupted after pulling it from the queue, I'm not entirely sure how the tool acomplishes that, it's written in Java, also I'm new to MQ so I tried searching for the correct options in IBM's docs but I haven't found anything that makes more sense than 65535 and Base16, yet it still doesn't work, could anyone with more experience with MQ tell if playing with these options makes sense at all in this case and if so - suggest what CCSID and encoding can I try to accomplish what Ive described above?
More information is really needed, but my suspicion is you are putting the message on the queue as a text message and playing around with encodings and ccsid's to try to get it right. You really need to know how the 'Java' app achieves this - is it using JMS (eg JMSBytesMessage) or base Java (something like setMessageData).
At a high level, there is a header on a message (The MD) which 'describes' the data - the MD format field. If you say the data is a string then MQ can convert between codepages should the getter request it etc. Put a tiny binary file into a message onto a queue, and browse the queue with amqsbcg or the GUI - what are the MD fields for format? What headers are on the payload - anything like RFH2's?
Put the same code in to give us a clue, or at least the amqsbcg output

Why would LayoutObjectNames return an empty string in FileMaker 14?

I'm seeing some very strange behavior with FileMaker 14. I'm using LayoutObjectNames for some required functionality. On the development system it's working fine. It returns the list of named objects on the layout.
I close the file, zip it up and send it to the client, and that required functionality isn't working. He sends the file back and I open it and get a data viewer up. The function returns nothing. I go into layout mode and confirm that there are named objects on the layout.
The first time this happened and I tried recovering the file. In the recovered file it worked, so I assumed some corruption had happened on his end. I told him to trash the file I had given him and work with a new version I supplied. The problem came up again.
This morning he sent me the oldest version that the problem manifested in. I confirmed the problem, tried recovering it again, but this time it didn't fix the problem.
I'm at a loss. It works in the version I send him, doesn't on his system. We're both using FileMaker 14, although I'm using Advanced. My next step will be to work from a served file instead of a local one, but I have never seen this type of behavior in FileMaker. Has anyone seen anything similar? Any ideas on a fix? I'm almost ready to just scrap the file and build it again from scratch since we're not too far into the project.
Thanks, Chuck
There is a known issue with the Get (FileName) function when the file name contains dots (other that the one before the extension). I will amend my answer later with more details and a possible solution (I have to look it up).
Here's a quote from 2008:
This is a known issue. It affects not only the ValueListItems()
function, but any function that requires the file name. The solution
is to include the file extension explicitly in the file name. This
works even if you use Get (FileName) to return the file name
dynamically:
ValueListItems ( Get ( FileName ) & ".fp7" ; "MyValueList" )
Of course, this is not required if you take care not to use period
when naming your files.
http://fmforums.com/forums/topic/60368-fm-bug-with-valuelistitems-function/?do=findComment&comment=285448
Apparently the issue is still with us - I wonder if the solution is still the same (I cannot test this at the moment).

"News System" acting different on different systems

I have 2 servers which I though were synchronized (dev and live) but the "News System" (extension key "news") makes something different.
In the dev server this line
<f:format.date format="%A">{newsItem.datetime}</f:format.date>
outputs "Freitag", as expected (Thats friday in German)
But in the Live Server, it outputs %AM. Which is even weirder is that l (alone, without %) outputs "Friday" in English.
I've checked all the configurations I've seen and I cant seem to find where the difference between the systems is.
Any idea?
TYPO3 is using DateTime::format (http://de2.php.net/manual/en/datetime.format.php) to format the date. This method is using the same syntax of date() which is not using locales, so all output is english.
The only thing I can not explain is why your dev enviroment accepts %A to render the date. Are there different PHP-Versions? Which TYPO3 Version are you using? Get a look at /typo3/sysext/fluid/Classes/ViewHelpers/Format/DateViewHelper.php, you will get the answer there.
I just solved it! Turns out I had 4.7.7 in my live server, and that doesnt support stftime.
Funny, I never thought that such an important feature would be added in an 4.7.X update...

Replace éàçè... with equivalent "eace" In GWT

I tried
s=Normalizer.normalize(s, Normalizer.Form.NFD).replaceAll("[^\\p{ASCII}]", "");
But it seems that GWT API doesn't provide such fonction.
I tried also :
s=s.replace("é",e);
But it doesn't work either
The scenario is I'am trying to générate token from the clicked Widget's text for the history management
You can take ASCII folding filter from Lucene and add to your project. You can just take foldToASCII() method from ASCIIFoldingFilter (the method does not have any dependencies). There is also a patch in Jira that has a full class for that without any dependencies - see here. It should be compiled by GWT without any problems. License should be also OK, since it is Apache License, but don't quote me on it - you should ask a real lawyer.
#okrasz, the foldToASCII() worked but I found a shorter one Transform a String to URL standard String in Java