How to store subscript and superscript values in Progress OpenEdge? - progress-4gl

Is there a way to store subscript and superscript values in the Progress database, for example, chemical symbols and formulas, such as C2H5OH, and is it possible to display them ?
I tried copying from Word and pasting into fill in string fields but it doesn't format correctly, it doesn't recognize subscripted values and it is displayed as C2H5OH.

After some testing I've come this far:
1) You need to start your session with startup parameter -cpinternal utf-8 ie
prowin32.exe -cpinternal utf8
Depending on your need you might also need to set -cpstream utf-8 and possibly -cpcoll basic (or something else that matches your needs).
When I did this I had some strange crashes - but that might be because I edited a file saved in another codepage?
2) You need to get the data into your system (perhaps you already have it?).
I used Word and information found here and further explained here. The subscript font setting are just font settings (not unicode) so don't let that fool you (copy-pasting from your question is exactly the same). Basically you need to write the hexadecimal value of the subscript 2 (2082) in Word and then press Alt + X.
Assuming you want to write the actual data in a Progress based GUI I haven't been successful so far. Perhaps you could look at changing registry values as described in the links and continue along that path. I don't want to do that for just basic testing...
3) You will need a font with decent support for these characters. Some fonts don't support them at all!
Segoe UI:
Default system font (possibly) MS Sans Serif:
Arial:
5) Database? I'm unsure if you will need to use CLOB-fields to store this in your database or not. Most likely you shouldn't.
Hope this is enough to at least get you started!

Related

Errors using ps2ascii on some files

What does FC_WEIGHT refer to? Please advise: Although a text file was produced it is large and consists largely of numbers which makes it hard to proofread. I need relatively good confidence the output matches the input. If there is a fix please point me to it and bring joy to my dull drab existence.
entered the command
ps2ascii /Users/dwstclair/Desktop/untitled3/stmt_20181130.pdf a.txt
The result was:
DEBUG: FC_WEIGHT didn't match
On the off chance a default font was missing on my system
I added DroidSansFallback.ttf (no joy)
Basically, I wouldn't use ps2ascii. Its long been deprecated and doesn't even ship in more recent versions of Ghostscript.
Instead consider using the txtwrite device. It works with a wider range of input (in particular it can use ToUnicode CMaps in PDF files, which ps2ascii cannot) and is capable of producing output in other than ASCII, which is quite useful. Even if you aren't working with non-Latin languages, the ability to preserve ligatures (eg fi, ffi, ffl etc) is convenient.
The actual answer to your question is 'don't worry about it'.
FC_WEIGHT refers to the weight of a font (light, bold, regular, ExtraBold etc). This message can only arise when you are using FontConfig, and Ghostscript is enumerating the available fonts from font config, trying to find a match for a missing font in the input. This means that a candidate font did not match the target font's weight.
Since you aren't going to use the font, it doesn't affect you.

What's a good method for writing fixed width field files?

I need to write a file that is probably being interpreted by something like RPG IV on an AS/400 (but I don't know that). The file will be created by reading data from our MySQL database and then writing it in the specified format. It could be quite large ( potentially measured in GB but haven't determined yet ). Right now I'm thinking Perl's built in format might actually be my best bet, because things like Xslate, and Template Toolkit are more designed for things that aren't fixed width (HTML). My only concern there is that format doesn't appear to have conditionals and it looks like I may need them (I found a format left justified if field A is set, right justified and padded if not)
Other possibilities that come to mind are pack and the sprintf family of functions.
I don't think pack supports right-justified text, so that wouldn't be an option.
That leaves (s)printf. You can build format specifiers programatically to support your conditional logic for justification.
Template Toolkit can do a serviceable job at creating fixed width formatted files. The trick is to use the templates to describe the file and record structure, but have a Perl function format the data for each field.
It may be easier to skip the templates and do all the formatting in Perl. Either way you need to consider how you need to format your fields. In my experience sprintf is better and handling more of the formatting cases required by fixed width formatted files. You will probably still need to implement a few helper functions the hand oddities (like EBCDIC/COBOL signed numbers encoded in ASCII, if your unlucky enough).
There are a thousand odd special cases in legacy fixed width formatted files, it's almost enough to make me like XML data files, typically it's the oddest special case in the end that determines what the best method for formatting the file is.

Bad MySQL import, now we have garbage showing in place of utf-8 chars

We restored from a backup in a different format to a new MySQL structure (which is setup correctly for UTF-8 support). We have weird characters showing in the browser, but we're not sure what they're called so we can find a master list of what they translate to.
I have noticed that they do, in fact, correlate to a specific character. For example:
â„¢ always translates to ™
— always translates to —
• always translates to ·
I referenced this post, which got me started, but this is far from a complete list. Either I'm not searching for the correct name, or the "master list" of these bad-to-good conversions as a reference doesn't exist.
Reference:
Detecting utf8 broken characters in MySQL
Also, when trying to search via MySQL query, if I search for â, I always get MySQL treating it as an "a". Is there any way to tweak my MySQL queries so that they are more literal searches? We don't use internationalization much so I can safely assume any fields containing the â character is considered to be a problematic entry, which would need to be remedied by our "fixit" script we're building.
Instead of designing a "fixit" script to go through and replace this data, I think it would be better to simply fix the issue directly. It seems like the data was originally stored in a different format than UTF-8 so that when you brought it into the table that was set up for UTF-8, it garbled the text. If you have the opportunity, go back to your original backup to determine the format the data was stored in. If you can't do that, you will probably need to do a bit of trial and error to figure out which format the data is in. However, once you know that, conversion is easy. Read the following article's section on Repairing:
http://www.istognosis.com/en/mysql/35-garbled-data-set-utf8-characters-to-mysql-
Basically you are going to set the column to BINARY and then set it to the original charset. That should make the text appear properly (a good check to know you are using the correct charset). Once that is done, set the column to UTF-8. This will convert the data properly and it will correct the problems you are currently experiencing.

JMeter CSV Data Set is corrupting Japanese strings stored as proper UTF-8, I get Question Marks instead

I read in search terms from a simple text file to send to a search engine.
It works fine in English, but gives me ???? for any Japanese text.
Text with mixed English and Japanese does show the English text, so I know it's reading it.
What I'm seeing:
Input text:
Snow Leopard をインストールする場合、新しい
Turns into:
Snow Leopard ???????????????
This is in my POST field of an HTTP.
If I set JMeter to encode the data, it just puts in the percent sequence for question marks.
About the Data:
The CSV file is very simple in
structure.
There's only one field / one column,
which I name TERM, and later use as
${TERM}
I don't really need full CSV because it's only one string per line.
There's no commas or quotes.
It's UTF-8 and when I run the Unix "file" command on the file, it says UTF-8 text.
I've also verified UTF-8 in command line and graphical mode on two machines.
Interesting note:
An interesting coincidence that I noticed: if there are 15 Japanese characters then I get 15 question marks, so at some point it's being seen as full characters and not just bytes.
JMeter CSV Dataset Config:
Filename: japanese-searches.csv
File encoding: UTF-8 (also tried without)
Variable names: TERM
Delimiter: ,
Allow Quoted Data: False (I also tried True, different, but still wrong)
Recycle at EOF: True
Stop at EOF: False
Staring mode: All threads
A few things I've tried:
- Tried Allow quoted Data. It changed to other strange characters.
- Added -Dfile.encoding=UTF-8
- Tried encoding the POST stage, but it just turned into a bunch of %nn for question marks
And I'm not sure how "debug" just after the each line of the CSV is read in. I think it's corrupted right away, but I'm not sure.
If it's only mangled when I reference it, then instead of ${TERM} perhaps there's some other "to bytes" function call. I'll start checking into that. I haven't done anything with the JMeter functions yet.
Edited Dec 24:
Tweaks:
Changed formatting and added bullet
points for more clarity.
Clarified that the file is UTF-8, and have verified that.
A new theory:
Is it possible that the Japanese characters are making it through, and the issue is that EVERY SINGLE place that shows them maps them to a "?" at DISPLAY TIME only. So even though I've checked in a bunch of places, they all have a display issue just in the UI?
Is there a way in JMeter to see the numeric value of a character or string? Actually, to tell JMeter to display the list of Unicode code points?
I'll look at my last log files... although I suppose even the server logs could mis-mapped the characters.
Also, perhaps when doing variable expansion inside of the text field that I POST, where I reference the ${TERM}, maybe at that point it also maps to question marks, but that the corruption happens at that later point. If that happened, AND it was mis-displayed in the UI, then it might lead to a false conclusion.
What I'd really like to do is pause JMeter after the first CSV record, just after that line is loaded, and look at it with a "data scope" or byte editor or something. Not sure if this is possible.
Found the issue, there was another place the UTF-8 had to be specified.
In the HTTP Request, to the right of the Method, you have to also set Content Encoding to UTF-8
Yes, in hindsight, this seems obvious, but there were a number of reasons I didn't think this was needed. Some of my incorrect assumptions might be helpful for others who are debugging, so here goes - I would have thought that:
1: Once text has made it into Java as Unicode, it stays as Unicode, and goes in and out by UTF-8. Obviously not in this case.
2: I sort of thought HTTP defaulted to UTF-8 unless you say otherwise, but maybe I'm just used to XML, but probably not a good practice to assume that, and maybe HTTP defaults to ISO-Latin1 or something, or even if there's a spec, maybe folks don't follow it.
3: And if I don't specific it, I'd think the "do no harm" approach would be to pass the characters on, and let the receiver on the other end deal with it. Wrong again!
(OK, so points 1, 2 and 3 overlap a bit)
4: Even though my HTTP Request POST, I did still try the Encode checkbox. I certainly thought that would have encoded it, but all I got was the repeating % hex for question marks, so seemed to me that the data was already corrupted at that point. Wrong again. I suspect WITHIN the HTTP phase, there's TWO character transitions, first from Unicode to whatever encoding it thinks you have, and THEN a second encoding into the %signs, and my data was mis-encoded at the first step.
5: And I would have thought JMeter would say something or warn, but from my reading, apparently it's not helpful in that respect. You can do logging or whatever.
And the "?" is Java's way of reporting a problem BY default, this started in the Java 1.4x timeframe. In my Java code I prefer to set encoding errors to report as an exception, but again, not the default, and not what JMeter does.
So I learned my lesson.
The HINT that the Unicode was at least starting out OK was that the number of question marks equaled the number of Japanese characters, instead of having 2 or 3 times as many question marks. If the length of "???" matches your Japanese (or Chinese) string, then Java DID see actual Unicode characters at some point along the journey. Whereas if you see 3 times as many ?'s as input text, then Java always saw them as bytes or ints or whatever, and NEVER as valid codepoints.
Came across this topic when searching for solution to use parameters from csv file that contained some columns written in Hebrew.
I used Excel 2007 to create a 1000 lines data for user registrations. The first and the last names had to be in Hebrew.
I exported the file to "Unicode text" file. It became tab delimited.
"Unicode Text" saves in UTF-16 LE (Little Endian), not in UTF-8. That is important.
I opened the result in Notepad++. I could see the Hebrew letters properly. The Notepad++ has the "Encoding" menu item, where you can check the encoding or change it. So I changed the Little Endian to UTF-8.
Then I replaced tabs with commas (just selected the tab and pasted it into the Find box.
The parameters were substituted ok, but after running the script I saw the following:
In the "View Results Tree" listener I opened the "Result" tab of the "Http Request".
The parameters were substituted, but the HTTP view tab (on the bottom) of the Request showed me some gibberish.
But when I looked at the Raw view, I saw that the request parameters actually contained strings like %D7%A9%D7%A8%D7%9E%D7%95%D7%98%D7%94 that when taken in pairs (%D7 %A9) corersponded properly to Hebrew letters.
To my mind, the JMeter has a bug and can not properly display the unicode chars. But it sends (POSTs) them out ok.
Hope I am right and hope it will help someone.
You can try to use "SHIFT-JIS" in Content encoding (it's nearby Method selection). Then you should uncheck "Encode?" for parameter that included Japanese.
Hope it works you.

Apostrophe issue in RTF

I have a function within a custom CRM web application (old VB.Net circa 2003) that takes a set of fields from a database and merges them with palceholders in a set of RTF based template documents. These generate merged letters and documentation. The code essentially loops through each line of the RTF template file and replaces any instances of the placeholder values with text from a database record. The issue I'm having is that users have pasted a certain type of apostrophe into the web app (and therefore into the database) that is not rendering correctly in the resulting RTF file. It is rendering like this - ’.
I need a way to spot this invalid apostrophe in the code and replace it with a valid one. Unfortunately when I paste the invalid apostrophe into the Visual Studio editor it gets converted into the correct one. So I need another way to express this invalid apostrophe's value. Unfortunately I do not know a great deal about unicode and other encodings so I'm calling out for help with this.
Any ideas?
If you really just want to figure out what the character is you might want to try and paste it into a text editor like ultraedit. It has a hex mode that you can flip to to see the actual underlying bytes.
In order to do the replace once you've figured out the character you'd do something like this in Vb,
text.Replace(ChrW(2001), "'")
Note that you might not be able to figure it out easily using the text editor because it might also get mangled by paste from the clipboard. You might want to either print some debug of the ascii values from code. You can use the AscW function to do that.
I can't help but think that it may actually simply be a case of specifying the correct encoding to use when you write out the stream though. Assuming you're using a StreamWriter you can specify it on the constructor. I'm guessing you actually want ASCII given your requirement.
oWriter = New System.IO.StreamWriter(path, False, System.Text.Encoding.ASCII)
It looks like you probably want to encode characters out of the 8 bit range (>255).
You can do that using \uNNNN according to the wikipedia article.