Deleting all special characters from a string in progress 4GL - progress-4gl

How can I delete all special characters from a string in Progress 4GL?

I guess this depends on your definition of special characters.
You can remove ANY character with REPLACE. Simply set the to-string part of replace to blank ("").
Syntax:
REPLACE ( source-string , from-string , to-string )
Example:
DEFINE VARIABLE cOldString AS CHARACTER NO-UNDO.
DEFINE VARIABLE cNewString AS CHARACTER NO-UNDO.
cOldString = "ABC123AACCC".
cNewString = REPLACE(cOldString, "A", "").
DISPLAY cNewString FORMAT "x(10)".
You can use REPLACE to remove a complete matching string. For example:
REPLACE("This is a text with HTML entity &", "&", "").
Handling "special characters" can be done in a number of ways. If you mean special "ASCII" characters like linefeed, bell and so on you can use REPLACE together with the CHR function.
Basic syntax (you could add some information about code pages as well but that's rarely needed) :
CHR( expression )
expression: An expression that yields an integer value that you want to convert to a character value. (ASCII numberic value).
So if you want to remove all Swedish letter Ö:s (ASCII 214) from a text you could do:
REPLACE("ABCDEFGHIJKLMNOPQRSTUVWXYZÅÄÖ", "Ö", "").
or
REPLACE("ABCDEFGHIJKLMNOPQRSTUVWXYZÅÄÖ", CHR(214), "").
Putting this together you could build an array of unwanted characters and remove all those in the string. For example:
FUNCTION cleanString RETURNS CHARACTER (INPUT pcString AS CHARACTER):
DEFINE VARIABLE iUnwanted AS INTEGER NO-UNDO EXTENT 3.
DEFINE VARIABLE i AS INTEGER NO-UNDO.
/* Remove all capital Swedish letters ÅÄÖ */
iUnwanted[1] = 197.
iUnwanted[2] = 196.
iUnwanted[3] = 214.
DO i = 1 TO EXTENT(iUnwanted):
IF iUnwanted[i] <> 0 THEN DO:
pcString = REPLACE(pcString, CHR(iUnwanted[i]), "").
END.
END.
RETURN pcString.
END.
DEFINE VARIABLE cString AS CHARACTER NO-UNDO INIT "AANÅÅÖÖBBCVCÄÄ".
DISPLAY cleanString(cString) FORMAT "x(10)".
Other functions that could be useful to look into:
SUBSTRING: Returns a part of a string. Can be used to modify it as well.
ASC: Like CHR but the other way around - displays ASCII value from a character).
INDEX: Returns the position of a character in a string.
R-INDEX: Like INDEX but searches right to left.
STRING: Converts a value of any data type into a character value.

This function will replace chars according to the current collation.
function Dia2Plain returns character (input icTxt as character):
define variable ocTxt as character no-undo.
define variable i as integer no-undo.
define variable iAsc as integer no-undo.
define variable cDia as character no-undo.
define variable cPlain as character no-undo.
assign ocTxt = icTxt.
repeat i = 1 to length(ocTxt):
assign cDia = substring(ocTxt,i,1)
cPlain = "".
if asc(cDia) > 127
then do:
repeat iAsc = 65 to 90: /* A..Z */
if compare(cDia, "eq" , chr(iAsc), "case-sensitive")
then assign cPlain = chr(iAsc).
end.
repeat iAsc = 97 to 122: /* a..z */
if compare(cDia, "eq" , chr(iAsc), "case-sensitive")
then assign cPlain = chr(iAsc).
end.
if cPlain <> ""
then assign substring(ocTxt,i,1) = cPlain.
end.
end.
return ocTxt.
end.
/* testing */
def var c as char init "ÄëÉÖìÇ".
disp c Dia2Plain(c).
def var i as int.
def var d as char.
repeat i = 128 to 256:
assign c = chr(i) d = Dia2Plain(chr(i)).
if asc(c) <> asc(d) then disp i c d.
end.

This function will remove anything that is not a letter or number (adapt it as you wish).
/* remove any characters that are not numbers or letters */
FUNCTION alphanumeric RETURN CHARACTER
(lch_string AS CHARACTER).
DEFINE VARIABLE lch_newstring AS CHARACTER NO-UNDO.
DEFINE VARIABLE i AS INTEGER NO-UNDO.
DO i = 1 TO LENGTH(lch_string):
/* check to see if this is a number or letter */
IF (ASC(SUBSTRING(lch_string,i,1)) GE ASC("1")
AND ASC(SUBSTRING(lch_string,i,1)) LE ASC("9"))
OR (ASC(SUBSTRING(lch_string,i,1)) GE ASC("A")
AND ASC(SUBSTRING(lch_string,i,1)) LE ASC("Z"))
OR (ASC(SUBSTRING(lch_string,i,1)) GE ASC("a")
AND ASC(SUBSTRING(lch_string,i,1)) LE ASC("z"))
THEN
/* only keep it if it is a number or letter */
lch_newstring = lch_newstring + SUBSTRING(lch_string,i,1).
END.
RETURN lch_newstring.
END FUNCTION.

Or you can simply use regex
System.Text.RegularExpressions.Regex:Replace("Say,Hi!", "[^a-zA-Z0-9]","")

Related

Is there an equation for a function isSpaceChar in Java on Swift Character Class?

Is there an equation for a function isSpaceChar like in Java on the class Character stdlib in Swift?
In Java this function is to get true or false from a value of AsciiValue character.
For example a character space " " the AsciiValue is 32.
Unicode properties for Character and UnicodeScalar were introduced with Swift 5, see
SE-0211 Add Unicode Properties to Unicode.Scalar and
SE-0221 Character Properties
In particular, Character.isWhiteSpace respectively Unicode.Scalar.isWhiteSpace is
A Boolean value indicating whether this character represents whitespace, including newlines.
Example for characters:
let char: Character = " "
if char.isWhitespace {
// ...
}
Example for Unicode scalar values:
let value = 32
if let uc = UnicodeScalar(value), uc.properties.isWhitespace {
// ...
}

How to use entry and lookup function in the same program to display the string corresponding to the numbers

When I enter 4 at the runtime, the following program should return me the string "four" and similarly the string`s corresponding to 5, 6, 7 and 8.
This should be done using entry function.
DEFINE VARIABLE x AS CHARACTER NO-UNDO FORMAT "9" LABEL "Enter a digit between 4 and 8".
DEFINE VARIABLE show AS CHARACTER NO-UNDO FORMAT "x(5)" EXTENT 5 LABEL "Literal" INITIAL ["four","five","six","seven","eight"].
DEFINE VARIABLE i AS INTEGER.
REPEAT:
SET x AUTO-RETURN.
i = LOOKUP(x, "4,5,6,7,8",",") .
IF i = 0 THEN
DO:
MESSAGE "Digit must be 4, 5, 6, 7 or 8. Try again.".
UNDO, RETRY.
END.
MESSAGE ENTRY(i, show[i], ",") VIEW-AS ALERT-BOX INFO BUTTONS OK.
END.
a) Your LOOKUP function was wrong. First argument is x (the expression to locate in the list), then the list as a comma-delimited string
b) No need for an ELSE. As the UNDO, RETRY stops the current iteration of the loop
c) Since show is an ARRAY, just referent the array-element.
DEFINE VARIABLE x AS CHARACTER NO-UNDO FORMAT "9" LABEL "Enter a digit between 4 and 8".
DEFINE VARIABLE show AS CHARACTER NO-UNDO FORMAT "x(5)" EXTENT 5 LABEL "Literal" INITIAL ["four","five","six","seven","eight"].
define variable i as integer .
REPEAT:
SET x AUTO-RETURN.
i = lookup(x, "4,5,6,7,8",",") .
IF i = 0
THEN
DO:
MESSAGE "Digit must be 4,5,6,7, or 8. Try again.".
UNDO, RETRY.
END.
MESSAGE show[i]
VIEW-AS ALERT-BOX INFO BUTTONS OK.
END.
I think you need to decide if you want to use ENTRY or an array. Mixing makes no sence in this case!
DEFINE VARIABLE x AS CHARACTER NO-UNDO FORMAT "9" LABEL "Enter a digit between 4 and 8".
DEFINE VARIABLE show AS CHARACTER NO-UNDO FORMAT "x(5)" LABEL "Literal" INITIAL "four,five,six,seven,eight".
DEFINE VARIABLE i AS INTEGER NO-UNDO.
REPEAT:
SET x AUTO-RETURN.
i = LOOKUP(x, "4,5,6,7,8", ",") NO-ERROR.
IF i = 0 THEN DO:
MESSAGE "Digit must be 4,5,6,7, or 8. Try again.".
UNDO, RETRY.
END.
MESSAGE entry(i, show, ",") VIEW-AS ALERT-BOX INFO BUTTONS OK.
END.

How to determine whether a string represents an integer?

I need to determine if a string contains just an integer. The built-in function isinteger is not working.
To avoid loops I'd like to apply this task on cell arrays of strings.
For example:
Q = { 'qf5' ; '4' ; 'true' ; 'false' ; '4.00' ; '4E0' ; '4e0' ; '657' };
desired result:
integers = 0 1 0 0 0 0 0 1
For a single string I figured out an ugly workaround, but I can't imagine that this is the only possible way, and also it requires a loop to use it on cell arrays:
myString = '4';
integer = uint64( str2double( myString ) );
newString = int2str( integer );
isStringInteger = strcmp(newString,myString);
Which essential function am I missing?
You can do it with regexp; and to avoid the loop you use cellfun:
~cellfun('isempty', regexp(Q, '^-?\d+$'))
This considers an "integer" as a string of digits, possibly with one minus sign at the beginning.
Note that cellfun with the builtin function 'isempty' is very fast.
Well, the string is not an integer, therefore the question as such is not correct. What you want to check is whether the string is a representation of an integer. The isinteger function is also not what you want, because it does not check whether the actual content of a numeric variable is an integer, but whether the data type is an integer type.
As far as I can tell, there is no built-in way to check whether a string represents an integer. One approach to implement such a check would be to see whether all the characters in the string represent digits:
isintstr = all(myString >= '0') && all(myString <= '9')
This code takes advantage of the fact that the decimal digits are encoded in sequence in ASCII and Unicode.
To allow for leading and trailing white space, use
isintstr = all(strtrim(myString) >= '0') && all(strtrim(myString) <= '9')

Output sanitization within Progress ABL / 4GL

Is there an analagous procedure to php's http://php.net/manual/en/function.mysql-real-escape-string.php for Progress 4GL / ABL or a best practice within the Progress community that is followed for writing sanitized text to external and untrusted entities (web sites, mysql servers and APIs)?
The QUOTE or QUERY-PREPARE functions will not work as they sanitize text for dynamic queries for Progress and not for external entities.
The closest analogue to your cited example would be to write a function that does this:
DEFINE VARIABLE ch-escape-chars AS CHARACTER NO-UNDO.
DEFINE VARIABLE ch-string AS CHARACTER NO-UNDO.
DEFINE VARIABLE i-cnt AS INTEGER NO-UNDO.
DO i-cnt = 1 TO LENGTH(ch-escape-char):
ch-string = REPLACE(ch-string,
SUBSTRING(ch-escape-char, i-cnt, 1),
"~~" + SUBSTRING(ch-escape-char, i-cnt, 1)).
END.
where
ch-escape-chars are the characters you want escape'd.
ch-string is the incoming string.
"~~" is the esacap'd escape character.
It sounds like roll your own would be the only way. For my purposes I emulated the mysql_real_escape_string function
/* TODO progress auto changes all ASC(0) characters to space or ASC(20) in a non db string. */
/* the backslash needs to go first */
/* there is no concept of static vars in progress (non class) so global variables */
DEFINE VARIABLE cEscape AS CHARACTER EXTENT INITIAL [
"~\",
/*"~000",*/
"~n",
"~r",
"'",
"~""
]
.
DEFINE VARIABLE cReplace AS CHARACTER EXTENT INITIAL [
"\\",
/*"\0",*/
"\n",
"\r",
"\'",
'\"'
]
.
FUNCTION mysql_real_escape_string RETURNS CHARACTER (INPUT pcString AS CHAR):
DEF VAR ii AS INTEGER NO-UNDO.
MESSAGE pcString '->'.
DO ii = 1 TO EXTENT(cEscape):
ASSIGN pcString = REPLACE (pcString, cEscape[ii], cReplace[ii]).
END.
MESSAGE pcString.
RETURN pcString.
END.

How do I convert strings to title case in OpenEdge ABL / Progress 4GL?

How do I convert a string to title case in OpenEdge ABL (aka Progress 4GL)?
I know I can get upper case with CAPS(), and lower case with LC(), but I can't find the title case (sometimes called proper case) function.
Examples:
Input Output
------------ ------------
hello world! Hello World!
HELLO WORLD! Hello World!
function titleWord returns character ( input inString as character ):
return caps( substring( inString, 1, 1 )) + lc( substring( inString, 2 )).
end.
function titleCase returns character ( input inString as character ):
define variable i as integer no-undo.
define variable n as integer no-undo.
define variable outString as character no-undo.
n = num-entries( inString, " " ).
do i = 1 to n:
outString =
outString +
( if i > 1 and i <= n then " " else "" ) +
titleWord( entry( i, inString, " " ))
.
end.
return outString.
end.
display
titleCase( "the quick brown fox JUMPED over the lazy dog!" ) format "x(60)"
.
I think the order of one of those statements above is incorrect -
You'll be adding an extra " " at the beginning of the string! Also need to change the <= to < or you'll be tacking an extra " " into your return string.
It should be:
n = num-entries( inString, " " ).
do i = 1 to n:
outString =
outString +
titleWord( entry( i, inString, " " )) +
( if i < n then " " else "" ) +
.
end.
At least that's what I -think- it should be...
-Me
I was playing around with this a while back, and besides a solution similar to Tom's, I came up with two variations.
One of the problems I had was that not all words are separated by space, such as Run-Time and Read/Write, so I wrote this version to use any non-alphabetic characters as separators.
I also wanted to count diacritics and accented characters as alphabetic, so it became a little complicated. To solve the problem I create two versions of the title, one upper and one lower case. Where the two strings are the same, it's a non-alphabetic character, where they are different, it's alphabetical. Titles are usually very short, so this method is not as inefficient as might seem at first.
FUNCTION TitleCase2 RETURNS CHARACTER
( pcText AS CHARACTER ) :
/*------------------------------------------------------------------------------
Purpose: Converts a string to Title Case.
Notes: This version takes all non-alphabetic characters as word seperators
at the expense of a little speed. This affects things like
D'Arby vs D'arby or Week-End vs Week-end.
------------------------------------------------------------------------------*/
DEFINE VARIABLE cUText AS CHARACTER NO-UNDO CASE-SENSITIVE.
DEFINE VARIABLE cLText AS CHARACTER NO-UNDO CASE-SENSITIVE.
DEFINE VARIABLE i AS INTEGER NO-UNDO.
DEFINE VARIABLE lFound AS LOGICAL NO-UNDO INITIAL TRUE.
cUText = CAPS(pcText).
cLText = LC(pcText).
DO i = 1 TO LENGTH(pcText):
IF (SUBSTRING(cUText, i, 1)) <> (SUBSTRING(cLText, i, 1)) THEN
DO:
IF lFound THEN
DO:
SUBSTRING(cLText, i, 1) = (SUBSTRING(cUText, i, 1)).
lFound = FALSE.
END.
END.
ELSE lFound = TRUE.
END.
RETURN cLText.
END FUNCTION.
Another issue is that title case is supposed to be language specific, i.e. verbs and nouns are treated differently to prepositions and conjunctions. These are some possible rules for title case:
First and last word always get capitalized
Capitalize all nouns, verbs (including "is" and other forms of "to
be"), adverbs (including "than" and "when"), adjectives (including
"this" and "that"), and pronouns (including "its").
Capitalize prepositions that are part of a verb phrase.
Lowercase articles (a, an, the).
Lowercase coordinate conjunctions (and, but, for, nor, or).
Lowercase prepositions of four or fewer letters.
Lowercase "to" in an infinitive phrase.
Capitalize the second word in compound words if it is a noun or
proper adjective or the words have equal weight (Cross-Reference,
Pre-Microsoft Software, Read/Write Access, Run-Time). Lowercase the
second word if it is another part of speech or a participle
modifying the first word (How-to, Take-off).
I could of course not code all this without teaching the computer English, so I created this version as a simple if crude compromise; it works in most cases, but there are exceptions.
FUNCTION TitleCaseE RETURNS CHARACTER
( pcText AS CHARACTER ) :
/*------------------------------------------------------------------------------
Purpose: Converts an English string to Title Case.
Notes:
------------------------------------------------------------------------------*/
DEFINE VARIABLE i AS INTEGER NO-UNDO.
DEFINE VARIABLE cWord AS CHARACTER NO-UNDO.
DEFINE VARIABLE lFound AS LOGICAL NO-UNDO INITIAL TRUE.
DEFINE VARIABLE iLast AS INTEGER NO-UNDO.
DEFINE VARIABLE cSmallWords AS CHARACTER NO-UNDO
INITIAL "and,but,or,for,nor,the,a,an,to,amid,anti,as,at,but,by,down,from,in" +
",into,like,near,of,off,on,onto,over,per,than,to,up,upon,via,with".
pcText = REPLACE(REPLACE(LC(pcText),"-"," - "),"/"," / ").
iLast = NUM-ENTRIES(pcText, " ").
DO i = 1 TO iLast:
cWord = ENTRY(i, pcText, " ").
IF LENGTH(cWord) > 0 THEN
IF i = 1 OR i = iLast OR LOOKUP(cWord, cSmallWords) = 0 THEN
ENTRY(i, pcText, " ") = CAPS(SUBSTRING(cWord, 1, 1)) + LC(SUBSTRING(cWord, 2)).
END.
RETURN REPLACE(REPLACE(pcText," - ","-")," / ","/").
END FUNCTION.
I have to mention that Tom's solution is very much faster than both of mine. Depending on what you need, you may find that the speed is not that important, since you're unlikely to use this in large data crunching processes or with long strings, but I wouldn't ignore it. Make sure that your needs justify the performance loss.