COLDFUSION Copy from Word/Outlook causing question marks - encoding

We have a web app that was created in CF and when we copy from word/outlook and paste into out comment field to submit. It saves to the database with some kind of encoding that causes "?".
I have been trying figure out what the mysterious coding so i can perform a replace function. I have tried so many replaces:
str = RemoveHTML(str);
//Replace Tabls
str = Replace(str, Chr(9), " ", "All");
//Replace Newlines and carriage returns
str = Replace(str, Chr(10), " ", "All");
str = Replace(str, Chr(13), " ", "All");
str = Replace(str,"[^0-9A-Za-z ]","","all");
str = Replace(str, "[^\x20-\x7E]", "", "ALL");
str = Replace(str, Chr(32), " ", "All");
//Replace two or more blanks with one space
str = ReReplaceNoCase(str, "[[:blank:]]{2,}", " ", "All");
str = Trim(str);
str = Left(str, 5000);
return str;
Any help going in the right direction would be greatly appreciated.
Thanks so much!

The character encoding is not going through. The content of the Word document contains characters (like the curly quote marks) that the DB isn't recognizing. Check the DB table's column data type to see if it's VARCHAR or NVARCHAR.
NVARCHAR allows Unicode
VARCHAR doesn't
There are some RTF editors that magically convert text when pasted from MS Word. You can try to just capture the paste event and call some JS code to address the common special characters from Word documents.
https://developer.mozilla.org/en-US/docs/Web/API/Element/paste_event
Or try an RTF editor for the <textarea> that should make that conversion for you.
https://quilljs.com/

Related

How to replace double quotes with a newline character in spark scala

I am new to spark. I have a huge file which has data like-
18765967790#18765967790#T#20130629#00#31#2981546 " "18765967790#18765967790#T#20130629#19#18#3240165 " "18765967790#18765967790#T#20130629#18#18#1362836
13478756094#13478756094#T#20130629#31#26#2880701 " "13478756094#13478756094#T#20130629#19#18#1230206 " "13478756094#13478756094#T#20130629#00#00#1631440
40072066693#40072066693#T#20130629#79#18#1270246 " "40072066693#40072066693#T#20130629#79#18#3276502 " "40072066693#40072066693#T#20130629#19#07#3321860
I am trying to replace " " with new line character so that my output looks like this-
18765967790#18765967790#T#20130629#00#31#2981546
18765967790#18765967790#T#20130629#19#18#3240165
18765967790#18765967790#T#20130629#18#18#1362836
13478756094#13478756094#T#20130629#31#26#2880701
13478756094#13478756094#T#20130629#19#18#1230206
13478756094#13478756094#T#20130629#00#00#1631440
40072066693#40072066693#T#20130629#79#18#1270246
40072066693#40072066693#T#20130629#79#18#3276502
40072066693#40072066693#T#20130629#19#07#3321860
I have tried with-
val fact1 = sc.textFile("s3://abc.txt").map(x=>x.replaceAll("\"","\n"))
But this doesn't seem to be working. Can someone tell what I am missing?
Edit1- My final output will be a dataframe with schema imposed after splitting with delimeter "#".
I am getting below o/p-
scala> fact1.take(5).foreach(println)
18765967790#18765967790#T#20130629#00#31#2981546
18765967790#18765967790#T#20130629#19#18#3240165
18765967790#18765967790#T#20130629#18#18#1362836
13478756094#13478756094#T#20130629#31#26#2880701
13478756094#13478756094#T#20130629#19#18#1230206
13478756094#13478756094#T#20130629#00#00#1631440
40072066693#40072066693#T#20130629#79#18#1270246
40072066693#40072066693#T#20130629#79#18#3276502
40072066693#40072066693#T#20130629#19#07#3321860
I am getting extra blank lines which is further troubling me to create dataframe. This might seem simple here, but the file is huge, also the rows containing " " are long. In the question I have put only 2 double quotes but they can be more than 40-50 in numbers.
There are more than one quote in between textes, which is creating multiple line breaks. You either need to remove additional quotes before replace or empty lines after replace:
.map(x=>x.replaceAll("\"","\n").replaceAll("(?m)^[ \t]*\r?\n", ""))
Reference: Remove all empty lines
You might be missing implicit Encoders and you try the code as below
spark.read.text("src/main/resources/doubleQuoteFile.txt").map(row => {
row.getString(0).replace("\"","\n") // looking to replace " " with next line
row.getString(0).replace("\" \"","\n") // looking to replace " " with next line
})(org.apache.spark.sql.Encoders.STRING)

Determine if a string only contains invisible characters in Swift

I was parsing a messy XML. I found many of the nodes contain invisible characters only, for instance:
"\n "
" "
"\t "
"\n "
"\n\n"
I saw some posts and answers about alphabet and numbers, but the XML being parsed in my project includes UTF8 characters. I am not sure how I can list all visible UTF8 characters in the filter.
How can I determine if a string is made up of completely invisible characters like above, so I can filter them out? Thanks!
Use CharacterSet for that.
let nonWhitespace = CharacterSet.whitespacesAndNewlines.inverted
let containsNonWhitespace = (string.rangeOfCharacter(from: nonWhitespace) != nil)
Trim the string of whitespaces and newlines and see what's left.
if someString.trimmingCharacters(in: .whitespacesAndNewlines).isEmpty {
// someString only contains whitespaces and newlines
}

VFP 5 COPY TO command

I need to copy my cursor information into text file that is encoded using UTF-8.
My current command was :-
COPY TO (FILE NAME) DELIMITED WITH CHARACTER ";"
By default the text file was saved into ANSI, how can I make it save into UTF-8?
EDIT: I am using VFP 5.
I'm not sure , try using StrConv()
strconv(filetostr(FILE NAME),10)
1.Convert all Character and Memo fields into UTF-8:
update table1 set field1=STRCONV(field1, 9)
This converts all non-ANSI characters into UTF-8 encoding.
Export it using your COPY TO command.
To expand on Oleg's suggestion, you can cycle through all fields in the given table by...
USE C:\SomePath\YourTable.dbf
*/ Get list of all fields in the table's structure
lnF = AFIELDS( laF, "YourTable" )
lcUpdFlds = ""
*/ Prepare a field for allowing comma between multiple fields
*/ but first time in is the "SET" command instead.
lcNextFld = "set "
FOR lnI = 1 TO lnF
*/ Is it a character-based field
IF laF[ lnI, 2] = "C" OR laF[ lnI, 2] = "M"
lcFld = laF[ lnI, 1]
lcUpdFlds = lcUpdFlds + lcNextFld + lcFld + " = STRCONV( " + lcFld + ", 9) "
*/ Any subsequent character based fields will have a COMMA
*/ added between them.
lcNextFld = ", "
ENDIF
ENDFOR
update YourTable &lcUpdFlds
Modified to do ONE update command and hit ALL columns vs running multiple updates... Especially on a LARGER table

how to create comma separated value in progress openEdge

newbie question here.
I need to create a list. but my problem is what is the best way to not start with a comma?
eg:
output to /usr2/appsrv/test/Test.txt.
def var dTextList as char.
for each emp no-lock:
dTextList = dTextList + ", " + emp.Name.
end.
put unformatted dTextList skip.
output close.
then my end result is
, jack, joe, brad
what is the best way to get rid of the leading comma?
thank you
Here's one way:
ASSIGN
dTextList = dTextList + ", " WHEN dTextList > ""
dTextList = dTextList + emp.Name
.
This does it without any conditional logic:
for each emp no-lock:
csv = csv + emp.Name + ",".
end.
right-trim( csv, "," ).
or you can do this:
for each emp no-lock:
csv = substitute( "&1,&2" csv, emp.Name ).
end.
trim( csv, "," ).
Which also has the advantage of playing nicely with unknown values (the ? value...)
TRIM() trims both sides, LEFT-TRIM() only does leading characters and RIGHT-TRIM() gets trailing characters.
My vanilla list:
output to /usr2/appsrv/test/Test.txt.
def var dTextList as char no-undo.
for each emp no-lock:
dTextList = substitute( "&1, &2", dTextList, emp.Name )
end.
put unformatted substring( dTextList, 3 ) skip.
output close.
substitute prevents unknowns from wiping out list
keep list delimiter checking outside of loop
generally leave the list delimiter prefixed unless the prefix really needs to go as in the case when outputting it
When using delimited lists often you may want to consider a creating a list class to remove this irrelevant noise out of your code so that you can functionally just add an item to a list and export a list without tinkering with these details every time.
I usually do
ASSIGN dTextList = dTextList + (if dTextList = '' then '' else ',') + emp.name.
I come up (well my colleague did) he come up with this:
dTextList = substitute ("&1&3&2", dTextList, emp.Name, min(dTextList,",")).
But it is cool to see various ways to do this. Thank you for all the response
This results in no leading comma (delimiter) and no fiddling with trim/substring/etc
def var cDelim as char.
def var dTextList as char.
cDelim = ''.
for each emp no-lock:
dTextList = dTextList + cDelim + emp.Name.
cDelim = ','.
end.

Can't insert newline in msword form field using Powebuilder OLE

I have an application written in Powerbuilder 11.5 that automatically fills in form fields of a Word document (MS Word 2003).
The Word document is protected so only the form fields can be altered.
In the code below you can see I use char(10) + char(13) to insert a newline, however in the saved document all I see is 2 little squares where the characters should be.
I've also tried using "~r~n", this also just prints 2 squares.
When I fill in the form manually I can insert newlines as much as I want.
Is there anything else I can try? Or does anybody know of a different way to fill in word forms using Powerbuilder?
//1 Shipper
ls_value = ids_form_info.object.shipper_name[1]
if not isnull(ids_form_info.object.shipper_address2[1]) then
ls_value += char(10) + char(13) + ids_form_info.object.shipper_address2[1]
end if
if not isnull(ids_form_info.object.shipper_address4[1]) then
ls_value += char(10) + char(13) + ids_form_info.object.shipper_address4[1]
end if
if not isnull(ids_form_info.object.shipper_country[1]) then
ls_value += char(10) + char(13) + ids_form_info.object.shipper_country[1]
end if
if lnv_word.f_inserttextatbookmark( 'shipper', ls_value ) = -1 then return -1
The f_inserttextatbookmark is as follows:
public function integer f_inserttextatbookmark (string as_bookmark, string as_text, string as_fontname, integer ai_fontsize);
if isnull(as_text) then return 0
iole_word = create OLEOBJECT
iole_word.connectToNewobject( "word.application" )
iole_word.Documents.open( <string to word doc> )
iole_word.ActiveDocument.FormFields.Item(as_bookmark).Result = as_text
return 1
end function
Part of your problem is that carriage return is char(13), and line feed is char(10), so to make a CRLF in Windows and DOS you usually need to make char(13) + char(10). If these are out of order, many programs will balk. However, "~r~n" should have produced that for you.
I have success with (and I'm converting for brevity so it might only be close to correct):
lole_Word.ConnectToNewObject ("Word.Application")
...
lole_Word.Selection.TypeText (ls_StringForWord)
Maybe you can try other Word OLE commands to see if it's something to do with the specific command. (After the definition of the line break, I'm grasping at straws.)
Good luck,
Terry
Sounds like it may be a Unicode/Ansi character conversion thing.
for what its worth you could try this ...
http://www.rgagnon.com/pbdetails/pb-0263.html
Hope it helps.
I'm not using form fields, but I am able to insert newlines into a Word document from PowerBuilder using TypeText and "~n". Maybe you just need "~n".