ElementTree write <> into element text - elementtree

I would like to put <> characters in element content but after write I getting &lt ; and &gt ;
ET.SubElement(ShipTo, "CompanyOrName").text = '<![CDATA[{}]]>'.format(address['company'])
effect:
<CompanyOrName><![CDATA[Company]]></CompanyOrName>

Related

Finding text AND fields with variable content in Word

I need to find and delete every occurrence of the following pattern in a Word 2010 document:
RPDIS→ text {INCLUDEPICTURE c:\xxx\xxx.png" \*MERGEFORMAT} text ←RPDIS
Where:
RPDIS→ and ←RPDIS are start and end delimiters
Between the start and end delimiters there can be just text or text and fields with variable content
The * wildcard in the Word Find and Replace dialog box will find the pattern if it contains text only but it will ignore patterns where text is combined with fields. And ^19 will find the field but not the rest of the pattern until the end delimiter.
Can anyone help, please?
Here's a VBA solution. It wildcard searches for RPDIS→*←RPDIS. If the found text contains ^19 (assuming field codes visible; if objects are visible instead of field codes, then the appropriate test is text contains ^01), the found text is deleted. Note that this DOES NOT care about the type of embedded field --- it will delete ANY AND ALL embedded fields that occur between RPDIS→ and ←RPDIS, so use at your own risk. Also, the code has ChrW(8594) and ChrW(8592) to match right-arrow and left-arrow respectively. You may need to change that if your arrows are encoded differently.
Sub test()
Dim wdDoc As Word.Document
Dim r As Word.Range
Dim s As String
' Const c As Integer = 19 ' Works when field codes are visible
Const c As Integer = 1 ' Works when objects are visible
Set wdDoc = ActiveDocument
Set r = wdDoc.Content
With r.Find
.Text = "RPDIS" & ChrW(8594) & "*" & ChrW(8592) & "RPDIS"
.MatchWildcards = True
While .Execute
s = r.Text
If InStr(1, s, chr(c), vbTextCompare) > 0 Then
Debug.Print "Delete: " & s
' r.Delete ' This line commented out for testing; remove comments to actively delete
Else
Debug.Print "Keep: " & s
End If
Wend
End With
End Sub
Hope that helps.

How to count the visible number of lines of text in a text box on an MS Access form

OK, here's what I am trying to achieve. I have an MS Access 2016 DB with a form on it - one of the fields is a text field (max 255 chars), that users can enter "notes", by date.
The form is a continuous form, and there are a LOT of notes. And as most notes are only a single sentence, not the full 255 chars, to save screen space, the text box is sized to only allow show two lines of text (users can double click on the note to see the full text in the rare instances that the text is up to 255 chars).
The problem with this approach is that it is not always clear if a note goes beyond the two lines.
So I am trying to find a way to tell how many lines of text the note uses in the text box, and then I'll highlight the text box if this is the case.
Note what I am talking about here is text wrapping within a text box, not (necessarily) text with line breaks (although there may be line breaks also). Given the wrapping changes dependent upon the text (eg long words will "wrap early" to a new line), so using a simple char count doesn't work, even with a monospace font.
I have searched a lot online and found nothing, except a ref to a possible solution here:
http://www.lebans.com/textwidth-height.htm
But the download is an old Access file type I can no longer open.
Does anyone have any ideas (except for a form redesign - which is my last option hopefully!)
To count the number of lines in a string, or text box, you can use this expression:
UBound(Split(str, vbCrLf))
So
UBound(Split([textBoxName], vbCrLf))
OK, I have come up with a "solution" to this - it's neither neat nor fast, but it appears to work in my situation. I have posted the VBA code for anyone for whom it might interest.
This function is then used on a continuous form's textbox conditional highlighting, so I can highlight those instances where the text has wrapped beyond "n" lines (in my case, two lines)
FYI it's only partially tested, with no error handling!
' Returns TRUE if the text in a textbox wraps/breaks beyond the number of visible lines in the text box (before scrolling)
' THIS ONLY WORKS FOR MONOSPACE FONTS IN A TEXTBOX WHERE WE KNOW THE WidthInMonospaceCharacters
' WidthInMonospaceCharacters = number of MONOSPACE characters to EXACTLY fill one line in your text box (needs to be counted manually
' VisibleLinesInTextBox = number of lines your text box shows on screen (without scrolling)
Function UnseenLinesInTextBox(YourText As String, WidthInMonospaceCharacters As Long, VisibleLinesInTextBox As Long) As Boolean
Dim LineBreakTexts() As String
Dim CleanText As String
Dim LineCount As Long
Dim LineBreaks As Long
Dim i As Long
' Doesn't matter if we can't see invisible end spaces/line breaks, so lose them
' NB advise cleaning text whenver data updated then no need to run this line
CleanText = ClearEndSpacesAndLineBreaks(YourText)
' Check for any line breaks
LineBreakTexts = Split(CleanText, vbCrLf)
' Too many line breaks means we can't be all in the textbox, so report and GTFOOD
LineBreaks = UBound(LineBreakTexts)
If LineBreaks >= VisibleLinesInTextBox Then
UnseenLinesInTextBox = True
GoTo CleanExit
End If
' No line breaks, and text too short to wrap, so exit
If LineBreaks = 0 And Len(CleanText) <= WidthInMonospaceCharacters Then GoTo CleanExit
' Loop thorough the line break text, and check word wrapping for each
For i = 0 To LineBreaks
LineCount = LineCount + CountWrappedLines(LineBreakTexts(i), WidthInMonospaceCharacters, VisibleLinesInTextBox)
If LineCount > VisibleLinesInTextBox Then
UnseenLinesInTextBox = True
GoTo CleanExit
End If
Next i
CleanExit:
Erase LineBreakTexts
End Function
' Add BugOutLineCount if we are using this simply to see if we are exceeding X number of lines in a textbox
' Put this number of lines here (eg if we have a two line text box, enter 2)
Function CountWrappedLines(YourText As String, WidthInMonospaceCharacters As Long, Optional BugOutLineCount As Long) As Long
Dim SpaceBreakTexts() As String
Dim LineCount As Long, RollingCount As Long, SpaceBreaks As Long, i As Long
Dim WidthAdjust As Long
Dim CheckBugOut As Boolean
Dim tmpLng1 As Long, tmpLng2 As Long
If BugOutLineCount > 0 Then CheckBugOut = True
' Check for space breaks
SpaceBreakTexts = Split(YourText, " ")
SpaceBreaks = UBound(SpaceBreakTexts)
If SpaceBreaks = 0 Then
' No spaces, so text will wrap simply based on the number of characters per line
CountWrappedLines = NoSpacesWrap(YourText, WidthInMonospaceCharacters)
GoTo CleanExit
End If
' Need to count the wrapped line breaks manually
' We must start with at least one line!
LineCount = 1
For i = 0 To SpaceBreaks
tmpLng1 = Len(SpaceBreakTexts(i))
If i = 0 Then
' Do not count spaces in the first word...
RollingCount = RollingCount + tmpLng1
Else
' ... but add spaces to the count for the next texts
RollingCount = 1 + RollingCount + tmpLng1
End If
' Need this adjustment as wrapping works slightly differently between mid and
' end of text
If i = SpaceBreaks Then
WidthAdjust = WidthInMonospaceCharacters
Else
WidthAdjust = WidthInMonospaceCharacters - 1
End If
' Check when we get a wrapped line
If RollingCount > WidthAdjust Then
' Check the the length of the word itself doesn't warp over more than one line
If tmpLng1 > WidthInMonospaceCharacters Then
tmpLng2 = NoSpacesWrap(SpaceBreakTexts(i), WidthInMonospaceCharacters)
If i <> 0 Then
LineCount = LineCount + tmpLng2
Else
LineCount = tmpLng2
End If
' As we have wrapped, then we already have a word on the next line to count in the rolling count
RollingCount = tmpLng1 - ((tmpLng2 - 1) * WidthInMonospaceCharacters)
Else
' New line reached
LineCount = LineCount + 1
' As we have wrapped, then we already have a word on the next line to count in the rolling count
RollingCount = Len(SpaceBreakTexts(i))
End If
End If
If CheckBugOut Then If LineCount > BugOutLineCount Then Exit For
Next i
CountWrappedLines = LineCount
CleanExit:
Erase SpaceBreakTexts
End Function
' Work out how many lines text will wrap if it has NO spaces
Function NoSpacesWrap(YourText As String, WidthInMonospaceCharacters) As Long
Dim WordLines As Double
Dim MyInt As Integer
WordLines = (Len(YourText) / WidthInMonospaceCharacters)
MyInt = Int(WordLines)
' Line(s) are exact width we are looking at
If WordLines - MyInt = 0 Then
NoSpacesWrap = MyInt
Else
NoSpacesWrap = MyInt + 1
End If
End Function
Function ClearEndSpacesAndLineBreaks(YourText As String) As String
Dim str As String
Dim CurrentLength As Long
str = YourText
' Need to loop this in case we have a string of line breaks and spaces invisibly at end of text
Do
CurrentLength = Len(str)
' Clear end spaces
str = RTrim(str)
' Clear end line break(s) whihc are TWO characters long
Do
If Right(str, 2) <> vbCrLf Then Exit Do
str = Left(str, Len(str) - 2)
Loop
If Len(str) = CurrentLength Then Exit Do
Loop
ClearEndSpacesAndLineBreaks = str
End Function
Do please provide any feedback and comments!

Matlab: how to convert character array or string into a formatted output OR parse a string

Could someone please tell me how to convert character array into a formatted output using Matlab?
I am expecting data like this:
CHAR (1 x 29) : 0.050822999 3.141592979 ; (1)
OR
CELL (1 x 1) or string: '0.050822999 3.141592979 ; (1)'
I am looking for output like this:
d1 = 0.050822999; %double
d2 = 3.141592979; %double
index = 1; % integer
I tried transposing and then using str2num(Str'); but, it's returning me 0x 0 double.
Any help would be appreciated.
Regards,
DK
you can use regexp to parse the string
c = { '0.050822999 3.141592979 ; (1)' };
p = regexp( c{1}, '^(\d+\.\d+)\s(\d+\.\d+)\s*;\s*\((\d+)\)$', 'tokens', 'once' ); %//parse the input string
numbers = str2mat(p); %// convert extracted strings to numerical values
Example result
ans =
0.050822999
3.141592979
1
Explaining the regexp pattern:
^ - pattern starts at the beginning of the input string
(\d+\.\d+) - parentheses ('()') enclosing this sub-pattern indicates it as a single token
\d+ matches one or more digits, then expecting \. a dot (notice the \, since . alone in regexp acts as a wildcard) and after the dot \d+ one or more digits are expected.
This token should correspond to the first number, e.g., 0.050822999
\s expecting a single space
(\d+\.\d+) - again, expecting another decimal fraction as the second token.
\s* - expecting white space (zero or more).
; - capture the ; in the expression, but not as a token.
\s+ - expecting white space (zero or more).
\( - expecting an open parenthesis, note the \ since parentheses in regexp are used to denote tokens.
(\d+) - expecting one or more digits as the third token, only integer numbers are expected here. no decimal point.
\) - expecting a closing parenthesis.
$ - pattern should reach the end of the input string.
You can use something like this (if I understood you correctly)
function str_dump(var)
info = whos;
disp([info.class ' ' mat2str(info.size) ' : ' var]);
end
This just shows information about the string. If you want to parse it and convert to another Matlab's structure, you have to explain it more carefully.
%// Input
a = [0.050822999 3.141592979];
n = 1;
%// Output
str = [num2str(a,'%0.9f ') ' ; (' num2str(n) ')']
Result:
str =
0.050822999 3.141592979 ; (1)

How to get rid of the punctuation? and check the spelling error

eliminate punctuation
words split when meeting new line and space, then store in array
check the text file got error or not with the function of checkSpelling.m file
sum up the total number of error in that article
no suggestion is assumed to be no error, then return -1
sum of error>20, return 1
sum of error<=20, return -1
I would like to check spelling error of certain paragraph, I face the problem to get rid of the punctuation. It may have problem to the other reason, it return me the error as below:
My data2 file is :
checkSpelling.m
function suggestion = checkSpelling(word)
h = actxserver('word.application');
h.Document.Add;
correct = h.CheckSpelling(word);
if correct
suggestion = []; %return empty if spelled correctly
else
%If incorrect and there are suggestions, return them in a cell array
if h.GetSpellingSuggestions(word).count > 0
count = h.GetSpellingSuggestions(word).count;
for i = 1:count
suggestion{i} = h.GetSpellingSuggestions(word).Item(i).get('name');
end
else
%If incorrect but there are no suggestions, return this:
suggestion = 'no suggestion';
end
end
%Quit Word to release the server
h.Quit
f19.m
for i = 1:1
data2=fopen(strcat('DATA\PRE-PROCESS_DATA\F19\',int2str(i),'.txt'),'r')
CharData = fread(data2, '*char')'; %read text file and store data in CharData
fclose(data2);
word_punctuation=regexprep(CharData,'[`~!##$%^&*()-_=+[{]}\|;:\''<,>.?/','')
word_newLine = regexp(word_punctuation, '\n', 'split')
word = regexp(word_newLine, ' ', 'split')
[sizeData b] = size(word)
suggestion = cellfun(#checkSpelling, word, 'UniformOutput', 0)
A19(i)=sum(~cellfun(#isempty,suggestion))
feature19(A19(i)>=20)=1
feature19(A19(i)<20)=-1
end
Substitute your regexprep call to
word_punctuation=regexprep(CharData,'\W','\n');
Here \W finds all non-alphanumeric characters (inclulding spaces) that get substituted with the newline.
Then
word = regexp(word_punctuation, '\n', 'split');
As you can see you don't need to split by space (see above). But you can remove the empty cells:
word(cellfun(#isempty,word)) = [];
Everything worked for me. However I have to say that you checkSpelling function is very slow. At every call it has to create an ActiveX server object, add new document, and delete the object after check is done. Consider rewriting the function to accept cell array of strings.
UPDATE
The only problem I see is removing the quote ' character (I'm, don't, etc). You can temporary substitute them with underscore (yes, it's considered alphanumeric) or any sequence of unused characters. Or you can use list of all non-alphanumeric characters to be remove in square brackets instead of \W.
UPDATE 2
Another solution to the 1st UPDATE:
word_punctuation=regexprep(CharData,'[^A-Za-z0-9''_]','\n');

delimiting by a char but not deleting it

I have a text file that looks like this:
(a (bee (cold down)))
if I load it using
c=textscan(fid,'%s');
I get this:
'(a'
'(bee'
'(cold'
'down)))'
What I would like to get is:
'('
'a'
'('
'bee'
'('
'cold'
'down'
')'
')'
')'
I know I can delimit with '(' and ')' by specifying 'Delimiter' in textscan, but then I will loose this character, which I want to keep.
Thank you in Advance.
The %s specifier indicates that you want Strings, what you want is individual chars. Use %c instead .
c=textscan(fid,'%c');
Update if you want too keep your words intact then you'll want to load your text using the %s specifier. After the text is loaded you can either solve this problem with Regular Expressions (not my forte) or write your own parser then parses each word individually and saves the paranthesis and words to a new cell array.
AFAIK, there is no canned routine capable of preserving arbitrary delimiters.
You'd have to do it yourself:
string = '(a (bee (cold down)))';
bo = string == '(';
bc = string == ')';
sp = string == ' ';
output = cell(nnz(bo|bc|sp)+1,1);
j = 1;
for ii = 1:numel(string)
if bo(ii)
output{j} = '(';
j = j + 1;
elseif bc(ii)
output{j} = ')';
j = j + 1;
elseif sp(ii)
j = j + 1;
else
output{j} = [output{j} string(ii)];
end
end
Which can probably be improved -- the growing character array will prevent the loop from being JIT'ed. The array bc | bo | sp holds all the information to vectorize this thing, I just don't see how at this hour...
Nevertheless, it should give you a place to start.
Matlab has a strtok function similar to C. Its format is:
token = strtok(str)
token = strtok(str, delimiter)
[token, remain] = strtok('str', ...)
there is also a string replace function strrep:
modifiedStr = strrep(origStr, oldSubstr, newSubstr)
What I would do is modify the original string with strrep to add in delimiters, then use strtok. Since you already scanned the string into c:
c = (c,'(','( '); %Add a space after each open paren
c = (c,')',' ) '); % Add a space before and after each close paren
token = zeros(10); preallocate for speed
i = 2;
[token(1), remain] = strtok(c, ' ');
while(remain)
[token(i), remain] = strtok(c, ' ');
i =i + 1;
end
gives you the linear token array of each of the string you requested.
strtok reference: http://www.mathworks.com/help/techdoc/ref/strtok.html
strrep reference: http://www.mathworks.com/help/techdoc/ref/strrep.html