wxWidgets wrong substring - substring

I am trying to extract a substring out of some html code in wxWidgets but I can't get my method working properly.
content of to_parse:
[HTML CODE]
<html><head></head><body><font face="Segue UI" size=2 .....<font face="Segoe UI"size="2" color="#000FFF"><font face="#DFKai-SB" ... <b><u> the text </u></b></font></font></font></body></html>
[/HTML CODE] (sorry about the format)
wxString to_parse = SOStream.GetString();
size_t spos = to_parse.find_last_of("<font face=",wxString::npos);
size_t epos = to_parse.find_first_of("</font>",wxString::npos);
wxString retstring(to_parse.Mid(spos,epos));
wxMessageBox(retstring); // Output: always ---> tml>
As there are several font face tags in the HTML the to_parse variable I would like to find the postion of the last <"font face= and the postion of the first <"/font>" close tag.
For some reason, only get the same to me unexpected output tml>
Can anyone spot the reason why?

The methods find_{last,first}_of() don't do what you seem to think they do, they behave in the same way as std::basic_string<> methods of the same name and find the first (or last) character of the string you pass to them, see the documentation.
If you want to search for a substring, use find().

Thank you for the answer. Yes you were right, I must have somehow been under the impression that Substring() / substr() / Mid() takes two wxStrings as parameters, which isn't the case.
wxString to_parse = SOStream.GetString();
to_parse = to_parse.Mid(to_parse.find("<p ")); disregarts everything before "<p "
to_parse = to_parse.Remove(to_parse.find("</p>")); removes everything after "</p>"
wxMessageBox(to_parse); // so we are left with everything between "<p" and "</p>"

Related

Converting numbers into timestamps (inserting colons at specific places)

I'm using AutoHotkey for this as the code is the most understandable to me. So I have a document with numbers and text, for example like this
120344 text text text
234000 text text
and the desired output is
12:03:44 text text text
23:40:00 text text
I'm sure StrReplace can be used to insert the colons in, but I'm not sure how to specify the position of the colons or ask AHK to 'find' specific strings of 6 digit numbers. Before, I would have highlighted the text I want to apply StrReplace to and then press a hotkey, but I was wondering if there is a more efficient way to do this that doesn't need my interaction. Even just pointing to the relevant functions I would need to look into to do this would be helpful! Thanks so much, I'm still very new to programming.
hfontanez's answer was very helpful in figuring out that for this problem, I had to use a loop and substring function. I'm sure there are much less messy ways to write this code, but this is the final version of what worked for my purposes:
Loop, read, C:\[location of input file]
{
{ If A_LoopReadLine = ;
Continue ; this part is to ignore the blank lines in the file
}
{
one := A_LoopReadLine
x := SubStr(one, 1, 2)
y := SubStr(one, 3, 2)
z := SubStr(one, 5)
two := x . ":" . y . ":" . z
FileAppend, %two%`r`n, C:\[location of output file]
}
}
return
Assuming that the "timestamp" component is always 6 characters long and always at the beginning of the string, this solution should work just fine.
String test = "012345 test test test";
test = test.substring(0, 2) + ":" + test.substring(2, 4) + ":" + test.substring(4, test.length());
This outputs 01:23:45 test test test
Why? Because you are temporarily creating a String object that it's two characters long and then you insert the colon before taking the next pair. Lastly, you append the rest of the String and assign it to whichever String variable you want. Remember, the substring method doesn't modify the String object you are calling the method on. This method returns a "new" String object. Therefore, the variable test is unmodified until the assignment operation kicks in at the end.
Alternatively, you can use a StringBuilder and append each component like this:
StringBuilder sbuff = new StringBuilder();
sbuff.append(test.substring(0,2));
sbuff.append(":");
sbuff.append(test.substring(2,4));
sbuff.append(":");
sbuff.append(test.substring(4,test.length()));
test = sbuff.toString();
You could also use a "fancy" loop to do this, but I think for something this simple, looping is just overkill. Oh, I almost forgot, this should work with both of your test strings because after the last colon insert, the code takes the substring from index position 4 all the way to the end of the string indiscriminately.

Qt5.5 QByteArray indexOf mid wrong result

I have an XML file in a QByteArray I am using the indexOf method to find a string in the array, but the position returned isn't correct. If I examine the data content using qDebug I can see that the data has escape characters which isn't a problem but I don't think indexOf is counting the escape characters.
For example the result from:
qDebug() << arybytXML;
A snippet from the result of this is:
<?xml version="1.0" encoding="utf-8"?><!--\n Node: gui\n Attrbuttes: left, right, top and bottom defines the pixel white space to allow\n from the edge of the display\n\t\tlanguage, should be set to the appropriate country code, an XML file named using\n\t\tthe country code must exist, e.g. 44.xml\n//-->\n<gui id=\"root\" bottom=\"0\" left=\"0\" right=\"0\" top=\"24\" language=\"44\">
I use the code:
intOpenComment = arybytXML.indexOf("<!--");
The result is that intOpenComment is 39. If I then search for the end comment and try to extract the data I get the wrong result:
intClosingComment = arybytXML.indexOf("-->", intOpenComment);
QString strComment = arybytXML.mid(intOpenComment
,intClosingComment + strlen("-->"));
Result:
<!--\n Node: gui\n Attrbuttes: left, right, top and bottom defines the pixel white space to allow\n from the edge of the display\n\t\tlanguage, should be set to the appropriate country code, an XML file named using\n\t\tthe country code must exist, e.g. 44.xml\n//-->\n<gui id=\"root\" bottom=\"0\" left=\"0\" rig"
The result should stop after -->, why is there more data?
The problem is that when using mid, the 2nd parameter should be the number of bytes and needed to have 'intOpenComment' removed.

RegexKitLite How to convert a PHP regex Expression in objective c

I used this regex expression to search for img src in a string in one on my site.
Now I wan't to use this expression to do the same thing in objective c. How can I do that using RegexKitLite?
This is my expression
/<img.+src=[\'"]([^\'"]+)[\'"].*>/i
#Tim Pietzcker
Your code works great but for example if I try to search img in this string
<p> <img src="http://www.nationalgeographic.it/images/2011/07/29/115624013-20034abf-4d91-40fe-98ab-782f06a9854d.jpg" width="140" align="left" hspace="10">Scoperta in America del Sud la sepoltura pre-incaica di un uomo circondato da coltelli cerimoniali che secondo gli archeologi eseguiva sacrifici umani</p>
I have this result in my array:
matchArray: (
"<img src=\"http://www.nationalgeographic.it/images/2011/07/29/115624013-20034abf-4d91-40fe-98ab-782f06a9854d.jpg\" width=\"140\" align=\"left\" hspace=\"10\">"
)
How can I mod your regex to only get the content of src tag? thank you so much
The / delimiters are throwing you off. Also, you should at least use lazy quantifiers. Try this:
NSString *regexString = #"(?i)<img.+?src=['\"]([^'\"]+)['\"].*?>";
This breaks when filenames contain quotes, by the way. Could that be a problem for you?
A regex that's a bit safer (and that handles quotes well) would be
NSString *regexString = #"(?i)<img[^<>]+?src=(['\"])((?:(?!\\1).)+)\\1[^<>]*>";
However, now the matches filename will be in capture group 2, not 1, so you need to modify any code that uses the filename after the match.

How can I create a string from just the first line of my UITextView?

I am making a UITextView which is similar to notes.app, where the first line of the textView is used as the title. I need to create a new string which contains only the first line of text. So far I've come up with this:
NSRange startRange = NSMakeRange(0, 1);
NSRange titleRange = [noteTextView.text lineRangeForRange:startRange];
NSString *titleString = [noteTextView.text substringToIndex:titleRange.length];
NSLog(#"The title is: %#", titleString);
The only problem with this is that it relies on the user pressing Return. I've also tried using a loop to find the number of characters in the first line:
CGSize lineSize = [noteTextView.text sizeWithFont:noteTextView.font
constrainedToSize:noteTextView.frame.size
lineBreakMode:UILineBreakModeWordWrap];
int textLength =1;
while ((lineSize.width < noteTextView.frame.size.width) &&
([[noteTextView.text substringToIndex:textLength] length] < [noteTextView.text length]))
{
lineSize = [[noteTextView.text substringToIndex:textLength] sizeWithFont:noteTextView.font
constrainedToSize:noteTextView.frame.size
lineBreakMode:UILineBreakModeWordWrap];
textLength = textLength+1;
}
NSLog(#"Length is %i", textLength);
But I've got this wrong somewhere - it returns the total number of characters, instead of the number on the first line.
Does anyone know an easier/better way of doing this?
There is probably a much better way with CoreText, but I'll throw this out there just because it came to mind off the top of my head.
You could add characters one by one to an NSMutableString *title while
[title sizeWithFont:noteTextView.font].width < noteTextView.frame.size.width
then drop the last one, obviously doing the necessary bounds checking along the way and dropping the last added character if necessary.
But sizeWithFont is sloooooow. So if you're doing this often you might want to consider another definition of 'title' - say, at first word break after 20 chars.
But again, CoreText might yield more possibilities.
I do not understand the code you're having above. Wouldn't it be simpler do just find the first line of text in the string, e.g. until a CR or LF terminates the first line?
And if there is no CR or LF, then you take the entire text as you have only one line then.
Of course, this will give you not what is visible in the first line in case the line is longer and gets wrapped, but I think that using lineRangeForRange doesn't do this, either, or does it?
And if your only concern is that "the user has to press enter" to make it work, then why not simply append a newline char to the text before testing for the first line's length?
See how many characters can fit in one line of your text view and use that number in a substringToIndex: method. Like this:
Type out the same character repeatedly and count how many fit in one line. Make sure to use a wide letter to ensure reliability. Use a capital g or m or q or w or whatever is widest in the font you're using.
Say 20 characters can fit in one line.
Then do
NSString *textViewString = notesTextView.text;
NSString *titleString = [textViewString substringToIndex:20]
Just use the titleString as the title.

How to convert Unicode characters to escape codes

So, I have a bunch of strings like this: {\b\cf12 よろてそ } . I'm thinking I could iterate over each character and replace any unicode (Edit: Anything where AscW(char) > 127 or < 0) with a unicode escape code (\u###). However, I'm not sure how to programmatically do so. Any suggestions?
Clarification:
I have a string like {\b\cf12 よろてそ } and I want a string like {\b\cf12 [STUFF]}, where [STUFF] will display as よろてそ when I view the rtf text.
You can simply use the AscW() function to get the correct value:-
sRTF = "\u" & CStr(AscW(char))
Note unlike other escapes for unicode, RTF uses the decimal signed short int (2 bytes) representation for a unicode character. Which makes the conversion in VB6 really quite easy.
Edit
As MarkJ points out in a comment you would only do this for characters outside of 0-127 but then you would also need to give some other characters inside the 0-127 range special handling as well.
Another more roundabout way, would be to add the MSScript.OCX to the project and interface with VBScript's Escape function. For example
Sub main()
Dim s As String
s = ChrW$(&H3088) & ChrW$(&H308D) & ChrW$(&H3066) & ChrW$(&H305D)
Debug.Print MyEscape(s)
End Sub
Function MyEscape(s As String) As String
Dim scr As Object
Set scr = CreateObject("MSScriptControl.ScriptControl")
scr.Language = "VBScript"
scr.Reset
MyEscape = scr.eval("escape(" & dq(s) & ")")
End Function
Function dq(s)
dq = Chr$(34) & s & Chr$(34)
End Function
The Main routine passes in the original Japanese characters and the debug output says:
%u3088%u308D%u3066%u305D
HTH