NSString's enumerateSubstrings doesn't include symbols - iphone

When using NSString's enumerateSubstringsInRange:options:usingBlock: with the options set as NSStringEnumerationByWords it doesn't include symbols such as /* or // which should be treated similarly to words as they are seperated by spaces.
I also tried using NSStringEnumerationByComposedCharacterSequences but it seems to do exactly the same thing even without this option, it simply goes through every single letter.
Is their no way to enumerate through every substring separated by a space? It sounds so simple by no way to do is provided to do this using enumerateSubstringsInRange:options:usingBlock:.
EDIT
I was also using the option NSEnumerationReverse to got through the substrings backwards.

You could use NSScanner for something like this. It's sort of the long way around, but if the enumerate... messages aren't doing it for you, it might be worth looking at.
For example, you could do something like
NSString *output = nil;
NSCharacterSet *whitespaceCharSet = [NSCharacterSet whitespaceCharacterSet];
NSScanner *scanner = [[NSScanner alloc] initWithString:someString];
// should skip leading whitespace and read everything up to the next whitespace
[scanner scanUpToCharactersFromSet:whitespaceCharSet intoSring:&output];
[scanner release];
Sort of a crude example, but the documentation for NSScanner is fairly simple.
Edit: Alternatively, you could do something like this:
NSString *someString = <...>; // get your string somehow
NSCharacterSet *charSet = [NSCharacterSet whitespaceAndNewlineCharacterSet];
NSArray *components = [someString componentsSeparatedByCharactersInSet:charSet];
[components
enumerateObjectsWithOptions:NSEnumerationReverse
usingBlock:^(id obj, NSUInteger index, BOOL *stop) {
// do stuff
}];

Related

NSRegularExpression not getting exact text

I have a string like:
<book>MyBook</book><value>myValue</value>
Now I want to get the text "myValue" out of this string. I want to use NSRegularExpression to do this. I tried this:
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"(<book>MyBook</book>\\s*<value>).*?(</value>)"
options:NSRegularExpressionCaseInsensitive
error:&error];
NSArray *textArray = [regex matchesInString:myData options:0 range:NSMakeRange(0, [myData length])];
NSTextCheckingResult * result = [rege firstMatchInString:myData
options:0
range:NSMakeRange(0, [myData length])];
The result is:
<book>MyBook</book><value>myValue</value>
So I get the whole string, but I only want "myValue". How can I do this? What am I missing here?
Thanks in advance!
That happens because you wrote a regex that matches the entire string. I'd reckon that writing a regex that will only match the myValue part of the string is way too complicated to be bothered with (due to the fact that you've got MyBook string that will probably match anything myValue does).
I'd recommend not using regex for this, as they are not intended for the use you've described here. If you don't want to use any XML deserialization, you could use a NSScanner or any of the NSString class methods which will yield a simpler, and easier code to maintain.
For example, using an NSScanner and a few other methods:
NSString *stringToBeScanned = #"<book>MyBook</book><value>myValue</value>";
NSString *myValue;
NSScanner *scanner = [NSScanner scannerWithString:stringToBeScanned];
[scanner scanUpToString:#"<value>" intoString:nil];
// After the above, we've got "<value>myValue</value>" left to scan
[scanner scanUpToString:#"</value>" intoString:&myValue];
// We ended up with a "<value>myValue" type of a string
// This will trim the remaining of the string we don't need
myValue = [myValue stringByReplacingOccurrencesOfString:#"<value>" withString:#""];
The above could probably be written better and I might have made a mistake or two writing it out my head, but the principle should work.

Parse NSString from right hand side?

> (2009 RX7)</font></td>
>monospace" size="-1">214869 (2007 PAZ)</font></td>
>monospace" size="-1"> 4155 Accord</font></td>
I wonder if someone could offer me a little help, I have a list of NSString items (See Above) that I want to parse some data from. My problem is that there are no tags that I can use within the strings nor do the items I want have fixed positions. The data I want to extract is:
2009 RX7
2007 PAZ
4155 Accord
My thinking is that its going to be easier to parse from the right hand end, remove the </font></td> and then use ";" to separate the data items:
(2009&nbsp RX7)
(2007&nbsp PAZ)
4155&nbsp Accord
which can them be cleaned up to match the example given. Any pointers on doing this or working through from the right would be very much appreciated.
Personally I think you are better off with a regex. So my solution would be:
Regex of: ([0-9]+)[^;]+;([A-Za-z0-9]+)
Which for all the example text provides 3 matches. ie for:
(2009 RX7)</font></td>
0: 2009 RX7)<
1: 2009
2: RX7
I haven't coded this up, but did test the Regex at www.regextester.com
Regex's are implemented via NSRegularExpression and are available in iOS 4.0 and later.
Edit
Given that this appears to be a web scraping application, you never know when those pesky HTML code monkeys will change their output and break your carefully crafted matching methodology. As such I would change my regex to:
([0-9]+)([^;]+;)+([A-Za-z0-9]+)
Which adds an extra group, but allows for any number of elements between the number and the string.
Try this code:
NSString *str = #"> (2009 RX7)</font></td>";
NSRange fontRange = [str rangeOfString:#"</Font>" options:NSBackwardsSearch];
NSRange lastSemi = [str rangeOfString:#";" options:NSBackwardsSearch range:NSMakeRange(0, fontRange.location-1)];
NSRange priorSemi = [str rangeOfString:#";" options:NSBackwardsSearch range:NSMakeRange(0, lastSemi.location-1)];
NSString *yourString = [str substringWithRange:NSMakeRange(priorSemi.location+1, fontRange.location-1)];
The key element here is the NSBackwardsSearch search option.
This should do the trick:
NSString *s = #">monospace\" size=\"-1\"> 4155 Accord</font></td>";
NSArray *strArray = [s componentsSeparatedByString:#";"];
// you're interested in last two objects
NSArray *tmp = [strArray subarrayWithRange:NSMakeRange(strArray.count - 2, 2)];
In tmp you'll have something like:
"4155&nbsp",
"Accord</font></td>"
strip unneeded chars and you're all set.
Using NSRegularExpression:
NSRegularExpression *regex;
NSTextCheckingResult *match;
NSString *pattern = #"([0-9]+) ([A-Za-z0-9]+)[)]?</font></td>";
NSString *string = #"> (2009 RX7)</font></td>";
regex = [NSRegularExpression
regularExpressionWithPattern:pattern
options:NSRegularExpressionCaseInsensitive
error:nil];
match = [regex firstMatchInString:string options:0 range:NSMakeRange(0, [string length])];
NSLog(#"'%#'", [string substringWithRange:[match rangeAtIndex:1]]);
NSLog(#"'%#'", [string substringWithRange:[match rangeAtIndex:2]]);
NSLog output:
'2009'
'RX7'

How to check if a string contains English letters (A-Z)?

How can I check whether a string contains the English Letters (A through Z) in Objective-C?
In PHP, there is preg_match method for that.
One approach would be to use regular expressions — the NSRegularExpression class. The following demonstrates how you could detect any English letters, but the pattern could be modified to match only if the entire string consists of such letters. Something like ^[a-zA-Z]*$.
NSRegularExpression *regex = [[[NSRegularExpression alloc]
initWithPattern:#"[a-zA-Z]" options:0 error:NULL] autorelease];
// Assuming you have some NSString `myString`.
NSUInteger matches = [regex numberOfMatchesInString:myString options:0
range:NSMakeRange(0, [myString length])];
if (matches > 0) {
// `myString` contains at least one English letter.
}
Alternatively, you could construct an NSCharacterSet containing the characters you're interested in and use NSString's rangeOfCharacterFromSet: to find the first occurrence of any one. I should note that this method only finds the first such character in the string. Maybe not what you're after.
Finally, I feel like you could do something with encodings, but haven't given this much thought. Perhaps determine if the string could be represented using ASCII (using canBeConvertedToEncoding:) and then check for numbers/symbols?
Oh, and you could always iterate over the string and check each character! :)
You can use simple NSPredicate test.
NSString *str = #"APPLE";
NSString *regex = #"[A-Z]+";
NSPredicate *test = [NSPredicate predicateWithFormat:#"SELF MATCHES %#", regex];
BOOL result = [test evaluateWithObject:str];
You could also use NSCharacterSet and use the rangeOfCharacterFromSet: method to see if the returned NSRange is the entire range of the string. characterSetWithRange would be a good place to start to create your characterSet.
You can use the NSRegularExpression Class (Apple's documentation on the class can be viewed here)

remove non ASCII characters from NSString in objective-c

I have an app that syncs data from a remote DB that users populate. Seems people copy and paste crap from a ton of different OS's and programs which can cause different hidden non ASCII values to be imported into the system.
For example I end up with this:
Artist:â â Ioco
This ends up getting sent back into system during sync and my JSON conversion furthers the problem and invalid characters in various places cause my app to crash.
How do I search for and clean out any of these invalid characters?
While I strongly believe that supporting unicode is the right way to go, here's an example of how you can limit a string to only contain certain characters (in this case ASCII):
NSString *test = #"Olé, señor!";
NSMutableString *asciiCharacters = [NSMutableString string];
for (NSInteger i = 32; i < 127; i++) {
[asciiCharacters appendFormat:#"%c", i];
}
NSCharacterSet *nonAsciiCharacterSet = [[NSCharacterSet characterSetWithCharactersInString:asciiCharacters] invertedSet];
test = [[test componentsSeparatedByCharactersInSet:nonAsciiCharacterSet] componentsJoinedByString:#""];
NSLog(#"%#", test); // Prints #"Ol, seor!"
A simpler version of Morten Fast's answer:
NSString *test = #"Olé, señor!";
NSCharacterSet *nonAsciiCharacterSet = [[NSCharacterSet
characterSetWithRange:NSMakeRange(32, 127 - 32)] invertedSet];
test = [[test componentsSeparatedByCharactersInSet:nonAsciiCharacterSet]
componentsJoinedByString:#""];
NSLog(#"%#", test); // Prints #"Ol, seor!"
Notably, this uses NSCharacterSet's +characterSetWithRange: method to simply specify the desired ASCII range rather than having to create a string, etc.
The results are identical, as comparing one to the other with isEqual: returns YES.

Strange behaviour of NSScanner on simple whitespace removal

I'm trying to replace all multiple whitespace in some text with a single space. This should be a very simple task, however for some reason it's returning a different result than expected. I've read the docs on the NSScanner and it seems like it's not working properly!
NSScanner *scanner = [[NSScanner alloc] initWithString:#"This is a test of NSScanner !"];
NSMutableString *result = [[NSMutableString alloc] init];
NSString *temp;
NSCharacterSet *whitespace = [NSCharacterSet whitespaceCharacterSet];
while (![scanner isAtEnd]) {
// Scan upto and stop before any whitespace
[scanner scanUpToCharactersFromSet:whitespace intoString:&temp];
// Add all non whotespace characters to string
[result appendString:temp];
// Scan past all whitespace and replace with a single space
if ([scanner scanCharactersFromSet:whitespace intoString:NULL]) {
[result appendString:#" "];
}
}
But for some reason the result is #"ThisisatestofNSScanner!" instead of #"This is a test of NSScanner !".
If you read through the comments and what each line should achieve it seems simple enough!? scanUpToCharactersFromSet should stop the scanner just as it encounters whitespace. scanCharactersFromSet should then progress the scanner past the whitespace up to the non-whitespace characters. And then the loop continues to the end.
What am I missing or not understanding?
Ah, I figured it out! By default the NSScanner skips whitespace!
Turns out you just have to set charactersToBeSkipped to nil:
[scanner setCharactersToBeSkipped:nil];