How to split a string into sentences cocoa - iphone

I have an NSString with a number of sentences, and I'd like to split it into an NSArray of sentences. Has anybody solved this problem before? I found enumerateSubstringsInRange:options:usingBlock: which is able to do it, but it looks like it isn't available on the iPhone (Snow Leopard only). I thought about splitting the string based on periods, but that doesn't seem very robust.
So far my best option seems to be to use RegexKitLite to regex it into an array of sentences. Solutions?

Use CFStringTokenizer. You'll want to create the tokenizer with the kCFStringTokenizerUnitSentence option.

I would use a scanner for it,
NSScanner *sherLock = [NSCanner scannerWithString:yourString]; // autoreleased
NSMutableArray *theArray = [NSMutableArray array]; // autoreleased
while( ![sherLock isAtEnd] ){
NSString *sentence = #"";
// . + a space, your sentences probably will have that, and you
// could try scanning for a newline \n but iam not sure your sentences
// are seperated by it
[sherLock scanUpToString:#". " inToString:&sentence];
[theArray addObject:sentence];
}
This should do it, there could be some little mistakes in it but this is how I would do it.
You should lookup NSScanner in the docs though.. you might come across a method that is
better for this situation.

I haven't used them for a while but I think you can do this with NSString, NSCharacterSet and NSScanner. You create a character set that holds end sentence punctuation and then call -[NSScanner scanUpToCharactersFromSet:intoString:]. Each Scan will suck out a sentence into a string and you keep calling the method until the scanner runs out of string.
Of course, the text has to be well punctuated.

How about:
NSArray *sentences = [string componentsSeparatedByString:#". "];
This will return an array("One","Two","Three") from a string "One. Two. Three."

NSArray *sentences = [astring componentsSeparatedByCharactersInSet:[NSCharacterSet punctuationCharacterSet] ];

Related

remove specific characters from NSString

I wants to remove specific characters or group substring from NSString.
mean
NSString *str = #" hello I am #39;doing Parsing So $#39;I get many symbols in &my response";
I wants remove #39; and $#39; and & (Mostly these three strings comes in response)
output should be : hello I am doing Parsing So i get many symbols in my response
Side Question : I can't write & #39; without space here, because it converted in ' <-- this symbol. so i use $ in place of & in my question.
you should use [str stringByReplacingOccurrencesOfString:#"#39" withString:#""]
or you need replace strings of concrete format like "#number"?
try below code ,i think you got whatever you want simply change the charecterset,
NSString *string = #"hello I am #39;doing Parsing So $#39;I get many symbols in &my response";
NSCharacterSet *trim = [NSCharacterSet characterSetWithCharactersInString:#"#39;$&"];
NSString *result = [[string componentsSeparatedByCharactersInSet:trim] componentsJoinedByString:#""];
NSLog(#"%#", result);

Parsing URL Using Regular Expression

I need to parse a URL in the following format:
http://www.example.com/?method=example.method&firstKey=firstValue&id=1893736&thirdKey=thirdValue
All I need is the value of 1893736 within &id=1893736.
I need to do the parsing in Objective-C for my iPhone project. I understand it must have something to do with regular expression. But I just have no clue how to do it.
Any suggestions would be appreciated. :)
You don't need a regex for this. You can try something like this
NSString *url = #"http://www.example.com/?method=example.method&firstKey=firstValue&id=1893736&thirdKey=thirdValue";
NSString *identifier = nil;
for (NSString *arg in [[[url pathComponents] lastObject] componentsSeparatedByString:#"&"]) {
if ([arg hasPrefix:#"id="]) {
identifier = [arg stringByReplacingOccurrencesOfString:#"id=" withString:#""];
}
}
NSLog(#"%#", identifier);
Don't use regular expressions. Use NSURL to reliably extract the query string and then use this answer's code to parse the query string.
Use this:
.*/\?(?:\w*=[^&]*&)*?(?:id=([^&]*))(?:&\w*=[^&]*)*
And grap first group: \1. You will obtain 1893736.
Simplifying
If the id can consist of only digits:
.*/\?(?:\w*=[^&]*&)*?(?:id=(\d*))(?:&\w*=[^&]*)*
If you don't care about capturing uninterested groups (use \3 or id in this case):
.*/\?(\w*=.*?&)*?(id=(?<id>\d*))(&\w*=.*)*
More simpler version (use \3):
.*/\?(.*?=.*?&)*(id=(\d*))(&.*?=.*)*
Instead of using regex, you can split the string representation of your NSURL instance. In your case, you can split the string by the appersand (&), loop the array looking for the prefix (id=), and get the substring from the index 2 (which is where the = ends).

Objective-C: Comparing normal strings and strings found in NSMutableArrays

I am confused about strings (a beginner's problem, I'm afraid):
I have one NSMutableArray called Notebook. At index position 1, I have an object, which I think is a string. At least I put it into the array like this:
[NoteBook replaceObjectAtIndex:1 withObject:#"x-x-x-x"];
So far so good. If I put this into an UILabel, it will show x-x-x-x on my screen. The nightmare starts when I try to compare this string with other strings. Let's consider that I do not want to display the string x-x-x-x on my screen, but just to have a blank instead. So I thought I could achieve this by coding this:
NSString *tempDateString;
tempDateString = [NSString stringWithFormat:#"%#",[NoteBook objectAtIndex:1]];
if (tempDateString == #"x-x-x-x") {
UISampleLabel.text = #"";
}
For some reason, this does not work, i.e. even if the string at position 1 of my array is 'x-x-x-x', it will still not set my UISampleLabel to nothing.
I suppose that I am getting confused with the #"" markers. When do I really need them? Why can't I simply code tempDateString = [NoteBook objectAtIndex:1]; without the formatting thing?
Any help and suggestions would be very much appreciated!
You need to compare string with isEqualToString:
if ([tempDateString isEqualToString:#"x-x-x-x"]) {
UISampleLabel.text = #"";
}
In addition to the question that's been answered:
Why can't I simply code tempDateString = [NoteBook objectAtIndex:1]; without the formatting thing?
You can. Why do you think you can't?

How to cut out parts of NSString?

#"/News/some news text/"
#"/News/some other news text/"
#"/About/Some about text/"
#"/Abcdefg/Some abcdefg text/some more abcdefg text"
How do I cut out the first part of the strings, so that I end up with the following strings?
#"/News/"
#"/News/"
#"/About/"
#"/Abcdefg/"
Use componentsSeparatedByString: to break the string up:
NSArray *components=[string componentsSeparatedByString:#"/"];
if ([components count]>=2) {
// Text after the first slash is second item in the array
return [NSString stringWithFormat:#"/%#/",[components objectAtIndex:1]];
} else {
return nil; // Up to you what happens in this situation
}
If these are pathnames, you may want to look into the path-related methods of NSString, such as pathComponents and pathByDeletingLastPathComponent.
While it's pretty unlikely that the path separator is ever going to change, it's nonetheless a good habit to not rely on such things and use dedicated path-manipulation methods in preference to assuming that the path separator will be a certain character.
EDIT from the year 2013: Or use URLs (more specifically, NS/CFURL objects), which Apple has made pretty clear are the proper way to refer to files from now on, and are necessary for some tasks in a sandbox.

Organize objective-C string for filters

How can I organize this string better for best coding practices. It's a string that defines filters:
NSString* string3 = [[[[[[tvA.text stringByReplacingOccurrencesOfString:#"\n" withString:#" "] stringByReplacingOccurrencesOfString:#"&" withString:#"and"] stringByReplacingOccurrencesOfString:#"garçon" withString:#"garcon"] stringByReplacingOccurrencesOfString:#"Garçon" withString:#"Garcon"] stringByReplacingOccurrencesOfString:#"+" withString:#"and"] stringByAddingPercentEscapesUsingEncoding:NSASCIIStringEncoding];
Is there a way to have it be:
NSString* string3 = [[[[[tvA.text filter1] filter2] filter3] filter4] filter5] stringByAddingPercentEscapesUsingEncoding:NSASCIIStringEncoding];
You shouldn't be replacing & and + before percent-escaping. The problem is that stringByAddingPercentEscapesUsingEncoding: (IIRC) adds the minimum escapes to make it a "valid" URL string, whereas you want to escape anything that might have a special interpretation. For this, use CFURLCreateStringByAddingPercentEscapes():
return [(NSString*)CFURLCreateStringByAddingPercentEscapes(NULL, (CFStringRef)aString, NULL, (CFStringRef)#":/?#[]#!$&'()*+,;=", kCFStringEncodingUTF8) autorelease];
This encodes & and + correctly, instead of just changing them to "and". It also encodes newlines as %0a (so you might want to replace them with spaces; that's your call), and encodes ç as %C3%A7 (which is decoded correctly if you use UTF-8 on the server).
The first thing I'd do is capture the transformation into a method somewhere (where "somewhere" is either an instance method on an appropriate object or a class method on a utility class).
- (NSString *) transformString: (NSString *) aString
{
NSString *transformedString;
transformedString = [aString stringByReplacingOccurrencesOfString:#"\n" withString:#" "];
transformedString = [transformedString stringByReplacingOccurrencesOfString:#"&" withString:#"and"];
transformedString = [transformedString stringByReplacingOccurrencesOfString:#"garçon" withString:#"garcon"];
transformedString = [transformedString stringByReplacingOccurrencesOfString:#"Garçon" withString:#"Garcon"];
transformedString = [transformedString stringByReplacingOccurrencesOfString:#"+" withString:#"and"];
transformedString = [transformedString stringByAddingPercentEscapesUsingEncoding:NSASCIIStringEncoding];
return transformedString;
}
Then:
NSString *result = [myTransformer transformString: tVA.text];
A bit brutish, but it'll work. And by "brutish", I mean that it is going to be slow and will cause a bunch of interim strings to pile up in the autorelease pool. However, if this is something that you only do every now and then, don't worry about it -- while brutish, it is certainly quite straightforward.
If, however, this shows up in performance analysis as a bottleneck, you could first move to using NSMutableString as it has methods for doing replacements in place. That, at least, will reduce memory thrash and will likely be a bit faster in that there is less copying of strings going on.
If that is still too slow, then you will likely need to write yourself a fun little bit of parsing and processing code that walks through the original and copies it to new a new string while also doing any necessary transforms along the way.
But, don't bother optimizing until you prove that it is a problem. And, of course, if it is a problem, you have just one method to optimize!
If performance is not crucial, put the strings and their replacements into an NSDictionary and iterate over the items. Put it all in a helper method and use a NSMutableString to work on it (which reduces at least some of the cost).