NSScanner vs. componentsSeparatedByString - iphone

I have a large text file (about 10 MB). In the text file there are values like (without the empty lines between the rows, I couldn't format it here properly):
;string1;stringValue1;
;string2;stringValue2;
;string3;stringValue3;
;string4;stringValue4;
I'm parsing all the 'stringX' values to an Array and the 'stringValueX' to another string, using a pretty ugly solution:
words = [rawText componentsSeparatedByString:#";"];
NSEnumerator *word = [words objectEnumerator];
while(tmpWord = [word nextObject]) {
if ([tmpWord isEqualToString: #""] || [tmpWord isEqualToString: #"\r\n"] || [tmpWord isEqualToString: #"\n"]) {
// NSLog(#"%#*** NOTHING *** ",tmpWord);
}else { // here I add tmpWord the arrays...
I've tried to do this using NSScanner by following this example: http://www.macresearch.org/cocoa-scientists-part-xxvi-parsing-csv-data
But I received memory warnings and then it all crashed.
Shall I do this using NSScanner and if so, can anyone give me an example of how to do that?
Thanks!

In most cases NSScanner is better suited than componentsSeparatedByString:, especially if you are trying to preserve memory.
Your file could be parsed by a loop like this:
while (![scanner isAtEnd]) {
NSString *firstPart = #"";
NSString *secondPart = #"";
[scanner scanString: #";" intoString: NULL];
[scanner scanUpToString: #";" intoString: &firstPart];
[scanner scanString: #";" intoString: NULL];
[scanner scanUpToString: #";" intoString: &secondPart];
[scanner scanString: #";" intoString: NULL];
// TODO: add firstPart and secondPart to your arrays
}
You probably need to add error-checking code to this in case you get an invalid file.

You should use fast enumeration. It's far better than the one using objectEnumerator. Try this
for (NSString *word in words) {
// do the thing you need
}

Related

NSScanner behavior

I am very new to iOS development. I am trying to parse a simple csv file that has about 10 lines separated by commas. I am using the code below but not able understand why NSScanner, when parsing the fields (fields in the code below) does not go to the next string after the comma. I have to execute the line
[fields scanCharactersFromSet:fieldCharSet intoString:nil];
to make it go past the delimiter. However, I don't have to do the same thing for lines - NSScanner automatically sets the position to the next line past the newline. In both cases I am using the same method - [lines scanUpToCharactersFromSet:intoString] Is there something I am not understanding?
Here is the test file I am trying to parse:
Name,Location,Number,Units
A,AA,4,mm
B,BB,3.5,km
C,CC,10.2,mi
D,DD,2,mm
E,EE,6,in
F,FF,2.8,m
G,GG,3.7,km
H,HH,4.3,mm
I,II,4,km
Here is my code:
-(void)parseFile {
NSCharacterSet *lineCharSet = [NSCharacterSet newlineCharacterSet];
NSCharacterSet *fieldCharSet = [NSCharacterSet characterSetWithCharactersInString:self.separator];
// import the file
NSStringEncoding *encoding = nil;
NSError *error = nil;
NSString *data = [[NSString alloc] initWithContentsOfURL:self.absoluteURL usedEncoding:encoding error:&error];
NSString *line,*field;
NSScanner *lines = [NSScanner scannerWithString:data];
while (![lines isAtEnd]) {
[lines scanUpToCharactersFromSet:lineCharSet intoString:&line];//automatically sets to next line - why?
NSLog(#"%#\n",line);
NSScanner *fields = [NSScanner scannerWithString:line];
while (![fields isAtEnd]) {
[fields scanUpToCharactersFromSet:fieldCharSet intoString:&field];
[fields scanCharactersFromSet:fieldCharSet intoString:nil]; //have to do this otherwise will not go to next symbol
NSLog(#"%#\n", field);
}
}
}
That's just the way NSScanner works. When you use scanUpToCharactersFromSet:intoString:, it scans characters up to but not including the characters in the set. If you want it to move past characters in the set, you have two options:
Make it scan those characters. You are doing this now using scanCharactersFromSet:intoString:. Another way you could do it is [fields scanString:self.separator intoString:nil].
Tell the scanner that the separator character is to be skipped, using setCharactersToBeSkipped:. However, this will make it hard for you to detect empty fields.
The scanner's default set of characters-to-be-skipped includes the newline. That's why your outer scanner skips the newline.
You could do this entirely using componentsSeparatedByString:, instead of using NSScanner. Example:
-(void)parseFile {
NSString *data = [[NSString alloc] initWithContentsOfURL:self.absoluteURL usedEncoding:encoding error:&error];
for (NSString *line in [data componentsSeparatedByString:#"\n"]) {
if (line.length == 0)
continue;
NSLog(#"line: %#", line);
for (NSString *field in [line componentsSeparatedByString:self.separator]) {
NSLog(#" field: %#", field);
}
}
}

how to extract value of an element from JSON data using RegEx

I am trying to extract value of "points" element from JSON data using
NSString* encodedPoints = [apiResponse stringByMatching:#"points:\\\"([^\\\"]*)\\\"" capture:1L];
but there are more than one "points" elements in the JSON data. Plz help me i dont know much about regular expressions.
i am getting JSON Data from this link
You should use a JSON scanner.
Ensure that you have the JSON in an NSString, not an NSData.
Here is a method that uses an NSScanner instead of a regular expression:
NSMutableArray *pointList = [NSMutableArray array];
NSString *pointsString;
BOOL success = YES;
NSScanner *scanner = [NSScanner scannerWithString:encodedPoints];
while (YES) {
success = [scanner scanUpToString:#"points:\"" intoString:nil];
success = [scanner scanString:#"points:\"" intoString:nil];
if (success == NO)
break;
success = [scanner scanUpToString:#"\"" intoString:&pointsString];
[pointList addObject:pointsString];
}
// Show results by print lengths of the found points
for (NSString *point in pointList)
NSLog(#"point length: %i", point.length);
NSLog output:
point length: 22058
point length: 8889
You should use a JSON parser for this as that would be more correct/idea for dealing with JSON than using a regex which is prone to failure.

How to truncate NSString?

I looked at the string formatting documents but couldn't figure out exactly how to do this.
Lets say I have a sting like this
#"(01–05) Operations on the nervous system"
I want to create 2 strings from this like so:
#"01-05" and #"Operations on the nervous system"
How can I do this?
Here are the docs I looked at: http://developer.apple.com/library/mac/#documentation/Cocoa/Conceptual/Strings/Articles/FormatStrings.html
Give this a shot. It might be off a bit, I havent checked for typos. But you can mess around with it now that you get the idea.
NSString * sourceString = #"(01–05) Operations on the nervous system";
NSString *string1 = [sourceString substringToIndex:6];
string1 = [string1 stringByReplacingOccurrencesOfString:#"(" withString:#""];
//string1 = 01-05
NSString *string2 =[sourceString substringFromIndex:7];
//string2 = Operations on the nervous system
If you just want the first substring contained by the characters "(" and ")" and anything after that I'd recommend doing something like this:
NSString *original = #"(01–05) Operations on the nervous system";
NSString *firstPart = [NSString string];
NSString *secondPart = [NSString string];
NSScanner *scanner = [NSScanner scannerWithString:original];
[scanner scanUpToString:#"(" intoString:NULL]; // find first "("
if (![scanner isAtEnd]) {
[scanner scanString:#"(" intoString:NULL]; // consume "("
[scanner scanUpToString:#")" intoString:&firstPart]; // store characters up to the next ")"
if (![scanner isAtEnd]) {
[scanner scanString:#")" intoString:NULL]; // consume ")"
// grab the rest of the string
secondPart = [[scanner string] substringFromIndex:[scanner scanLocation]];
}
}
Of course the secondPart string will still have spaces and whatnot at the front of it, to get rid of those you can do something along the lines of:
secondPart = [secondPart stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceCharacterSet];
The advantage of using NSScanner is that you don't have to hard-code the start and end of the firstPart substring.
NSString *theFirstStringSubString = [NSString substringFromIndex:1];
NSString *theFirstStringSecondSubstring = [theFirstStringSubString substringToIndex:6];
Now theFirstStringSecondSubstring is 01-05
same thing for the other but at different indexes. Please note that these are strings that are autoreleased. If you want to keep them, retain it.

Find characters from the given string with numbers.

How do I get string using NSScanner from a string which contains string as well as numbers too?
i.e. 001234852ACDSB
The result should be 001234852 and ACDSB
I am able to get numbers from the string using NSScanner and characters by using stringByReplacingOccurrencesOfString but I want to know, is that possible to get string from with the use of NSScanner or any other built in methods?
I would like to know the Regex for the same.
If you can guarantee that the string always consists of numbers followed by letters, then you could do the following with NSScanner:
NSScanner *scanner = [NSScanner scannerWithString:#"001234852ACDSB"];
NSString *theNumbers = nil;
[scanner scanCharactersFromSet:[NSCharacterSet decimalDigitCharacterSet]
intoString:&theNumbers];
NSString *theLetters = nil;
[scanner scanCharactersFromSet:[NSCharacterSet letterCharacterSet]
intoString:&theLetters];
A regular expression capturing the same things would look like this:
([0-9]+)([a-zA-Z]+)
Finally after google for the same and go through some information from net, I reached to my destination. With this I'm posting the code, this may help many who are facing the same problem as I have.
NSString *str = #"001234852ACDSB";
NSScanner *scanner = [NSScanner scannerWithString:str];
// set it to skip non-numeric characters
[scanner setCharactersToBeSkipped:[[NSCharacterSet decimalDigitCharacterSet] invertedSet]];
int i;
while ([scanner scanInt:&i])
{
NSLog(#"Found int: %d",i); //001234852
}
// reset the scanner to skip numeric characters
[scanner setScanLocation:0];
[scanner setCharactersToBeSkipped:[NSCharacterSet decimalDigitCharacterSet]];
NSString *resultString;
while ([scanner scanUpToCharactersFromSet:[NSCharacterSet decimalDigitCharacterSet] intoString:&resultString])
{
NSLog(#"Found string: %#",resultString); //ACDSB
}
You don't have to use a scanner to do it.
NSString *mixedString = #"01223abcdsadf";
NSString *numbers = [[mixedString componentsSeparatedByCharactersInSet:[[NSCharacterSet characterSetWithCharactersInString:#"0123456789"] invertedSet]] componentsJoinedByString:#""];
NSString *characters = [[mixedString componentsSeparatedByCharactersInSet:[[NSCharacterSet characterSetWithCharactersInString:#"abcdefghijklmnouprstuwvxyz"] invertedSet]] componentsJoinedByString:#""];
For other possible solution view this question Remove all but numbers from NSString

Objective-C: Find numbers in string

I have a string that contains words as well as a number. How can I extract that number from the string?
NSString *str = #"This is my string. #1234";
I would like to be able to strip out 1234 as an int. The string will have different numbers and words each time I search it.
Ideas?
Here's an NSScanner based solution:
// Input
NSString *originalString = #"This is my string. #1234";
// Intermediate
NSString *numberString;
NSScanner *scanner = [NSScanner scannerWithString:originalString];
NSCharacterSet *numbers = [NSCharacterSet characterSetWithCharactersInString:#"0123456789"];
// Throw away characters before the first number.
[scanner scanUpToCharactersFromSet:numbers intoString:NULL];
// Collect numbers.
[scanner scanCharactersFromSet:numbers intoString:&numberString];
// Result.
int number = [numberString integerValue];
(Some of the many) assumptions made here:
Number digits are 0-9, no sign, no decimal point, no thousand separators, etc. You could add sign characters to the NSCharacterSet if needed.
There are no digits elsewhere in the string, or if there are they are after the number you want to extract.
The number won't overflow int.
Alternatively you could scan direct to the int:
[scanner scanUpToCharactersFromSet:numbers intoString:NULL];
int number;
[scanner scanInt:&number];
If the # marks the start of the number in the string, you could find it by means of:
[scanner scanUpToString:#"#" intoString:NULL];
[scanner setScanLocation:[scanner scanLocation] + 1];
// Now scan for int as before.
Self contained solution:
+ (NSString *)extractNumberFromText:(NSString *)text
{
NSCharacterSet *nonDigitCharacterSet = [[NSCharacterSet decimalDigitCharacterSet] invertedSet];
return [[text componentsSeparatedByCharactersInSet:nonDigitCharacterSet] componentsJoinedByString:#""];
}
Handles the following cases:
#"1234" → #"1234"
#"001234" → #"001234"
#"leading text get removed 001234" → #"001234"
#"001234 trailing text gets removed" → #"001234"
#"a0b0c1d2e3f4" → #"001234"
Hope this helps!
You could use the NSRegularExpression class, available since iOS SDK 4.
Bellow a simple code to extract integer numbers ("\d+" regex pattern) :
- (NSArray*) getIntNumbersFromString: (NSString*) string {
NSMutableArray* numberArray = [NSMutableArray new];
NSString* regexPattern = #"\\d+";
NSRegularExpression* regex = [[NSRegularExpression alloc] initWithPattern:regexPattern options:0 error:nil];
NSArray* matches = [regex matchesInString:string options:0 range:NSMakeRange(0, string.length)];
for( NSTextCheckingResult* match in matches) {
NSString* strNumber = [string substringWithRange:match.range];
[numberArray addObject:[NSNumber numberWithInt:strNumber.intValue]];
}
return numberArray;
}
Try this answer from Stack Overflow for a nice piece of C code that will do the trick:
for (int i=0; i<[str length]; i++) {
if (isdigit([str characterAtIndex:i])) {
[strippedString appendFormat:#"%c",[str characterAtIndex:i]];
}
}
By far the best solution! I think regexp would be better, but i kind of sux at it ;-) this filters ALL numbers and concats them together, making a new string. If you want to split multiple numbers change it a bit. And remember that when you use this inside a big loop it costs performance!
NSString *str= #"bla bla bla #123 bla bla 789";
NSMutableString *newStr = [[NSMutableString alloc] init];;
int j = [str length];
for (int i=0; i<j; i++) {
if ([str characterAtIndex:i] >=48 && [str characterAtIndex:i] <=59) {
[newStr appendFormat:#"%c",[str characterAtIndex:i]];
}
}
NSLog(#"%# as int:%i", newStr, [newStr intValue]);
Swift extension for getting number from string
extension NSString {
func getNumFromString() -> String? {
var numberString: NSString?
let thisScanner = NSScanner(string: self as String)
let numbers = NSCharacterSet(charactersInString: "0123456789")
thisScanner.scanUpToCharactersFromSet(numbers, intoString: nil)
thisScanner.scanCharactersFromSet(numbers, intoString: &numberString)
return numberString as? String;
}
}
NSPredicate is the Cocoa class for parsing string using ICU regular expression.