How to use regular expression in iPhone app to separate string by , (comma) - iphone

I have to read .csv file which has three columns. While parsing the .csv file, I get the string in this format Christopher Bass,\"Cry the Beloved Country Final Essay\",cbass#cgs.k12.va.us. I want to store the values of three columns in an Array, so I used componentSeparatedByString:#"," method! It is successfully returning me the array with three components:
Christopher Bass
Cry the Beloved Country Final Essay
cbass#cgs.k12.va.us
but when there is already a comma in the column value, like this
Christopher Bass,\"Cry, the Beloved Country Final Essay\",cbass#cgs.k12.va.us
it separates the string in four components because there is a ,(comma) after the Cry:
Christopher Bass
Cry
the Beloved Country Final Essay
cbass#cgs.k12.va.us
so, How can I handle this by using regular expression. I have "RegexKitLite" classes but which regular expression should I use. Please help!
Thanks-

Any regular expression would probably turn out with the same problem, what you need is to sanitize your entries or strings, either by escaping your commas or by highlighting strings this way: "My string". Otherwise you will have the same problem. Good luck.
For your example you would probably need to do something like:
\"Christopher Bass\",\"Cry\, the Beloved Country Final Essay\",\"cbass#cgs.k12.va.us\"
That way you could use a regexp or even the same method from the NSString class.
Not related at all, but the importance of sanitizing strings: http://xkcd.com/327/ hehehe.

How about this:
componentsSeparatedByRegex:#",\\\"|\\\","
This should split your string whereever " and , appear together in either order, resulting in a three-member array. This of course assumes that the second element in the string is always enclosed in parentheses, and the characters " and , never appear consecutively within the three components.
If either of these assumptions is incorrect, other methods to identify string components may be used, but it should be made clear that no generic solution exists. If the three component strings can contain " and , anywhere, not even a limited solution is possible in such cases:
Doe, John,\"\"Why Unescaped Strings Suck\", And Other Development Horror Stories\",Doe, John <john.doe#dev.null>
Hopefully there is nothing like the above in your CSV data. If there is, the data is basically unusable, and you should look into a better CSV exporter.

The regex you're searching for is: \\"(.*)\\"[ ^,]*|([^,]*),
in ObjC: (('\"' && string_1 && '\"' && 0-n spaces) || string_2 except comma) && comma
NSString *str = #"Christopher Bass,\"Cry, the Beloved Country ,Final Essay\",cbass#cgs.k12.va.us,som";
NSString *regEx = #"\\\"(.*)\\\"[ ^,]*|([^,]*),";
NSMutableArray *split = [[str componentsSeparatedByRegex:regEx] mutableCopy];
[split removeObject:#""]; // because it will print always both groups even if the other is empty
NSLog(#"%#", split);
// OUTPUT:
2012-02-07 17:42:18.778 tmpapp[92170:c03] (
"Christopher Bass",
"Cry, the Beloved Country ,Final Essay",
"cbass#cgs.k12.va.us",
som
)
RegexKitLite will add both strings to the array, therefore you will end up with empty objects for your array. removeObject:#"" will delete those but if you need to maintain true empty values (eg. your source has val,,ue) you have to modify the code to the following:
str = [str stringByReplacingOccurrencesOfRegex:regEx withString:#"$1$2∏"];
NSArray *split = [str componentsSeparatedByString:#"∏"];
$1 and $2 are those two strings mentioned above, ∏ is in this case a character which will most likely never appear in normal text (and is easy to remember: option-shift-p).

The last part looks like it will never contain a comma. Neither will the first one as far as I can see...
What about splitting the string like this:
NSArray *splitArr = [str componentsSeparatedByString:#","];
NSString *nameStr = [splitArr objectAtIndex:0];
NSString *emailStr = [splitArr lastObject];
NSString *contentStr = #"";
for(int i=1; i<[splitArr count]-1; ++i) {
contentStr = [contentStr stringByAppendingString:[splitArr objectAtIndex:i]];
}
This will use the first and last string as is, and combine the rest into the content.
Kind of a hack, but a name and an email address will never contain a comma, right?

Is the title guarantied to have the quotation marks? And is it the only component that can have them? Because then componentSeparatedByString:#"\"" should get you this:
Christopher Bass,
Cry, the Beloved Country Final Essay
,cbass#cgs.k12.va.us
Then use componentSeparatedByString:#"," or substringFrom/ToIndex: to get rid of the two commas in the first and last component.
Here's a solution using substring:
NSString* input = #"Christopher Bass,\"Cry, the Beloved Country Final Essay\",cbass#cgs.k12.va.us";
NSArray* split = [input componentsSeparatedByString:#"\""];
NSString* part1 = [split objectAtIndex:0];
NSString* part2 = [split objectAtIndex:1];
NSString* part3 = [split objectAtIndex:2];
part1 = [part1 substringToIndex:[part1 length] - 1];
part3 = [part3 substringFromIndex:1];
NSLog(part1);
NSLog(part2);
NSLog(part3);

Related

Confusion with case used by CFURLCreateStringByAddingPercentEscapes encoding

I want URL encoding to be done. My input string is "ChBdgzQ3qUpNRBEHB+bOXQNjRTQ="
I get an output as "ChBdgzQ3qUpNRBEHB%2BbOXQNjRTQ%3D" which is totally correct except the case which gets encoded.
Ideally, it should have been "ChBdgzQ3qUpNRBEHB%2bbOXQNjRTQ%3d" instead of the output I get.
i.e I should have got %2b and %3d instead of %2B and %3D.
Could this be done?
The code I used is as below :
NSString* inputStr = #"ChBdgzQ3qUpNRBEHB+bOXQNjRTQ=";
NSString* outputStr = (NSString *)CFURLCreateStringByAddingPercentEscapes(NULL,
(CFStringRef)inputStr,
NULL,
(CFStringRef)#"!*'\"();:#&=+$,/?%#[]% ",
CFStringConvertNSStringEncodingToEncoding(encoding));
Another perhaps more elegant but slower way would be to loop over your string, converting each character in the string one by one (so you would get the length of your string, then get a substring from it from location 0 to length-1, with one character each time, then translate just that substring. If the returned string has a length > 1, then CFURLCreateStringByAddingPercentEscapes encoded the character, and you can safely turn the case into lower case.
In all cases you append the returned (and possibly modified) string to a mutable string, and when done you have exactly what you want for any possible string. Even though this would appear to be a real processor hog, the reality is you would probably never notice the extra consumed cycles.
Likewise, a second approach would be to just convert your whole string first, then copy it byte by byte to a mutable string, and if you find a "%", then turn the next two characters into lower case. Just a slightly different way to slice the problem.
You can use a regular expression to perform the post operation:
NSMutableString *finalStr = outputStr.mutableCopy;
NSRegularExpression *re = [[NSRegularExpression alloc] initWithPattern:#"(?<=%)[0-9A-F]{2}" options:0 error:nil];
for (NSTextCheckingResult *match in [re matchesInString:escaped options:0 range:NSMakeRange(0, escaped.length)]) {
[finalStr replaceCharactersInRange:match.range withString:[[escaped substringWithRange:match.range] lowercaseString]];
}
The code uses this regular expression:
(<?=%)[0-9A-F]{2}
It matches two hexadecimal characters, only if preceded by a percent sign. Each match is then iterated and replaced within a mutable string. We don't have to worry about offset changes because the replacement string is always the same length.

Convert NSString UPPERCASE into Capitalized

I have some strings like NAVJYOT COMPLEX, NEAR A ONE SCHOOL, SUBHASH CHOWK , MEMNAGAR, Ahmedabad, Gujarat, India.
I want to convert them so the first character is uppercase and remaining are lowercase, e.g: Navjyot Complex, Near A One School, Subhash Chowk, Memnagar, Ahmedabad, Gujarat, India. So please help me convert those strings.
Thanks in advance.
nsstring has the following method
capitalizedString
it returns:
"A string with the first character from each word in the receiver changed to its corresponding uppercase value, and all remaining characters set to their corresponding lowercase values."
http://developer.apple.com/library/mac/#documentation/Cocoa/Reference/Foundation/Classes/NSString_Class/Reference/NSString.html
use This one
NSString *str1 = #"ron";
NSString *str = [str1 capitalizedString];

Find words with regEx and then add whitespaces inbetween with Objective-c

I was wondering how to add whitespaces inbetween letters/numbers in a string with Objective-C.
I have the sample code kinda working at the moment. Basically I want to turn "West4thStreet" into "West 4th Street".
NSString *myText2 = #"West4thStreet";
NSString *regexString2 = #"([a-z.-][^a-z .-])";
for(NSString *match2 in [myText2 componentsMatchedByRegex:regexString2 capture:1L]) {
NSString *myString = [myText2 stringByReplacingOccurrencesOfString:match2 withString:#" "];
NSLog(#"Prints out: %#",myString); // Prints out: Wes thStreet // Prints out: West4t treet
}
So in this example, it's replacing what I found in regEx (the "t4" and "hS") with spaces. But I just want to add a space inbetween the letters to separate out the words.
Thanks!
If you wrap parts of your regex patterns in parentheses, you can refer to them as $1, $2, etc in your replacement string (patterns are numbered from left to right, by the order of their opening parenthesis).
NSString *origString = #"West4thStreet";
NSString *newString = [origString stringByReplacingOccurrencesOfRegex:#"(4th)" withString:#" $1 "];
Not sure I understand your broader use case, but that should at least get you going...

How do you split NSString into component parts?

In Xcode, if I have an NSString containing a number, ie #"12345", how do I split it into an array representing component parts, ie "1", "2", "3", "4", "5"... There is a componentsSeparatedByString on the NSString object, but in this case there is no delimiter...
There is a ready member function of NSString for doing that:
NSString* foo = #"safgafsfhsdhdfs/gfdgdsgsdg/gdfsgsdgsd";
NSArray* stringComponents = [foo componentsSeparatedByString:#"/"];
It may seem like characterAtIndex: would do the trick, but that returns a unichar, which isn't an NSObject-derived data type and so can't be put into an array directly. You'd need to construct a new string with each unichar.
A simpler solution is to use substringWithRange: with 1-character ranges. Run your string through a simple for (int i=0;i<[myString length];i++) loop to add each 1-character range to an NSMutableArray.
A NSString already is an array of it’s components, if by components you mean single characters. Use [string length] to get the length of the string and [string characterAtIndex:] to get the characters.
If you really need an array of string objects with only one character you will have to create that array yourself. Loop over the characters in the string with a for loop, create a new string with a single character using [NSString stringWithFormat:] and add that to your array. But this usually is not necessary.
In your case, since you have no delimiter, you have to get separate chars by
- (void)getCharacters:(unichar *)buffer range:(NSRange)aRange
or this one
- (unichar)characterAtIndex:(NSUInteger) index inside a loop.
That the only way I see, at the moment.
Don't know if this works for what you want to do but:
const char *foo = [myString UTF8String]
char third_character = foo[2];
Make sure to read the docs on UTF8String

How a get a part of the string from main String in Objective C

I have mainString from which i need to get the part of the string after finding a keyword.
NSString *mainString = "Hi how are you GET=dsjghdsghghdsjkghdjkhsg";
now I need to get the string after the keyword "GET=".
Waiting for a reply.
Have a look at the NSString documentation.
Assuming your string really is so totally straightforward, you could do something like this:
NSArray *components = [mainString componentsSeparatedByString: #"GET="];
NSString *stringYouWant = [components objectAtIndex: 1];
Obviously, this performs absolutely no error checking and makes a number of assumptions about the actual contents of mainString, but it should get you started.
Note, also, that the code is somewhat defensive in that it assumes that you are looking for GET= and not separating on =. Either way is a hack in terms of parsing, but... hey... hacks are sometimes the right answer.
You can use a regex via RegexKitLite:
NSString *mainString = #"Hi how are you GET=dsjghdsghghdsjkghdjkhsg";
NSString *matchedString = [mainString stringByMatching:#"GET=(.*)" capture:1L];
// matchedString == #"dsjghdsghghdsjkghdjkhsg";
The regex used, GET=(.*), basically says "Look for GET=, and then grab everything after that". The () specifies a capture group, which are useful for extracting just part of a match. Capture groups begin at 1, with capture group 0 being "the entire match". The part inside the capture group, .*, says "Match any character (the .) zero or more times (the *)".
If the string, in this case mainString, is not matched by the regex, then matchedString will be NULL.
You can get the location of the first occurrence of = and then just take a substring of mainString from the location of = to the end of the string.