Is it possible in Objective C to search an NSString for a number of different strings at the same time?
For example, I want to search for all occurrences of the strings "good", "great", "awesome", "incredible", "fantastic" and "brilliant" in a very long string.
My first though is to use NSString:rangeOfString: and cycle through multiple times (once for each string) but it strikes me that with longer sets of strings, this may become inefficient and slow.
Is there an in-built way of searching for multiple strings like this, or should I create my own method?
EDIT: The results are in!
After finding some time to benchmark, I found that the RegEx method is indeed slower (more than 2x slower) than the looping rangeInString method. The numbers, for your delectation, are as follows:
With a list of 150,000 words (~1103,500 characters) and 20 match-words, with 5412 matches present
NSString:rangeInString search = 231.077ms
Regular Expression search = 530.113ms
it strikes me that with longer sets of strings, this may become inefficient and slow.
So, have you benchmarked it? If not, then you don't have the right to judge it as "inefficient" and "slow". Premature optimization is evil. Just stick with those nice and simple for loops and the - [NSString rangeOfString:] method.
But: to actually answer your question, it's not impossible to avoid the manual looping. If you use NSRegularExpression with a regex like good|great|awesome, then you can find all occurrences in one pass. The use of regular expressions would probably be slower than a simple string search, though.
Regular expressions are so widely used that the implementation will be efficient. Specifically, a regex match will traverse the input string once.
NSRegularExpression *regex =
[NSRegularExpression regularExpressionWithPattern: #"(good|great|...)"
options: NSRegularExpressionCaseInsensitive
error: ...];
NSArray *matches = [regex matchesInString: string
options: 0
range: NSMakeRange(0, [string length])];
for (NSTextCheckingResult *match in matches)
...
Here is a test snippet:
NSString *string = #"not good nor great";
// as above
for (NSTextCheckingResult *match in matches)
NSLog (#"Match: %#", match);
produces:
2013-08-22 10:21:11.644 foo[2454:707] Match: <NSSimpleRegularExpressionCheckingResult: 0x7fc954301650>{4, 4}{<NSRegularExpression: 0x7fc9543001c0> (good|great) 0x1}
2013-08-22 10:21:11.644 foo[2454:707] Match: <NSSimpleRegularExpressionCheckingResult: 0x7fc954301540>{13, 5}{<NSRegularExpression: 0x7fc9543001c0> (good|great) 0x1}
Yes, internally the NSString is a data blob of unichars. You could retrieve a pointer to that and then have multiple queues search parts of it, though you'd have to make sure that you divide on white space characters so that miss a word part of two ranges.
Related
I want URL encoding to be done. My input string is "ChBdgzQ3qUpNRBEHB+bOXQNjRTQ="
I get an output as "ChBdgzQ3qUpNRBEHB%2BbOXQNjRTQ%3D" which is totally correct except the case which gets encoded.
Ideally, it should have been "ChBdgzQ3qUpNRBEHB%2bbOXQNjRTQ%3d" instead of the output I get.
i.e I should have got %2b and %3d instead of %2B and %3D.
Could this be done?
The code I used is as below :
NSString* inputStr = #"ChBdgzQ3qUpNRBEHB+bOXQNjRTQ=";
NSString* outputStr = (NSString *)CFURLCreateStringByAddingPercentEscapes(NULL,
(CFStringRef)inputStr,
NULL,
(CFStringRef)#"!*'\"();:#&=+$,/?%#[]% ",
CFStringConvertNSStringEncodingToEncoding(encoding));
Another perhaps more elegant but slower way would be to loop over your string, converting each character in the string one by one (so you would get the length of your string, then get a substring from it from location 0 to length-1, with one character each time, then translate just that substring. If the returned string has a length > 1, then CFURLCreateStringByAddingPercentEscapes encoded the character, and you can safely turn the case into lower case.
In all cases you append the returned (and possibly modified) string to a mutable string, and when done you have exactly what you want for any possible string. Even though this would appear to be a real processor hog, the reality is you would probably never notice the extra consumed cycles.
Likewise, a second approach would be to just convert your whole string first, then copy it byte by byte to a mutable string, and if you find a "%", then turn the next two characters into lower case. Just a slightly different way to slice the problem.
You can use a regular expression to perform the post operation:
NSMutableString *finalStr = outputStr.mutableCopy;
NSRegularExpression *re = [[NSRegularExpression alloc] initWithPattern:#"(?<=%)[0-9A-F]{2}" options:0 error:nil];
for (NSTextCheckingResult *match in [re matchesInString:escaped options:0 range:NSMakeRange(0, escaped.length)]) {
[finalStr replaceCharactersInRange:match.range withString:[[escaped substringWithRange:match.range] lowercaseString]];
}
The code uses this regular expression:
(<?=%)[0-9A-F]{2}
It matches two hexadecimal characters, only if preceded by a percent sign. Each match is then iterated and replaced within a mutable string. We don't have to worry about offset changes because the replacement string is always the same length.
I am new to iPhone.I have a small doubt in regular expressions that at present i am using regular expression below one in my project that is
NSRegularExpression *regularExpression =
[NSRegularExpression regularExpressionWithPattern:#"href=\"(.*).zip\""
options:NSRegularExpressionCaseInsensitive
error:&error];
it searches the website viewsource and gives results which are in below pattern
href="kjv/36_Zep.zip"
href="kjv/37_Hag.zip"
but one of the link in view source is like below
href="kjv/38_Zec.zip "
i want to ignore the white spaces after the .zip
how it is possible if any body know this please help me
One way is to do a string replace of all whites spaces with the empty string or use a strip function on that string to remove all trailing spaces. Refer String replacement in Objective-C
If you don't want to do that, use the pattern for empty space in your regular expression to match one or more white spaces.
\s includes \n(ewline) \r(eturn) \t(tab) \v(ertical tab) \f(orm feed) and space. If you want only space use "" which is actually a blank space.
You can match the examples you provided with the following regex...
#"href=\"(.+)\.zip\s*\""
I modified your regex by adding
1) + (matches 1 or more of the preceding character) to capture the entire name before the .zip,
2) \ to the . to prevent it from matching all characters,
3) \s* to match (skip in your case) zero or more whitespaces.
Suppose its given a NSString *test = #"...href="/functions?q=KEYWORD\x26amp... " and you want to perform actions on this string with NSRegularExpression, you could also do easy method call like this
NSTextCheckingResult *result = [testRegex firstMatchInString:[test stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]] options:0 range:NSMakeRange(0, [test length])];
And dont change anything in your NSRegularExpression.
I commonly use groups to gather the item I want. However you need to know how groups work.
Unfortunately You cannot name them. but think of it this way.
groups are indexed with numbers for the () encountered.
0 is the entire match.
1 is the first set of ()
2 is the second set of () and so on.
if you have a group set like this.
NSString *matchString = #"(href)=\"((.*)[.]zip)\"";
you would have 4 groups.
Group 0 is the entire string, Group 1 is the "href", Group 2 is the entire filename and group 3 would be the filename without the extension.
Hope that helps.
NSRegularExpression *regularExpression =
[NSRegularExpression regularExpressionWithPattern:#"href=\"(.*[.]zip)[^\"]*\""
options:NSRegularExpressionCaseInsensitive
error:&error];
NSMutableArray *foundMatches = [NSMutableArray array];
[regex enumerateMatchesInString:originalString
options:0
range:NSMakeRange(0, [originalString length])
usingBlock:^(NSTextCheckingResult *result, NSMatchingFlags flags, BOOL *stop) {
if (result.numberOfRanges == 2){
[foundMatches addObject:[originalString substringWithRange:[result rangeAtIndex:1]]];
}
}];
the match I used here would mess up in the event there is a .zip in the filename that does not include the extension.
e.g. href="my.zip.file.zip" would put match group 2 would be "my.zip" as opposed to "my.zip.file.zip"
hey just a couple quick noob questions about writing my first ios app. Ive been searching through the questions here but they all seem to address questions more advanced than mine so im getting confused.
(1) All I want to do is turn a string into an array of integers representing the ASCII code. In other words, I want to convert:
"This is some string. It has spaces, punctuation, AND capitals."
into an array with 62 integers.
(2) How do I get back from the NSArray to a string?
(3) Also, are these expensive operations in terms of memory or computation time? It seems like it might be if we have to create a new variable at every iteration or something.
I know how to declare all the variables and im assuming I run a loop through the length of the string and at each iteration I somehow get the character and convert it into a number with some call to a built in command.
Thanks for any help you can offer or links to posts that might help!
if you want to store the ascii values in an nsarray it is going to be expensive. NSArray can only hold objects so you're going to have to create an NSNumber for each ASCII value:
unsigned len = [string length];
NSMutableArray arr = [NSMutableArray arrayWithCapacity:len];
for (unsigned i = 0; i < len; ++i) {
[arr addObject:[NSNumber numberWithUnsignedShort:[string characterAtIndex:i]]];
}
2) to go back to an NSString you'll need to use an MSMutableString and append each byte to the NSMutableString.
After saying that I'd suggest you don't use this method if you can avoid it.
A better approach would be to use #EmilioPelaez's answer. To go back from a memory buffer to an NSString is simple and inexpensive compared to iterating and concatting strings.
NSString * stringFromMemory = [[NSString alloc] initWithBytes:buffer length:len encoding: NSASCIIStringEncoding];
I ended up using the syntax I found here. Thanks for the help
How to convert ASCII value to a character in Objective-C?
NSString has a method to get the characters in an array:
NSString *string = "This is some string. It has spaces, punctuation, AND capitals.";
unichar *buffer = malloc(sizeof(unichar) * [string lenght]);
[string getCharacters:buffer range:NSMakeRange(0, [string length])];
If you check the definition of unichar, it's an unsigned short.
I have to read .csv file which has three columns. While parsing the .csv file, I get the string in this format Christopher Bass,\"Cry the Beloved Country Final Essay\",cbass#cgs.k12.va.us. I want to store the values of three columns in an Array, so I used componentSeparatedByString:#"," method! It is successfully returning me the array with three components:
Christopher Bass
Cry the Beloved Country Final Essay
cbass#cgs.k12.va.us
but when there is already a comma in the column value, like this
Christopher Bass,\"Cry, the Beloved Country Final Essay\",cbass#cgs.k12.va.us
it separates the string in four components because there is a ,(comma) after the Cry:
Christopher Bass
Cry
the Beloved Country Final Essay
cbass#cgs.k12.va.us
so, How can I handle this by using regular expression. I have "RegexKitLite" classes but which regular expression should I use. Please help!
Thanks-
Any regular expression would probably turn out with the same problem, what you need is to sanitize your entries or strings, either by escaping your commas or by highlighting strings this way: "My string". Otherwise you will have the same problem. Good luck.
For your example you would probably need to do something like:
\"Christopher Bass\",\"Cry\, the Beloved Country Final Essay\",\"cbass#cgs.k12.va.us\"
That way you could use a regexp or even the same method from the NSString class.
Not related at all, but the importance of sanitizing strings: http://xkcd.com/327/ hehehe.
How about this:
componentsSeparatedByRegex:#",\\\"|\\\","
This should split your string whereever " and , appear together in either order, resulting in a three-member array. This of course assumes that the second element in the string is always enclosed in parentheses, and the characters " and , never appear consecutively within the three components.
If either of these assumptions is incorrect, other methods to identify string components may be used, but it should be made clear that no generic solution exists. If the three component strings can contain " and , anywhere, not even a limited solution is possible in such cases:
Doe, John,\"\"Why Unescaped Strings Suck\", And Other Development Horror Stories\",Doe, John <john.doe#dev.null>
Hopefully there is nothing like the above in your CSV data. If there is, the data is basically unusable, and you should look into a better CSV exporter.
The regex you're searching for is: \\"(.*)\\"[ ^,]*|([^,]*),
in ObjC: (('\"' && string_1 && '\"' && 0-n spaces) || string_2 except comma) && comma
NSString *str = #"Christopher Bass,\"Cry, the Beloved Country ,Final Essay\",cbass#cgs.k12.va.us,som";
NSString *regEx = #"\\\"(.*)\\\"[ ^,]*|([^,]*),";
NSMutableArray *split = [[str componentsSeparatedByRegex:regEx] mutableCopy];
[split removeObject:#""]; // because it will print always both groups even if the other is empty
NSLog(#"%#", split);
// OUTPUT:
2012-02-07 17:42:18.778 tmpapp[92170:c03] (
"Christopher Bass",
"Cry, the Beloved Country ,Final Essay",
"cbass#cgs.k12.va.us",
som
)
RegexKitLite will add both strings to the array, therefore you will end up with empty objects for your array. removeObject:#"" will delete those but if you need to maintain true empty values (eg. your source has val,,ue) you have to modify the code to the following:
str = [str stringByReplacingOccurrencesOfRegex:regEx withString:#"$1$2∏"];
NSArray *split = [str componentsSeparatedByString:#"∏"];
$1 and $2 are those two strings mentioned above, ∏ is in this case a character which will most likely never appear in normal text (and is easy to remember: option-shift-p).
The last part looks like it will never contain a comma. Neither will the first one as far as I can see...
What about splitting the string like this:
NSArray *splitArr = [str componentsSeparatedByString:#","];
NSString *nameStr = [splitArr objectAtIndex:0];
NSString *emailStr = [splitArr lastObject];
NSString *contentStr = #"";
for(int i=1; i<[splitArr count]-1; ++i) {
contentStr = [contentStr stringByAppendingString:[splitArr objectAtIndex:i]];
}
This will use the first and last string as is, and combine the rest into the content.
Kind of a hack, but a name and an email address will never contain a comma, right?
Is the title guarantied to have the quotation marks? And is it the only component that can have them? Because then componentSeparatedByString:#"\"" should get you this:
Christopher Bass,
Cry, the Beloved Country Final Essay
,cbass#cgs.k12.va.us
Then use componentSeparatedByString:#"," or substringFrom/ToIndex: to get rid of the two commas in the first and last component.
Here's a solution using substring:
NSString* input = #"Christopher Bass,\"Cry, the Beloved Country Final Essay\",cbass#cgs.k12.va.us";
NSArray* split = [input componentsSeparatedByString:#"\""];
NSString* part1 = [split objectAtIndex:0];
NSString* part2 = [split objectAtIndex:1];
NSString* part3 = [split objectAtIndex:2];
part1 = [part1 substringToIndex:[part1 length] - 1];
part3 = [part3 substringFromIndex:1];
NSLog(part1);
NSLog(part2);
NSLog(part3);
I have mainString from which i need to get the part of the string after finding a keyword.
NSString *mainString = "Hi how are you GET=dsjghdsghghdsjkghdjkhsg";
now I need to get the string after the keyword "GET=".
Waiting for a reply.
Have a look at the NSString documentation.
Assuming your string really is so totally straightforward, you could do something like this:
NSArray *components = [mainString componentsSeparatedByString: #"GET="];
NSString *stringYouWant = [components objectAtIndex: 1];
Obviously, this performs absolutely no error checking and makes a number of assumptions about the actual contents of mainString, but it should get you started.
Note, also, that the code is somewhat defensive in that it assumes that you are looking for GET= and not separating on =. Either way is a hack in terms of parsing, but... hey... hacks are sometimes the right answer.
You can use a regex via RegexKitLite:
NSString *mainString = #"Hi how are you GET=dsjghdsghghdsjkghdjkhsg";
NSString *matchedString = [mainString stringByMatching:#"GET=(.*)" capture:1L];
// matchedString == #"dsjghdsghghdsjkghdjkhsg";
The regex used, GET=(.*), basically says "Look for GET=, and then grab everything after that". The () specifies a capture group, which are useful for extracting just part of a match. Capture groups begin at 1, with capture group 0 being "the entire match". The part inside the capture group, .*, says "Match any character (the .) zero or more times (the *)".
If the string, in this case mainString, is not matched by the regex, then matchedString will be NULL.
You can get the location of the first occurrence of = and then just take a substring of mainString from the location of = to the end of the string.