I have an app that syncs data from a remote DB that users populate. Seems people copy and paste crap from a ton of different OS's and programs which can cause different hidden non ASCII values to be imported into the system.
For example I end up with this:
Artist:â â Ioco
This ends up getting sent back into system during sync and my JSON conversion furthers the problem and invalid characters in various places cause my app to crash.
How do I search for and clean out any of these invalid characters?
While I strongly believe that supporting unicode is the right way to go, here's an example of how you can limit a string to only contain certain characters (in this case ASCII):
NSString *test = #"Olé, señor!";
NSMutableString *asciiCharacters = [NSMutableString string];
for (NSInteger i = 32; i < 127; i++) {
[asciiCharacters appendFormat:#"%c", i];
}
NSCharacterSet *nonAsciiCharacterSet = [[NSCharacterSet characterSetWithCharactersInString:asciiCharacters] invertedSet];
test = [[test componentsSeparatedByCharactersInSet:nonAsciiCharacterSet] componentsJoinedByString:#""];
NSLog(#"%#", test); // Prints #"Ol, seor!"
A simpler version of Morten Fast's answer:
NSString *test = #"Olé, señor!";
NSCharacterSet *nonAsciiCharacterSet = [[NSCharacterSet
characterSetWithRange:NSMakeRange(32, 127 - 32)] invertedSet];
test = [[test componentsSeparatedByCharactersInSet:nonAsciiCharacterSet]
componentsJoinedByString:#""];
NSLog(#"%#", test); // Prints #"Ol, seor!"
Notably, this uses NSCharacterSet's +characterSetWithRange: method to simply specify the desired ASCII range rather than having to create a string, etc.
The results are identical, as comparing one to the other with isEqual: returns YES.
Related
I'm using the following code to copy a string of text which contains both English and Hebrew characters into UIPasteboard.
UIPasteboard *appPasteBoard = [UIPasteboard generalPasteboard];
appPasteBoard.persistent = YES;
NSString *toCopy = [self.workingDvarTorah description];
[appPasteBoard setValue:toCopy forPasteboardType:(NSString *)kUTTypeUTF8PlainText];
I've implemented my own version of the description method, to copy the relevant data, here's that:
- (NSString *)description{
// Build a string from the tags
NSMutableString *tags = [[[NSMutableString alloc] init] autorelease];
BOOL isFirstTag = YES;
for (Tag *aTag in self.tags) {
// Add a comma where necessary, but make
// sure that we're not adding a comma to
// the beginning of the first tag.
if (isFirstTag) {
isFirstTag = NO;
[tags appendFormat:#" "];
}else{
[tags appendFormat:#", "];
}
[tags appendFormat:#"%#", aTag.tagText];
}
return [NSString stringWithFormat:#"%# \n\n %#\n\n%#: %#", self.dvarTorahTitle, self.dvarTorahContent, NSLocalizedString(#"Tags", #""), tags];
}
The text copies to the pasteboard, but when I paste it into notes or mail, certain characters, nameley the dagesh, unicode character 05BC, appears as a box, instead of the way it should. I've tried all of the text UTI types.
Am I doing something wrong? Is this a bug in iOS or the Notes app?
What can I do, short of stripping the offending characters, to correct the problem?
According to Wikipedia, there are two representations of Hebrew characters with dagesh in them. Apparently, there is a "combined character" and an alternate representation composed of two alternate characters. The program which created my initial data file may have used the combined character. When the data was exported again without the combined characters, the copy-paste worked.
So, it looks like there are certain characters that are unsupported by Notes and Mail for iOS.
> (2009 RX7)</font></td>
>monospace" size="-1">214869 (2007 PAZ)</font></td>
>monospace" size="-1"> 4155 Accord</font></td>
I wonder if someone could offer me a little help, I have a list of NSString items (See Above) that I want to parse some data from. My problem is that there are no tags that I can use within the strings nor do the items I want have fixed positions. The data I want to extract is:
2009 RX7
2007 PAZ
4155 Accord
My thinking is that its going to be easier to parse from the right hand end, remove the </font></td> and then use ";" to separate the data items:
(2009  RX7)
(2007  PAZ)
4155  Accord
which can them be cleaned up to match the example given. Any pointers on doing this or working through from the right would be very much appreciated.
Personally I think you are better off with a regex. So my solution would be:
Regex of: ([0-9]+)[^;]+;([A-Za-z0-9]+)
Which for all the example text provides 3 matches. ie for:
(2009 RX7)</font></td>
0: 2009 RX7)<
1: 2009
2: RX7
I haven't coded this up, but did test the Regex at www.regextester.com
Regex's are implemented via NSRegularExpression and are available in iOS 4.0 and later.
Edit
Given that this appears to be a web scraping application, you never know when those pesky HTML code monkeys will change their output and break your carefully crafted matching methodology. As such I would change my regex to:
([0-9]+)([^;]+;)+([A-Za-z0-9]+)
Which adds an extra group, but allows for any number of elements between the number and the string.
Try this code:
NSString *str = #"> (2009 RX7)</font></td>";
NSRange fontRange = [str rangeOfString:#"</Font>" options:NSBackwardsSearch];
NSRange lastSemi = [str rangeOfString:#";" options:NSBackwardsSearch range:NSMakeRange(0, fontRange.location-1)];
NSRange priorSemi = [str rangeOfString:#";" options:NSBackwardsSearch range:NSMakeRange(0, lastSemi.location-1)];
NSString *yourString = [str substringWithRange:NSMakeRange(priorSemi.location+1, fontRange.location-1)];
The key element here is the NSBackwardsSearch search option.
This should do the trick:
NSString *s = #">monospace\" size=\"-1\"> 4155 Accord</font></td>";
NSArray *strArray = [s componentsSeparatedByString:#";"];
// you're interested in last two objects
NSArray *tmp = [strArray subarrayWithRange:NSMakeRange(strArray.count - 2, 2)];
In tmp you'll have something like:
"4155 ",
"Accord</font></td>"
strip unneeded chars and you're all set.
Using NSRegularExpression:
NSRegularExpression *regex;
NSTextCheckingResult *match;
NSString *pattern = #"([0-9]+) ([A-Za-z0-9]+)[)]?</font></td>";
NSString *string = #"> (2009 RX7)</font></td>";
regex = [NSRegularExpression
regularExpressionWithPattern:pattern
options:NSRegularExpressionCaseInsensitive
error:nil];
match = [regex firstMatchInString:string options:0 range:NSMakeRange(0, [string length])];
NSLog(#"'%#'", [string substringWithRange:[match rangeAtIndex:1]]);
NSLog(#"'%#'", [string substringWithRange:[match rangeAtIndex:2]]);
NSLog output:
'2009'
'RX7'
I have a txt file with some URLs like this
http://url1.com
http://url1.com
http://url1.com
Separated by a line break. How could I add those as different entries separated by line breaks to an NSMutableArray? Thanks :)
Try something like this:
NSMutableArray *txtLines = [NSMutableArray array];
[txtFile enumerateLinesUsingBlock:^(NSString *line, BOOL *stop) {
if ([line length] > 0) {
[txtLines addObject:line];
}
}];
Update
#Evan is right, the above only works if blocks are available on your platform. A compiler directive around that code should take care of this limitation, e.g.:
#if NS_BLOCKS_AVAILABLE
// iOS 4.0+ solution
#else
// iOS 2.0+ solution
#endif
NSString *myListString = /* load / download file */
NSMutableArray *myList = [myListString componentsSeparatedByString:#"\n"];
You may have to use <br/> if it's HTML.
#octy's solution is only available in iOS 4.0 or later. This solution is iOS 2.0 or later. You can check the iOS version and choose which one to use:
BOOL useEnumeratedLineParsing = FALSE;
NSString *reqSysVer = #"4.0";
NSString *currSysVer = [[UIDevice currentDevice] systemVersion];
if ([currSysVer compare:reqSysVer options:NSNumericSearch] != NSOrderedAscending)
useEnumeratedLineParsing = TRUE;
Then check the value of useEnumeratedLineParsing.
NSString *textFilePath = [[NSBundle mainBundle] pathForResource:#"urls" ofType:#"txt"];
NSString *fileContentsUrls = [NSString stringWithContentsOfFile:textFilePath encoding:NSUTF8StringEncoding error:nil];
NSArray *myArray = [urls componentsSeparatedByString:#"\n"];
As long as it isn't mega-large, you could read the whole file into an NSString.
NSString *text = [NSString stringWithContentsOfFile:path encoding:NSUTF8Encoding error:nil];
Then split the lines:
NSArray *lines = [text componentsSeparatedByString:#"\n"];
And make it mutable:
NSMutableArray *mutableLines = [lines mutableCopy];
Now, depending on where your text file is coming from, you probably need to be more careful. It could be separated by \r\n instead of just \n, in which case your lines will contain a bunch of extra \r characters. You could clean this up after the fact, using something to remove extra whitespace (your file also might have blank lines which the above will turn into empty strings).
On the other hand, if you're in control of the file, you won't have to worry about that. (But in that case, why not read a plist instead parsing a plain text file...)
When I get a string of the form \u043F\u043F (Unicode), how do I convert it to a readable NSUT8String? Here is my code (that fails when these are non English characters):
- (void)connectionDidFinishLoading:(NSURLConnection *)connection{
NSString *theStr = [[NSString alloc] initWithBytes:[receivedData bytes]
length:[receivedData length] encoding: NSUTF8StringEncoding];
NSLog(theStr);
}
When the string is in English characters everything is fine - but when it is in Unicode format it fails to give me a readable string (but remains in a Unicode format).
What do you think?
EDIT:
I realized I didn't give enough info on what I'm trying to do. I am trying to use youtube's way of getting auto-suggested keywords when you use the search box (nothing official, just used a sniffer to find out). Here it is:
http://suggestqueries.google.com/complete/search?hl=en&client=youtube&hjson=t&ds=yt&jsonp=window.yt.www.suggest.handleResponse&q=*******&cp=******
q is your query and cp is the length of q.
So basically when q is something in English it works fine. But when q has non English characters (Russian for example) this is what I get (from NSLog):
window.yt.www.suggest.handleResponse(["\u043F\u0440",[["\u043F\u0440\u0438\u043A\u043E\u043B\u044B","","0"],["\u043F\u0440\u043E\u0436\u0435\u043A\u0442\u043E\u0440\u043F\u0435\u0440\u0438\u0441\u0445\u0438\u043B\u0442\u043E\u043D","","1"],["\u043F\u0440\u043E\u0436\u0435\u043A\u0442\u043E\u0440\u043F\u0435\u0440\u0438\u0441\u0445\u0438\u043B\u0442\u043E\u043D 87","","2"],["\u043F\u0440\u043E\u0436\u0435\u043A\u0442\u043E\u0440\u043F\u0435\u0440\u0438\u0441\u0445\u0438\u043B\u0442\u043E\u043D 88","","3"],["\u043F\u0440\u043E\u0436\u0435\u043A\u0442\u043E\u0440\u043F\u0435\u0440\u0438\u0441\u0445\u0438\u043B\u0442\u043E\u043D 86","","4"],["\u043F\u0440\u043E\u0436\u0435\u043A\u0442\u043E\u0440\u043F\u0435\u0440\u0438\u0441\u0445\u0438\u043B\u0442\u043E\u043D 85","","5"],["\u043F\u0440\u043E\u0436\u0435\u043A\u0442\u043E\u0440\u043F\u0435\u0440\u0438\u0441\u0445\u0438\u043B\u0442\u043E\u043D 89","","6"],["\u043F\u0440\u043E\u0436\u0435\u043A\u0442\u043E\u0440\u043F\u0435\u0440\u0438\u0441\u0445\u0438\u043B\u0442\u043E\u043D 84","","7"],["\u043F\u0440\u0438\u043A\u043E\u043B\u044B \u0432 \u043F\u0440\u044F\u043C\u043E\u043C \u044D\u0444\u0438\u0440\u0435","","8"],["\u043F\u0440\u043E\u0436\u0435\u043A\u0442\u043E\u0440\u043F\u0435\u0440\u0438\u0441\u0445\u0438\u043B\u0442\u043E\u043D 90","","9"]],{}])
You can use:
#interface NSString
{
- (__strong const char *)UTF8String; // Convenience to return
// null-terminated UTF8 representation
}
I think this may help..
NSString *yourString = "\u043F\u0440\u0438\u043A\u043E\u043B\u044B";
NSArray *unicodeArray = [yourString componentsSeparatedByString:#"\\u"];
NSMutableString *finalString = [[NSMutableString alloc] initWithString:#""];
for (NSString *unicodeString in unicodeArray) {
if (![unicodeString isEqualToString:#""]) {
unichar codeValue;
[[NSScanner scannerWithString:unicodeString] scanHexInt:&codeValue];
NSString* betaString = [NSString stringWithCharacters:&codeValue length:1];
[finalString appendString:betaString];
}
}
//finalString should have encoded one
When using NSString's enumerateSubstringsInRange:options:usingBlock: with the options set as NSStringEnumerationByWords it doesn't include symbols such as /* or // which should be treated similarly to words as they are seperated by spaces.
I also tried using NSStringEnumerationByComposedCharacterSequences but it seems to do exactly the same thing even without this option, it simply goes through every single letter.
Is their no way to enumerate through every substring separated by a space? It sounds so simple by no way to do is provided to do this using enumerateSubstringsInRange:options:usingBlock:.
EDIT
I was also using the option NSEnumerationReverse to got through the substrings backwards.
You could use NSScanner for something like this. It's sort of the long way around, but if the enumerate... messages aren't doing it for you, it might be worth looking at.
For example, you could do something like
NSString *output = nil;
NSCharacterSet *whitespaceCharSet = [NSCharacterSet whitespaceCharacterSet];
NSScanner *scanner = [[NSScanner alloc] initWithString:someString];
// should skip leading whitespace and read everything up to the next whitespace
[scanner scanUpToCharactersFromSet:whitespaceCharSet intoSring:&output];
[scanner release];
Sort of a crude example, but the documentation for NSScanner is fairly simple.
Edit: Alternatively, you could do something like this:
NSString *someString = <...>; // get your string somehow
NSCharacterSet *charSet = [NSCharacterSet whitespaceAndNewlineCharacterSet];
NSArray *components = [someString componentsSeparatedByCharactersInSet:charSet];
[components
enumerateObjectsWithOptions:NSEnumerationReverse
usingBlock:^(id obj, NSUInteger index, BOOL *stop) {
// do stuff
}];