Replace unicode value in string

Replace unicode value in string - iphone

I have a string #"\EOP". I want to dislpay this to user. But when i display this string in textfield, It shows only OP. I tried to print that in console while debugging and it shows ¿OP
So \E is unicode value and that's why it's having some issue of encoding. I can fix this issue by:
NSString *str=[str stringByReplacingOccurrencesOfString:#"\E" withString:#"\\E"];
With this it will display perfect string #"\EOP".
Here my issue is that there can be many more characters same like \E for example \u. How can I implement one fix for all these kind of characters?

\E in the string #"\EOP" is the character with the ASCII-code (or Unicode) 27,
which is a control character.
I don't know of a built-in method to escape all control characters in a string.
The following code uses NSScanner to locate the control characters, and replaces them
using a lookup-table. The control characters are replaced by "Character Escape Codes"
such as "\r" or "\n" if possible, otherwise by "\x" followed by the hex-code.
NSString *str = #"\EOP";
NSCharacterSet *controls = [NSCharacterSet controlCharacterSet];
static char *replacements[] = {
"0", NULL, NULL, NULL, NULL, NULL, NULL, "\\a",
"\\b", "\\t", "\\n", "\\v", "\\f", "\\r", NULL, NULL,
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
NULL, NULL, NULL, "\\e"};
NSScanner *scanner = [NSScanner scannerWithString:str];
[scanner setCharactersToBeSkipped:nil];
NSMutableString *result = [NSMutableString string];
while (![scanner isAtEnd]) {
NSString *tmp;
// Copy all non-control characters verbatim:
if ([scanner scanUpToCharactersFromSet:controls intoString:&tmp]) {
[result appendString:tmp];
}
if ([scanner isAtEnd])
break;
// Escape all control characters:
if ([scanner scanCharactersFromSet:controls intoString:&tmp]) {
for (int i = 0; i < [tmp length]; i++) {
unichar c = [tmp characterAtIndex:i];
char *r;
if (c < sizeof(replacements)/sizeof(replacements[0])
&& (r = replacements[c]) != NULL) {
// Replace by well-known character escape code:
[result appendString:#(r)];
} else {
// Replace by \x<hexcode>:
[result appendFormat:#"\\x%02x", c];
}
}
}
}
NSLog(#"%#", result);

You can always replace \ with \\. These are called Escape Sequences.
Sample Code :
NSString *str = #"\EOP";
NSString *myNewStr = [str stringByReplacingOccurrencesOfString:#"\\" withString:#"\\\\"];
NSLog(#"myNewStr :: %#",myNewStr);

If you want a backslash (\) to appear in a string literal, you need to escape it in the string literal i.e.
NSString* foo = #"\\EOP";
The above will give you the Unicode sequence 5C 45 4F 50 which is what you want.

Related

iPhone iOS how to programmatically check for characters like ✭ ♦ (and others from this set)?

I know that both ✭star and ♦ diamond are from the ASCII extended character set. But is there some NScharacterSet available on iOS that I can use to check for characters like these programmatically?

you can display these symbols using charsets ISO 8859-1 or UTF-8.
[ ★ ] star solid [ number: ★]
[ ♦ ] black diamond suit [name: ♦] [number: ♦]

This can be done with the NSString object method: rangeOfCharacterFromSet. It returns NSRange object.
From that,
Eg.:
NSCharacterSet *charSet = [[NSCharacterSet characterSetWithCharactersInString:#"0123456789"] invertedSet];
NSString *string = #"Stach✭Overflow";
NSRange range = [string rangeOfCharacterFromSet:charSet];
if(range.location != NSNotFound) {
// one of those characters is exists in the string.
}else{
//no specifiedcharacter not found.
}

Trim string from END

I want to string string from the end of string, is there any api function of string which ONLY removes Space and Newline from END of string.
I wrote manual code to search character from end of string and remove space and newline but it may slow the process.
API function needed..
Thanks in advance

try this one may be it helps you,
-(NSString *)removeEndSpaceFrom:(NSString *)strtoremove{
NSUInteger location = 0;
unichar charBuffer[[strtoremove length]];
[strtoremove getCharacters:charBuffer];
int i = 0;
for ( i = [strtoremove length]; i >0; i--){
if (![[NSCharacterSet whitespaceCharacterSet] characterIsMember:charBuffer[i - 1]]){
break;
}
}
return [strtoremove substringWithRange:NSMakeRange(location, i - location)];
}
and
NSString *string = #" this text has spaces before and after ";
NSString *trimmedString = [self removeEndSpaceFrom:string];
NSLog(#"%#",trimmedString);

NSCharacter Set uses int's but i need unassigned short?

I am using MWFeedParser to add a feed into my app. Now the framework passes date's and I it has a few warnings mainly due to older type of code.
Now there are 4 warnings left which are all the same and technically I can fix them and remove them so that the warnings are gone, but then I get left with the app not working properly.
The code concerning is:
// Character sets
NSCharacterSet *stopCharacters = [NSCharacterSet characterSetWithCharactersInString:[NSString stringWithFormat:#"< \t\n\r%C%C%C%C", 0x0085, 0x000C, 0x2028, 0x2029]];
Now the bit that is the warning is:
\t\n\r%C%C%C%C", 0x0085, 0x000C, 0x2028, 0x2029]];
The warning is:
Format specifies type 'unsigned short' but the argument has type 'int'
So I changed into:
\t\n\r%i%i%i%i", 0x0085, 0x000C, 0x2028, 0x2029]];
which indeed removed the warnings and gave me perfect code:-) (no warnings or errors)
When I then ran the app it did not parse the date and it was not able to open the link.
I am not sure if this a is C thing, but right now it is definitely outside of my knowledge field. Is there anyone who can help me that can fix this problem, and still have it working in the app??
Thank you in advance:-)
EDIT
- (NSString *)stringByConvertingHTMLToPlainText {
// Pool
NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init];
// Character sets
NSCharacterSet *stopCharacters = [NSCharacterSet characterSetWithCharactersInString:#"< \t\n\r\x0085\x000C\u2028\u2029"];
NSCharacterSet *newLineAndWhitespaceCharacters = [NSCharacterSet characterSetWithCharactersInString:#"< \t\n\r\205\014\u2028\u2029"];
NSCharacterSet *tagNameCharacters = [NSCharacterSet characterSetWithCharactersInString:#"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"];
// Scan and find all tags
NSMutableString *result = [[NSMutableString alloc] initWithCapacity:self.length];
NSScanner *scanner = [[NSScanner alloc] initWithString:self];
[scanner setCharactersToBeSkipped:nil];
[scanner setCaseSensitive:YES];
NSString *str = nil, *tagName = nil;
BOOL dontReplaceTagWithSpace = NO;
do {
// Scan up to the start of a tag or whitespace
if ([scanner scanUpToCharactersFromSet:stopCharacters intoString:&str]) {
[result appendString:str];
str = nil; // reset
}
// Check if we've stopped at a tag/comment or whitespace
if ([scanner scanString:#"<" intoString:NULL]) {
// Stopped at a comment or tag
if ([scanner scanString:#"!--" intoString:NULL]) {
// Comment
[scanner scanUpToString:#"-->" intoString:NULL];
[scanner scanString:#"-->" intoString:NULL];
} else {
// Tag - remove and replace with space unless it's
// a closing inline tag then dont replace with a space
if ([scanner scanString:#"/" intoString:NULL]) {
// Closing tag - replace with space unless it's inline
tagName = nil; dontReplaceTagWithSpace = NO;
if ([scanner scanCharactersFromSet:tagNameCharacters intoString:&tagName]) {
tagName = [tagName lowercaseString];
dontReplaceTagWithSpace = ([tagName isEqualToString:#"a"] ||
[tagName isEqualToString:#"b"] ||
[tagName isEqualToString:#"i"] ||
[tagName isEqualToString:#"q"] ||
[tagName isEqualToString:#"span"] ||
[tagName isEqualToString:#"em"] ||
[tagName isEqualToString:#"strong"] ||
[tagName isEqualToString:#"cite"] ||
[tagName isEqualToString:#"abbr"] ||
[tagName isEqualToString:#"acronym"] ||
[tagName isEqualToString:#"label"]);
}
// Replace tag with string unless it was an inline
if (!dontReplaceTagWithSpace && result.length > 0 && ![scanner isAtEnd]) [result appendString:#" "];
}
// Scan past tag
[scanner scanUpToString:#">" intoString:NULL];
[scanner scanString:#">" intoString:NULL];
}
} else {
// Stopped at whitespace - replace all whitespace and newlines with a space
if ([scanner scanCharactersFromSet:newLineAndWhitespaceCharacters intoString:NULL]) {
if (result.length > 0 && ![scanner isAtEnd]) [result appendString:#" "]; // Dont append space to beginning or end of result
}
}
} while (![scanner isAtEnd]);
// Cleanup
[scanner release];
// Decode HTML entities and return
NSString *retString = [[result stringByDecodingHTMLEntities] retain];
[result release];
// Drain
[pool drain];
// Return
return [retString autorelease];
}

This is a total mess
The reason this is a total mess is because you are running into a compiler bug and an arbitrary limitation in the C spec.
Scroll to the bottom for the fix.
Compiler warning
Format specifies type 'unsigned short' but the argument has type 'int'
My conclusion is that this is a compiler bug in Clang. It is definitely safe to ignore this warning, because (unsigned short) arguments are always promoted to (int) before they are passed to vararg functions anyway. This is all stuff that is in the C standard (and it applies to Objective C, too).
printf("%hd", 1); // Clang generates warning. GCC does not.
// Clang is wrong, GCC is right.
printf("%hd", 1 << 16); // Clang generates warning. GCC does not.
// Clang is right, GCC is wrong.
The problem here is that neither compiler looks deep enough.
Remember, it is actually impossible to pass a short to printf(), because it must get promoted to int. GCC never gives a warning for constants, Clang ignores the fact that you are passing a constant and always gives a warning because the type is wrong. Both options are wrong.
I suspect nobody has noticed because -- why would you be passing a constant expression to printf() anyway?
In the short term, you can use the following hack:
#pragma GCC diagnostic ignored "-Wformat"
Universal character names
You can use \uXXXX notation. Except you can't, because the compiler won't let you use U+0085 this way. Why? See § 6.4.3 of C99:
A universal character name shall not specify a character whose short identifier is less than 00A0 other than 0024 ($), 0040 (#), or 0060 (‘), nor one in the range D800 through DFFF inclusive.
This rules out \u0085.
There is a proposal to fix this part of the spec.
The fix
You really want a constant string, don't you? Use this:
[NSCharacterSet characterSetWithCharactersInString:
#"\t\n\r\xc2\x85\x0c\u2028\u2029"]
This relies on the fact that the source encoding is UTF-8. Don't worry, that's not going to change any time soon.
The \xc2\x85 in the string is the UTF-8 encoding of U+0085. The appearance of 85 in both is a coincidence.

The problem is that 0x0085, etc are literal ints. So they don't match the %C format specifier, which expects a unichar, which is an unsigned short.
There's no direct way to specify a literal short in C and I'm not aware of any Objective-C extension. But you can use a brute-force approach:
NSCharacterSet *stopCharacters =
[NSCharacterSet characterSetWithCharactersInString:
[NSString stringWithFormat:#"< \t\n\r%C%C%C%C",
(unichar)0x0085, (unichar)0x000C,
(unichar)0x2028, (unichar)0x2029]];

You don't need stringWithFormat, you can embed unicode chars directly into a string using the \u escape. For example \u0085.

Apply regular expression repeatedly

I've got text expressions like this:
HIUPA:bla1bla1'HIUPD:bla2bla2'HIUPD:bla3bla3'HISYN:bla4bla4'
I want to extract the following text pieces:
HIUPD:bla2bla2'
And
HIUPD:bla3bla3'
My Objective-C code for this looks like this:
-(void) ermittleKonten:(NSString*) bankNachricht
{
NSRegularExpression* regexp;
NSTextCheckingResult* tcr;
regexp = [NSRegularExpression regularExpressionWithPattern:#"HIUPD.*'" options:0 error:nil];
int numAccounts = [regexp numberOfMatchesInString:bankNachricht options:0 range:NSMakeRange(0, [bankNachricht length])];
for( int i = 0; i < numAccounts; ++i ) {
tcr = [regexp firstMatchInString:bankNachricht options:0 range:NSMakeRange( 0, [bankNachricht length] )];
NSString* HIUPD = [bankNachricht substringWithRange:tcr.range];
NSLog(#"Found text is:\n%#", HIUPD);
}
}
In the Objective-C code numAccounts is 1, but should be 2. And the string that is found is "HIUPD:bla2bla2'HIUPD:bla3bla3'HISYN:bla4bla4'"
I tested the regular expression pattern with an online tool ( http://www.regexplanet.com/simple/index.html ). In the online tool it works fine and delivers 2 results as I want it to be.
But I would like to have the same result in the ios code, i.e. "HIUPD:bla2bla2'" and "HIUPD:bla3bla3'". What is wrong with the pattern?

You're doing greedy matching with the .*, so the regular expression catches as much as it can in the .*. You should be doing .*?, or [^']*, so that the * can't match a '.

Objective-c iPhone percent encode a string?

I would like to get the percent encoded string for these specific letters, how to do that in objective-c?
Reserved characters after percent-encoding
! * ' ( ) ; : # & = + $ , / ? # [ ]
%21 %2A %27 %28 %29 %3B %3A %40 %26 %3D %2B %24 %2C %2F %3F %23 %5B %5D
Percent-encoding wiki
Please test with this string and see if it do work:
myURL = #"someurl/somecontent"
I would like the string to look like:
myEncodedURL = #"someurl%2Fsomecontent"
I tried with the stringByAddingPercentEscapesUsingEncoding: NSASCIIStringEncoding already but it does not work, the result is still the same as the original string. Please advice.

I've found that both stringByAddingPercentEscapesUsingEncoding: and CFURLCreateStringByAddingPercentEscapes() are inadequate. The NSString method misses quite a few characters, and the CF function only lets you say which (specific) characters you want to escape. The proper specification is to escape all characters except a small set.
To fix this, I created an NSString category method to properly encode a string. It will percent encoding everything EXCEPT [a-zA-Z0-9.-_~] and will also encode spaces as + (according to this specification). It will also properly handle encoding unicode characters.
- (NSString *) URLEncodedString_ch {
NSMutableString * output = [NSMutableString string];
const unsigned char * source = (const unsigned char *)[self UTF8String];
int sourceLen = strlen((const char *)source);
for (int i = 0; i < sourceLen; ++i) {
const unsigned char thisChar = source[i];
if (thisChar == ' '){
[output appendString:#"+"];
} else if (thisChar == '.' || thisChar == '-' || thisChar == '_' || thisChar == '~' ||
(thisChar >= 'a' && thisChar <= 'z') ||
(thisChar >= 'A' && thisChar <= 'Z') ||
(thisChar >= '0' && thisChar <= '9')) {
[output appendFormat:#"%c", thisChar];
} else {
[output appendFormat:#"%%%02X", thisChar];
}
}
return output;
}

The iOS 7 SDK now has a better alternative tostringByAddingPercentEscapesUsingEncoding that does let you specify that you want all characters escaped except certain allowed ones. It works well if you are building up the URL in parts:
NSString * unescapedQuery = [[NSString alloc] initWithFormat:#"?myparam=%d", numericParamValue];
NSString * escapedQuery = [unescapedQuery stringByAddingPercentEncodingWithAllowedCharacters:[NSCharacterSet URLQueryAllowedCharacterSet]];
NSString * urlString = [[NSString alloc] initWithFormat:#"http://ExampleOnly.com/path.ext%#", escapedQuery];
Although it's less often that the other parts of the URL will be variables, there are constants in the NSURLUtilities category for those as well:
[NSCharacterSet URLHostAllowedCharacterSet]
[NSCharacterSet URLUserAllowedCharacterSet]
[NSCharacterSet URLPasswordAllowedCharacterSet]
[NSCharacterSet URLPathAllowedCharacterSet]
[NSCharacterSet URLFragmentAllowedCharacterSet]
[NSCharacterSet URLQueryAllowedCharacterSet] includes all of the characters allowed in the query part of the URL (the part starting with the ? and before the # for a fragment, if any) including the ? and the & or = characters, which are used to delimit the parameter names and values. For query parameters with alphanumeric values, any of those characters might be included in the values of the variables used to build the query string. In that case, each part of the query string needs to be escaped, which takes just a bit more work:
NSMutableCharacterSet * URLQueryPartAllowedCharacterSet; // possibly defined in class extension ...
// ... and built in init or on first use
URLQueryPartAllowedCharacterSet = [[NSCharacterSet URLQueryAllowedCharacterSet] mutableCopy];
[URLQueryPartAllowedCharacterSet removeCharactersInString:#"&+=?"]; // %26, %3D, %3F
// then escape variables in the URL, such as values in the query and any fragment:
NSString * escapedValue = [anUnescapedValue stringByAddingPercentEncodingWithAllowedCharacters:URLQueryPartAllowedCharacterSet];
NSString * escapedFrag = [anUnescapedFrag stringByAddingPercentEncodingWithAllowedCharacters:[NSCharacterSet URLFragmentAllowedCharacterSet]];
NSString * urlString = [[NSString alloc] initWithFormat:#"http://ExampleOnly.com/path.ext?myparam=%##%#", escapedValue, escapedFrag];
NSURL * url = [[NSURL alloc] initWithString:urlString];
The unescapedValue could even be an entire URL, such as for a callback or redirect:
NSString * escapedCallbackParamValue = [anAlreadyEscapedCallbackURL stringByAddingPercentEncodingWithAllowedCharacters:URLQueryPartAllowedCharacterSet];
NSURL * callbackURL = [[NSURL alloc] initWithString:[[NSString alloc] initWithFormat:#"http://ExampleOnly.com/path.ext?callback=%#", escapedCallbackParamValue]];
Note: Don't use NSURL initWithScheme:(NSString *)scheme host:(NSString *)host path:(NSString *)path for a URL with a query string because it will add more percent escapes to the path.

NSString *encodedString = [myString stringByAddingPercentEscapesUsingEncoding:NSASCIIStringEncoding];
It won't replace your string inline; it'll return a new string. That's implied by the fact that the method starts with the word "string". It's a convenience method to instantiate a new instance of NSString based on the current NSString.
Note--that new string will be autorelease'd, so don't call release on it when you're done with it.

NSString's stringByAddingPercentEscapesUsingEncoding: looks like what you're after.
EDIT: Here's an example using CFURLCreateStringByAddingPercentEscapes instead. originalString can be either an NSString or a CFStringRef.
CFStringRef newString = CFURLCreateStringByAddingPercentEscapes(kCFAllocatorDefault, originalString, NULL, CFSTR("!*'();:#&=+#,/?#[]"), kCFStringEncodingUTF8);
Please note that this is untested. You should have a look at the documentation page to make sure you understand the memory allocation semantics for CFStringRef, the idea of toll-free bridging, and so on.
Also, I don't know (off the top of my head) which of the characters specified in the legalURLCharactersToBeEscaped argument would have been escaped anyway (due to being illegal in URLs). You may want to check this, although it's perhaps better just to be on the safe side and directly specify the characters you want escaped.
I'm making this answer a community wiki so that people with more knowledge about CoreFoundation can make improvements.

Following the RFC3986 standard, here is what I'm using for encoding URL components:
// https://tools.ietf.org/html/rfc3986#section-2.2
let rfc3986Reserved = NSCharacterSet(charactersInString: "!*'();:#&=+$,/?#[]")
let encoded = "email+with+plus#example.com".stringByAddingPercentEncodingWithAllowedCharacters(rfc3986Reserved.invertedSet)
Output: email%2Bwith%2Bplus%40example.com

If you are using ASI HttpRequest library in your objective-c program, which I cannot recommend highly enough, then you can use the "encodeURL" helper API on its ASIFormDataRequest object. Unfortunately, the API is not static so maybe worth creating an extension using its implementation in your project.
The code, copied straight from the ASIFormDataRequest.m for encodeURL implementation, is:
- (NSString*)encodeURL:(NSString *)string
{
NSString *newString = NSMakeCollectable([(NSString *)CFURLCreateStringByAddingPercentEscapes(kCFAllocatorDefault, (CFStringRef)string, NULL, CFSTR(":/?#[]#!$ &'()*+,;=\"<>%{}|\\^~`"), CFStringConvertNSStringEncodingToEncoding([self stringEncoding])) autorelease]);
if (newString) {
return newString;
}
return #"";
}
As you can see, it is essentially a wrapper around CFURLCreateStringByAddingPercentEscapes that takes care of all the characters that should be properly escaped.

Before I noticed Rob's answer, which appears to work well and is preferred as it's cleaner, I went ahead and ported Dave's answer to Swift. I'll leave it here in case anyone is interested:
public extension String {
// For performance, I've replaced the char constants with integers, as char constants don't work in Swift.
var URLEncodedValue: String {
let output = NSMutableString()
guard let source = self.cStringUsingEncoding(NSUTF8StringEncoding) else {
return self
}
let sourceLen = source.count
var i = 0
while i < sourceLen - 1 {
let thisChar = source[i]
if thisChar == 32 {
output.appendString("+")
} else if thisChar == 46 || thisChar == 45 || thisChar == 95 || thisChar == 126 ||
(thisChar >= 97 && thisChar <= 122) ||
(thisChar >= 65 && thisChar <= 90) ||
(thisChar >= 48 && thisChar <= 57) {
output.appendFormat("%c", thisChar)
} else {
output.appendFormat("%%%02X", thisChar)
}
i++
}
return output as String
}
}

In Swift4:
var str = "someurl/somecontent"
let percentEncodedString = str.addingPercentEncoding(withAllowedCharacters: .alphanumerics)

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse