Determine if a string only contains invisible characters in Swift - swift

I was parsing a messy XML. I found many of the nodes contain invisible characters only, for instance:
"\n "
" "
"\t "
"\n "
"\n\n"
I saw some posts and answers about alphabet and numbers, but the XML being parsed in my project includes UTF8 characters. I am not sure how I can list all visible UTF8 characters in the filter.
How can I determine if a string is made up of completely invisible characters like above, so I can filter them out? Thanks!

Use CharacterSet for that.
let nonWhitespace = CharacterSet.whitespacesAndNewlines.inverted
let containsNonWhitespace = (string.rangeOfCharacter(from: nonWhitespace) != nil)

Trim the string of whitespaces and newlines and see what's left.
if someString.trimmingCharacters(in: .whitespacesAndNewlines).isEmpty {
// someString only contains whitespaces and newlines
}

Related

Why I cannot use \ or backslash in a String in Swift?

I have a string like this in below and I want replace space with backslash and space.
let test: String = "Hello world".replacingOccurrences(of: " ", with: "\ ")
print(test)
But Xcode make error of :
Invalid escape sequence in literal
The code in up is working for any other character or words, but does not for backslash. Why?
Backslash is used to escape characters. So to print a backslash itself, you need to escape it. Use \\.
For Swift 5 or later you can avoid needing to escape backslashes using the enhanced string delimiters:
let backSlashSpace = #"\ "#
If you need String interpolation as well:
let value = 5
let backSlashSpaceWithValue = #"\\#(value) "#
print(backSlashSpaceWithValue) // \5
You can use as many pound signs as you wish. Just make sure to mach the same amount in you string interpolation:
let value = 5
let backSlashSpaceWithValue = ###"\\###(value) "###
print(backSlashSpaceWithValue) // \5
Note: If you would like more info about this already implemented Swift evolution proposal SE-0200 Enhancing String Literals Delimiters to Support Raw Text

Extracting range of unpadded string

I'd like to extract the Range<String.Index> of a sentence within its whitespace padding. For example,
let padded = " El águila (🦅). "
let sentenceRangeInPadded = ???
assert(padded[sentenceRangeInPadded] == "El águila (🦅).") // The test!
Here's some regex that I started with, but looks like variable length lookbehinds aren't supported.
let sentenceRangeInPadded = padded.range(of: #"(?<=^\s*).*?(?=\s*$)"#, options: .regularExpression)!
I'm not looking to extract the sentence (could just use trimmingCharacters(in:) for that), just the Range.
Thanks for reading!
You may use
#"(?s)\S(?:.*\S)?"#
See the regex demo.
Details
(?s) - a DOTALL modifier making . match any char, including line break chars
\S - the first non-whitespace char
(?:.*\S)? - an optional non-capturing group matching
.* - any 0+ chars as many as possible
\S - up to the last non-whitespace char.

How to get hashtag from string that contains # at the beginning and end without space at the end?

This is my string
"I made this wonderful pic last #chRistmas... #instagram #nofilter #snow #fun"
and I would like to get hashtag that contains # at the beginning and end without space. My expected result is:
$fun
This is what I have so far for regex search:
#[a-z0-9]+
but it give me all the hashtags not the one that I want. Thank you for your help!
Using #[a-zA-Z0-9]*$ instead of your current regex
It seems you need to match a hashtag at the end of the string, or the last hashtag in the string. So, there are several ways solve the issue.
Matching the last hashtag in the string
let str = "I made this wonderful pic last #chRistmas... #instagram #nofilter #snow #fun"
let regex = "#[[:alnum:]]++(?!.*#[[:alnum:]])"
if let range = str.range(of: regex, options: .regularExpression) {
let text: String = String(str[range])
print(text)
}
Details
# - a hash symbol
[[:alnum:]]++ - 1 or more alphanumeric chars
(?!.*#[[:alnum:]]) - no # + 1+ alphanumeric chars after any 0+ chars other than line break chars immediately to the right of the current location.
Matching a hashtag at the end of the string
Same code but with the following regexps:
let regex = "#[[:alnum:]]+$"
or
let regex = "#[[:alnum:]]+\\z"
Note that \z matches the very end of string, if there is a newline char between the hashtag and the end of string, there won't be any match (in case of $, there will be a match).
Note on the regex
If a hashtag should only start with a letter, it is a better idea to use
#[[:alpha:]][[:alnum:]]*
where [[:alpha:]] matches any letter and [[:alnum:]]* matches 0+ letters or/and digits.
Note that in ICU regex patterns, you may write [[:alnum:]] as [:alnum:].
You can use:
(^#[a-z0-9]+|#[a-z0-9]+$)
Test it online

How to remove spaces from a string in Swift?

I have the need to remove leading and trailing spaces around a punctuation character.
For example: Hello , World ... I 'm a newbie iOS Developer.
And I'd like to have: > Hello, World... I'm a newbie iOS Developer.
How can I do this? I tried to get components of the string and enumerate it by sentences. But that is not what I need
Rob's answer is great, but you can trim it down quite a lot by taking advantage of the \p{Po} regular expression class. Getting rid of the spaces around punctuation then becomes a single regular expression replace:
import Foundation
let input = "Hello , World ... I 'm a newbie iOS Developer."
let result = input.replacingOccurrences(of: "\\s*(\\p{Po}\\s?)\\s*",
with: "$1",
options: [.regularExpression])
print(result) // "Hello, World... I'm a newbie iOS Developer."
Rob's answer also tries to trim leading/trailing spaces, but your input doesn't have any of those. If you do care about that you can just call result.trimmingCharacters(in: .whitespacesAndNewlines) on the result.
Here's an explanation for the regular expression. Removing the double-escapes it looks like
\s*(\p{Po}\s?)\s*
This is comprised of the following components:
\s* - Match zero or more whitespace characters (and throw them away)
(…) - Capturing group. Anything inside this group is preserved by the replacement (the $1 in the replacement refers to this group).
\p{Po} - Match a single character in the "Other_Punctuation" unicode category. This includes things like ., ', and …, but excludes things like ( or -.
\s? - Match a single optional whitespace character. This preserves the space after periods (or ellipses).
\s* - Once again, match zero or more whitespace characters (and throw them away). This is what turns your , World into , World.
For Swift 3 or 4 you can use :
let trimmedString = string.trimmingCharacters(in: .whitespaces)
This is a really wonderful problem and a shame that it isn't easier to do in Swift today (someday it will be, but not today).
I kind of hate this code, but I'm getting on a plane for 20 hours, and don't have time to make it nicer. This may at least get you started using NSMutableString. It'd be nice to work in String, and Swift hates regular expressions, so this is kind of hideous, but at least it's a start.
import Foundation
let input = "Hello, World ... I 'm a newbie iOS Developer."
let adjustments = [
(pattern: "\\s*(\\.\\.\\.|\\.|,)\\s*", replacement: "$1 "), // elipsis or period or comma has trailing space
(pattern: "\\s*'\\s*", replacement: "'"), // apostrophe has no extra space
(pattern: "^\\s+|\\s+$", replacement: ""), // remove leading or trailing space
]
let mutableString = NSMutableString(string: input)
for (pattern, replacement) in adjustments {
let re = try! NSRegularExpression(pattern: pattern)
re.replaceMatches(in: mutableString,
options: [],
range: NSRange(location: 0, length: mutableString.length),
withTemplate: replacement)
}
mutableString // "Hello, World... I'm a newbie iOS Developer."
Regular expressions can be very confusing when you first encounter them. A few hints at reading these:
The specific language Foundation uses is described by ICU.
Backslash (\) means "the next character is special" for a regex. But inside a Swift string, backslash means "the next character is special" of the string. So you have to double them all.
\s means "a whitespace character"
\s* means "zero or more whitespace characters"
\s+ means "one or more whitespace characters"
$1 means "the thing we matched in parentheses"
| means "or"
^ means "start of string"
$ means "end of string"
. means "any character" so to mean "an actual dot" you have to type "\\." in a Swift string.
Notice that I check for both "..." and "." in the same regular expression. You kind of have to do something like that, or else the "." will match three times inside the "...". Another approach would be to first replace "..." with "…" (the single ellipsis character, typed on a Mac by pressing Opt-;). Then "…" is a one-character punctuation. (You could also decide to re-expand all ellipsis back to dot-dot-dot at the end of the process.)
Something like this is probably how I'd do it in real life, get it done and shipped, but it may be worth the pain/practice to try to build this as a character-by-character state machine, walking one character at a time, and keeping track of your current state.
You can try something like
string.replacingOccurrences(of: " ,", with: ",") for every punctuation...
Interesting problem; here's my stab at a non-Regex approach:
func correct(input: String) -> String {
typealias Correction = (punctuation: String, replacement: String)
let corrections: [Correction] = [
(punctuation: "...", replacement: "... "),
(punctuation: "'", replacement: "'"),
(punctuation: ",", replacement: ", "),
]
var transformed = input
for correction in corrections {
transformed = transformed
.components(separatedBy: correction.punctuation)
.map({ $0.trimmingCharacters(in: .whitespaces) })
.joined(separator: correction.replacement)
}
return transformed
}
let testInput = "Hello , World ... I 'm a newbie iOS Developer."
let testOutput = correct(input: testInput)
// Hello, World... I'm a newbie iOS Developer.
If you were doing this manually by processing characters arrays, you would merely need to check the previous and next characters around spaces. You can achieve the same result using functional style programming with zip, filter and map:
let testInput = "Hello , World ... I 'm a newbie iOS Developer."
let punctuation = Set(".\',")
let previousNext = zip( [" "] + testInput, String(testInput.dropFirst()) + [" "] )
let filteredChars = zip(Array(previousNext),testInput)
.filter{ $1 != " "
|| !($0.0 != " " && punctuation.contains($0.1))
}
let filteredInput = String(filteredChars.map{$1})
print(testInput) // Hello , World ... I 'm a newbie iOS Developer.
print(filteredInput) // Hello, World... I'm a newbie iOS Developer.
Swift 4, 4.2 and 5
let str = " Akbar Code "
let trimmedString = str.trimmingCharacters(in: .whitespaces)

Clean string from html tags and special characters

I want to clean my text from html tags, html spacial characters and characters like < > [ ] / \ * ,
I used $str = preg_replace("/&#?[a-zA-Z0-9]+;/i", "", $str);
it works well with html special characters but some characters doesn't remove like :
( /*/*]]>*/ )
how can I remove these characters?
If you are really using php as it looks like, you can just use:
$str = htmlspecialchars($str);
All HTML chars will be escaped (which could be better than just stripping them). If you really want just to filter these characters, what you need to do is escape those characters on the chars list:
$str = preg_replace("/[\&#\?\]\[\/\\\<\>\*\:\(\);]*/i","",$str);
Notice there's just one "/[]*/i", I removed the a-zA-Z0-9 as you should want these chars in. You can also classify only the desired chars to enter your string (will give you trouble with accentuations like á é ü if you use them, you have to specify every accepted char):
$str = preg_replace("/[^a-zA-Z0-9áÁéÉíÍãÃüÜõÕñÑ\.\+\-\_\%\$\#\!\=;]*/","",$str);
Notice also there's never too much to escape characters, unless for example for the intervals (\a-\z would do fine, \a-\z would match a, or -, or z).
I hope it helps. :)
Regular expression for html tags is:
/\<(.*)?\>/
so use something like this:
// The regular expression to remove HTML tags
$htmltagsregex = '/\<(.*)?\>/';
// what shit will substitute it
$nothing = '';
// the string I want to apply it to
$string = 'this is a string with <b>HTML tags</b> that I want to <strong>remove</strong>';
// DO IT
$result = preg_replace ($htmltagsregex,nothing,$string);
and it will return
this is a string with HTML tags that I want to remove
That's all