The question is pretty self-explainable. I want to check if certain letter is uppercase or another letter is lowercase. Could you give me any examples of how to do that in Flutter/Dart?
you can use the .toUpperCase() in a boolean statement:
bool isUppercased(String str){
return str == str.toUpperCase();
}
If you want to use regular expressions, here is how you could do:
bool isUpperCase(String letter) {
assert(s.length == 1);
final regExp = RegExp('[A-Z]');
return regExp.hasMatch(letter);
}
The one solution that is coming to my mind is to check its ASCII code.
The ASCII code of a-z starts at 97 and ends at 122.
Similarly, in the case of Uppercase letters A-Z it starts from 65 and ends at 90.
Keeping this in mind you can use the method string.codeUnitAt(index) which will return you the ASCII code and later you can check its range and find its an Uppercase or lowercase.
Have a look into this example
main() {
String ch = 'Rose';
print(' ASCII value of ${ch[0]} is ${ch.codeUnitAt(0)}');
print(' ASCII value of ${ch[1]} is ${ch.codeUnitAt(1)}');
}
The output will be:
ASCII value of R is 82
ASCII value of o is 111
Now you can compare with the range using if statement and find out.
Related
Say I have a string with n number of characters, but I want to trim it down to only 10 characters. (Given that at all times the string has greater that 10 characters)
I don't know the contents of the string.
How to trim it in such a way?
I know how to trim it after a CERTAIN character
String s = "one.two";
//Removes everything after first '.'
String result = s.substring(0, s.indexOf('.'));
print(result);
But how to remove it after a CERTAIN NUMBER of characters?
All answers (using substring) get the first 10 UTF-16 code units, which is different than the first 10 characters because some characters consist of two code units. It is better to use the characters package:
import 'package:characters/characters.dart';
void main() {
final str = "Hello 😀 World";
print(str.substring(0, 9)); // BAD
print(str.characters.take(9)); // GOOD
}
prints
➜ dart main.dart
Hello 😀
Hello 😀 W
With substring you might even get half a character (which isn't valid):
print(str.substring(0, 7)); // BAD
print(str.characters.take(7)); // GOOD
prints:
Hello �
Hello 😀
The above examples will fail if string's length is less than the trimmed length. The below code will work with both short and long strings:
import 'dart:math';
void main() {
String s1 = 'abcdefghijklmnop';
String s2 = 'abcdef';
var trimmed = s1.substring(0, min(s1.length,10));
print(trimmed);
trimmed = s2.substring(0, min(s2.length,10));
print(trimmed);
}
NOTE:
Dart string routines operate on UTF-16 code units. For most of Latin and Cyrylic languages that is not a problem since all characters will fit into a single code unit. Yet emojis, some Asian, African and Middle-east languages might need 2 code units to encode a single character. E.g. '😊'.length will return 2 although it is a single character string. See characters package.
I think this should work.
String result = s.substring(0, 10);
To trim a String to a certain number of characters. The. code below works perfectly well:
// initialise your string here
String s = 'one.two.three.four.five';
// trim the string by getting the first 10 characters
String trimmedString = s.substring(0, 10);
// print the first ten characters of the string
print(trimmedString);
Output:
one.two.th
i hope this helps
You can do this in multiple ways.
'string'.substr(start, ?length) USE :- 'one.two.three.four.five'.substr(0, 10)
'string'.substring(start, ?end) USE :- 'one.two.three.four.five'.substring(0, 10)
'string'.slice(start, ?end) USE :- 'one.two.three.four.five'.slice(0, 10)
To trim all trailing/right characters by specified characters, use the method:
static String trimLastCharacter(String srcStr, String pattern) {
if (srcStr.length > 0) {
if (srcStr.endsWith(pattern)) {
final v = srcStr.substring(0, srcStr.length - 1 - pattern.length);
return trimLastCharacter(v, pattern);
}
return srcStr;
}
return srcStr;
}
For example, you want to remove all 0 behind the decimals
$123.98760000
then, call it by
trimLastCharacter("$123.98760000", "0")
output:
$123.9876
The text file abc.txt is an arbitrary article that has been scraped from the web. For example, it is as follows:
His name is "Donald" and he likes burger. On December 11, he married.
I want to extract only words in lower case and numbers except for all kinds of periods and quotes in the above article. In the case of the above example:
{his, name, is, Donald, and, he, likes, burger, on, December, 11, he, married}
My code is as follows:
filename = 'abc.txt';
fileID = fopen(filename,'r');
C = textscan(fileID,'%s','delimiter',{',','.',':',';','"','''});
fclose(fileID);
Cstr = C{:};
Cstr = Cstr(~cellfun('isempty',Cstr));
Is there any simple code to extract only alphabet words and numbers except all symbols?
Two steps are necessary as you want to convert certain words to lowercase.
regexprep converts words, which are either at the start of the string or follow a full stop and whitespace, to lower case.
In the regexprep function, we use the following pattern:
(?<=^|\. )([A-Z])
to indicate that:
(?<=^|\. ) We want to assert that before the word of interest either the start of string (^), or (|) a full stop (.) followed by whitespace are found. This type of construct is called a lookbehind.
([A-Z]) This part of the expression matches and captures (stores the match) a upper case letter (A-Z).
The ${lower($0)} component in the regex is called a dynamic expression, and replaces the contents of the captured group (([A-Z])) to lower case. This syntax is specific to the MATLAB language.
You can check the behaviour of the above expression here.
Once the lower case conversions have occurred, regexp finds all occurrences of one or more digits, lower case and upper case letters.
The pattern [a-zA-Z0-9]+ matches lower case letters, upper case letters and digits.
You can check the behavior of this regex here.
text = fileread('abc.txt')
data = {regexp(regexprep(text,'(?<=^|\. )([A-Z])','${lower($0)}'),'[a-zA-Z0-9]+','match')'}
>>data{1}
13×1 cell array
{'his' }
{'name' }
{'is' }
{'Donald' }
{'and' }
{'he' }
{'likes' }
{'burger' }
{'on' }
{'December'}
{'11' }
{'he' }
{'married' }
All I want to do is convert a single Character to uppercase without the overhead of converting to a String and then calling .uppercased(). Is there any built-in way to do this, or a way for me to call the toupper() function from C without any bridging? I really don't think I should have to go out of my way for something so simple.
To call the C toupper() you need to get the Unicode code point of the Character. But Character has no method for getting its code point (a Character may consist of multiple code points), so you have to convert the Character into a String to obtain any of its code points.
So you really have to convert to String to get anywhere. Unless you store the character as a UnicodeScalar instead of a Character. In this case you can do this:
assert(unicodeScalar.isASCII) // toupper argument must be "representable as an unsigned char"
let uppercase = UnicodeScalar(toupper(CInt(unicodeScalar.value)))
But this isn't really more readable than simply using String:
let uppercase = Character(String(character).uppercased())
just add this to your program
extension Character {
//converts a character to uppercase
func convertToUpperCase() -> Character {
if(self.isUppercase){
return self
}
return Character(self.uppercased())
}
}
I am trying to create an LPeg pattern that would match any Unicode punctuation inside UTF-8 encoded input. I came up with the following marriage of Selene Unicode and LPeg:
local unicode = require("unicode")
local lpeg = require("lpeg")
local punctuation = lpeg.Cmt(lpeg.Cs(any * any^-3), function(s,i,a)
local match = unicode.utf8.match(a, "^%p")
if match == nil
return false
else
return i+#match
end
end)
This appears to work, but it will miss punctuation characters that are a combination of several Unicode codepoints (if such characters exist), as I am reading only 4 bytes ahead, it probably kills the performance of the parser, and it is undefined what the library match function will do, when I feed it a string that contains a runt UTF-8 character (although it appears to work now).
I would like to know whether this is a correct approach or if there is a better way to achieve what I am trying to achieve.
The correct way to match UTF-8 characters is shown in an example in the LPeg homepage. The first byte of a UTF-8 character determines how many more bytes are a part of it:
local cont = lpeg.R("\128\191") -- continuation byte
local utf8 = lpeg.R("\0\127")
+ lpeg.R("\194\223") * cont
+ lpeg.R("\224\239") * cont * cont
+ lpeg.R("\240\244") * cont * cont * cont
Building on this utf8 pattern we can use lpeg.Cmt and the Selene Unicode match function kind of like you proposed:
local punctuation = lpeg.Cmt(lpeg.C(utf8), function (s, i, c)
if unicode.utf8.match(c, "%p") then
return i
end
end)
Note that we return i, this is in accordance with what Cmt expects:
The given function gets as arguments the entire subject, the current position (after the match of patt), plus any capture values produced by patt. The first value returned by function defines how the match happens. If the call returns a number, the match succeeds and the returned number becomes the new current position.
This means we should return the same number the function receives, that is the position immediately after the UTF-8 character.
Swift seems to be trying to deprecate the notion of a string being composed of an array of atomic characters, which makes sense for many uses, but there's an awful lot of programming that involves picking through datastructures that are ASCII for all practical purposes: particularly with file I/O. The absence of a built in language feature to specify a character literal seems like a gaping hole, i.e. there is no analog of the C/Java/etc-esque:
String foo="a"
char bar='a'
This is rather inconvenient, because even if you convert your strings into arrays of characters, you can't do things like:
let ch:unichar = arrayOfCharacters[n]
if ch >= 'a' && ch <= 'z' {...whatever...}
One rather hacky workaround is to do something like this:
let LOWCASE_A = ("a" as NSString).characterAtIndex(0)
let LOWCASE_Z = ("z" as NSString).characterAtIndex(0)
if ch >= LOWCASE_A && ch <= LOWCASE_Z {...whatever...}
This works, but obviously it's pretty ugly. Does anyone have a better way?
Characters can be created from Strings as long as those Strings are only made up of a single character. And, since Character implements ExtendedGraphemeClusterLiteralConvertible, Swift will do this for you automatically on assignment. So, to create a Character in Swift, you can simply do something like:
let ch: Character = "a"
Then, you can use the contains method of an IntervalType (generated with the Range operators) to check if a character is within the range you're looking for:
if ("a"..."z").contains(ch) {
/* ... whatever ... */
}
Example:
let ch: Character = "m"
if ("a"..."z").contains(ch) {
println("yep")
} else {
println("nope")
}
Outputs:
yep
Update: As #MartinR pointed out, the ordering of Swift characters is based on Unicode Normalization Form D which is not in the same order as ASCII character codes. In your specific case, there are more characters between a and z than in straight ASCII (ä for example). See #MartinR's answer here for more info.
If you need to check if a character is in between two ASCII character codes, then you may need to do something like your original workaround. However, you'll also have to convert ch to an unichar and not a Character for it to work (see this question for more info on Character vs unichar):
let a_code = ("a" as NSString).characterAtIndex(0)
let z_code = ("z" as NSString).characterAtIndex(0)
let ch_code = (String(ch) as NSString).characterAtIndex(0)
if (a_code...z_code).contains(ch_code) {
println("yep")
} else {
println("nope")
}
Or, the even more verbose way without using NSString:
let startCharScalars = "a".unicodeScalars
let startCode = startCharScalars[startCharScalars.startIndex]
let endCharScalars = "z".unicodeScalars
let endCode = endCharScalars[endCharScalars.startIndex]
let chScalars = String(ch).unicodeScalars
let chCode = chScalars[chScalars.startIndex]
if (startCode...endCode).contains(chCode) {
println("yep")
} else {
println("nope")
}
Note: Both of those examples only work if the character only contains a single code point, but, as long as we're limited to ASCII, that shouldn't be a problem.
If you need C-style ASCII literals, you can just do this:
let chr = UInt8(ascii:"A") // == UInt8( 0x41 )
Or if you need 32-bit Unicode literals you can do this:
let unichr1 = UnicodeScalar("A").value // == UInt32( 0x41 )
let unichr2 = UnicodeScalar("é").value // == UInt32( 0xe9 )
let unichr3 = UnicodeScalar("😀").value // == UInt32( 0x1f600 )
Or 16-bit:
let unichr1 = UInt16(UnicodeScalar("A").value) // == UInt16( 0x41 )
let unichr2 = UInt16(UnicodeScalar("é").value) // == UInt16( 0xe9 )
All of these initializers will be evaluated at compile time, so it really is using an immediate literal at the assembly instruction level.
The feature you want was proposed to be in Swift 5.1, but that proposal was rejected for a few reasons:
Ambiguity
The proposal as written, in the current Swift ecosystem, would have allowed for expressions like 'x' + 'y' == "xy", which was not intended (the proper syntax would be "x" + "y" == "xy").
Amalgamation
The proposal was two in one.
First, it proposed a way to introduce single-quote literals into the language.
Second, it proposed that these would be convertible to numerical types to deal with ASCII values and Unicode codepoints.
These are both good proposals, and it was recommended that this be split into two and re-proposed. Those follow-up proposals have not yet been formalized.
Disagreement
It never reached consensus whether the default type of 'x' would be a Character or a Unicode.Scalar. The proposal went with Character, citing the Principle of Least Surprise, despite this lack of consensus.
You can read the full rejection rationale here.
The syntax might/would look like this:
let myChar = 'f' // Type is Character, value is solely the unicode U+0066 LATIN SMALL LETTER F
let myInt8: Int8 = 'f' // Type is Int8, value is 102 (0x66)
let myUInt8Array: [UInt8] = [ 'a', 'b', '1', '2' ] // Type is [UInt8], value is [ 97, 98, 49, 50 ] ([ 0x61, 0x62, 0x31, 0x32 ])
switch someUInt8 {
case 'a' ... 'f': return "Lowercase hex letter"
case 'A' ... 'F': return "Uppercase hex letter"
case '0' ... '9': return "Hex digit"
default: return "Non-hex character"
}
It also looks like you can use the following syntax:
Character("a")
This will create a Character from the specified single character string.
I have only tested this in Swift 4 and Xcode 10.1
Why do I exhume 7 year old posts? Fun I guess? Seriously though, I think I can add to the discussion.
It is not a gaping hole, or rather, it is a deliberate gaping hole that explicitly discourages conflating a string of text with a sequence of ASCII bytes.
You absolutely can pick apart a String. A String implements BidirectionalCollection and has many ways to manipulate the atoms. See: https://developer.apple.com/documentation/swift/string.
But you have to get used to the more generalized notion of a String. It can be picked apart from the User perspective, which is a sequence of grapheme clusters, each (usually) which a visually separable appearance, or from the encoding perspective, which can be one of several (UTF32, UTF16, UTF8).
At the risk of overanalyzing the wording of your question:
A data structure is conceptual, and independent of encoding in storage
A data structure encoded as an ASCII string is just one kind of ASCII string
By design the encoding of ASCII values 0-127 will have an identical encoding in UTF-8, so loading that stream with a UTF8 API is fine
A data structure encoded as a string where fields of the structure have UTF-8 Unicode string values is not an ASCII string, but a UTF-8 string itself
A string is either ASCII-encoded or not; "for practical purposes" isn't a meaningful qualifier. A UTF-8 database field where 99.99% of the text falls in the ASCII range (where encodings will match), but occasionally doesn't, will present some nasty bug opportunities.
Instead of a terse and low-level equivalence of fixed-width integers and English-only text, Swift has a richer API that forces more explicit naming of the involved categories and entities. If you want to deal with ASCII, there's a name (method) for that, and if you want to deal with human sub-categories, there's a name for that, too, and they're totally independent of one another. There is a strong move away from ASCII and the English-centric string handling model of C. This is factual, not evangelizing, and it can present an irksome learning curve.
(This is aimed at new-comers, acknowledging the OP probably has years of experience with this now.)
For what you're trying to do there, consider:
let foo = "abcDeé#¶œŎO!##"
foo.forEach { c in
print((c.isASCII ? "\(c) is ascii with value \(c.asciiValue ?? 0); " : "\(c) is not ascii; ")
+ ((c.isLetter ? "\(c) is a letter" : "\(c) is not a letter")))
}
b is ascii with value 98; b is a letter
c is ascii with value 99; c is a letter
D is ascii with value 68; D is a letter
e is ascii with value 101; e is a letter
é is not ascii; é is a letter
# is ascii with value 64; # is not a letter
¶ is not ascii; ¶ is not a letter
Å“ is not ascii; Å“ is a letter
ÅŽ is not ascii; ÅŽ is a letter
O is ascii with value 79; O is a letter
! is ascii with value 33; ! is not a letter
# is ascii with value 64; # is not a letter
# is ascii with value 35; # is not a letter