Trying to understand this code (creating a [char]range) - powershell

I have code that works, but I have no idea WHY it works.
This will generate a list containing each letter of the English alphabet:
[char[]]([char]'a'..[char]'z')
However, this will not:
[char]([char]'a'..[char]'z')
and this will actually generate a list of numbers from 97 - 122
([char]'a'..[char]'z')
Could any experts out there explain to me how this works (or doesn't)?

In your second example, you are trying to cast an array of characters to a single character [char]. That won't work. In the third example, the 'a' is considered a string by PowerShell. So casting it to [char] tells PowerShell it is a single char. The .. operator ranges over numbers. Fortunately, PowerShell can convert the character 'a' to its ASCII value 97 and 'z' to 122. So you effectively wind up with 97..122. Then in your first example, the [char[]] converts that array of ints back to an array of characters: a through z.

In Powershell 'a' is a [string] type. [char]'a' is, obviously a [char] type. These are very different things.
$string = 'a'
$char = [char]$string
$string can be cast as a [char] because it is a string, consisting of a single character. If there is more than one character in the string, e.g. 'ab' then you need an array of [chars], which is type [char[]]. The extra set of square brackets designates an array.
$string | get-member
$char | get-member
reveals much different methods for the two types. The [char] type has .toint() methods. If you cast it as [int], it assumes the numeric ASCII code for that character.
[int]$char
returns 97, the ASCII code for the letter 'a'.

Related

Comparing '1' and '_' in PowerShell gives unexpected result

On comparing '1' with '_' the answer I'm expecting is '1' < '_' because their Ascii values are 49 and 95 respectively. But the answer is the other way. For that matter, even ':' instead of '_' gives the same result.
[byte][char]'1' -gt [byte][char]'_'
>> False
which makes sense. However:
'1' -gt '_'
>> True
Would appreciate any pointers on what I may be missing here. In essence I'm looking for a reliable way to lexicographically compare strings in powershell. Thanks!
Let's break down your two examples.
[byte][char]'1' -gt [byte][char]'_'
In this example you're comparing a byte to a byte. It is important to note that byte and char are both numeric values. The only real difference is that a char is 16 bits (so it can represent Unicode characters) and a byte is only 8 bits. Casting a string to a char gets the numeric representation of the character in the string (provided the string only contains a single character).
This means that [byte][char]'1' results in the number 49 and [byte][char]'_' results in 95. The expression will evaluate to false since 49 is not greater than 95.
Now let's look at your second example
'1' -gt '_'
In this example, you're comparing a string to a string. When comparing two strings using -gt, -ge, -lt, or -le, it uses the alphabetical sort order to determine whether or not the expression should be true or false, not the numeric values of the characters in the string. If one string is sorted before another, the first string is considered less than the second and vice versa.
You can see this behavior if you pass some strings to the Sort-Object cmdlet.
'1', '2', '3', '_' | Sort-Object
# returns '_', '1', '2', '3'
This means that your second example will return true because in the sort order implemented by .NET, _ comes before 1.
The order of special characters can vary by language and/or culture as there does not appear to be a standard, however it is pretty universally accepted that special characters should be sorted before numbers and letters.
Your first example is using Byte.CompareTo(Byte) whereas your second example is using String.CompareTo(String).
'1' -gt '_' returns $true because 1 follows _.
Two different ways you could see it:
'1'.CompareTo('_') # => 1
([char] '1').CompareTo([char] '_') # => -46
'1', '_' | Sort-Object # => `_` goes first
'1', '_' -as [char[]] | Sort-Object # => `1` goes first

Count the scale of a given decimal

How can I count the scale of a given decimal in Powershell?
$a = 0.0001
$b = 0.000001
Casting $a to a string and returning $a.Length gives a result of 6...I need 4.
I thought there'd be a decimal or math function but I haven't found it and messing with a string seems inelegant.
There's probably a better mathematic way but I'd find the decimal places like this:
$a = 0.0001
$decimalPlaces = ("$a" -split '\.')[-1].TrimEnd('0').Length
Basically, split the string on the . character and get the length of the last string in the array. Wrapping $a in double-quotes implicitly calls .ToString() with an invariant culture (you could expand this as $a.ToString([CultureInfo]::InvariantCulture)), making this method to determine the number of decimal places culture-invariant.
.TrimEnd('0') is used in case $a were sourced from a string, not a proper number type, it's possible that trailing zeroes could be included that should not count as decimal places. However, if you want the scale and not just the used decimal places, leave .TrimEnd('0') off like so:
$decimalPlaces = ("$a" -split '\.')[-1].Length
mclayton helpfully linked to this answer to a related C# question in a comment, and the solution there can indeed be adapted to PowerShell, if working with or conversion to type [decimal] is acceptable:
# Define $a as a [decimal] literal (suffix 'd')
# This internally records the scale (number of decimal places) as specified.
$a = 0.0001d
# [decimal]::GetBits() allows extraction of the scale from the
# the internal representation:
[decimal]::GetBits($a)[-1] -shr 16 -band 0xFF # -> 4, the number of decimal places
The System.Decimal.GetBits method returns an array of internal bit fields whose last element contains the scale in bits 16 - 23 (8 bits, even though the max. scale allowed is 28), which is what the above extracts.
Note: A PowerShell number literal that is a fractional number without the d suffix - e.g., 0.0001 becomes a [double] instance, i.e. a double-precision binary floating-point number.
PowerShell automatically converts [double] to [decimal] values on demand, but do note that there can be rounding errors due to the differing internal representations, and that [double] can store larger numbers than [decimal] can (although not accurately).
A [decimal] literal - one with suffix d (note that C# uses suffix m) - is parsed with a scale exactly as specified, so that applying the above to 0.000d and 0.010d yields 3 in both cases; that is, the trailing zeros are meaningful.
This does not apply if you (implicitly) convert from [double] instances such as 0.000 and 0.010, for which the above yields 0 and 2, respectively.
A string-based solution:
To offer a more concise (also culture-invariant) alternative to Bender The Greatest's helpful answer:
$a = 0.0001
("$a" -replace '.+\.').Length # -> 4, the number of decimal places
Caveat: This solution relies on the default string representation of a [double] number, which need not match the original input format; for instance, .0100, when stringified later, becomes '0.01'; however, as discussed above, you can preserve trailing zeros if you start with a [decimal] literal: .0100d stringifies to '0.0100' (input number of decimals preserved).
"$a", uses an expandable string (PowerShell's string interpolation) to create a culture-invariant string representation of the number so as to ensure that the string representation uses . as the decimal mark.
In effect, PowerShell calls $a.ToString([cultureinfo]::InvariantCulture) behind the scenes.[1].
By contrast, .ToString() (argument-less) applies the rules of the current culture, and in some cultures it is , - not . - that is used as the decimal mark.
Caveat: If you use just $a as the LHS of -replace, $a is implicitly stringified, in which case you - curiously - get culture-sensitive behavior, as with .ToString() - see this GitHub issue.
-replace '.+\.' effectively removes all characters up to and including the decimal point from the input string, and .Length counts the characters in the resulting string - the number of decimal places.
[1] Note that casts from strings in PowerShell too use the invariant culture (effectively, ::Parse($value, [cultureinfo]::InvariantCulture) is called) so that in order to parse a a culture-local string representation you'll need to use the ::Parse() method explicitly; e.g., [double]::Parse('1,2'), not [double] '1,2'.

How to add multiple values to case sensitive powershell hashtable?

I need a (key,value) hashtable of alphabet to convert letters and numbers to codes in PowerShell, i did it like this:
$symbols = #{"A"="0x41"; "B"="0x42"; "C"="0x43"; "D"="0x44"; "E"="0x45"; "F"="0x46"; "G"="0x47"; "H"="0x48"; "I"="0x49"; ....}
But then i noticed that hashtables are case insensitive by default, and i need case sensitivity. I found that i can create case sensitive hashtable with:
$symbols = New-Object System.Collections.Hashtable
and then add values:
$symbols.Add("a","0x41")
$symbols.Add("A","shift+0x41")
....
But that will take 52 lines of code, is there any way to add multiple values to CASE SENSITIVE hashtable in one line?
Because if i try to combine two hashtables or add values in one line, it becomes case INsensitive and throws error about duplicate values.
I think this potentially does what you're after:
$symbols = New-Object System.Collections.Hashtable
((65..90) + (97..122)) | ForEach-Object {
$symbols.Add([char]$_,"$(if ($_ -lt 97) {'shift+'})0x{0:x}" -f $( if ($_ -lt 97) { $_ } Else { $_ -32 }))
}
$symbols.GetEnumerator() | sort name
This assumes that you're converting the character to hex code.
Explanation:
((65..90) + (97..122)) creates an array of two ranges of numbers, which are the decimal codes for A..Z and a..z.
Converts the decimal code to it's corresponding character letter with [Char]
If the code is less than 97 add the text Shift+ to the start of the value.
Uses "0x{0:x}" -f <number> to convert the number to its hex equivalent, changing the range back to the lower case range for the uppercase characters by subtracting 32.

Why is `-lt` behaving differently for chars and strings?

I recently answered a SO-question about using -lt or -gt with strings. My answer was based on something I've read earlier which said that -lt compares one char from each string at a time until a ASCII-value is not equal to the other. At that point the result (lower/equal/greater) decides. By that logic, "Less" -lt "less" should return True because L has a lower ASCII-byte-value than l, but it doesn't:
[System.Text.Encoding]::ASCII.GetBytes("Less".ToCharArray())
76
101
115
115
[System.Text.Encoding]::ASCII.GetBytes("less".ToCharArray())
108
101
115
115
"Less" -lt "less"
False
It seems that I may have been missing a crucial piece: the test is case-insensitive
#L has a lower ASCII-value than l. PS doesn't care. They're equal
"Less" -le "less"
True
#The last s has a lower ASCII-value than t. PS cares.
"Less" -lt "lest"
True
#T has a lower ASCII-value than t. PS doesn't care
"LesT" -lt "lest"
False
#Again PS doesn't care. They're equal
"LesT" -le "lest"
True
I then tried to test char vs single-character-string:
[int][char]"L"
76
[int][char]"l"
108
#Using string it's case-insensitive. L = l
"L" -lt "l"
False
"L" -le "l"
True
"L" -gt "l"
False
#Using chars it's case-sensitive! L < l
([char]"L") -lt ([char]"l")
True
([char]"L") -gt ([char]"l")
False
For comparison, I tried to use the case-sensitive less-than operator, but it says L > l which is the opposite of what -lt returned for chars.
"L" -clt "l"
False
"l" -clt "L"
True
How does the comparison work, because it clearly isn't by using ASCII-value and why does it behave differently for chars vs. strings?
A big thank-you to PetSerAl for all his invaluable input.
tl; dr:
-lt and -gt compare [char] instances numerically by Unicode codepoint.
Confusingly, so do -ilt, -clt, -igt, -cgt - even though they only make sense with string operands, but that's a quirk in the PowerShell language itself (see bottom).
-eq (and its alias -ieq), by contrast, compare [char] instances case-insensitively, which is typically, but not necessarily like a case-insensitive string comparison (-ceq again compares strictly numerically).
-eq/-ieq ultimately also compares numerically, but first converts the operands to their uppercase equivalents using the invariant culture; as a result, this comparison is not fully equivalent to PowerShell's string comparison, which additionally recognizes so-called compatible sequences (distinct characters or even sequences considered to have the same meaning; see Unicode equivalence) as equal.
In other words: PowerShell special-cases the behavior of only -eq / -ieq with [char] operands, and does so in a manner that is almost, but not quite the same as case-insensitive string comparison.
This distinction leads to counter-intuitive behavior such as [char] 'A' -eq [char] 'a' and [char] 'A' -lt [char] 'a' both returning $true.
To be safe:
always cast to [int] if you want numeric (Unicode codepoint) comparison.
always cast to [string] if you want string comparison.
For background information, read on.
PowerShell's usually helpful operator overloading can be tricky at times.
Note that in a numeric context (whether implicit or explicit), PowerShell treats characters ([char] ([System.Char]) instances) numerically, by their Unicode codepoint (not ASCII).
[char] 'A' -eq 65 # $true, in the 'Basic Latin' Unicode range, which coincides with ASCII
[char] 'Ā' -eq 256 # $true; 0x100, in the 'Latin-1 Supplement' Unicode range
What makes [char] unusual is that its instances are compared to each other numerically as-is, by Unicode codepoint, EXCEPT with -eq/-ieq.
ceq, -lt, and -gt compare directly by Unicode codepoints, and - counter-intuitively - so do -ilt, -clt, -igt and -cgt:
[char] 'A' -lt [char] 'a' # $true; Unicode codepoint 65 ('A') is less than 97 ('a')
-eq (and its alias -ieq) first transforms the characters to uppercase, then compares the resulting Unicode codepoints:
[char] 'A' -eq [char] 'a' # !! ALSO $true; equivalent of 65 -eq 65
It's worth reflecting on this Buddhist turn: this and that: in the world of PowerShell, character 'A' is both less than and equal to 'a', depending on how you compare.
Also, directly or indirectly - after transformation to uppercase - comparing Unicode codepoints is NOT the same as comparing them as strings, because PowerShell's string comparison additionally recognizes so-called compatible sequences, where characters (or even character sequences) are considered "the same" if they have the same meaning (see Unicode equivalence); e.g.:
# Distinct Unicode characters U+2126 (Ohm Sign) and U+03A9 Greek Capital Letter Omega)
# ARE recognized as the "same thing" in a *string* comparison:
"Ω" -ceq "Ω" # $true, despite having distinct Unicode codepoints
# -eq/ieq: with [char], by only applying transformation to uppercase, the results
# are still different codepoints, which - compared numerically - are NOT equal:
[char] 'Ω' -eq [char] 'Ω' # $false: uppercased codepoints differ
# -ceq always applies direct codepoint comparison.
[char] 'Ω' -ceq [char] 'Ω' # $false: codepoints differ
Note that use of prefixes i or c to explicitly specify case-matching behavior is NOT sufficient to force string comparison, even though conceptually operators such as -ceq, -ieq, -clt, -ilt, -cgt, -igt only make sense with strings.
Effectively, the i and c prefixes are simply ignored when applied to -lt and -gt while comparing [char] operands; as it turns out (unlike what I originally thought), this is a general PowerShell pitfall - see below for an explanation.
As an aside: -lt and -gt logic in string comparison is not numeric, but based on collation order (a human-centric way of ordering independent of codepoints / byte values), which in .NET terms is controlled by cultures (either by default by the one currently in effect, or by passing a culture parameter to methods).
As #PetSerAl demonstrates in a comment (and unlike what I originally claimed), PS string comparisons use the invariant culture, not the current culture, so their behavior is the same, irrespective of what culture is the current one.
Behind the scenes:
As #PetserAl explains in the comments, PowerShell's parsing doesn't distinguish between the base form of an operator its i-prefixed form; e.g., both -lt and -ilt are translated to the same value, Ilt.
Thus, Powershell cannot implement differing behavior for -lt vs. -ilt, -gt vs. igt, ..., because it treats them the same at the syntax level.
This leads to somewhat counter-intuitive behavior in that operator prefixes are effectively ignored when comparing data types where case-sensitivity has no meaning - as opposed to getting coerced to strings, as one might expect; e.g.:
"10" -cgt "2" # $false, because "2" comes after "1" in the collation order
10 -cgt 2 # !! $true; *numeric* comparison still happens; the `c` is ignored.
In the latter case I would have expected the use of -cgt to coerce the operands to strings, given that case-sensitive comparison is only a meaningful concept in string comparison, but that is NOT how it works.
If you want to dig deeper into how PowerShell operates, see #PetSerAl's comments below.
Not quite sure what to post here other than the comparisons are all correct when dealing with strings/characters. If you want an Ordinal comparison, do an Ordinal comparison and you get results based on that.
Best Practices for Using Strings in the .NET Framework
[string]::Compare('L','l')
returns 1
and
[string]::Compare("L","l", [stringcomparison]::Ordinal)
returns -32
Not sure what to add here to help clarify.
Also see: Upper vs Lower Case

Prevent coercion

Assuming:
Function Invoke-Foo {
Param(
[string[]]$Ids
)
Foreach ($Id in $Ids) {
Write-Host $Id
}
}
If I do this:
PS> Invoke-Foo -ids '0000','0001'
0000
0001
If I do this:
PS> Invoke-Foo -ids 0000,0001
0
1
In the second case, is there a way to prevent the coercion, other than make them explicit strings (first case)?
No, unfortunately not.
From the about_Parsing help file:
When processing a command, the Windows PowerShell parser operates
in expression mode or in argument mode:
- In expression mode, character string values must be contained in
quotation marks. Numbers not enclosed in quotation marks are treated
as numerical values (rather than as a series of characters).
- In argument mode, each value is treated as an expandable string
unless it begins with one of the following special characters: dollar
sign ($), at sign (#), single quotation mark ('), double quotation
mark ("), or an opening parenthesis (().
So, the parser evaluates 0001 before anything is passed to the function. We can test the effect of treating 0001 as an "Expandable String" with the ExpandString() method:
PS C:\> $ExecutionContext.InvokeCommand.ExpandString(0001)
1
At least, if you are sure that your ids are in the range [0, 9999], you can do the formatting like this:
Function Invoke-Foo {
Param([int[]]$Ids)
Foreach ($Id in $Ids) {
Write-Host ([System.String]::Format("{0:D4}", $Id))
}
}
More about padding numbers with leading zeros can be found here.
What important to note here:
Padding will work for numbers. I changed the parameter typing to int[] so that if you pass the strings they will be converted to numbers and the padding will work for them too.
This method (as it is now) limits you to the range of ids I mentioned before and it always will give you four-zeros-padded output, even if you pass it '003'