Comparing '1' and '_' in PowerShell gives unexpected result - powershell

On comparing '1' with '_' the answer I'm expecting is '1' < '_' because their Ascii values are 49 and 95 respectively. But the answer is the other way. For that matter, even ':' instead of '_' gives the same result.
[byte][char]'1' -gt [byte][char]'_'
>> False
which makes sense. However:
'1' -gt '_'
>> True
Would appreciate any pointers on what I may be missing here. In essence I'm looking for a reliable way to lexicographically compare strings in powershell. Thanks!

Let's break down your two examples.
[byte][char]'1' -gt [byte][char]'_'
In this example you're comparing a byte to a byte. It is important to note that byte and char are both numeric values. The only real difference is that a char is 16 bits (so it can represent Unicode characters) and a byte is only 8 bits. Casting a string to a char gets the numeric representation of the character in the string (provided the string only contains a single character).
This means that [byte][char]'1' results in the number 49 and [byte][char]'_' results in 95. The expression will evaluate to false since 49 is not greater than 95.
Now let's look at your second example
'1' -gt '_'
In this example, you're comparing a string to a string. When comparing two strings using -gt, -ge, -lt, or -le, it uses the alphabetical sort order to determine whether or not the expression should be true or false, not the numeric values of the characters in the string. If one string is sorted before another, the first string is considered less than the second and vice versa.
You can see this behavior if you pass some strings to the Sort-Object cmdlet.
'1', '2', '3', '_' | Sort-Object
# returns '_', '1', '2', '3'
This means that your second example will return true because in the sort order implemented by .NET, _ comes before 1.
The order of special characters can vary by language and/or culture as there does not appear to be a standard, however it is pretty universally accepted that special characters should be sorted before numbers and letters.

Your first example is using Byte.CompareTo(Byte) whereas your second example is using String.CompareTo(String).
'1' -gt '_' returns $true because 1 follows _.
Two different ways you could see it:
'1'.CompareTo('_') # => 1
([char] '1').CompareTo([char] '_') # => -46
'1', '_' | Sort-Object # => `_` goes first
'1', '_' -as [char[]] | Sort-Object # => `1` goes first

Related

Powershell number format

I am creating a script converting a csv file in an another format.
To do so, i need my numbers to have a fixed format to respect column size : 00000000000000000,00 (20 characters, 2 digits after comma)
I have tried to format the number with -f and the method $value.toString("#################.##") without success
Here is an example Input :
4000000
45817,43
400000
570425,02
15864155,69
1068635,69
128586256,9
8901900,04
29393,88
126858346,88
1190011,46
2358411,95
139594,82
13929,74
11516,85
55742,78
96722,57
21408,86
717,01
54930,49
391,13
2118,64
Any hints are welcome :)
Thank you !
tl;dr:
Use 0 instead of # in the format string:
PS> $value = 128586256.9; $value.ToString('00000000000000000000.00')
00000000000128586256.90
Note:
Alternatively, you could construct the format string as an expression:
$value.ToString('0' * 20 + '.00')
The resulting string reflects the current culture with respect to the decimal mark; e.g., with fr-FR (French) in effect, , rather than . would be used; you can pass a specific [cultureinfo] object as the second argument to control what culture is used for formatting; see the docs.
As in your question, I'm assuming that $value already contains a number, which implies that you've already converted the CSV column values - which are invariably strings - to numbers.
To convert a string culture-sensitively to a number, use [double]::Parse('1,2'), for instance (this method too has an overload that allows specifying what culture to use).
Caveat: By contrast, a PowerShell cast (e.g. [double] '1.2') is by design always culture-invariant and only recognizes . as the decimal mark, irrespective of the culture currently in effect.
zerocukor287 has provided the crucial pointer:
To unconditionally represent a digit in a formatted string and default to 0 in the absence of an available digit, use 0, the zero placeholder in a .NET custom numeric format string
By contrast, #, the digit placeholder, represents only digits actually present in the input number.
To illustrate the difference:
PS> (9.1).ToString('.##')
9.1 # only 1 decimal place available, nothing is output for the missing 2nd
PS> (9.1).ToString('.00')
9.10 # only 1 decimal place available, 0 is output for the missing 2nd
Since your input uses commas as decimal point, you can split on the comma and format the whole number and the decimal part separately.
Something like this:
$csv = #'
Item;Price
Item1;4000000
Item2;45817,43
Item3;400000
Item4;570425,02
Item5;15864155,69
Item6;1068635,69
Item7;128586256,9
Item8;8901900,04
Item9;29393,88
Item10;126858346,88
Item11;1190011,46
Item12;2358411,95
Item13;139594,82
Item14;13929,74
Item15;11516,85
Item16;55742,78
Item17;96722,57
Item18;21408,86
Item19;717,01
Item20;54930,49
Item21;391,13
Item22;2118,64
'# | ConvertFrom-Csv -Delimiter ';'
foreach ($item in $csv) {
$num,$dec = $item.Price -split ','
$item.Price = '{0:D20},{1:D2}' -f [int64]$num, [int]$dec
}
# show on screen
$csv
# output to (new) csv file
$csv | Export-Csv -Path 'D:\Test\formatted.csv' -Delimiter ';'
Output in screen:
Item Price
---- -----
Item1 00000000000004000000,00
Item2 00000000000000045817,43
Item3 00000000000000400000,00
Item4 00000000000000570425,02
Item5 00000000000015864155,69
Item6 00000000000001068635,69
Item7 00000000000128586256,09
Item8 00000000000008901900,04
Item9 00000000000000029393,88
Item10 00000000000126858346,88
Item11 00000000000001190011,46
Item12 00000000000002358411,95
Item13 00000000000000139594,82
Item14 00000000000000013929,74
Item15 00000000000000011516,85
Item16 00000000000000055742,78
Item17 00000000000000096722,57
Item18 00000000000000021408,86
Item19 00000000000000000717,01
Item20 00000000000000054930,49
Item21 00000000000000000391,13
Item22 00000000000000002118,64
I do things like this all the time, usually for generating computernames. That custom numeric format string reference will come in handy. If you want a literal period, you have to backslash it.
1..5 | % tostring 00000000000000000000.00
00000000000000000001.00
00000000000000000002.00
00000000000000000003.00
00000000000000000004.00
00000000000000000005.00
Adding commas to long numbers:
psdrive c | % free | % tostring '0,0' # or '#,#'
18,272,501,760
"Per mille" character ‰ :
.00354 | % tostring '#0.##‰'
3.54‰

Pad IP addresses with leading 0's - powershell

I'm looking to pad IP addresses with 0's
example
1.2.3.4 -> 001.002.003.004
50.51.52.53 -> 050.051.052.053
Tried this:
[string]$paddedIP = $IPvariable
[string]$paddedIP.PadLeft(3, '0')
Also tried split as well, but I'm new to powershell...
You can use a combination of .Split() and -join.
('1.2.3.4'.Split('.') |
ForEach-Object {$_.PadLeft(3,'0')}) -join '.'
With this approach, you are working with strings the entire time. Split('.') creates an array element at every . character. .PadLeft(3,'0') ensures 3 characters with leading zeroes if necessary. -join '.' combines the array into a single string with each element separated by a ..
You can take a similar approach with the format operator -f.
"{0:d3}.{1:d3}.{2:d3}.{3:d3}" -f ('1.2.3.4'.Split('.') |
Foreach-Object { [int]$_ } )
The :dN format string enables N (number of digits) padding with leading zeroes.
This approach creates a string array like in the first solution. Then each element is pipelined and converted to an [int]. Lastly, the formatting is applied to each element.
To complement AdminOfThings' helpful answer with a more concise alternative using the -replace operator with a script block ({ ... }), which requires PowerShell Core (v6.1+):
PSCore> '1.2.3.50' -replace '\d+', { '{0:D3}' -f [int] $_.Value }
001.002.003.050
The script block is called for every match of regex \d+ (one or more digits), and $_ inside the script block refers to a System.Text.RegularExpressions.Match instance that represents the match at hand; its .Value property contains the matched text (string).

How to add multiple values to case sensitive powershell hashtable?

I need a (key,value) hashtable of alphabet to convert letters and numbers to codes in PowerShell, i did it like this:
$symbols = #{"A"="0x41"; "B"="0x42"; "C"="0x43"; "D"="0x44"; "E"="0x45"; "F"="0x46"; "G"="0x47"; "H"="0x48"; "I"="0x49"; ....}
But then i noticed that hashtables are case insensitive by default, and i need case sensitivity. I found that i can create case sensitive hashtable with:
$symbols = New-Object System.Collections.Hashtable
and then add values:
$symbols.Add("a","0x41")
$symbols.Add("A","shift+0x41")
....
But that will take 52 lines of code, is there any way to add multiple values to CASE SENSITIVE hashtable in one line?
Because if i try to combine two hashtables or add values in one line, it becomes case INsensitive and throws error about duplicate values.
I think this potentially does what you're after:
$symbols = New-Object System.Collections.Hashtable
((65..90) + (97..122)) | ForEach-Object {
$symbols.Add([char]$_,"$(if ($_ -lt 97) {'shift+'})0x{0:x}" -f $( if ($_ -lt 97) { $_ } Else { $_ -32 }))
}
$symbols.GetEnumerator() | sort name
This assumes that you're converting the character to hex code.
Explanation:
((65..90) + (97..122)) creates an array of two ranges of numbers, which are the decimal codes for A..Z and a..z.
Converts the decimal code to it's corresponding character letter with [Char]
If the code is less than 97 add the text Shift+ to the start of the value.
Uses "0x{0:x}" -f <number> to convert the number to its hex equivalent, changing the range back to the lower case range for the uppercase characters by subtracting 32.

Why is `-lt` behaving differently for chars and strings?

I recently answered a SO-question about using -lt or -gt with strings. My answer was based on something I've read earlier which said that -lt compares one char from each string at a time until a ASCII-value is not equal to the other. At that point the result (lower/equal/greater) decides. By that logic, "Less" -lt "less" should return True because L has a lower ASCII-byte-value than l, but it doesn't:
[System.Text.Encoding]::ASCII.GetBytes("Less".ToCharArray())
76
101
115
115
[System.Text.Encoding]::ASCII.GetBytes("less".ToCharArray())
108
101
115
115
"Less" -lt "less"
False
It seems that I may have been missing a crucial piece: the test is case-insensitive
#L has a lower ASCII-value than l. PS doesn't care. They're equal
"Less" -le "less"
True
#The last s has a lower ASCII-value than t. PS cares.
"Less" -lt "lest"
True
#T has a lower ASCII-value than t. PS doesn't care
"LesT" -lt "lest"
False
#Again PS doesn't care. They're equal
"LesT" -le "lest"
True
I then tried to test char vs single-character-string:
[int][char]"L"
76
[int][char]"l"
108
#Using string it's case-insensitive. L = l
"L" -lt "l"
False
"L" -le "l"
True
"L" -gt "l"
False
#Using chars it's case-sensitive! L < l
([char]"L") -lt ([char]"l")
True
([char]"L") -gt ([char]"l")
False
For comparison, I tried to use the case-sensitive less-than operator, but it says L > l which is the opposite of what -lt returned for chars.
"L" -clt "l"
False
"l" -clt "L"
True
How does the comparison work, because it clearly isn't by using ASCII-value and why does it behave differently for chars vs. strings?
A big thank-you to PetSerAl for all his invaluable input.
tl; dr:
-lt and -gt compare [char] instances numerically by Unicode codepoint.
Confusingly, so do -ilt, -clt, -igt, -cgt - even though they only make sense with string operands, but that's a quirk in the PowerShell language itself (see bottom).
-eq (and its alias -ieq), by contrast, compare [char] instances case-insensitively, which is typically, but not necessarily like a case-insensitive string comparison (-ceq again compares strictly numerically).
-eq/-ieq ultimately also compares numerically, but first converts the operands to their uppercase equivalents using the invariant culture; as a result, this comparison is not fully equivalent to PowerShell's string comparison, which additionally recognizes so-called compatible sequences (distinct characters or even sequences considered to have the same meaning; see Unicode equivalence) as equal.
In other words: PowerShell special-cases the behavior of only -eq / -ieq with [char] operands, and does so in a manner that is almost, but not quite the same as case-insensitive string comparison.
This distinction leads to counter-intuitive behavior such as [char] 'A' -eq [char] 'a' and [char] 'A' -lt [char] 'a' both returning $true.
To be safe:
always cast to [int] if you want numeric (Unicode codepoint) comparison.
always cast to [string] if you want string comparison.
For background information, read on.
PowerShell's usually helpful operator overloading can be tricky at times.
Note that in a numeric context (whether implicit or explicit), PowerShell treats characters ([char] ([System.Char]) instances) numerically, by their Unicode codepoint (not ASCII).
[char] 'A' -eq 65 # $true, in the 'Basic Latin' Unicode range, which coincides with ASCII
[char] 'Ā' -eq 256 # $true; 0x100, in the 'Latin-1 Supplement' Unicode range
What makes [char] unusual is that its instances are compared to each other numerically as-is, by Unicode codepoint, EXCEPT with -eq/-ieq.
ceq, -lt, and -gt compare directly by Unicode codepoints, and - counter-intuitively - so do -ilt, -clt, -igt and -cgt:
[char] 'A' -lt [char] 'a' # $true; Unicode codepoint 65 ('A') is less than 97 ('a')
-eq (and its alias -ieq) first transforms the characters to uppercase, then compares the resulting Unicode codepoints:
[char] 'A' -eq [char] 'a' # !! ALSO $true; equivalent of 65 -eq 65
It's worth reflecting on this Buddhist turn: this and that: in the world of PowerShell, character 'A' is both less than and equal to 'a', depending on how you compare.
Also, directly or indirectly - after transformation to uppercase - comparing Unicode codepoints is NOT the same as comparing them as strings, because PowerShell's string comparison additionally recognizes so-called compatible sequences, where characters (or even character sequences) are considered "the same" if they have the same meaning (see Unicode equivalence); e.g.:
# Distinct Unicode characters U+2126 (Ohm Sign) and U+03A9 Greek Capital Letter Omega)
# ARE recognized as the "same thing" in a *string* comparison:
"Ω" -ceq "Ω" # $true, despite having distinct Unicode codepoints
# -eq/ieq: with [char], by only applying transformation to uppercase, the results
# are still different codepoints, which - compared numerically - are NOT equal:
[char] 'Ω' -eq [char] 'Ω' # $false: uppercased codepoints differ
# -ceq always applies direct codepoint comparison.
[char] 'Ω' -ceq [char] 'Ω' # $false: codepoints differ
Note that use of prefixes i or c to explicitly specify case-matching behavior is NOT sufficient to force string comparison, even though conceptually operators such as -ceq, -ieq, -clt, -ilt, -cgt, -igt only make sense with strings.
Effectively, the i and c prefixes are simply ignored when applied to -lt and -gt while comparing [char] operands; as it turns out (unlike what I originally thought), this is a general PowerShell pitfall - see below for an explanation.
As an aside: -lt and -gt logic in string comparison is not numeric, but based on collation order (a human-centric way of ordering independent of codepoints / byte values), which in .NET terms is controlled by cultures (either by default by the one currently in effect, or by passing a culture parameter to methods).
As #PetSerAl demonstrates in a comment (and unlike what I originally claimed), PS string comparisons use the invariant culture, not the current culture, so their behavior is the same, irrespective of what culture is the current one.
Behind the scenes:
As #PetserAl explains in the comments, PowerShell's parsing doesn't distinguish between the base form of an operator its i-prefixed form; e.g., both -lt and -ilt are translated to the same value, Ilt.
Thus, Powershell cannot implement differing behavior for -lt vs. -ilt, -gt vs. igt, ..., because it treats them the same at the syntax level.
This leads to somewhat counter-intuitive behavior in that operator prefixes are effectively ignored when comparing data types where case-sensitivity has no meaning - as opposed to getting coerced to strings, as one might expect; e.g.:
"10" -cgt "2" # $false, because "2" comes after "1" in the collation order
10 -cgt 2 # !! $true; *numeric* comparison still happens; the `c` is ignored.
In the latter case I would have expected the use of -cgt to coerce the operands to strings, given that case-sensitive comparison is only a meaningful concept in string comparison, but that is NOT how it works.
If you want to dig deeper into how PowerShell operates, see #PetSerAl's comments below.
Not quite sure what to post here other than the comparisons are all correct when dealing with strings/characters. If you want an Ordinal comparison, do an Ordinal comparison and you get results based on that.
Best Practices for Using Strings in the .NET Framework
[string]::Compare('L','l')
returns 1
and
[string]::Compare("L","l", [stringcomparison]::Ordinal)
returns -32
Not sure what to add here to help clarify.
Also see: Upper vs Lower Case

Trying to understand this code (creating a [char]range)

I have code that works, but I have no idea WHY it works.
This will generate a list containing each letter of the English alphabet:
[char[]]([char]'a'..[char]'z')
However, this will not:
[char]([char]'a'..[char]'z')
and this will actually generate a list of numbers from 97 - 122
([char]'a'..[char]'z')
Could any experts out there explain to me how this works (or doesn't)?
In your second example, you are trying to cast an array of characters to a single character [char]. That won't work. In the third example, the 'a' is considered a string by PowerShell. So casting it to [char] tells PowerShell it is a single char. The .. operator ranges over numbers. Fortunately, PowerShell can convert the character 'a' to its ASCII value 97 and 'z' to 122. So you effectively wind up with 97..122. Then in your first example, the [char[]] converts that array of ints back to an array of characters: a through z.
In Powershell 'a' is a [string] type. [char]'a' is, obviously a [char] type. These are very different things.
$string = 'a'
$char = [char]$string
$string can be cast as a [char] because it is a string, consisting of a single character. If there is more than one character in the string, e.g. 'ab' then you need an array of [chars], which is type [char[]]. The extra set of square brackets designates an array.
$string | get-member
$char | get-member
reveals much different methods for the two types. The [char] type has .toint() methods. If you cast it as [int], it assumes the numeric ASCII code for that character.
[int]$char
returns 97, the ASCII code for the letter 'a'.