I'm trying to get a handle on documentation/evidence around when PowerShell does conversions, and when a user needs to be explicit about what they want. A somewhat related question, here, has a broken link that possibly explained scenarios that the type would be adjusted. There are plenty of instances of similar problems though (namely, comparing a string to an int) - for example when you get a number via Read-Host, you're actually getting a string.
Specifically though I've found that certain mechanisms seem to handle a string representation of the number fine - but I want to understand if that is truly because they're handling it as a number, or if the output is correct in appearance, but wrong under the hood. Here's a specific example though, so the question is about Measure-Object and how it handles calculating the Sum of a string property.
PS > $Example = Import-Csv .\Example.csv
PS > $Example
Procedure_Code : 123456789
CPT_Code : J3490
Modifier :
Service_Date : 02/01/2020
Revenue_Code : 259
Units : -100.00
Amount : 55.00
Procedure_Code : 123456789
CPT_Code : J3490
Modifier :
Service_Date : 02/02/2020
Revenue_Code : 259
Units : 100.00
Amount : 55.00
PS > [Math]::Sign($Example[0].Units)
-1
PS > $Example | Measure-Object Amount -Sum
Count : 2
Sum : 110
Property : Amount
PS > $Example | ForEach-Object { $Sum = $Sum + $_.Amount} ; Write-Host "$($Sum)"
55.0055.00 #Plus sign concatenates strings
PS > $Example | ForEach-Object { $Sum = $Sum + [int]$_.Amount} ; Write-Host "$($Sum)"
110 #Expected result if casting the string into an integer type before adding
So basically it seems like [Math]::Sign() works (even though it doesn't accept strings as far as I can tell), and Measure-Object is smarter about strings than a simple plus sign. But is Measure-Object casting my string into an [int]? If I use an example with decimals, the answer is more precise, so maybe it is a [single]?
What is Measure-Object doing to get the right answer, and when shouldn't I trust it?
Measure-Object is a special case, because it is that cmdlet's specific logic that performs automatic type conversion, as opposed to PowerShell's own, general automatic conversions that happen in the context of using operators such as +, passing arguments to commands, calling .NET methods such as [Math]::Sign(), and using values as conditionals.
As your own answer hints at, Measure-Object calls the same code that PowerShell's general automatic type conversions use; these are discussed below.
Automatic type conversions in PowerShell:
In general, PowerShell always attempts automatic type conversion - and a pitfall to those familiar with C# is that even an explicit cast (e.g. [double] '1.2') may not be honored if the target type is ultimately a different one.[1]
Supported conversions:
In addition to supporting .NET's type conversions, PowerShell implements a few built-in custom conversions and supports user-defined conversions by way of custom conversion classes and attribute classes; also, it implicitly tries to call single-parameter target-type constructors and - for string input - a target type's static ::Parse() method, if present; see this answer for details.
To-number conversion and culture-sensitivity when converting to and from strings:
Typically, PowerShell uses culture-invariant conversion, so that, for instance, only . is recognized as the decimal mark in "number strings" that represent floating-point numbers.
String-to-number conversion, including how specific numeric data types are chosen, is covered in this answer.
To-string conversion (including from numbers) is covered in this answer.
Automatic type conversions by context:
Operands (operator arguments):
Some operators, such as -replace and -match, operate on strings only, in which case the operand(s) are invariably converted to strings (which always succeeds):
42 -replace 2, '!' yields '4!': both 42 and 2 were implicitly converted to strings.
Others, such as - and \, operate on numbers only, in which case the operand(s) are invariably converted to numbers (which may fail):
'10' - '2' yields 8, an [int]
Yet others, such as + and *, can operate on either strings or numbers, and it is the type of the LHS that determines the operation type:
'10' + 2 yields '102', the concatenation of string '10' with the to-string conversion of number 2.
By contrast, 10 + '2' yields 12, the addition of the number 10 and the to-number conversion of string '2'.
See this answer for more information.
Command arguments:
If the target parameter has a specific type (other than [object] or [psobject]), PowerShell will attempt conversion.
Otherwise, for literal arguments, the type is inferred from the literal representation; e.g. 42 is passed as an [int], and '42' as a [string].
Values serving as conditionals, such as in if statements:
A conditional must by definition be a Boolean (type [bool]), and PowerShell automatically converts a value of any type to [bool], using the rules described in the bottom section of this answer.
Method arguments:
Automatic type conversion for .NET-method arguments can be tricky, because multiple overloads may have to be considered, and - if the argument's aren't of the expected type - PowerShell has to find the best overload based on supported type conversions - and it may not be obvious which one is chosen.
Executing [Math]::Sign (calling this as-is, without ()) reveals 8(!) different overloads, for various numeric types.
More insidiously, the introduction of additional .NET-method overloads in future .NET versions can break existing PowerShell code, if a new overload then happens to be the best match in a given invocation.
A high-profile example is the [string] type's Split() method - see the bottom section of this answer.
Therefore, for long-term code stability:
Avoid .NET methods in favor of PowerShell-native commands, if possible.
Otherwise, if type conversion is necessary, use casts (e.g. [Math]::Sign([int] '-42')) to guide method-overload resolution to avoid ambiguity.
[1] E.g., the explicit [double] is quietly converted to an [int] in the following statement: & { param([int] $i) $i } ([double] '1.2'). Also, casts to .NET interfaces generally have no effect in PowerShell - except to guide overload resolution in .NET method calls.
I'm not sure this is the be-all and end-all answer, but it works for what I was curious about. For a lot of 'quick script' scenarios where you might use something like Measure-Object - you'll probably get the correct answers you're looking for, albeit maybe slower than other methods.
Measure-Object specifically seems to use [double] for Sum, Average, and StandardDeviation and will indeed throw an error if one of the string values from a CSV Object can't be converted.
I'm still a little surprised that [Math]::Sign() works at all with strings, but seemingly glad it does
if (_measureAverage || _measureSum || _measureStandardDeviation)
{
double numValue = 0.0;
if (!LanguagePrimitives.TryConvertTo(objValue, out numValue))
{
_nonNumericError = true;
ErrorRecord errorRecord = new(
PSTraceSource.NewInvalidOperationException(MeasureObjectStrings.NonNumericInputObject, objValue),
"NonNumericInputObject",
ErrorCategory.InvalidType,
objValue);
WriteError(errorRecord);
return;
}
AnalyzeNumber(numValue, stat);
}
I am trying to get my head around windows errors, especially the relationship between Win32 errors and HRESULT errors.
So, as an example I know 3010 is
"The requested operation is successful. Changes will not be effective
until the system is rebooted"
And I can get that by casting to ComponentModel.Win32Exception thus: [ComponentModel.Win32Exception]3010.
Also, I know that 3010 is expressed in hex as 0x00000BC2 or 0x0BC2, and I can cast both of those as well. But it can also be expressed as 0x80070bc2, and this will cast properly. And it can even be expressed as 0xFFFFFFFF80070BC2. Here, as a 64 bit hex value it won't cast. But, 0xFFFFFFFF80070BC2 is -2147021886 in decimal, and that will cast. And it's a return value that can be expected, as documented here.
Similarly 0 decimal can be expressed in hex as 0x0000 and 0x00000000 and those cast fine and return
"The operation completed successfully"
But 0xFFFFFFFF00000000 and the decimal equivalent -4294967296 won't cast, they both return the decimal value. But I have gotten that decimal value returned from an installer, and the website referenced above also includes that decimal value.
So, at times when running installers from various vendors I have seen 0, -4294967296, 3010 & -2147021886 returned, and in three of the four situations I can cast to get a meaningful message for the user, and in one I can't.
So, to sum up, why are these BAD values bad, and why are they inconsistent, and what is the best way to deal with values like -4294967296 or -2147945410, the latter of which may never show up, but the former of which I have seen.
[ComponentModel.Win32Exception]3010 # Good
[ComponentModel.Win32Exception]0x0BC2 # Good
[ComponentModel.Win32Exception]-2147021886 # Good
[ComponentModel.Win32Exception]0x00000BC2 # Good
[ComponentModel.Win32Exception]0x0000000000000BC2 # Good
[ComponentModel.Win32Exception]-2147945410 # BAD
[ComponentModel.Win32Exception]0x80070BC2 # Good
[ComponentModel.Win32Exception]0x0000000080070BC2 # Good
[ComponentModel.Win32Exception]0 # Good
[ComponentModel.Win32Exception]0x0000 # Good
[ComponentModel.Win32Exception]0x00000000 # Good
[ComponentModel.Win32Exception]0x0000000000000000 # Good
[ComponentModel.Win32Exception]0xFFFFFFFF00000000 # BAD
[ComponentModel.Win32Exception]-4294967296 # BAD
EDIT: So, I have dug around a bit, and I THINK this might work.
foreach ($errCode in #(0, 3010, -2147021886, -4294967296)) {
[int]$intCode = $errCode -band 0xFFFF
[ComponentModel.Win32Exception]$intCode
}
For the four values in question it does, but I'll need to test with a bunch of others too. Still just don't understand why Microsoft Autodesk and the rest will return values that can't be used easily. Why not just use nothing but the damn Win32 codes and be done with it? And why is 0x80070BC2 treated the same as 0x00000BC2 and both work, but their decimal equivalents are treated differently and only one works?
So, I happened to notice that last.fm is hiring in my area, and since I've known a few people who worked there, I though of applying.
But I thought I'd better take a look at the current staff first.
Everyone on that page has a cute/clever/dumb strapline, like "Is life not a thousand times too short for us to bore ourselves?". In fact, it was quite amusing, until I got to this:
perl -e'print+pack+q,c*,,map$.+=$_,74,43,-2,1,-84, 65,13,1,5,-12,-3, 13,-82,44,21, 18,1,-70,56, 7,-77,72,-7,2, 8,-6,13,-70,-34'
Which I couldn't resist pasting into my terminal (kind of a stupid thing to do, maybe), but it printed:
Just another Last.fm hacker,
I thought it would be relatively easy to figure out how that Perl one-liner works. But I couldn't really make sense of the documentation, and I don't know Perl, so I wasn't even sure I was reading the relevant documentation.
So I tried modifying the numbers, which got me nowhere. So I decided it was genuinely interesting and worth figuring out.
So, 'how does it work' being a bit vague, my question is mainly,
What are those numbers? Why are there negative numbers and positive numbers, and does the negativity or positivity matter?
What does the combination of operators +=$_ do?
What's pack+q,c*,, doing?
This is a variant on “Just another Perl hacker”, a Perl meme. As JAPHs go, this one is relatively tame.
The first thing you need to do is figure out how to parse the perl program. It lacks parentheses around function calls and uses the + and quote-like operators in interesting ways. The original program is this:
print+pack+q,c*,,map$.+=$_,74,43,-2,1,-84, 65,13,1,5,-12,-3, 13,-82,44,21, 18,1,-70,56, 7,-77,72,-7,2, 8,-6,13,-70,-34
pack is a function, whereas print and map are list operators. Either way, a function or non-nullary operator name immediately followed by a plus sign can't be using + as a binary operator, so both + signs at the beginning are unary operators. This oddity is described in the manual.
If we add parentheses, use the block syntax for map, and add a bit of whitespace, we get:
print(+pack(+q,c*,,
map{$.+=$_} (74,43,-2,1,-84, 65,13,1,5,-12,-3, 13,-82,44,21,
18,1,-70,56, 7,-77,72,-7,2, 8,-6,13,-70,-34)))
The next tricky bit is that q here is the q quote-like operator. It's more commonly written with single quotes:
print(+pack(+'c*',
map{$.+=$_} (74,43,-2,1,-84, 65,13,1,5,-12,-3, 13,-82,44,21,
18,1,-70,56, 7,-77,72,-7,2, 8,-6,13,-70,-34)))
Remember that the unary plus is a no-op (apart from forcing a scalar context), so things should now be looking more familiar. This is a call to the pack function, with a format of c*, meaning “any number of characters, specified by their number in the current character set”. An alternate way to write this is
print(join("", map {chr($.+=$_)} (74, …, -34)))
The map function applies the supplied block to the elements of the argument list in order. For each element, $_ is set to the element value, and the result of the map call is the list of values returned by executing the block on the successive elements. A longer way to write this program would be
#list_accumulator = ();
for $n in (74, …, -34) {
$. += $n;
push #list_accumulator, chr($.)
}
print(join("", #list_accumulator))
The $. variable contains a running total of the numbers. The numbers are chosen so that the running total is the ASCII codes of the characters the author wants to print: 74=J, 74+43=117=u, 74+43-2=115=s, etc. They are negative or positive depending on whether each character is before or after the previous one in ASCII order.
For your next task, explain this JAPH (produced by EyesDrop).
''=~('(?{'.('-)#.)#_*([]#!#/)(#)#-#),#(##+#)'
^'][)#]`}`]()`#.#]#%[`}%[#`#!##%[').',"})')
Don't use any of this in production code.
The basic idea behind this is quite simple. You have an array containing the ASCII values of the characters. To make things a little bit more complicated you don't use absolute values, but relative ones except for the first one. So the idea is to add the specific value to the previous one, for example:
74 -> J
74 + 43 -> u
74 + 42 + (-2 ) -> s
Even though $. is a special variable in Perl it does not mean anything special in this case. It is just used to save the previous value and add the current element:
map($.+=$_, ARRAY)
Basically it means add the current list element ($_) to the variable $.. This will return a new array with the correct ASCII values for the new sentence.
The q function in Perl is used for single quoted, literal strings. E.g. you can use something like
q/Literal $1 String/
q!Another literal String!
q,Third literal string,
This means that pack+q,c*,, is basically pack 'c*', ARRAY. The c* modifier in pack interprets the value as characters. For example, it will use the value and interpret it as a character.
It basically boils down to this:
#!/usr/bin/perl
use strict;
use warnings;
my $prev_value = 0;
my #relative = (74,43,-2,1,-84, 65,13,1,5,-12,-3, 13,-82,44,21, 18,1,-70,56, 7,-77,72,-7,2, 8,-6,13,-70,-34);
my #absolute = map($prev_value += $_, #relative);
print pack("c*", #absolute);
I have a few numbers in a file in a variety of formats: 8.3, 0.001, 9e-18. I'm looking for an easy way to read them in and store them without any loss of precision. This would be easy in AWK, but how's it done in Perl? I'm only open to using Perl. Thanks!
Also, I was wondering if there's an easy way to print them in an appropriate format. For example, 8.3 should be printed as "8.3" not "8.3e0"
If they're text strings, then reading them into Perl as strings and writing them back out as strings shouldn't result in any loss of precision. If you have to do arithmetic on them, then I suggest installing the CPAN module Math::BigFloat to ensure that you don't lose any precision to rounding.
As to your second question, Perl doesn't do any reformatting unless you ask it to:
$ perl -le 'print 8.3'
8.3
Am I missing something?
From http://perldoc.perl.org/perlnumber.html:
Perl can internally represent numbers in 3 different ways: as native
integers, as native floating point numbers, and as decimal strings.
Decimal strings may have an exponential notation part, as in
"12.34e-56" . Native here means "a format supported by the C compiler
which was used to build perl".
This means that printing the number out depends on how the number is stored internal to perl, which means, in turn, that you have to know how the number is represented on input.
By and large, Perl will just do the right thing, but you should know how what compiler was used, how it represents numbers internally, and how to print those numbers. For example:
$ perldoc -f int
int EXPR
int Returns the integer portion of EXPR. If EXPR is omitted, uses $_. You should
not use this function for rounding: one because it truncates towards 0, and two
because machine representations of floating-point numbers can sometimes produce
counterintuitive results. For example, "int(-6.725/0.025)" produces -268 rather than
the correct -269; that's because it's really more like -268.99999999999994315658
instead. Usually, the "sprintf", "printf", or the "POSIX::floor" and
"POSIX::ceil" functions will serve you better than will int().
I think that if you want to read a number in explicitly as a string, your best bet would be to use unpack() with the 'A*' format.