Powershell binary grep - powershell

Is there a way to determine whether a specified file contains a specified byte array (at any position) in powershell?
Something like:
fgrep --binary-files=binary "$data" "$filepath"
Of course, I can write a naive implementation:
function posOfArrayWithinArray {
param ([byte[]] $arrayA, [byte[]]$arrayB)
if ($arrayB.Length -ge $arrayA.Length) {
foreach ($pos in 0..($arrayB.Length - $arrayA.Length)) {
if ([System.Linq.Enumerable]::SequenceEqual(
$arrayA,
[System.Linq.Enumerable]::Skip($arrayB, $pos).Take($arrayA.Length)
)) {return $pos}
}
}
-1
}
function posOfArrayWithinFile {
param ([byte[]] $array, [string]$filepath)
posOfArrayWithinArray $array (Get-Content $filepath -Raw -AsByteStream)
}
// They return position or -1, but simple $false/$true are also enough for me.
— but it's extremely slow.

Sorry, for the additional answer. It is not usual to do so, but the universal question intrigues me and the approach and information of my initial "using -Like" answer is completely different. Btw, if you looking for a positive response to the question "I believe that it must exist in .NET" to accept an answer, it probably not going to happen, the same quest exists for StackOverflow searches in combination with C#, .Net or Linq.
Anyways, the fact that nobody is able to find the single assumed .Net command for this so far, it is quiet understandable that several semi-.Net solutions are being purposed instead but I believe that this will cause some undesired overhead for a universal function.
Assuming that you ByteArray (the byte array being searched) and SearchArray (the byte array to be searched) are completely random. There is only a 1/256 chance that each byte in the ByteArray will match the first byte of the SearchArray. In that case you don't have to look further, and if it does match, the chance that the second byte also matches is 1/2562, etc. Meaning that the inner loop will only run about 1.004 times as much as the outer loop. In other words, the performance of everything outside the inner loop (but in the outer loop) is almost as important as what is in the inner loop!
Note that this also implies that the chance a 500Kb random sequence exists in a 100Mb random sequence is virtually zero. (So, how random are your given binary sequences actually?, If they are far from random, I think you need to add some more details to your question). A worse case scenario for my assumption will be a ByteArray existing of the same bytes (e.g. 0, 0, 0, ..., 0, 0, 0) and a SearchArray of the same bytes ending with a different byte (e.g. 0, 0, 0, ..., 0, 0, 1).
Based on this, it shows again (I have also proven this in some other answers) that native PowerShell commands aren't that bad and possibly could even outperform .Net/Linq commands in some cases. In my testing, the below Find-Bytes function is about 20% till twice as fast as the function in your question:
Find-Bytes
Returns the index of where the -Search byte sequence is found in the -Bytes byte sequence. If the search sequence is not found a $Null ([System.Management.Automation.Internal.AutomationNull]::Value) is returned.
Parameters
-Bytes
The byte array to be searched
-Search
The byte array to search for
-Start
Defines where to start searching in the Bytes sequence (default: 0)
-All
By default, only the first index found will be returned. Use the -All switch to return the remaining indexes of any other search sequences found.
Function Find-Bytes([byte[]]$Bytes, [byte[]]$Search, [int]$Start, [Switch]$All) {
For ($Index = $Start; $Index -le $Bytes.Length - $Search.Length ; $Index++) {
For ($i = 0; $i -lt $Search.Length -and $Bytes[$Index + $i] -eq $Search[$i]; $i++) {}
If ($i -ge $Search.Length) {
$Index
If (!$All) { Return }
}
}
}
Usage example:
$a = [byte[]]("the quick brown fox jumps over the lazy dog".ToCharArray())
$b = [byte[]]("the".ToCharArray())
Find-Bytes -all $a $b
0
31
Benchmark
Note that you should open a new PowerShell session to properly benchmark this as Linq uses a large cache that properly doesn't apply to your use case.
$a = [byte[]](&{ foreach ($i in (0..500Kb)) { Get-Random -Maximum 256 } })
$b = [byte[]](&{ foreach ($i in (0..500)) { Get-Random -Maximum 256 } })
Measure-Command {
$y = Find-Bytes $a $b
}
Measure-Command {
$x = posOfArrayWithinArray $b $a
}

The below code may prove to be faster, but you will have to test that out on your binary files:
function Get-BinaryText {
# converts the bytes of a file to a string that has a
# 1-to-1 mapping back to the file's original bytes.
# Useful for performing binary regular expressions.
Param (
[Parameter(Mandatory = $true, ValueFromPipeline = $true, ValueFromPipelineByPropertyName = $true)]
[ValidateScript( { Test-Path $_ -PathType Leaf } )]
[Alias('FullName','FilePath')]
[string]$Path
)
$Stream = New-Object System.IO.FileStream -ArgumentList $Path, 'Open', 'Read'
# Note: Codepage 28591 returns a 1-to-1 char to byte mapping
$Encoding = [Text.Encoding]::GetEncoding(28591)
$StreamReader = New-Object System.IO.StreamReader -ArgumentList $Stream, $Encoding
$BinaryText = $StreamReader.ReadToEnd()
$Stream.Dispose()
$StreamReader.Dispose()
return $BinaryText
}
# enter the byte array to search for here
# for demo, I'll use 'SearchMe' in bytes
[byte[]]$searchArray = 83,101,97,114,99,104,77,101
# create a regex from the $searchArray bytes
# 'SearchMe' --> '\x53\x65\x61\x72\x63\x68\x4D\x65'
$searchString = ($searchArray | ForEach-Object { '\x{0:X2}' -f $_ }) -join ''
$regex = [regex]$searchString
# read the file as binary string
$binString = Get-BinaryText -Path 'D:\test.bin'
# use regex to return the 0-based starting position of the search string
# return -1 if not found
$found = $regex.Match($binString)
if ($found.Success) { $found.Index } else { -1}

Just formalizing my comments and agreeing with your comment:
I dislike the idea of converting byte sequences to character sequences
at all (I'd better have functionality to match byte (or other)
sequences as they are), among the
conversion-to-character-strings-implying solutions this seems to be
one of the quickest
Performance
String manipulations are usually expensive but re-initializing a LINQ call is apparently pretty expensive as well. I guess, that you might presume that the native algorithms for the PowerShell string representation and methods (operators) like -Like are meanwhile completely squeezed.
Memory
Aside from some founded performance disadvantages, there is a memory disadvantage as well by converting each byte to a decimal string representation. In the purposed solution, each byte will take an average of 2.57 bytes (depending on the number of decimal digits of each byte: (1 * 10 / 256) + (2 * 90 /256) + (3 * 156 / 256)). Besides you will use/need an extra byte for separating the numeric representations. In total, this will increase the sequence about 3.57 times!.
You might consider saving bytes by e.g. converting it to hexadecimal and/or combine the separator, but that will likely result in an expensive conversion again.
Easy
Anyways, the easy way is probably still the most effective.
This comes down to the following simplified syntax:
" $Sequence " -Like "* $SubSequence *" # $True if $Sequence contains $SubSequence
(Where $Sequence and $SubSequence are binary arrays of type: [Byte[]])
Note 1: the spaces around the variables are important. This will prevent a false positive in case a 1 (or 2) digit byte representation overlaps with a 2 (or 3) digit byte representation. E.g.: 123 59 74 contains 23 59 7 in the string representation but not in the actual bytes.
Note 2: This syntax will tell you only whether $arrayA contains $arrayB ($True or $False). There is no clue where $arrayB actually resides in $arrayA. If you need to know this, or e.g. want to replace $arrayB with something else, refer to this answer: Methods to hex edit binary files via PowerShell .

I've determined that the following can work as a workaround:
(Get-Content $filepath -Raw -Encoding 28591).IndexOf($fragment)
— i.e. any bytes can be successfully matched by PowerShell strings (in fact, .NET System.Strings) when we specify binary-safe encoding. Of course, we need to use the same encoding for both the file and fragment, and the encoding must be really binary-safe (e.g. 1250, 1000 and 28591 fit, but various species of Unicode (including the default BOM-less UTF-8) don't, because they convert any non-well-formed code-unit to the same replacement character (U+FFFD)). Thanks to Theo for clarification.
On older PowerShell, you can use:
[System.Text.Encoding]::GetEncoding(28591).
GetString([System.IO.File]::ReadAllBytes($filepath)).
IndexOf($fragment)
Sadly, I haven't found a way to match sequences universally (i.e. a common method to match sequences with any item type: integer, object, etc). I believe that it must exist in .NET (especially that particual implementation for sequences of characters exists). Hopefully, someone will suggest it.

Related

How to calculate sum of floating point values on Windows command prompt or Powershell

I have a list of floating point values that I want to add together using just CMD or PowerShell.
How to do it?
Example:
93947922,7
77441,0
77429114,8
53747239,4
445002,6
2066,7
201257230,1
...
As far as I know, Batch files do not support floating point arithmetics. A work-around based on integer math is possible, but for most use cases it would be way too much work.
In Powershell, one needs to use a period . as decimal separator instead of a comma ,, as comma is reserved to element separator. In practice, this means that
2066,7
is, for Powershell, a list of two elements, namely 2066 and 7. So the comma needs to be switched in order to make Powershell understand that's it's about floating points.
For easy processing, first add all the elements into an array. Then loop through the array and convert each value into a double (the default floating point type). Finally, sum the elements and display the results.
$arr = #()
$arr += "93947922,7"
$arr += "77441,0"
$arr += "77429114,8"
$arr += "53747239,4"
$arr += "445002,6"
$arr += "2066,7"
$arr += "201257230,1"
# Convert string values into double
# Using brute force search and replace and cast
# for($i = 0; $i -lt $arr.count; ++$i) { $arr[$i] = [double]$arr[$i].Replace(',', '.') }
# If you know a culture that uses commas (European ones often do)
# Double.Parse() can use its number format too.
for($i = 0; $i -lt $arr.count; ++$i) {
$arr[$i] = [double]::Parse($arr[$i], [CultureInfo]::GetCultureInfo("fi-FI").NumberFormat)
}
# Sum the elements
$sum = 0
$arr | % { $sum += $_ }
$sum
# Output
426906017,3
Your question lacks a ton of details, so I made several assumptions.
I assumed that your "list of values" are placed in a text file, so they appear one per line (as you posted them).
I assumed that each number have precisely one decimal digit, as you posted them.
I assumed that the decimal separator is a comma, as you posted them. To change to a decimal point, just change the value in delims=, part.
This is the .BATch code:
#echo off
setlocal
set /A "int=0, frac=0"
for /F "tokens=1,2 delims=," %%a in (test.txt) do set /A int+=%%a, frac+=%%b
set /A "int+=frac/10, frac%%=10"
echo %int%,%frac%
text.txt:
93947922,7
77441,0
77429114,8
53747239,4
445002,6
2066,7
201257230,1
Output:
426906017,3

Explain Bizarre Function Call Processing/Results

Can someone please explain how/why this code:
function DoIt($one, $two)
{
"`$one is $one"
"`$two is $two"
return $one * $two
}
$sum = DoIt 5 4
$sum
works exactly as expected/intended - i.e., it produces the following output:
$one is 5
$two is 4
20
However, this (seemingly almost identical) code:
function DoIt($one, $two)
{
"`$one is $one"
"`$two is $two"
return $one * $two
}
$sum = DoIt 5 4
"`$sum is $sum"
bends my brain and breaks my understanding of reality by producing the following output:
$sum is $one is 5 $two is 4 20
The reason for the odd behavior is that you are polluting the output stream. PowerShell doesn't explicitly have a pure "return" value like a compiled program, and relies on streams to pass data between functions. See about_Return for more information, as well as an example that illustrates this very behavior.
Basically the $sum variable gets all the output of the function, and as #TheMadTechnician says, it outputs differently depending on how it is being used.
The "correct" way in PowerShell to return values is in general not to use return but to use Write-Output to explicitly define which stream you want to output to. The second thing is that you have to use Write-Host when writing messages to the Host console, otherwise it gets returned on the Output stream as well.
function DoIt($one, $two)
{
Write-Host "`$one is $one"
Write-Host "`$two is $two"
Write-Output ($one * $two)
}
$sum = DoIt 5 4
"`$sum is $sum"
$one is 5
$two is 4
$sum is 20
There's good information in the existing answers, but let me try to bring it all together:
function DoIt($one, $two)
{
"`$one is $one"
"`$two is $two"
return $one * $two
}
produces three outputs ("return values") to PowerShell's success [output] stream (see about_Redirection), which are the evaluation results of:
expandable string "`$one is $one"
expandable string "`$two is $two"
expression $one * $two
The expandable strings are implicitly output - due to producing a result that is neither captured, sent to another command, nor redirected .
Similarly, the return in return $one * $two is just syntactic sugar for:
$one * $two # implicit output
return # control flow: exit the scope
Note:
return is unnecessary here and never required.
While Write-Output could be used in lieu of implicit output:
it is only ever helpful if you want output a collection as a single object, via its -NoEnumerate switch.
it is otherwise not only needlessly verbose, it slows things down.
If you want to print status information from a function without polluting the output stream, you have several choices:
Write-Host prints to the host's display (the console, if you're running in a console window); in PSv4-, such output could neither be captured nor suppressed; in PSv5+, Write-Host now writes to the information stream (number 6), which means that 6> can be used to redirect/suppress the output.
Write-Verbose is for opt-in verbose output, which you can activate by setting preference variable $VerbosePreference to 'Continue' or by using the -Verbose switch.
Write-Debug is for opt-in debugging output via $DebugPreference or -Debug, though note that in the latter case a prompt is displayed whenever a Write-Debug statement is encountered.
Because these outputs are captured in a variable - $sum = DoIt 5 4 - and there is more than 1 output, they are implicitly collected in an array (of type [object[]]).
Using implicit output to print this variable's value to the display - $sum - enumerates the array elements, which prints them each on their own line:
$one is 5
$two is 4
20
By contrast, using implicit output with expandable string "`$sum is $sum" results in a single output string in which the reference to variable $sum, which contains an array, is expanded as follows:
each element of the array is stringified
loosely speaking, this is like calling .ToString() on each element, but it's important to note that PowerShell opts for a culture-invariant string representation, if available - see this answer.
the results are joined with the value of (the rarely used automatic variable) $OFS as the separator to form a single string; $OFS defaults to a single space.
Thus, if you join array elements $one is 5, $two is 4, and 20 with a single space between them, you get:
$one is 5 $two is 4 20
To put it differently: the above is the equivalent of executing '$one is 5', '$two is 4', '20' -join ' '
This is due to how you are outputting an array, and has nothing to do with the function. You can replicate the same behavior with the following:
$sum='$one is 5','$two is 4',20
"`$sum is $sum"
Putting the variable in double quotes performs string expansion, which performs the .ToString() method on each item in the array, and then joins them with a space to convert the array into a string.
After playing around more (using the info I learned from #HAL9256), I discovered the problem isn't actually with using return vs Write-Output but rather with just how command-less strings are handled in functions. For example, this too works perfectly:
function DoIt($one, $two)
{
Write-Host "`$one is $one"
Write-Host "`$two is $two"
return ($one * $two)
}
$sum = DoIt 5 4
Write-Host "`$sum is $sum"
that is, it produces the following expected output:
$one is 5
$two is 4
$sum is 20
(And, apparently, the mathematical computation without explicit parentheses works just fine within a return command - unlike within a Write-Output command.)
In other words, I guess a static string in a function is treated as if you're building an array to be returned (as #TheMadTechnician was referring) as opposed to shorthand for a Write-Host command. Live and learn - "explicit is better than implicit". :-)
Granted, I still don't understand why the final output wasn't exactly the same (right or wrong) between the two code blocks in my original question ... but, hey, my brain is tired enough for one day. :-P
thanks again guys!!!

Extract the nth to nth characters of an string object

I have a filename and I wish to extract two portions of this and add into variables so I can compare if they are the same.
$name = FILE_20161012_054146_Import_5785_1234.xml
So I want...
$a = 5785
$b = 1234
if ($a = $b) {
# do stuff
}
I have tried to extract the 36th up to the 39th character
Select-Object {$_.Name[35,36,37,38]}
but I get
{5, 7, 8, 5}
Have considered splitting but looks messy.
There are several ways to do this. One of the most straightforward, as PetSerAl suggested is with .Substring():
$_.name.Substring(35,4)
Another way is with square braces, as you tried to do, but it gives you an array of [char] objects, not a string. You can use -join and you can use a range to make that easier:
$_.name[35..38] -join ''
For what you're doing, matching a pattern, you could also use a regular expression with capturing groups:
if ($_.name -match '_(\d{4})_(\d{4})\.xml$') {
if ($Matches[1] -eq $Matches[2]) {
# ...
}
}
This way can be very powerful, but you need to learn more about regex if you're not familiar. In this case it's looking for an underscore _ followed by 4 digits (0-9), followed by an underscore, and four more digits, followed by .xml at the end of the string. The digits are wrapped in parentheses so they are captured separately to be referenced later (in $Matches).
Yet another approach: returns 1234 substring four times.
$FileName = "FILE_20161012_054146_Import_5785_1234.xml"
# $FileName
$FileName.Substring(33,4) # Substring method (zero-based)
-join $FileName[33..36] # indexing from beginning (zero-based)
-join $FileName[-8..-5] # reverse indexing:
# e.g. $FileName[-1] returns the last character
$FileArr = $FileName.Split("_.") # Split (depends only on filename "pattern template")
$FileArr[$FileArr.Count -2] # does not depend on lengths of tokens

How to convert string to int, if possible in powershell?

I am getting a string from VSO (using TFPT.exe) that can be either the item number or the item number plus a letter
"830" or "830a"
How can I break off the letter if it exists - and convert the number to int
$a = 830
#or
$a = 830
$b = "a"
I tried to test if "830" was a number - but i guess because it pulls it in as a string, i don't know how to ask: could this string be a int?
Assuming only the one set of numbers you can -match that pretty easily with regex. Where \d+ will match a group of consecutive digits.
PS C:\temp> "830a" -match "\d+"
True
PS C:\temp> $matches[0]
830
Knowing that you could incorporate something like this in your code.
$b = If($a -match "\d+"){[int]$matches[0]}
Obviously it would be more appropriate to use better variable names but this is just proof of concept. This as written would cause an issue if the alpha characters were in the middle of the string. As long as the number are grouped together it will work either way.
The other way you could do this would be to replace all of the character that are not digits.
$a = "830adasdf"
$a = $a -replace "\D" -as [int]
\D meaning any non digit character. -as [int] will perform the cast.
In either case [int] will cast the remaining digit string as an integer.
If you could guarantee that it is just the one character on the end that could be there then you could use the string method .TrimEnd() as well. It removes all characters found on the end of a string as determined by a char array. Lets give it an array of all letters. In practice this was having an issue with case so we take the string, converted it to uppercase and then remove any trailing letters.
"830z".ToUpper().TrimEnd([char[]](65..99)) -as [int]
It actually seems to convert the number array to char automatically so this would do just the same
"830z".ToUpper().TrimEnd(65..99) -as [int]
This is the best I have been able to come up with, seams to work: doesn't seam the most efficient way...
$t = $parent.Substring($parent.Length-1)
if($t -in #("a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z"))
{
[int]$parentSRP = $parent.Substring(0,$parent.Length-1)
$parentVer = $parent.Substring($parent.Length-1,1)
}
else{[int]$parentSRP = $parent}

Trouble understanding obsfucated Perl method

I'm trying my best to decipher some Perl code and convert it into C# code so I can use it with a larger program. I've been able to get most of it converted, but am having trouble with the following method:
sub dynk {
my ($t, $s, $v, $r) = (unpack("b*", $_[0]), unpack("b*", pack("v",$_[1])));
$v^=$t=substr($t,$r=$_*$_[($_[1]>>$_-1&1)+2]).substr($t,0,$r)^$s for (1..16);
pack("b*", $v);
}
It is called like:
$sid = 0;
$rand = pack("H*", 'feedfacedeadbeef1111222233334444');
$skey = dynk($rand, $sid, 2, 3) ^ dynk(substr($dbuf, 0, 16), $sid, -1, -4);
I understand most of it except for this section:
$_*$_[($_[1]>>$_-1&1)+2]
I'm not sure how $_ is being used in that context? If someone could explain that, I think I can get the rest.
pack and unpack take a pattern, and some data, and transform this data according to the pattern. For example, pack "H*", "466F6F" treats the data as a hex string of arbitrary length, and decodes it to the bytes it represents. Here: Foo. The unpack function does the reverse, and extracts data from a binary representation to a certain format.
The "b*" pattern stands produces a bit string – unpack "b*", "42" is "0010110001001100".
The v represents one little-endian 16-bit integer.
The Perl is rather obfuscated. Here is a rewrite that simplifies some aspects.
sub dynk {
# Extract arguments: A salt, another parameter, and then two ints that determine rotation.
my ($initial, $sid, $rot_a, $rot_b) = #_;
# Unpack the initial value to a bitstring
my $temp = unpack("b*", $initial);
# Unpack the 16-bit number $sid to a bitstring
my $sid_bits = unpack("b*", pack("v", $sid));
my $v; # an accumulator
# Loop through the 16 bits of our $sid
for my $bit_number (1..16) {
# Pick the $bit_number-th bit from the $sid as an index for the data
my $bit_value = substr($sid_bits, $bit_number-1, 1);
# calculate rotation from one data argument
my $rotation = $bit_number * ( $bit_value ? $rot_b : $rot_a );
# Rotate the $temp bitstring by $rotation bits
$temp = substr($temp, $rotation) . substr($temp, 0, $rotation);
# XOR the $temp with $sid_bits
$temp = $temp ^ $sid_bits;
# ... and XOR with the $v accumulator
$v = $v ^ $temp;
}
# Pack the bitstring back to binary data, return.
return pack("b*", $v);
}
This seems to be some sort of encryption or hashing. It mainly jumbles the first argument according to the following ones. The larger $sid is, the more extra parameters are used: at least one, at most 16. Each bit is used in turn as an index, thus only two extra parameters are used. The length of the first argument stays constant in this operation, but the output is at least two bytes long.
If one of the extra arguments is zero, no rotation takes place during that loop iteration. Unititializes arguments are considered to be zero.