Comparing string outputs between Azure Properties.ContentMD5 and Get-Filehash - powershell

How do I compare the output of Get-FileHash directly with the output of Properties.ContentMD5?
I'm putting together a PowerShell script that takes some local files from my system and copies them to an Azure Blob Storage Container.
The files change daily so I have added in a check to see if the file already exists in the container before uploading it.
I use Get-FileHash to read the local file:
$LocalFileHash = (Get-FileHash "D:\file.zip" -Algorithm MD5).Hash
Which results in $LocalFileHash holding this: 67BF2B6A3E6657054B4B86E137A12382
I use this code to get the checksum of the blob file already transferred to the container:
$BlobFile = "Path\To\file.zip"
$AZContext = New-AZStorageContext -StorageAccountName $StorageAccountName -SASToken "<token here>"
$RemoteBlobFile = Get-AzStorageBlob -Container $ContainerName -Context $AZContext -Blob $BlobFile -ErrorAction Ignore
if ($ExistingBlobFile) {
$cloudblob = [Microsoft.Azure.Storage.Blob.CloudBlockBlob]$RemoteBlobFile.ICloudBlob
$RemoteBlobHash = $cloudblob.Properties.ContentMD5
}
This value of $RemoteBlobHash is set to Z78raj5mVwVLS4bhN6Ejgg==
No problem, I thought, I'll just decrypt the Base64 string and compare:
$output = [System.Text.Encoding]::UTF8.GetString([System.Convert]::FromBase64String($RemoteBlobHash))
Which gives me g�+j>fWKK��7�#� so not directly comparable ☹
This question shows someone in a similar pickle but I don't think they were using Get-FileHash given the format of their local MD5 result.
Other things I've tried:
changing the System.Text.Encoding line above UTF8 to UTF16 & ASCII which changes the output but not to anything recognisable.
dabbling with GetBytes to see if that helped:
$output = [System.Text.Encoding]::UTF8.GetBytes([System.Text.Encoding]::UTF16.GetString([System.Convert]::FromBase64String($RemoteBlobHash)))
Note: Using md5sum to compare the local file and a downloaded copy of file.zip results in the same MD5 string as Get-FileHash: 67BF2B6A3E6657054B4B86E137A12382
Thank you in advance!

ContentMD5 is a base64 representation of the binary hash value, not the resulting hex string :)
$md5sum = [convert]::FromBase64String('Z78raj5mVwVLS4bhN6Ejgg==')
$hdhash = [BitConverter]::ToString($md5sum).Replace('-','')
Here we convert base64 -> binary -> hexadecimal
If you need to do it the other way around (ie. for obtaining a local file hash, then using that to search for blobs in Azure), you'll first need to split the hexadecimal string into byte-size chunks, then convert the resulting byte array to base64:
$hdhash = '67BF2B6A3E6657054B4B86E137A12382'
$bytes = [byte[]]::new($hdhash.Length / 2)
for($i = 0; $i -lt $bytes.Length; $i++){
$offset = $i * 2
$bytes[$i] = [convert]::ToByte($hdhash.Substring($offset,2), 16)
}
$md5sum = [convert]::ToBase64String($bytes)
 

Related

Drastic Powershell vs GNU coreutils base64 output length difference

I'm trying to figure out why there's a huge difference in the output sizes when encoding a file in base64 in Powershell vs GNU coreutils. Depending on options (UTF8 vs Unicode), the Powershell output ranges from about 240MB to 318MB. Using coreutils base64 (in Cygwin, in this case), the output is about 80MB. The original filesize is about 58MB. So, 2 questions:
Why is there such a drastic difference?
How can I get Powershell to give the smaller output that the GNU tool gives?
Here are the specific commands I used:
Powershell smaller output:
$input = "C:\Users\my.user\myfile.pdf"
$filecontent = get-content $input
$converted = [System.Text.Encoding]::UTF8.GetBytes($filecontent)
$encodedtext = [System.Convert]::ToBase64String($converted)
$encodedtext | Out-File "C:\Users\my.user\myfile.pdf.via_ps.base64"
The larger Powershell output came from simply replacing "UTF8" with "Unicode". It will be obvious that I'm pretty new to Powershell; I'm sure someone only slightly better with it could combine that into a couple of simple lines.
Coreutils (via Cygwin) base64:
base64.exe -w0 myfile.pdf > myfile.pdf.via_cygwin.base64
Why is there such a drastic difference?
Because you're doing something wildly different in PowerShell
How can I get Powershell to give the smaller output that the GNU tool gives?
By doing what base64 does :)
Let's have a look at what base64 ... > ... actually does:
base64:
Opens file handle to input file
Reads raw byte stream from disk
Converts every 3-byte pair to a 4-byte base64-encoded output string-fragment
>:
Writes raw byte stream to disk
Since the 4-byte output fragments only contain byte values that correspond to 64 printable ASCII characters, the command never actually does any "string manipulation" - the values on which it operates just happen to also be printable as ASCII strings and the resulting file is therefor indistinguishable from a "text file".
Your PowerShell script on the other hand does lots of string manipulation:
Get-Content $input:
Opens file handle to input file
Reads raw byte stream from disk
Decodes the byte stream according to some chosen encoding scheme (likely your OEM codepage)
[Encoding]::UTF8.GetBytes():
Re-encodes the resulting string using UTF8
[Convert]::ToBase64String()
Converts every 3-byte pair to a 4-byte base64-encoded output string-fragment
Out-File:
Encodes input string as little-endian UTF16
Writes to disk
The three additional string encoding steps highlighted above will result in a much-inflated byte stream, which is why you're seeing the output size double or triple.
How to base64-encode files then?
The trick here is to read the raw bytes from disk and pass those directly to [convert]::ToBase64String()
It is technically possibly to just read the entire file into an array at once:
$bytes = Get-Content path\to\file.ext -Encoding Byte # Windows PowerShell only
# or
$bytes = [System.IO.File]::ReadAllBytes($(Convert-Path path\to\file.ext))
$b64String = [convert]::ToBase64String($bytes)
Set-Content path\to\output.base64 -Value $b64String -Encoding Ascii
... I'd strongly recommend against doing so for files larger than a few kilobytes.
Instead, for file transformation in general you'll want to use streams. In this particular case, you'll want want to use a CryptoStream with a ToBase64Transform to re-encode a file stream as base64:
function New-Base64File {
[CmdletBinding(DefaultParameterSetName = 'ByPath')]
param(
[Parameter(Mandatory = $true, ParameterSetName = 'ByPath', Position = 0)]
[string]$Path,
[Parameter(Mandatory = $true, ParameterSetName = 'ByPSPath')]
[Alias('PSPath')]
[string]$LiteralPath,
[Parameter(Mandatory = $true, Position = 1)]
[string]$Destination
)
# Create destination file if it doesn't exist
if (-not(Test-Path -LiteralPath $Destination -PathType Leaf)) {
$outFile = New-Item -Path $Destination -ItemType File
}
else {
$outFile = Get-Item -LiteralPath $Destination
}
[void]$PSBoundParameters.Remove('Destination')
try {
# Open a writable file stream to the output file
$outStream = $outFile.OpenWrite()
# Wrap output file stream in a CryptoStream.
#
# Anything that we write to the crypto stream is automatically
# base64-encoded and then written through to the output file stream
$transform = [System.Security.Cryptography.ToBase64Transform]::new()
$cryptoStream = [System.Security.Cryptography.CryptoStream]::new($outStream, $transform, 'Write')
foreach ($file in Get-Item #PSBoundParameters) {
try {
# Open readable input file stream
$inStream = $file.OpenRead()
# Copy input bytes to crypto stream
# - which in turn base64-encodes and writes to output file
$inStream.CopyTo($cryptoStream)
}
finally {
# Clean up the input file stream
$inStream | ForEach-Object Dispose
}
}
}
finally {
# Clean up the output streams
$transform, $cryptoStream, $outStream | ForEach-Object Dispose
}
}
Now you can do:
$inputPath = "C:\Users\my.user\myfile.pdf"
New-Base64File $inputPath -Destination "C:\Users\my.user\myfile.pdf.via_ps.base64"
And expect an output the same size as with base64

Powershell: storing variables to a file [duplicate]

I would like to write out a hash table to a file with an array as one of the hash table items. My array item is written out, but it contains files=System.Object[]
Note - Once this works, I will want to reverse the process and read the hash table back in again.
clear-host
$resumeFile="c:\users\paul\resume.log"
$files = Get-ChildItem *.txt
$files.GetType()
write-host
$types="txt"
$in="c:\users\paul"
Remove-Item $resumeFile -ErrorAction SilentlyContinue
$resumeParms=#{}
$resumeParms['types']=$types
$resumeParms['in']=($in)
$resumeParms['files']=($files)
$resumeParms.GetEnumerator() | ForEach-Object {"{0}={1}" -f $_.Name,$_.Value} | Set-Content $resumeFile
write-host "Contents of $resumefile"
get-content $resumeFile
Results
IsPublic IsSerial Name BaseType
-------- -------- ---- --------
True True Object[] System.Array
Contents of c:\users\paul\resume.log
files=System.Object[]
types=txt
in=c:\users\paul
The immediate fix is to create your own array representation, by enumerating the elements and separating them with ,, enclosing string values in '...':
# Sample input hashtable. [ordered] preserves the entry order.
$resumeParms = [ordered] #{ foo = 42; bar = 'baz'; arr = (Get-ChildItem *.txt) }
$resumeParms.GetEnumerator() |
ForEach-Object {
"{0}={1}" -f $_.Name, (
$_.Value.ForEach({
(("'{0}'" -f ($_ -replace "'", "''")), $_)[$_.GetType().IsPrimitive]
}) -join ','
)
}
Not that this represents all non-primitive .NET types as strings, by their .ToString() representation, which may or may not be good enough.
The above outputs something like:
foo=42
bar='baz'
arr='C:\Users\jdoe\file1.txt','C:\Users\jdoe\file2.txt','C:\Users\jdoe\file3.txt'
See the bottom section for a variation that creates a *.psd1 file that can later be read back into a hashtable instance with Import-PowerShellDataFile.
Alternatives for saving settings / configuration data in text files:
If you don't mind taking on a dependency on a third-party module:
Consider using the PSIni module, which uses the Windows initialization file (*.ini) file format; see this answer for a usage example.
Adding support for initialization files to PowerShell itself (not present as of 7.0) is being proposed in GitHub issue #9035.
Consider using YAML as the file format; e.g., via the FXPSYaml module.
Adding support for YAML files to PowerShell itself (not present as of 7.0) is being proposed in GitHub issue #3607.
The Configuration module provides commands to write to and read from *.psd1 files, based on persisted PowerShell hashtable literals, as you would declare them in source code.
Alternatively, you could modify the output format in the code at the top to produce such files yourself, which allows you to read them back in via
Import-PowerShellDataFile, as shown in the bottom section.
As of PowerShell 7.0 there's no built-in support for writing such as representation; that is, there is no complementary Export-PowerShellDataFile cmdlet.
However, adding this ability is being proposed in GitHub issue #11300.
If creating a (mostly) plain-text file is not a must:
The solution that provides the most flexibility with respect to the data types it supports is the XML-based CLIXML format that Export-Clixml creates, as Lee Dailey suggests, whose output can later be read with Import-Clixml.
However, this format too has limitations with respect to type fidelity, as explained in this answer.
Saving a JSON representation of the data, as Lee also suggests, via ConvertTo-Json / ConvertFrom-Json, is another option, which makes for human-friendlier output than XML, but is still not as friendly as a plain-text representation; notably, all \ chars. in file paths must be escaped as \\ in JSON.
Writing a *.psd1 file that can be read with Import-PowerShellDataFile
Within the stated constraints regarding data types - in essence, anything that isn't a number or a string becomes a string - it is fairly easy to modify the code at the top to write a PowerShell hashtable-literal representation to a *.psd1 file so that it can be read back in as a [hashtable] instance via Import-PowerShellDataFile:
As noted, if you don't mind installing a module, consider the Configuration module, which has this functionality built int.
# Sample input hashtable.
$resumeParms = [ordered] #{ foo = 42; bar = 'baz'; arr = (Get-ChildItem *.txt) }
# Create a hashtable-literal representation and save it to file settings.psd1
#"
#{
$(
($resumeParms.GetEnumerator() |
ForEach-Object {
" {0}={1}" -f $_.Name, (
$_.Value.ForEach({
(("'{0}'" -f ($_ -replace "'", "''")), $_)[$_.GetType().IsPrimitive]
}) -join ','
)
}
) -join "`n"
)
}
"# > settings.psd1
If you read settings.psd1 with Import-PowerShellDataFile settings.psd1 later, you'll get a [hashtable] instance whose entries you an access as usual and which produces the following display output:
Name Value
---- -----
bar baz
arr {C:\Users\jdoe\file1.txt, C:\Users\jdoe\file1.txt, C:\Users\jdoe\file1.txt}
foo 42
Note how the order of entries (keys) was not preserved, because hashtable entries are inherently unordered.
On writing the *.psd1 file you can preserve the key(-creation) order by declaring the input hashtable (System.Collections.Hashtable) as [ordered], as shown above (which creates a System.Collections.Specialized.OrderedDictionary instance), but the order is, unfortunately, lost on reading the *.psd1 file.
As of PowerShell 7.0, even if you place [ordered] before the opening #{ in the *.psd1 file, Import-PowerShellDataFile quietly ignores it and creates an unordered hashtable nonetheless.
This is a problem I deal with all the time and it drives me mad. I really think that there should be a function specifically for this action... so I wrote one.
function ConvertHashTo-CSV
{
Param (
[Parameter(Mandatory=$true)]
$hashtable,
[Parameter(Mandatory=$true)]
$OutputFileLocation
)
$hastableAverage = $NULL #This will only work for hashtables where each entry is consistent. This checks for consistency.
foreach ($hashtabl in $hashtable)
{
$hastableAverage = $hastableAverage + $hashtabl.count #Counts the amount of headings.
}
$Paritycheck = $hastableAverage / $hashtable.count #Gets the average amount of headings
if ( ($parity = $Paritycheck -is [int]) -eq $False) #if the average is not an int the hashtable is not consistent
{
write-host "Error. Hashtable is inconsistent" -ForegroundColor red
Start-Sleep -Seconds 5
return
}
$HashTableHeadings = $hashtable[0].GetEnumerator().name #Get the hashtable headings
$HashTableCount = ($hashtable[0].GetEnumerator().name).count #Count the headings
$HashTableString = $null # Strange to hold the CSV
foreach ($HashTableHeading in $HashTableHeadings) #Creates the first row containing the column headings
{
$HashTableString += $HashTableHeading
$HashTableString += ", "
}
$HashTableString = $HashTableString -replace ".{2}$" #Removed the last , added by the above loop in error
$HashTableString += "`n"
foreach ($hashtabl in $hashtable) #Adds the data
{
for($i=0;$i -lt $HashTableCount;$i++)
{
$HashTableString += $hashtabl[$i]
if ($i -lt ($HashTableCount - 1))
{
$HashTableString += ", "
}
}
$HashTableString += "`n"
}
$HashTableString | Out-File -FilePath $OutputFileLocation #writes the CSV to a file
}
To use this copy the function into your script, run it, and then
ConvertHashTo-CSV -$hashtable $Hasharray -$OutputFileLocation c:\temp\data.CSV
The code is annotated but a brief explanation of what it does. Steps through the arrays and hashtables and adds them to a string adding the required formatting to make the string a CSV file, then outputs that to a file.
The main limitation of this is that the Hashtabes in the array all have to contain the same amount of fields. To get around this if a hashtable has a field that doesnt contain data ensure it contains at least a space.
More on this can be found here : https://grumpy.tech/powershell-convert-hashtable-to-csv/

How to Modify "Media Created" Field in File Properties via Powershell

I'm trying to convert a few thousand home videos to a smaller format. However, encoding the video changed the created and modified timestamp to today's date. I wrote a powershell script that successfully (somehow) worked by writing the original file's modified timestamp to the new file.
However, I couldn't find a way in powershell to modify the "Media created" timestamp in the file's details properties. Is there a way to add a routine that would either copy all of the metadata from the original file, or at least set the "media created" field to the modified date?
When I searched for file attributes, it looks like the only options are archive, hidden, etc. Attached is the powershell script that I made (please don't laugh too hard, haha). Thank you
$filepath1 = 'E:\ConvertedMedia\Ingest\' # directory with incorrect modified & create date
$filepath2 = "F:\Backup Photos 2020 and DATA\Data\Photos\Photos 2021\2021 Part1\Panasonic 3-2-21\A016\PRIVATE\PANA_GRP\001RAQAM\" # directory with correct date and same file name (except extension)
$destinationCodec = "*.mp4" # Keep * in front of extension
$sourceCodec = ".mov"
Get-ChildItem $filepath1 -File $destinationCodec | Foreach-Object { # change *.mp4 to the extension of the newly encoded files with the wrong date
$fileName = $_.Name # sets fileName variable (with extension)
$fileName # Optional used during testing- sends the file name to the console
$fileNameB = $_.BaseName # sets fileNameB variable to the filename without extension
$filename2 = "$filepath2" + "$fileNameB" + "$sourceCodec" # assembles filepath for source
$correctTime = (Get-Item $filename2).lastwritetime # used for testing - just shows the correct time in the output, can comment out
$correctTime # prints the correct time
$_.lastwritetime = (Get-Item $filename2).lastwritetime # modifies lastwritetime of filepath1 to match filepath2
$_.creationTime = (Get-Item $filename2).lastwritetime # modifies creation times to match lastwritetime (comment out if you need creation time to be the same)
}
Update:
I think I need to use Shell.Application, but I'm getting an error message "duplicate keys ' ' are not allowed in hash literals" and am not sure how to incorporate it into the original script.
I only need the "date modified" attribute to be the same as "lastwritetime." The other fields were added just for testing. I appreciate your help!
$tags = "people; snow; weather"
$cameraModel = "AG-CX10"
$cameraMaker = "Panasonic"
$mediaCreated = "2/‎16/‎1999 ‏‎5:01 PM"
$com = (New-Object -ComObject Shell.Application).NameSpace('C:\Users\philip\Videos') #Not sure how to specify file type
$com.Items() | ForEach-Object {
New-Object -TypeName PSCustomObject -Property #{
Name = $com.GetDetailsOf($_,0) # lists current extended properties
Tags = $com.GetDetailsOf($_,18)
CameraModel = $com.GetDetailsOf($_,30)
CameraMaker = $com.GetDetailsOf($_,32)
MediaCreated = $com.GetDetailsOf($_,208)
$com.GetDetailsOf($_,18) = $tags # sets extended properties
$com.GetDetailsOf($_,30) = $cameraModel
$com.GetDetailsOf($_,32) = $cameraMaker
$com.GetDetailsOf($_,32) = $mediaCreated
}
}
Script Example
File Properties Window
I think your best option is to drive an external tool/library from Powershell rather than using the shell (not sure you can actually set values this way tbh).
Its definitely possible to use FFMpeg to set the Media Created metadata of a file like this:
ffmpeg -i input.MOV -metadata creation_time=2000-01-01T00:00:00.0000000+00:00 -codec copy output.MOV
This would copy input.MOV file to new file output.MOV and set the Media Created metadata on the new output.MOV. This is very inefficient - but it does work.
You can script ffmpeg something like the below. The script will currently output the FFMpeg commands to the screen, the commented out Start-Process line can be used to execute ffmpeg.
gci | where Extension -eq ".mov" | foreach {
$InputFilename = $_.FullName;
$OutputFilename = "$($InputFilename)-fixed.mov";
Write-Host "Reading $($_.Name). Created: $($_.CreationTime). Modifed: $($_.LastWriteTime)";
$timestamp = Get-Date -Date $_.CreationTime -Format O
Write-Host "ffmpeg -i $InputFilename -metadata creation_time=$timestamp -codec copy $OutputFilename"
# Start-Process -Wait -FilePath C:\ffmpeg\bin\ffmpeg.exe -ArgumentList #("-i $InputFilename -metadata creation_time=$timestamp -codec copy $($OutputFilename)")
}

How to read binary file with text header using stremreader in powershell?

I have a binary log file that has text headers describing the file. The headers are of the form:
FileName: <filename>
Datetime: <dateTime>
NumberOfLines: <nnn>
DataForm: l,l,d,d,d,d
DataBlock:
After that there goes a binary portion of the file:
ウゥョ・ ` 0 ウゥョ゚?~・?ヨソフ>・・?Glfウチメ>-~JUッ羲ソ濂・x・-$>;ノPセ.・4ツヌ岐・セ:)胥篩・tシj~惞ケ劔劔劒 ウゥッ ` 0 ウゥッ?Gd$・フ>・)
and so on...
The headers I can read and parse into variables using this:
$streamReader = [System.IO.StreamReader]::New($FullPath)
while (($currentLine = $streamReader.ReadLine()) -ne 'DataBlock: ') {
$variableName, $variableValue = $currentLine -split ':'
New-Variable -Name $variableName -Value $variableValue.Trim() -Force
}
Now to the binary block.
This binary portion is basically a CSV-like data structure. DataForm describes how long the fields are and what is data type for each field. NumberOfLines - how many lines there are.
So, I know how to read and parse the binary portion using:
[Byte[]] $FileByteArray = Get-Content $FullPath -Encoding Byte
and knowing the start position of the data block. For instance, the first field in the example above is 'l' which is 4 bytes for UInt32 data type. Assuming my data block starts at byte 800, I can read it like this:
$byteArray = $FileByteArray[800..803]
[BitConverter]::ToUInt32($byteArray, 0)
and so on.
Now, the question.
I'd like to use my existing StreamReader that I used to parse headers to keep advancing through the file (not by line now, but by chunks of bytes) and read it as bytes. How do I do that?
If I can only read characters using StreamReader - what other methods I can use?
I don't want to read headers using StreamReader, calculating along the line the starting position of my data block through length of each line plus two bytes for new line symbols, and then read the whole file again through [Byte[]] $FileByteArray = Get-Content $FullPath -Encoding Byte

Get checksum without built in Get-FileHash cmdlet in powershell

I cannot use the built in cmdlet Get-FileHash to generate checksum value as the version of Powershell is lower than 4.
Is there an alternative way of getting or validating the integrity of the file?
OK lets assume you have a file item (from Get-ChildItem for example)
$stream = new-object system.IO.FileStream($item.fullname, "Open", "Read", "ReadWrite")
You open the file with FileStream to get a stream object.
Then you can use one of the Crypto classes to compute its hash:
if ($stream)
{
$sha = new-object -type System.Security.Cryptography.SHA256Managed
$bytes = $sha.ComputeHash($stream)
$stream.Dispose()
$stream.Close()
$sha.Dispose()
$checksum = [System.BitConverter]::ToString($bytes).Replace("-", [String]::Empty).ToLower();
}
Finally the checksum is in $checksum and it's a nice string you can use for your compare:
5989b3cdcff6a594b2b2aef7f6288f7727019c037515c2b10627721e707cf613
You have all sort of classes to compute hashes under System.Security.Cryptography, you can see what is available here : https://msdn.microsoft.com/en-us/library/system.security.cryptography(v=vs.110).aspx