How to read binary file with text header using stremreader in powershell? - powershell

I have a binary log file that has text headers describing the file. The headers are of the form:
FileName: <filename>
Datetime: <dateTime>
NumberOfLines: <nnn>
DataForm: l,l,d,d,d,d
DataBlock:
After that there goes a binary portion of the file:
ウゥョ・ ` 0 ウゥョ゚?~・?ヨソフ>・・?Glfウチメ>-~JUッ羲ソ濂・x・-$>;ノPセ.・4ツヌ岐・セ:)胥篩・tシj~惞ケ劔劔劒 ウゥッ ` 0 ウゥッ?Gd$・フ>・)
and so on...
The headers I can read and parse into variables using this:
$streamReader = [System.IO.StreamReader]::New($FullPath)
while (($currentLine = $streamReader.ReadLine()) -ne 'DataBlock: ') {
$variableName, $variableValue = $currentLine -split ':'
New-Variable -Name $variableName -Value $variableValue.Trim() -Force
}
Now to the binary block.
This binary portion is basically a CSV-like data structure. DataForm describes how long the fields are and what is data type for each field. NumberOfLines - how many lines there are.
So, I know how to read and parse the binary portion using:
[Byte[]] $FileByteArray = Get-Content $FullPath -Encoding Byte
and knowing the start position of the data block. For instance, the first field in the example above is 'l' which is 4 bytes for UInt32 data type. Assuming my data block starts at byte 800, I can read it like this:
$byteArray = $FileByteArray[800..803]
[BitConverter]::ToUInt32($byteArray, 0)
and so on.
Now, the question.
I'd like to use my existing StreamReader that I used to parse headers to keep advancing through the file (not by line now, but by chunks of bytes) and read it as bytes. How do I do that?
If I can only read characters using StreamReader - what other methods I can use?
I don't want to read headers using StreamReader, calculating along the line the starting position of my data block through length of each line plus two bytes for new line symbols, and then read the whole file again through [Byte[]] $FileByteArray = Get-Content $FullPath -Encoding Byte

Related

Add Content to a specific line in powershell

I have seen this post:
Add-Content - append to specific line
But I cannot add these lines because "Array index is out of range".
What my script is doing:
Find the line
Loop through the array that contains the data i want to add
$file[$line]+=$data
$line+=1
Write to file
Should I create a new file content and then add each line of the original file to it?
IF so, do you know how to do that and how to stop and add my data in between?
Here is the part of my code where I try to add:
$f=Get-Content $path
$ct=$begin+1 #$begin is the line where I want to place content under
foreach($add in $add_to_yaml)
{
$f[$ct]+=$add
$ct+=1
}
$f | Out-File -FilePath $file
Let's break down your script and try to analyze what's going on:
$f = Get-Content $path
Get-Content, by default, reads text files and spits out 1 string per individual line in the file. If the file found at $path has 10 lines, the resulting value stored in $f will be an array of 10 string values.
Worth noting is that array indices in PowerShell (and .NET in general) are zero-based - to get the 10th line from the file, we'd reference index 9 in the array ($f[9]).
That means that if you want to concatenate stuff to the end of (or "under") line 10, you need to specify index 9. For this reason, you'll want to change the following line:
$ct = $begin + 1 #$begin is the line where i want to place content under
to
$ct = $begin
Now that we have the correct starting offset, let's look at the loop:
foreach($add in $add_to_yaml)
{
$f[$ct] += $add
$ct += 1
}
Assuming $add_to_yaml contains multiple strings, the loop body will execute more than once. Let's take a look at the first statement:
$f[$ct] += $add
We know that $f[$ct] resolves to a string - and strings have the += operator overloaded to mean "string concatenation". That means that the string value stored in $f[$ct] will be modified (eg. the string will become longer), but the array $f itself does not change its size - it still contains the same number of strings, just one of them is a little longer.
Which brings us to the crux of your issue, this line right here:
$ct += 1
By incrementing the index counter, you effectively "skip" to the next string for every value in $add_to_yaml - so if the number of elements you want to add exceeds the number of lines after $begin, you naturally reach a point "beyond the bounds" of the array before you're finished.
Instead of incrementing $ct, make sure you concatenate your new string values with a newline sequence:
$f[$ct] = $f[$ct],$add -join [Environment]::Newline
Putting it all back together, you end up with something like this (notice we can discard $ct completely, since its value is constant an equal to $begin anyway):
$f = Get-Content $path
foreach($add in $add_to_yaml)
{
$f[$begin] = $f[$begin],$add -join [Environment]::Newline
}
But wait a minute - all the strings in $add_to_yaml are simply going to be joined by newlines - we can do that in a single -join operation and get rid of the loop too!
$f = Get-Content $path
$f[$begin] = #($f[$begin];$add_to_yaml) -join [Environment]::Newline
$f | Out-File -FilePath $file
Much simpler :)

Drastic Powershell vs GNU coreutils base64 output length difference

I'm trying to figure out why there's a huge difference in the output sizes when encoding a file in base64 in Powershell vs GNU coreutils. Depending on options (UTF8 vs Unicode), the Powershell output ranges from about 240MB to 318MB. Using coreutils base64 (in Cygwin, in this case), the output is about 80MB. The original filesize is about 58MB. So, 2 questions:
Why is there such a drastic difference?
How can I get Powershell to give the smaller output that the GNU tool gives?
Here are the specific commands I used:
Powershell smaller output:
$input = "C:\Users\my.user\myfile.pdf"
$filecontent = get-content $input
$converted = [System.Text.Encoding]::UTF8.GetBytes($filecontent)
$encodedtext = [System.Convert]::ToBase64String($converted)
$encodedtext | Out-File "C:\Users\my.user\myfile.pdf.via_ps.base64"
The larger Powershell output came from simply replacing "UTF8" with "Unicode". It will be obvious that I'm pretty new to Powershell; I'm sure someone only slightly better with it could combine that into a couple of simple lines.
Coreutils (via Cygwin) base64:
base64.exe -w0 myfile.pdf > myfile.pdf.via_cygwin.base64
Why is there such a drastic difference?
Because you're doing something wildly different in PowerShell
How can I get Powershell to give the smaller output that the GNU tool gives?
By doing what base64 does :)
Let's have a look at what base64 ... > ... actually does:
base64:
Opens file handle to input file
Reads raw byte stream from disk
Converts every 3-byte pair to a 4-byte base64-encoded output string-fragment
>:
Writes raw byte stream to disk
Since the 4-byte output fragments only contain byte values that correspond to 64 printable ASCII characters, the command never actually does any "string manipulation" - the values on which it operates just happen to also be printable as ASCII strings and the resulting file is therefor indistinguishable from a "text file".
Your PowerShell script on the other hand does lots of string manipulation:
Get-Content $input:
Opens file handle to input file
Reads raw byte stream from disk
Decodes the byte stream according to some chosen encoding scheme (likely your OEM codepage)
[Encoding]::UTF8.GetBytes():
Re-encodes the resulting string using UTF8
[Convert]::ToBase64String()
Converts every 3-byte pair to a 4-byte base64-encoded output string-fragment
Out-File:
Encodes input string as little-endian UTF16
Writes to disk
The three additional string encoding steps highlighted above will result in a much-inflated byte stream, which is why you're seeing the output size double or triple.
How to base64-encode files then?
The trick here is to read the raw bytes from disk and pass those directly to [convert]::ToBase64String()
It is technically possibly to just read the entire file into an array at once:
$bytes = Get-Content path\to\file.ext -Encoding Byte # Windows PowerShell only
# or
$bytes = [System.IO.File]::ReadAllBytes($(Convert-Path path\to\file.ext))
$b64String = [convert]::ToBase64String($bytes)
Set-Content path\to\output.base64 -Value $b64String -Encoding Ascii
... I'd strongly recommend against doing so for files larger than a few kilobytes.
Instead, for file transformation in general you'll want to use streams. In this particular case, you'll want want to use a CryptoStream with a ToBase64Transform to re-encode a file stream as base64:
function New-Base64File {
[CmdletBinding(DefaultParameterSetName = 'ByPath')]
param(
[Parameter(Mandatory = $true, ParameterSetName = 'ByPath', Position = 0)]
[string]$Path,
[Parameter(Mandatory = $true, ParameterSetName = 'ByPSPath')]
[Alias('PSPath')]
[string]$LiteralPath,
[Parameter(Mandatory = $true, Position = 1)]
[string]$Destination
)
# Create destination file if it doesn't exist
if (-not(Test-Path -LiteralPath $Destination -PathType Leaf)) {
$outFile = New-Item -Path $Destination -ItemType File
}
else {
$outFile = Get-Item -LiteralPath $Destination
}
[void]$PSBoundParameters.Remove('Destination')
try {
# Open a writable file stream to the output file
$outStream = $outFile.OpenWrite()
# Wrap output file stream in a CryptoStream.
#
# Anything that we write to the crypto stream is automatically
# base64-encoded and then written through to the output file stream
$transform = [System.Security.Cryptography.ToBase64Transform]::new()
$cryptoStream = [System.Security.Cryptography.CryptoStream]::new($outStream, $transform, 'Write')
foreach ($file in Get-Item #PSBoundParameters) {
try {
# Open readable input file stream
$inStream = $file.OpenRead()
# Copy input bytes to crypto stream
# - which in turn base64-encodes and writes to output file
$inStream.CopyTo($cryptoStream)
}
finally {
# Clean up the input file stream
$inStream | ForEach-Object Dispose
}
}
}
finally {
# Clean up the output streams
$transform, $cryptoStream, $outStream | ForEach-Object Dispose
}
}
Now you can do:
$inputPath = "C:\Users\my.user\myfile.pdf"
New-Base64File $inputPath -Destination "C:\Users\my.user\myfile.pdf.via_ps.base64"
And expect an output the same size as with base64

Powershell: Replace headers while using Import-CSV

I found a related answer here that is really helpful, but not quite what I'm looking for. There are also a number of other questions I've looked at, but I can't figure out how to get this to work unfortunately and it seems rather simple.
Basically, I'm using Import-Csv and manipulating a lot of data; but the names of the headers can sometimes change. So instead of re-writing my code, I'd like to map the headers I'm given to the headers that are used in my code blocks. Outputting the final data as a CSV, I can leave it using the 'updated headers' or, if I can figure out how to swap headers easily, I could always swap them back to what they were.
So let's say I have a mapping file in Excel. I can do the mapping in rows or columns, whichever will be easier. For this first example, I have the mapping in rows. When I use Import-CSV, I want to use the Headers from Row #2 instead of the headers in Row #1. Here's the content of the mapping file:
So basically if I hard coded this all, I'd have something like:
$null, $headerRow, $dataRows = (Get-Content -Raw foo.csv) -split '(^.+\r?\n)', 2
ConvertFrom-Csv ($headerRow.Trim() -replace 'Identification', 'ID' -replace 'Revenue Code', 'Revenue_Code' -replace 'Total Amount for Line', 'Amount' -replace 'Total Quantity for Line', 'Qty'), $dataRows
Except I don't want to hard code it, I am basically looking for a way to use Replace with a mapping file or hashtable if I can create one.
#Pseudo code for what I want
$hashtable = Get-Content mapping.xlsx
ConvertFrom-Csv ($headerRow.Trim() -replace $hashtable.Name, $hashtable.Value), $dataRows
I'm probably failing and failing to find similar examples since I'm trying to be flexible on the format of the mapping file. My original idea was to basically treat the 1st row as a string, and to replace that entire string with the second row. But the hashtable idea came from likely restructuring the mapping to look like this:
Here I would basically -replace each Source value with the corresponding Target value.
EDIT If you need to convert back, give this a shot - but keep in mind it'll only work if you have a one-to-one relationship of Source:Target values.
#Changing BACK to the original Headers...
$Unmap = #{}
(Import-Csv MappingTable.csv).ForEach({$Unmap[$_.Target] = $_.Source})
#Get string data from CSV Objects
$stringdata = $outputFixed | ConvertTo-CSV -NoTypeInformation
$headerRow = $stringdata[0]
$dataRows = $stringdata[1..($stringdata.Count-1)] -join "`r`n"
#Create new header data
$unmappedHeaderRow = ($headerRow -replace '"' -split ',').ForEach({'"' + $Unmap[$_] + '"'}) -join ','
$newdata = ConvertFrom-Csv $unmappedHeaderRow, $dataStrings
Here's a complete example that builds on your original attempt:
It provides the column-name (header) mapping via (another) .csv file, with columns Source and Target, where each row maps a source name to a target name, as (also) shown in your question.
The mapping CSV file is transformed into a hashtable that maps source names to target names.
The data CSV file is then read as plain text, as in your question - efficiently, but in full - split into header row and data rows, and a new header row with the mapped names is constructed with the help of the hashtable.
The new header row plus the data rows are then sent to ConvertFrom-Csv for to-object conversion based on the mapped column (property) names.
# Create sample column-name mapping file.
#'
Source,Target
Identification,Id
Revenue Code,Revenue_Code
'# > mapping.csv
# Create a hashtable from the mapping CSV file
# that maps each Source column value to its Target value.
$map = #{}
(Import-Csv mapping.csv).ForEach({ $map[$_.Source] = $_.Target })
# Create sample input CSV file.
#'
Revenue Code,Identification
r1,i1
r2,i2
'# > data.csv
# Read the data file as plain text, split into a header line and
# a multi-line string comprising all data lines.
$headerRow, $dataRows = (Get-Content -Raw data.csv) -split '\r?\n', 2
# Create the new header based on the column-name mapping.
$mappedHeaderRow =
($headerRow -replace '"' -split ',').ForEach({ $map[$_] }) -join ','
# Parse the data rows with the new header.
$mappedHeaderRow, $dataRows | ConvertFrom-Csv
The above outputs the following, showing that the columns were effectively mapped (renamed):
Revenue_Code Id
------------ --
r1 i1
r2 i2
The easiest thing to do here is to process the CSV and then transform each row, from whatever format it was, into a new desired target format.
Pretend we have an input CSV like this.
RowID,MayBeNull,MightHaveAValue
1,,Value1
2,Value2,
3,,Value3
Then we import the csv like so:
#helper function for ugly logic
function HasValue($param){
return -not [string]::IsNullOrEmpty($param)
}
$csv = import-csv C:\pathTo\this.csv
foreach($row in $csv){
if (HasValue($row.MayBeNull)){
$newColumn = $row.MayBeNull
}
else{
$newColumn = $row.MightHaveAValue
}
#generate new output
[psCustomObject]#{
Id = $row.RowId;
NewColumn = $newColumn
}
}
Which gives the following output:
This is an easy pattern to follow for a data migration script, then you just need to scale it up to fix your problem.

Comparing string outputs between Azure Properties.ContentMD5 and Get-Filehash

How do I compare the output of Get-FileHash directly with the output of Properties.ContentMD5?
I'm putting together a PowerShell script that takes some local files from my system and copies them to an Azure Blob Storage Container.
The files change daily so I have added in a check to see if the file already exists in the container before uploading it.
I use Get-FileHash to read the local file:
$LocalFileHash = (Get-FileHash "D:\file.zip" -Algorithm MD5).Hash
Which results in $LocalFileHash holding this: 67BF2B6A3E6657054B4B86E137A12382
I use this code to get the checksum of the blob file already transferred to the container:
$BlobFile = "Path\To\file.zip"
$AZContext = New-AZStorageContext -StorageAccountName $StorageAccountName -SASToken "<token here>"
$RemoteBlobFile = Get-AzStorageBlob -Container $ContainerName -Context $AZContext -Blob $BlobFile -ErrorAction Ignore
if ($ExistingBlobFile) {
$cloudblob = [Microsoft.Azure.Storage.Blob.CloudBlockBlob]$RemoteBlobFile.ICloudBlob
$RemoteBlobHash = $cloudblob.Properties.ContentMD5
}
This value of $RemoteBlobHash is set to Z78raj5mVwVLS4bhN6Ejgg==
No problem, I thought, I'll just decrypt the Base64 string and compare:
$output = [System.Text.Encoding]::UTF8.GetString([System.Convert]::FromBase64String($RemoteBlobHash))
Which gives me g�+j>fWKK��7�#� so not directly comparable ☹
This question shows someone in a similar pickle but I don't think they were using Get-FileHash given the format of their local MD5 result.
Other things I've tried:
changing the System.Text.Encoding line above UTF8 to UTF16 & ASCII which changes the output but not to anything recognisable.
dabbling with GetBytes to see if that helped:
$output = [System.Text.Encoding]::UTF8.GetBytes([System.Text.Encoding]::UTF16.GetString([System.Convert]::FromBase64String($RemoteBlobHash)))
Note: Using md5sum to compare the local file and a downloaded copy of file.zip results in the same MD5 string as Get-FileHash: 67BF2B6A3E6657054B4B86E137A12382
Thank you in advance!
ContentMD5 is a base64 representation of the binary hash value, not the resulting hex string :)
$md5sum = [convert]::FromBase64String('Z78raj5mVwVLS4bhN6Ejgg==')
$hdhash = [BitConverter]::ToString($md5sum).Replace('-','')
Here we convert base64 -> binary -> hexadecimal
If you need to do it the other way around (ie. for obtaining a local file hash, then using that to search for blobs in Azure), you'll first need to split the hexadecimal string into byte-size chunks, then convert the resulting byte array to base64:
$hdhash = '67BF2B6A3E6657054B4B86E137A12382'
$bytes = [byte[]]::new($hdhash.Length / 2)
for($i = 0; $i -lt $bytes.Length; $i++){
$offset = $i * 2
$bytes[$i] = [convert]::ToByte($hdhash.Substring($offset,2), 16)
}
$md5sum = [convert]::ToBase64String($bytes)
 

cleanup improperly formatted csv file

I am downloading a xlsx file from a sharepoint, and then convert it into a csv file. However, since the xlsx file contained empty columns that were not deleted, it exports those to a csv file like follows...
columnOne,columnTwo,columnThree,,,,
valueOne,,,,,,
,valueTwo,,,,,
,,valueThree,,,,
As you can see, Import-Csv cmdlet will fail with that file because of the extra null titles. I want to know how to count the extra commas at the end. The number of columns are always changing, and the name of the columns are also always changing. So we start the count based from the last non-null title number.
Right now, I'm doing the following...
$csvFileEdited = Get-Content $csvFile
$csvFileEdited[0] = $csvFileEdited[0].TrimEnd(',')
$csvFileEdited | Set-Content "$csvFile-temp"
Move-Item "$csvFile-temp" $csvFile -Force
Write-Host "Trim Complete."
This will make the file output like this...
columnOne,columnTwo,columnThree
valueOne,,,,,,
,valueTwo,,,,,
,,valueThree,,,,
The naming is now accepted for Import-Csv, but as you can see there is still extra null values that are not necessary since they are null for every row.
If I did the following code...
$csvFileWithExtraCommas = Get-Content $csvFile
$csvFileWithoutExtraCommas = #()
FOrEach ($line in $csvFileWithExtraCommas)
{
$line = $line.TrimEnd(',')
$csvFileWithoutExtraCommas += $line
{
$csvFileWithoutExtraCommas | Set-Content "$csvFile-temp"
Move-Item "$csvFile-temp" $csvFile -Force
Write-Host "Trim Complete."
Then it would remove a null value that should be null because it belongs to a non-null title-name. Such is the output....
columnOne,columnTwo,columnThree
valueOne
,valueTwo
,,valueThree
Here is the desired output:
columnOne,columnTwo,columnThree
valueOne,,
,valueTwo,
,,valueThree
Can anyone help with this?
Update
I'm using the following code to count the extra null titles...
$csvFileWithCommas = Get-Content $csvFile
[int]$csvFileWithExtraCommasNumber = $csvFileWithCommas[0].Length
$csvFileTitlesWithoutExtraCommas = $csvFileWithCommas[0].TrimEnd(',')
[int]$csvFileWithoutExtraCommasNumber = $csvFileTitlesWithoutExtraCommas.Length
$numOfCommas = $csvFileWithExtraCommasNumber - $csvFileWithoutExtraCommasNumber
The output of value of $numOfCommas is 4. Now the question is how can I use $line.TrimEnd(',') to only do so 4 times??
Ok.... If you really need to do this you can count the trailing commas from the header and use regex to remove as many the from the end of each line. There are other string manipulation approaches but the regex in this case is pretty clean.
Note that what Bluecakes answer shows should suffice. Perhaps there is some other hidden characters that are not being copied in the question or perhaps an encoding issue with your real file.
$file = Get-Content "D:\temp\text.csv"
# Number of trailing commas. Compare the length before and after the trim
$numberofcommas = $file[0].Length - $file[0].TrimEnd(",").Length
# Use regex to remove as many commas from the end of each line and convert to csv object.
$file -replace ",{$numberofcommas}$" | ConvertFrom-Csv
Regex is looking for X commas at the end of of each line where X is $numberofcommas. In our case it would look like ,{4}$
Source file used with above code was generated as such
#"
columnOne,columnTwo,columnThree,,,,
valueOne,,,,,,
,valueTwo,,,,,
,,valueThree,,,,
"# | set-content D:\temp\text.csv
Are you getting an error when trying to Import-csv? The cmdlet is smart enough to ignore columns without a heading without any additional code needed.
I copied your csv file to my H:\ drive:
columnOne,columnTwo,columnThree,,,,
valueOne,,,,,,
,valueTwo,,,,,
,,valueThree,,,,
and then ran $nullcsv = Import-Csv -Path H:\nullcsv.csv and this is what i got
PS> $nullcsv
columnOne columnTwo columnThree
--------- --------- -----------
valueOne
valueTwo
valueThree
The imported csv only contains 3 values as you would expect:
PS> $nullcsv.count
3
The cmdlet is also orrectly accounting for null values in each of the columns:
PS> $nullcsv | Format-List
columnOne : valueOne
columnTwo :
columnThree :
columnOne :
columnTwo : valueTwo
columnThree :
columnOne :
columnTwo :
columnThree : valueThree