Reading part of a file into a Stream in Powershell - powershell

I have some files which are 'offsetted' Zip files in that they have 4 extra bytes at the begining which must be ignored when extracting them.
I've been using ReadAllBytes/WriteAllBytes (with an offset of 4) - that works but obviously I have to write read/write/read the file which is slow.
I'd prefer to use System.IO.Compression.ZipArchive to read from a Stream loaded from the file (sans the first 4 bytes) - but I cannot figure-out the steps required to do that?
I tried 'Seek' but ZipArchive ignores position
I cannot seem to get Byte Arrays to pass into System.IO.Compression at all...
Ideas?

Finally!
After trying all manner of hoop-jumping, it seems the simplest answer was the right one
$bytes = [system.io.file]::ReadAllBytes("file.zip4")
$ms = New-Object System.IO.MemoryStream -Argumentlist $bytes,4,($bytes.length-4)
$arch = New-Object System.IO.Compression.ZipArchive($ms)
I can then process $arch.Entries and extract things just fine - reading the file once and processing it instead of reading it, writing 'most' of it back to disc, reading that file back again!!

Related

Is there any limit to the length of text content that a PowerShell variable can hold?

I am storing the content of a text file in a variable like this -
$fileContent=$(Get-Content file1.txt)
Right now file1.txt contains 200 lines only. But if one day the file contains 10 million lines, then will this approach work? Is there any limit to the length of content that a variable can hold in PowerShell?
Get-Content reads the file into memory.
With that being said, you'd want to change the approach on what you're after. PowerShell being built on top of the .Net framework has access to all of its capabilities. So, you can use classes such as StreamReader which reads the file from disk one line at a time using a method like the one below.
$file = [System.IO.StreamReader]::new('.\Desktop\adobe_export.reg') #instantiate an istance of streamreader
while ($file.EndOfStream.Equals($false)) #if not end of file, continue.
{
# save this to a variable if needed
$file.ReadLine() # read/display line
# more code
}
$file.Close()
$file.Dispose()
First of all, you need to understand that a PS variable is a wrapper around a .NET type, so whatever that can hold, is the answer.
Regarding your actual case, you can search in Microsoft docs whatever GetType() returns, if there is a limit for that type - but there is always a memory limit. So if you read a lot of data into memory, and then return some of it after filtering/transforming/completing/whatever, you are filling memory. Instead you may NOT assign anything to a variable, but use the pipeline's one-at-a-time processing functionality, with this only that much memory is used for the items in the pipeline. Of course you might need to do more than one complex thing with the same input that need their own pipelines, but in this case you can either re-read the data, or if you think that it can change between reads and you need a snapshot, then copy it into a temporary place.

How can I use tar and tee in PowerShell to do a read once, write many, raw file copy

I'm using a small laptop to copy video files on location to multiple memory sticks (~8GB).
The copy has to be done without supervision once it's started and has to be fast.
I've identified a serious boundary to the speed, that when making several copies (eg 4 sticks, from 2 cameras, ie 8 transfers * 8Gb ) the multiple Reads use a lot of bandwidth, especially since the cameras are USB2.0 interface (two ports) and have limited capacity.
If I had unix I could use tar -cf - | tee tar -xf /stick1 | tee tar -xf /stick2 etc
which means I'd only have to pull 1 copy (2*8Gb) from each camera once, on the USB2.0 interface.
The memory sticks are generally on a hub on the single USB3.0 interface that is driven on different channel so write sufficently fast.
For reasons, I'm stuck using the current Win10 PowerShell.
I'm currently writing the whole command to a string (concatenating the various sources and the various targets) and then using Invoke-Process to execute the copy process while I'm entertaining and buying the rounds in the pub after the shoot. (hence the necessity to be afk).
I can tar cf - | tar xf a single file, but can't seem to get the tee functioning correctly.
I can also successfully use the microSD slot to do a single cameras card which is not as physically nice but is fast on one cameras recording, but I still have the bandwidth issue on the remaining camera(s). We may end up with 4-5 source cameras at the same time which means the read once, write many, is still going to be an issue.
Edit: I've just advanced to play with Get-Content -raw | tee \stick1\f1 | tee \stick2\f1 | out-null . Haven't done timings or file verification yet....
Edit2: It seems like the Get-Content -raw works properly, but the functionality of PowerShell pipelines violates two of the fundamental Commandments of programming: A program shall do one thing and do it well, Thou shalt not mess with the data stream.
For some unknown reason PowerShell default (and only) pipeline behaviour always modifies the datastream it is supposed to transfer from one stream to the next. Doesn't seem to have a -raw option nor does it seem to have a $session or $global I can set to remedy the mutilation.
How do PowerShell people transfer raw binary from one stream out, into the next process?
May be not quite what you want (if you insist on using built in Powershell commands), but if you care about speed, use streams and asynchronous Read/Write. Powershell is a great tool because it can use any .NET classes seamlessly.
The script below can easily be extended to write to more than 2 destinations and can potentially handle arbitrary streams. You might want to add some error handling via try/catch there too. You may also try to play with buffered streams with various buffer size to optimize the code.
Some references:
FileStream.ReadAsync
FileStream.WriteAsync
CancellationToken
Task.GetAwaiter
-- 2021-12-09 update: Code is modified a little to reflect suggestions from comments.
# $InputPath, $Output1Path, $Output2Path are parameters
[Threading.CancellationTokenSource] $cancellationTokenSource = [Threading.CancellationTokenSource]::new()
[Threading.CancellationToken] $cancellationToken = $cancellationTokenSource.Token
[int] $bufferSize = 64*1024
$fileStreamIn = [IO.FileStream]::new($inputPath,[IO.FileMode]::Open,[IO.FileAccess]::Read,[IO.FileShare]::None,$bufferSize,[IO.FileOptions]::SequentialScan)
$fileStreamOut1 = [IO.FileStream]::new($output1Path,[IO.FileMode]::CreateNew,[IO.FileAccess]::Write,[IO.FileShare]::None,$bufferSize)
$fileStreamOut2 = [IO.FileStream]::new($output2Path,[IO.FileMode]::CreateNew,[IO.FileAccess]::Write,[IO.FileShare]::None,$bufferSize)
try{
[Byte[]] $bufferToWriteFrom = [byte[]]::new($bufferSize)
[Byte[]] $bufferToReadTo = [byte[]]::new($bufferSize)
$Time = [System.Diagnostics.Stopwatch]::StartNew()
$bytesRead = $fileStreamIn.read($bufferToReadTo,0,$bufferSize)
while ($bytesRead -gt 0){
$bufferToWriteFrom,$bufferToReadTo = $bufferToReadTo,$bufferToWriteFrom
$writeTask1 = $fileStreamOut1.WriteAsync($bufferToWriteFrom,0,$bytesRead,$cancellationToken)
$writeTask2 = $fileStreamOut2.WriteAsync($bufferToWriteFrom,0,$bytesRead,$cancellationToken)
$readTask = $fileStreamIn.ReadAsync($bufferToReadTo,0,$bufferSize,$cancellationToken)
$writeTask1.Wait()
$writeTask2.Wait()
$bytesRead = $readTask.GetAwaiter().GetResult()
}
$time.Elapsed.TotalSeconds
}
catch {
throw $_
}
finally{
$fileStreamIn.Close()
$fileStreamOut1.Close()
$fileStreamOut2.Close()
}

Multiple file upload with mojolicious fails on large number of files

I've hit a wall and my google skills have this time failed me. I'm in the process of learning mojolicious to create a useful front end for a series of Perl scripts that I frequently use. I've got a long way through it but I'm stumped at (multiple) file uploads when the total number of files reaches 950.
Previously, I encountered the problem where- in multiple file uploads- files would begin to be uploaded, but once the filesize reached 16 mb the upload stopped. I fixed this by setting $ENV{MOJO_MAX_MESSAGE_SIZE} = 50000000000. However, this problem is different. To illustrate, this is part of my script where I try to grab the uploaded files:
my $files = $self->req->every_upload('localfiles');
for my $file ( #{$files} ) {
my $fileName = $file->filename =~ s/[^\w\d\.]+/_/gr;
$file->move_to("temporary_uploads/$fileName");
$self->app->log->debug("$fileName uploaded\n");
push #fileNames, $fileName;
};
say "FILES: ".scalar(#fileNames);
I apologise that it may be ugly. If I attempt to upload 949 files, my array #fileNames is populated correctly, but if I try to upload 950 files, my array ends up empty, and it seems as though $files is empty also. If anyone has any ideas or pointers to guide me to the solution I would be extremely grateful!
If I attempt to upload 949 files, my array #fileNames is populated correctly, but if I try to upload 950 files, my array ends up empty, and it seems as though $files is empty also.
That means the process is running out of file descriptors. In particular, the default for the Linux kernel is 1024:
For example, the kernel default for maximum number of file descriptors (ulimit -n) was 1024/1024 (soft, hard), and has been raised to 1024/4096 in Linux 2.6.39.

How convert multiple RTF-file to TXT-file

Looking here:
Is it possible to change an .rtf file to .txt file using some sort of batch script on Windows?
I have saw which possible use POWERSHELL for to do it. Was present a full example for to do it but link don't work.
Who can tell me as i can to solve it? Thanks.
You can use .NET to do this in powershell very easily by implementing the System.Windows.Forms.RichTextBox control, loading the richtextfile into it, then pulling the text version out. This is by far the easiest and quickest way I have found to do this.
My function for doing exactly this is here: https://github.com/Asnivor/PowerShell-Misc-Functions/blob/master/translate-rtf-to-txt.ps1
To explain this a little more basically:
$rtfFile = [System.Io.FileInfo]"path/to/some/rtf/file"
$txtFile = "path/to/the/destination/txt/file"
# Load *.rtf file into a hidden .NET RichTextBox
$rtBox = New-Object System.Windows.Forms.RichTextBox
$rtfText = [System.IO.File]::ReadAllText($rtfFile);
$rtBox.Rtf = $rtfText
# Get plain text version
$plainText = $rtBox.Text;
# Write the plain text out to the destination file
[System.IO.File]::WriteAllText($txtFile, $plainText)

Using Perl module IO::Uncompress::AnyUncompress

I wish to use Perl module IO::Uncompress::AnyUncompress, which is documented here : http://perldoc.perl.org/IO/Uncompress/AnyUncompress.html.
However, this documentation seems to elude the fact that a compressed archive (.zip, .7z) contains a tree of compressed files. I would like to extract only a single file from the archive and not the full archive, for example :
my $archivename = 'archive.7z';
my $filetoextract = './bin/file.lib';
my $archive = new IO::Uncompress::AnyUncompress($archivename);
my $filecontent = $archive->extract($filetoextract);
However, the API does not seem to have such an extract() fonction, neither a function that would return the list of files contained in the archive.
Have I missed something ?
IO::Uncompress::AnyUncompress only deals with a single compressed byte stream. You'll need a module like Archive::Any, Archive::Any::Lite, or Archive::Libarchive::XS.