Searching many large text files in Powershell

Searching many large text files in Powershell - powershell

I frequently have to search server log files in a directory that may contain 50 or more files of 200+ MB each. I've written a function in Powershell to do this searching. It finds and extracts all the values for a given query parameter. It works great on an individual large file or a collection of small files but totally bites in the above circumstance, a directory of large files.
The function takes a parameter, which consists of the query parameter to be searched.
In pseudo-code:
Take parameter (e.g. someParam or someParam=([^& ]+))
Create a regex (if one is not supplied)
Collect a directory list of *.log, pipe to Select-String
For each pipeline object, add the matchers to a hash as keys
Increment a match counter
Call GC
At the end of the pipelining:
if (hash has keys)
enumerate the hash keys,
sort and append to string array
set-content the string array to a file
print summary to console
exit
else
print summary to console
exit
Here's a stripped-down version of the file processing.
$wtmatches = #{};
gci -Filter *.log | Select-String -Pattern $searcher |
%{ $wtmatches[$_.Matches[0].Groups[1].Value]++; $items++; [GC]::Collect(); }
I'm just using an old perl trick of de-duplicating found items by making them the keys of a hash. Perhaps, this is an error, but a typical output of the processing is going to be around 30,000 items at most. More normally, found items is in the low thousands range. From what I can see, the number of keys in the hash does not affect processing time, it is the size and number of the files that breaks it. I recently threw in the GC in desperation, it does have some positive effect but it is marginal.
The issue is that with the large collection of large files, the processing sucks the RAM pool dry in about 60 seconds. It doesn't actually use a lot of CPU, interestingly, but there's a lot of volatile storage going on. Once the RAM usage has gone up over 90%, I can just punch out and go watch TV. It could take hours to complete the processing to produce a file with 15,000 or 20,000 unique values.
I would like advice and/or suggestions for increasing the efficiency, even if that means using a different paradigm to accomplish the processing. I went with what I know. I use this tool on almost a daily basis.
Oh, and I'm committed to using Powershell. ;-) This function is part of a complete module I've written for my job, so, suggestions of Python, perl or other useful languages are not useful in this case.
Thanks.
mp
Update:
Using latkin's ProcessFile function, I used the following wrapper for testing. His function is orders of magnitude faster than my original.
function Find-WtQuery {
<#
.Synopsis
Takes a parameter with a capture regex and a wildcard for files list.
.Description
This function is intended to be used on large collections of large files that have
the potential to take an unacceptably long time to process using other methods. It
requires that a regex capture group be passed in as the value to search for.
.Parameter Target
The parameter with capture group to find, e.g. WT.z_custom=([^ &]+).
.Parameter Files
The file wildcard to search, e.g. '*.log'
.Outputs
An object with an array of unique values and a count of total matched lines.
#>
param(
[Parameter(Mandatory = $true)] [string] $target,
[Parameter(Mandatory = $false)] [string] $files
)
begin{
$stime = Get-Date
}
process{
$results = gci -Filter $files | ProcessFile -Pattern $target -Group 1;
}
end{
$etime = Get-Date;
$ptime = $etime - $stime;
Write-Host ("Processing time for {0} files was {1}:{2}:{3}." -f (gci
-Filter $files).Count, $ptime.Hours,$ptime.Minutes,$ptime.Seconds);
return $results;
}
}
The output:
clients:\test\logs\global
{powem} [4] --> Find-WtQuery -target "WT.ets=([^ &]+)" -files "*.log"
Processing time for 53 files was 0:1:35.
Thanks to all for comments and help.

Here's a function that will hopefully speed up and reduce the memory impact of the file processing part. It will return an object with 2 properties: The total count of lines matched, and a sorted array of unique strings from the match group specified. (From your description it sounds like you don't really care about the count per string, just the string values themselves)
function ProcessFile
{
param(
[Parameter(ValueFromPipeline = $true, Mandatory = $true)]
[System.IO.FileInfo] $File,
[Parameter(Mandatory = $true)]
[string] $Pattern,
[Parameter(Mandatory = $true)]
[int] $Group
)
begin
{
$regex = new-object Regex #($pattern, 'Compiled')
$set = new-object 'System.Collections.Generic.SortedDictionary[string, int]'
$totalCount = 0
}
process
{
try
{
$reader = new-object IO.StreamReader $_.FullName
while( ($line = $reader.ReadLine()) -ne $null)
{
$m = $regex.Match($line)
if($m.Success)
{
$set[$m.Groups[$group].Value] = 1
$totalCount++
}
}
}
finally
{
$reader.Close()
}
}
end
{
new-object psobject -prop #{TotalCount = $totalCount; Unique = ([string[]]$set.Keys)}
}
}
You can use it like this:
$results = dir *.log | ProcessFile -Pattern 'stuff (capturegroup)' -Group 1
"Total matches: $($results.TotalCount)"
$results.Unique | Out-File .\Results.txt

IMO #latkin's approach is the way to go if you want do this within PowerShell and not use some dedicated tool. I made a few changes though to make the command play better with respect to accepting pipeline input. I also modified the regex to search for all matches on a particular line. Neither approach searches across multiple lines although that scenario would be pretty easy to handle as long as the pattern only ever spanned a few lines. Here's my take on the command (put it in a file called Search-File.ps1):
[CmdletBinding(DefaultParameterSetName="Path")]
param(
[Parameter(Mandatory=$true, Position=0)]
[ValidateNotNullOrEmpty()]
[string]
$Pattern,
[Parameter(Mandatory=$true, Position=1, ParameterSetName="Path",
ValueFromPipeline=$true, ValueFromPipelineByPropertyName=$true,
HelpMessage="Path to ...")]
[ValidateNotNullOrEmpty()]
[string[]]
$Path,
[Alias("PSPath")]
[Parameter(Mandatory=$true, Position=1, ParameterSetName="LiteralPath",
ValueFromPipelineByPropertyName=$true,
HelpMessage="Path to ...")]
[ValidateNotNullOrEmpty()]
[string[]]
$LiteralPath,
[Parameter()]
[ValidateRange(0, [int]::MaxValue)]
[int]
$Group = 0
)
Begin
{
Set-StrictMode -Version latest
$count = 0
$matched = #{}
$regex = New-Object System.Text.RegularExpressions.Regex $Pattern,'Compiled'
}
Process
{
if ($psCmdlet.ParameterSetName -eq "Path")
{
# In the -Path (non-literal) case we may need to resolve a wildcarded path
$resolvedPaths = #($Path | Resolve-Path | Convert-Path)
}
else
{
# Must be -LiteralPath
$resolvedPaths = #($LiteralPath | Convert-Path)
}
foreach ($rpath in $resolvedPaths)
{
Write-Verbose "Processing $rpath"
$stream = new-object System.IO.FileStream $rpath,'Open','Read','Read',4096
$reader = new-object System.IO.StreamReader $stream
try
{
while (($line = $reader.ReadLine())-ne $null)
{
$matchColl = $regex.Matches($line)
foreach ($match in $matchColl)
{
$count++
$key = $match.Groups[$Group].Value
if ($matched.ContainsKey($key))
{
$matched[$key]++
}
else
{
$matched[$key] = 1;
}
}
}
}
finally
{
$reader.Close()
}
}
}
End
{
new-object psobject -Property #{TotalCount = $count; Matched = $matched}
}
I ran this against my IIS log dir (8.5 GB and ~1000 files) to find all the IP addresses in all the logs e.g.:
$r = ls . -r *.log | C:\Users\hillr\Search-File.ps1 '\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}'
This took 27 minutes on my system and found 54356330 matches:
$r.Matched.GetEnumerator() | sort Value -Descending | select -f 20
Name Value
---- -----
xxx.140.113.47 22459654
xxx.29.24.217 13430575
xxx.29.24.216 13321196
xxx.140.113.98 4701131
xxx.40.30.254 53724

Related

How do I compress files using powershell that are over 2 GB?

I am working on a project to compress files that range anywhere from a couple mb to several gb's big and I am trying to use powershell to compress them into a .zip. The main problem I am having is that using Compress-Archive has a 2 Gb cap to individual file size, and I was wondering if there was another method to compress files.
Edit:
So for this project we are looking to implement a system to take .pst files from outlook and compress them to a .zip and upload them to a server. Once they are uploaded they will be pulled down from a new device and extracted into a .pst file again.

NOTE
Further updates to this function will be published to the official GitHub repo as well as to the PowerShell Gallery. The code in this answer will no longer be maintained.
Contributions are more than welcome, if you wish to contribute, fork the repo and submit a pull request with the changes.
To explain the limitation named on PowerShell Docs for Compress-Archive:
The Compress-Archive cmdlet uses the Microsoft .NET API System.IO.Compression.ZipArchive to compress files. The maximum file size is 2 GB because there's a limitation of the underlying API.
This happens because this cmdlet uses a Memory Stream to hold the bytes in memory and then write them to the file. Inspecting the InnerException produced by the cmdlet we can see:
System.IO.IOException: Stream was too long.
at System.IO.MemoryStream.Write(Byte[] buffer, Int32 offset, Int32 count)
at CallSite.Target(Closure , CallSite , Object , Object , Int32 , Object )
We would also see a similar issue if we attempt to read all bytes from a file larger than 2Gb:
Exception calling "ReadAllBytes" with "1" argument(s): "The file is too long.
This operation is currently limited to supporting files less than 2 gigabytes in size."
Coincidentally, we see the same limitation with System.Array:
.NET Framework only: By default, the maximum size of an Array is 2 gigabytes (GB).
There is also another limitation pointed out in this question, Compress-Archive can't compress if another process has a handle on a file.
How to reproduce?
# cd to a temporary folder and
# start a Job which will write to a file
$job = Start-Job {
0..1000 | ForEach-Object {
"Iteration ${_}:" + ('A' * 1kb)
Start-Sleep -Milliseconds 200
} | Set-Content .\temp\test.txt
}
Start-Sleep -Seconds 1
# attempt to compress
Compress-Archive .\temp\test.txt -DestinationPath test.zip
# Exception:
# The process cannot access the file '..\test.txt' because it is being used by another process.
$job | Stop-Job -PassThru | Remove-Job
Remove-Item .\temp -Recurse
To overcome this issue, and also to emulate explorer's behavior when compressing files used by another process, the function posted below will default to [FileShare] 'ReadWrite, Delete' when opening a FileStream.
To get around this issue there are two workarounds:
The easy workaround is to use the ZipFile.CreateFromDirectory Method. There are 3 limitations while using this static method:
The source must be a directory, a single file cannot be compressed.
All files (recursively) on the source folder will be compressed, we can't pick / filter files to compress.
It's not possible to Update the entries of an existing Zip Archive.
Worth noting, if you need to use the ZipFile Class in Windows PowerShell (.NET Framework) there must be a reference to System.IO.Compression.FileSystem. See inline comments.
# Only needed if using Windows PowerShell (.NET Framework):
Add-Type -AssemblyName System.IO.Compression.FileSystem
[IO.Compression.ZipFile]::CreateFromDirectory($sourceDirectory, $destinationArchive)
The code it yourself workaround, would be using a function which does all the manual process for creating the ZipArchive and the corresponding ZipEntries.
This function should be able to handle compression same as ZipFile.CreateFromDirectory Method but also allow filtering files and folders to compress while keeping the file / folder structure untouched.
Documentation as well as usage example can be found here.
using namespace System.IO
using namespace System.IO.Compression
using namespace System.Collections.Generic
Add-Type -AssemblyName System.IO.Compression
function Compress-ZipArchive {
[CmdletBinding(DefaultParameterSetName = 'Path')]
[Alias('zip', 'ziparchive')]
param(
[Parameter(ParameterSetName = 'PathWithUpdate', Mandatory, Position = 0, ValueFromPipeline)]
[Parameter(ParameterSetName = 'PathWithForce', Mandatory, Position = 0, ValueFromPipeline)]
[Parameter(ParameterSetName = 'Path', Mandatory, Position = 0, ValueFromPipeline)]
[string[]] $Path,
[Parameter(ParameterSetName = 'LiteralPathWithUpdate', Mandatory, ValueFromPipelineByPropertyName)]
[Parameter(ParameterSetName = 'LiteralPathWithForce', Mandatory, ValueFromPipelineByPropertyName)]
[Parameter(ParameterSetName = 'LiteralPath', Mandatory, ValueFromPipelineByPropertyName)]
[Alias('PSPath')]
[string[]] $LiteralPath,
[Parameter(Position = 1, Mandatory)]
[string] $DestinationPath,
[Parameter()]
[CompressionLevel] $CompressionLevel = [CompressionLevel]::Optimal,
[Parameter(ParameterSetName = 'PathWithUpdate', Mandatory)]
[Parameter(ParameterSetName = 'LiteralPathWithUpdate', Mandatory)]
[switch] $Update,
[Parameter(ParameterSetName = 'PathWithForce', Mandatory)]
[Parameter(ParameterSetName = 'LiteralPathWithForce', Mandatory)]
[switch] $Force,
[Parameter()]
[switch] $PassThru
)
begin {
$DestinationPath = $PSCmdlet.GetUnresolvedProviderPathFromPSPath($DestinationPath)
if([Path]::GetExtension($DestinationPath) -ne '.zip') {
$DestinationPath = $DestinationPath + '.zip'
}
if($Force.IsPresent) {
$fsMode = [FileMode]::Create
}
elseif($Update.IsPresent) {
$fsMode = [FileMode]::OpenOrCreate
}
else {
$fsMode = [FileMode]::CreateNew
}
$ExpectingInput = $null
}
process {
$isLiteral = $false
$targetPath = $Path
if($PSBoundParameters.ContainsKey('LiteralPath')) {
$isLiteral = $true
$targetPath = $LiteralPath
}
if(-not $ExpectingInput) {
try {
$destfs = [File]::Open($DestinationPath, $fsMode)
$zip = [ZipArchive]::new($destfs, [ZipArchiveMode]::Update)
$ExpectingInput = $true
}
catch {
$zip, $destfs | ForEach-Object Dispose
$PSCmdlet.ThrowTerminatingError($_)
}
}
$queue = [Queue[FileSystemInfo]]::new()
foreach($item in $ExecutionContext.InvokeProvider.Item.Get($targetPath, $true, $isLiteral)) {
$queue.Enqueue($item)
$here = $item.Parent.FullName
if($item -is [FileInfo]) {
$here = $item.Directory.FullName
}
while($queue.Count) {
try {
$current = $queue.Dequeue()
if($current -is [DirectoryInfo]) {
$current = $current.EnumerateFileSystemInfos()
}
}
catch {
$PSCmdlet.WriteError($_)
continue
}
foreach($item in $current) {
try {
if($item.FullName -eq $DestinationPath) {
continue
}
$relative = $item.FullName.Substring($here.Length + 1)
$entry = $zip.GetEntry($relative)
if($item -is [DirectoryInfo]) {
$queue.Enqueue($item)
if(-not $entry) {
$entry = $zip.CreateEntry($relative + '\', $CompressionLevel)
}
continue
}
if(-not $entry) {
$entry = $zip.CreateEntry($relative, $CompressionLevel)
}
$sourcefs = $item.Open([FileMode]::Open, [FileAccess]::Read, [FileShare] 'ReadWrite, Delete')
$entryfs = $entry.Open()
$sourcefs.CopyTo($entryfs)
}
catch {
$PSCmdlet.WriteError($_)
}
finally {
$entryfs, $sourcefs | ForEach-Object Dispose
}
}
}
}
}
end {
$zip, $destfs | ForEach-Object Dispose
if($PassThru.IsPresent) {
$DestinationPath -as [FileInfo]
}
}
}

Using Powershell to output characters (not lines) after a match in a large file

I use powershell to parse huge files and easily take a look at a small part of the file where a certain string occurs.. like this:
Select-String P120300420059211107104259.txt -Pattern "<ID>9671510841" -Context 0,300
This gives me 300 lines of the file after the occurance of that ID number.
But I've come across a file that has no carriage returns. Now I would like to do the same thing, but instead of lines being returned, I guess I need characters.
How would I do this?
I have never created scripts in powershell - just ran simple commands like the above.
I would like to see maybe 1000 characters after the matched string, within a huge file.
THanks!

The problem with using Select-String or [Regex]::Matches() (or -match) to test for the presence of a substring in a single-line file is that you first need to read the whole file into memory at once.
The good news is that you don't need regular expressions to find a substring in a huge single-line text file - instead, you can read the file contents into memory in smaller chunks and then search through those - this way you don't need to store the entire file in memory at once.
Reading buffered text from a file is fairly straightforward:
Open a readable file stream
Create a StreamReader to read from the file stream
Start reading!
Then you just need to check whether:
The target substring is found in each chunk, or
The start of the target substring is partially found at the tail end of the current chunk
And then repeat until you find the substring, at which point you read the following 1000 characters.
Here's an example of how you could implement it as script function (I've tried to explain the code in more detail in inline comments):
function Find-SubstringWithPostContext {
[CmdletBinding(DefaultParameterSetName = 'wp')]
param(
[Alias('PSPath')]
[Parameter(Mandatory = $true, ParameterSetName = 'lp', ValueFromPipelineByPropertyName = $true, ValueFromPipeline = $true)]
[string[]]$LiteralPath,
[Parameter(Mandatory = $true, ParameterSetName = 'wp', Position = 0)]
[string[]]$Path,
[Parameter(Mandatory = $true)]
[ValidateLength(1, 5000)]
[string]$Substring,
[ValidateRange(2, 25000)]
[int]$PostContext = 1000,
[switch]$All,
[System.Text.Encoding]
$Encoding
)
begin {
# start by ensuring we'll be using a buffer that's at least 4 larger than the
# target substring to avoid too many tail searches
$bufferSize = 2000
while ($Substring.Length -gt $bufferSize / 4) {
$bufferSize *= 2
}
$buffer = [char[]]::new($bufferSize)
}
process {
if ($PSCmdlet.ParameterSetName -eq 'wp') {
# resolve input paths if necessary
$LiteralPath = $Path | Convert-Path
}
:fileLoop
foreach ($lp in $LiteralPath) {
$file = Get-Item -LiteralPath $lp
# skip directories
if ($file -isnot [System.IO.FileInfo]) { continue }
try {
$fileStream = $file.OpenRead()
$scanner = [System.IO.StreamReader]::new($fileStream, $true)
do {
# remember the current offset in the file, we'll need this later
$baseOffset = $fileStream.Position
# read a chunk from the file, convert to string
$readCount = $scanner.ReadBlock($buffer, 0, $bufferSize)
$string = [string]::new($buffer, 0, $readCount)
$eof = $readCount -lt $bufferSize
# test if target substring is found in the chunk we just read
$indexOfTarget = $string.IndexOf($Substring)
if ($indexOfTarget -ge 0) {
Write-Verbose "Substring found in chunk at local index ${indexOfTarget}"
# we found a match, ensure we've read enough post-context ahead of the given index
$tail = ''
if ($string.Length - $indexOfTarget -lt $PostContext -and $readCount -eq $bufferSize) {
# just like above, we read another chunk from the file and convert it to a proper string
$tailBuffer = [char[]]::new($PostContext - ($string.Length - $indexOfTarget))
$tailCount = $scanner.ReadBlock($tailBuffer, 0, $tailBuffer.Length)
$tail = [string]::new($tailBuffer, 0, $tailCount)
}
# construct and output the full post-context
$substringWithPostContext = $string.Substring($indexOfTarget) + $tail
if($substringWithPostContext.Length -gt $PostContext){
$substringWithPostContext = $substringWithPostContext.Remove($PostContext)
}
Write-Verbose "Writing output object ..."
Write-Output $([PSCustomObject]#{
FilePath = $file.FullName
Offset = $baseOffset + $indexOfTarget
Value = $substringWithPostContext
})
if (-not $All) {
# no need to search this file any further unless `-All` was specified
continue fileLoop
}
else {
# rewind to position after this match before next iteration
$rewindOffset = $indexOfTarget - $readCount
$null = $scanner.BaseStream.Seek($rewindOffset, [System.IO.SeekOrigin]::Current)
}
}
else {
# target was not found, but we may have "clipped" it in half,
# so figure out if target string could start at the end of current string chunk
for ($i = $string.Length - $target.Length; $i -lt $string.Length; $i++) {
# if the first character of the target substring isn't found then
# we might as well skip it immediately
if ($string[$i] -ne $target[0]) { continue }
if ($target.StartsWith($string.Substring($i))) {
# rewind file stream to this position so it'll get re-tested on
# the next iteration, then break out of tail search
$rewindOffset = $i - $string.Length
$null = $scanner.BaseStream.Seek($rewindOffset, [System.IO.SeekOrigin]::Current)
break
}
}
}
} until ($eof)
}
finally {
# remember to clean up after searching each file
$scanner, $fileStream |Where-Object { $_ -is [System.IDisposable] } |ForEach-Object Dispose
}
}
}
}
Now you can extract exactly 1000 characters after a substring is found with minimal memory allocation:
Get-ChildItem P*.txt |Find-SubstringWithPostContext -Substring '<ID>9671510841'

I haven't tested this enough to tell you if it works properly but it definitely was something fun to code. -Context here will give you the context based on characters before and after instead of lines. You can give it a try and let me know if it worked :)
Usage:
Get-ChildItem *.txt | Find-String -Pattern 'mypattern'
Get-ChildItem *.txt | Find-String -Pattern 'mypattern' -Context 20, 20
Get-ChildItem *.txt | Find-String -Pattern 'mypattern' -AllMatches
using namespace System.Text.RegularExpressions
using namespace System.IO
function Find-String {
param(
[parameter(ValueFromPipeline, Mandatory)]
[Alias('PSPath')]
[FileInfo]$Path,
[parameter(Mandatory, Position = 0)]
[string]$Pattern,
[RegexOptions]$Options = 'IgnoreCase',
[switch]$AllMatches,
[int[]]$Context
)
process
{
$re = [regex]::new($Pattern, $Options)
$content = [File]::ReadAllText($Path)
$match = if($AllMatches.IsPresent)
{
$re.Matches($content)
}
else
{
$re.Match($content)
}
if($match.Success -notcontains $true) { return }
foreach($m in $match)
{
$out = [ordered]#{
Path = $path.FullName
Value = $m.Value
Index = $m.Index
Length = $m.Length
}
if($PSBoundParameters.ContainsKey('Context'))
{
$before = $m.Index
$after = $m.Index + $m.Length
$contextBefore = $Context[0]
$contextAfter = $Context[1]
while($contextBefore-- -and $before)
{
$before--
}
while($contextAfter-- -and $after -lt $content.Length)
{
$after++
}
$out.Context = (-join $content[$before..$after]).Trim()
}
[pscustomobject]$out
}
}
}

How to loop through arrays in hash table - passing parameters based on values read from a CSV file

Curious about how to loop through a hash table where each value is an array. Example:
$test = #{
a = "a","1";
b = "b","2";
c = "c","3";
}
Then I would like to do something like:
foreach ($T in $test) {
write-output $T
}
Expected result would be something like:
name value
a a
b b
c c
a 1
b 2
c 3
That's not what currently happens and my use case is to basically pass a hash of parameters to a function in a loop. My approach might be all wrong, but figured I would ask and see if anyone's tried to do this?
Edit**
A bit more clarification. What I'm basically trying to do is pass a lot of array values into a function and loop through those in the hash table prior to passing to a nested function. Example:
First something like:
$parameters = import-csv .\NewComputers.csv
Then something like
$parameters | New-LabVM
Lab VM Code below:
function New-LabVM
{
[CmdletBinding()]
Param (
# Param1 help description
[Parameter(Mandatory=$true,
Position=0,
ValueFromPipeline=$true,
ValueFromPipelineByPropertyName=$true)]
[Alias("p1")]
[string[]]$ServerName,
# Param2 help description
[Parameter(Position = 1)]
[int[]]$RAM = 2GB,
# Param3 help description
[Parameter(Position=2)]
[int[]]$ServerHardDriveSize = 40gb,
# Parameter help description
[Parameter(Position=3)]
[int[]]$VMRootPath = "D:\VirtualMachines",
[Parameter(Position=4)]
[int[]]$NetworkSwitch = "VM Switch 1",
[Parameter(Position=4)]
[int[]]$ISO = "D:\ISO\Win2k12.ISO"
)
process
{
New-Item -Path $VMRootPath\$ServerName -ItemType Directory
$Arguments = #{
Name = $ServerName;
MemoryStartupBytes = $RAM;
NewVHDPath = "$VMRootPath\$ServerName\$ServerName.vhdx";
NewVHDSizeBytes = $ServerHardDriveSize
SwitchName = $NetworkSwitch;}
foreach ($Argument in $Arguments){
# Create Virtual Machines
New-VM #Arguments
# Configure Virtual Machines
Set-VMDvdDrive -VMName $ServerName -Path $ISO
Start-VM $ServerName
}
# Create Virtual Machines
New-VM #Arguments
}
}

What you're looking for is parameter splatting.
The most robust way to do that is via hashtables, so you must convert the custom-object instances output by Import-Csv to hashtables:
Import-Csv .\NewComputers.csv | ForEach-Object {
# Convert the custom object at hand to a hashtable.
$htParams = #{}
$_.psobject.properties | ForEach-Object { $htParams[$_.Name] = $_.Value }
# Pass the hashtable via splatting (#) to the target function.
New-LabVM #htParams
}
Note that since parameter binding via splatting is key-based (the hashtable keys are matched against the parameter names), it is fine to use a regular hashtable with its unpredictable key ordering (no need for an ordered hashtable ([ordered] #{ ... }) in this case).

Try this:
for($i=0;$i -lt $test.Count; $i++)
{$test.keys | %{write-host $test.$_[$i]}}
Weirdly, it outputs everything in the wrong order (because $test.keys outputs it backwards).
EDIT: Here's your solution.
Using the [System.Collections.Specialized.OrderedDictionary] type, you guarantee that the output will come out the same order as you entered it.
$test = [ordered] #{
a = "a","1";
b = "b","2";
c = "c","3";
}
After running the same solution code as before, you get exactly the output you wanted.

how to write streaming function in powershell

I tried to create a function that emulates Linux's head:
Function head( )
{
[CmdletBinding()]
param (
[parameter(mandatory=$false, ValueFromPipeline=$true)] [Object[]] $inputs,
[parameter(position=0, mandatory=$false)] [String] $liness = "10",
[parameter(position=1, ValueFromRemainingArguments=$true)] [String[]] $filess
)
$lines = 0
if (![int]::TryParse($liness, [ref]$lines)) {
$lines = 10
$filess = ,$liness + (#{$true=#();$false=$filess}[$null -eq $filess])
}
$read = 0
$input | select-object -First $lines
if ($filess) {
get-content -TotalCount $lines $filess
}
}
The problem is that this will actually read all the content (whether by reading $filess or from $input) and then print the first, where I'd want head to read the first lines and forget about the rest so it can work with large files.
How can this function be rewritten?

Well, as far as I know, you are overdoing it slightly...
"Beginning in Windows PowerShell 3.0, Select-Object includes an optimization feature that prevents commands from creating and processing objects that are not used. When you include a Select-Object command with the First or Index parameter in a command pipeline, Windows PowerShell stops the command that generates the objects as soon as the selected number of objects is generated, even when the command that generates the objects appears before the Select-Object command in the pipeline. To turn off this optimizing behavior, use the Wait parameter."
So all you need to do is:
Get-Content -Path somefile | Select-Object -First 10 #or pass a variable

Simulating `ls` in Powershell

I'm trying to get something that looks like UNIX ls output in PowerShell. This is getting there:
Get-ChildItem | Format-Wide -AutoSize -Property Name
but it's still outputting the items in row-major instead of column-major order:
PS C:\Users\Mark Reed> Get-ChildItem | Format-Wide -AutoSize -Property Name
Contacts Desktop Documents Downloads Favorites
Links Music Pictures Saved Games
Searches Videos
Desired output:
PS C:\Users\Mark Reed> My-List-Files
Contacts Downloads Music Searches
Desktop Favorites Pictures Videos
Documents Links Saved Games
The difference is in the sorting: 1 2 3 4 5/6 7 8 9 reading across the lines, vs 1/2/3 4/5/6 7/8/9 reading down the columns.
I already have a script that will take an array and print it out in column-major order using Write-Host, though I found a lot of PowerShellish idiomatic improvements to it by reading Keith's and Roman's takes. But my impression from reading around is that's the wrong way to go about this. Instead of calling Write-Host, a script should output objects, and let the formatters and outputters take care of getting the right stuff written to the user's console.
When a script uses Write-Host, its output is not capturable; if I assign the result to a variable, I get a null variable and the output is written to the screen anyway. It's like a command in the middle of a UNIX pipeline writing directly to /dev/tty instead of standard output or even standard error.
Admittedly, I may not be able to do much with the array of Microsoft.PowerShell.Commands.Internal.Format.* objects I get back from e.g. Format-Wide, but at least it contains the output, which doesn't show up on my screen in rogue fashion, and which I can recreate at any time by passing the array to another formatter or outputter.

This is a simple-ish function that formats column major. You can do this all in PowerShell Script:
function Format-WideColMajor {
[CmdletBinding()]
param(
[Parameter(ValueFromPipeline)]
[AllowNull()]
[AllowEmptyString()]
[PSObject]
$InputObject,
[Parameter()]
$Property
)
begin {
$list = new-object System.Collections.Generic.List[PSObject]
}
process {
$list.Add($InputObject)
}
end {
if ($Property) {
$output = $list | Foreach {"$($_.$Property)"}
}
else {
$output = $list | Foreach {"$_"}
}
$conWidth = $Host.UI.RawUI.BufferSize.Width - 1
$maxLen = ($output | Measure-Object -Property Length -Maximum).Maximum
$colWidth = $maxLen + 1
$numCols = [Math]::Floor($conWidth / $colWidth)
$numRows = [Math]::Ceiling($output.Count / $numCols)
for ($i=0; $i -lt $numRows; $i++) {
$line = ""
for ($j = 0; $j -lt $numCols; $j++) {
$item = $output[$i + ($j * $numRows)]
$line += "$item$(' ' * ($colWidth - $item.Length))"
}
$line
}
}
}

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse