While loop does not produce pipeline output - powershell

It appears that a While loop does not produce an output that can continue in the pipeline. I need to process a large (many GiB) file. In this trivial example, I want to extract the second field, sort on it, then get only the unique values. What am I not understanding about the While loop and pushing things through the pipeline?
In the *NIX world this would be a simple:
cut -d "," -f 2 rf.txt | sort | uniq
In PowerShell this would be not quite as simple.
The source data.
PS C:\src\powershell> Get-Content .\rf.txt
these,1,there
lines,3,paragraphs
are,2,were
The script.
PS C:\src\powershell> Get-Content .\rf.ps1
$sr = New-Object System.IO.StreamReader("$(Get-Location)\rf.txt")
while ($line = $sr.ReadLine()) {
Write-Verbose $line
$v = $line.split(',')[1]
Write-Output $v
} | sort
$sr.Close()
The output.
PS C:\src\powershell> .\rf.ps1
At C:\src\powershell\rf.ps1:7 char:3
+ } | sort
+ ~
An empty pipe element is not allowed.
+ CategoryInfo : ParserError: (:) [], ParseException
+ FullyQualifiedErrorId : EmptyPipeElement

Making it a bit more complicated than it needs to be. You have a CSV without headers. The following should work:
Import-Csv .\rf.txt -Header f1,f2,f3 | Select-Object -ExpandProperty f2 -Unique | Sort-Object

Nasir's workaround looks like the way to go here.
If you want to know what was going wrong in your code, the answer is that while loops (and do/while/until loops) don't consistently return values to the pipeline the way that other statements in PowerShell do (actually that is true, and I'll keep the examples of that, but scroll down for the real reason it wasn't working for you).
ForEach-Object -- a cmdlet, not a built-in language feature/statement; does return objects to the pipeline.
1..3 | % { $_ }
foreach -- statement; does return.
foreach ($i in 1..3) { $i }
if/else -- statement; does return.
if ($true) { 1..3 }
for -- statement; does return.
for ( $i = 0 ; $i -le 3 ; $i++ ) { $i }
switch -- statement; does return.
switch (2)
{
1 { 'one' }
2 { 'two' }
3 { 'three' }
}
But for some reason, these other loops seem to act unpredictably.
Loops forever, returns $i (0 ; no incrementing going on).
$i = 0; while ($i -le 3) { $i }
Returns nothing, but $i does get incremented:
$i = 0; while ($i -le 3) { $i++ }
If you wrap the expression inside in parentheses, it seems it does get returned:
$i = 0; while ($i -le 3) { ($i++) }
But as it turns out (I'm learning a bit as I go here), while's strange return semantics have nothing to do with your error; you just can't pipe statements into functions/cmdlets, regardless of their return value.
foreach ($i in 1..3) { $i } | measure
will give you the same error.
You can "get around" this by making the entire statement a sub-expression with $():
$( foreach ($i in 1..3) { $i } ) | measure
That would work for you in this case. Or in your while loop instead of using Write-Output, you could just add your item to an array and then sort it after:
$arr = #()
while ($line = $sr.ReadLine()) {
Write-Verbose $line
$v = $line.split(',')[1]
$arr += $v
}
$arr | sort
I know you're dealing with a large file here, so maybe you're thinking that by piping to sort line by line you'll be avoiding a large memory footprint. In many cases piping does work that way in PowerShell, but the thing about sorting is that you need the whole set to sort it, so the Sort-Object cmdlet will be "collecting" each item you pass to it anyway and then do the actual sorting in the end; I'm not sure you can avoid that at all. Admittedly letting Sort-Object do that instead of building the array yourself might be more efficient depending on how its implemented, but I don't think you'll be saving much on RAM.

other solution
Get-Content -Path C:\temp\rf.txt | select #{Name="Mycolumn";Expression={($_ -split "," )[1]}} | select Mycolumn -Unique | sort

Related

Outputs and easy algorithm question Object Oriented

$a = dir
foreach ($file in $a)
{
if (($file.index%2 ) -eq 0)
//Hopefully this function works, supposed to
(Ideally) print every other file
{
Write-Host $file.name
}
}
The function -eq 0... not sure if that prints out every other file. I do not know exactly how the files are numbered, or how you reference a number to the file. Do you treat every file as an object and number them? Then make a function regarding the numbers made appended to the file?
Fairly new to this, I'm used to html, css.
If you have a more proficient answer, I'm open to the idea too.
Your script almost works.
Removed alias for dir, and sorted results as requested.
The -File switch for Get-ChildItem excludes folders. I guess that's what you want, but remove it otherwise.
Since there's not an easy way to get the current position in foreach, I used a for loop instead, but it's the same idea. If you want to try with foreach, you could set a variable to true, and then not (!) it each iteration.
$Path = 'C:\yourpath'
$Files = Get-ChildItem -Path $Path -File |
Sort-Object -Property 'Name' -Descending
for ($i = 0; $i -lt $Files.Count; $i++) {
if ($i % 2 -eq 0) {
Write-Host $Files[$i].Name
}
}
If you're using this output further, it's highly recommended to write results to an object rather than the console window.
Why not simply use a for loop and increment the index counter with a value of 2?
for ($i = 0; $i -lt $a.Count; $i += 2) {
Write-Host $a[$i].Name
}

Why my Out-File does not write output into the file

$ready = Read-Host "How many you want?: "
$i = 0
do{
(-join(1..12 | ForEach {((65..90)+(97..122)+(".") | % {[char]$_})+(0..9)+(".") | Get-Random}))
$i++
} until ($i -match $ready) Out-File C:/numbers.csv -Append
If I give a value of 10 to the script - it will generate 10 random numbers and shows it on pshell. It even generates new file called numbers.csv. However, it does not add the generated output to the file. Why is that?
Your Out-File C:/numbers.csv -Append call is a completely separate statement from your do loop, and an Out-File call without any input simply creates an empty file.[1]
You need to chain (connect) commands with | in order to make them run in a pipeline.
However, with a statement such as as a do { ... } until loop, this won't work as-is, but you can convert such a statement to a command that you can use as part of a pipeline by enclosing it in a script block ({ ... }) and invoking it with &, the call operator (to run in a child scope), or ., the member-access operator (to run directly in the caller's scope):
[int] $ready = Read-Host "How many you want?"
$i = 0
& {
do{
-join (1..12 | foreach {
(65..90 + 97..122 + '.' | % { [char] $_ }) +(0..9) + '.' | Get-Random
})
$i++
} until ($i -eq $ready)
} | Out-File C:/numbers.csv -Append
Note the [int] type constraint to convert the Read-Host output, which is always a string, to a number, and the use of the -eq operator rather than the text- and regex-based -match operator in the until condition; also, unnecessary grouping with (...) has been removed.
Note: An alternative to the use of a script block with either the & or . operator is to use $(...), the subexpression operator, as shown in MikeM's helpful answer. The difference between the two approaches is that the former streams its output to the pipeline - i.e., outputs objects one by one - whereas $(...) invariably collects all output in memory, up front.
For smallish input sets this won't make much of a difference, but the in-memory collection that $(...) performs can become problematic with large input sets, so the & { ... } / . { ... } approach is generally preferable.
Arno van Boven' answer shows a simpler alternative to your do ... until loop based on a for loop.
Combining a foreach loop with .., the range operator, is even more concise and expressive (and the cost of the array construction is usually negligible and overall still amounts to noticeably faster execution):
[int] $ready = Read-Host "How many you want?"
& {
foreach ($i in 1..$ready) {
-join (1..12 | foreach {
([char[]] (65..90 + 97..122)) + 0..9 + '.' | Get-Random
})
}
} | Out-File C:/numbers.csv -Append
The above also shows a simplification of the original command via a [char[]] cast that directly converts an array of code points to an array of characters.
In PowerShell [Core] 7+, you could further simplify by taking advantage of Get-Random's -Count parameter:
[int] $ready = Read-Host "How many you want?"
& {
foreach ($i in 1..$ready) {
-join (
([char[]] (65..90 + 97..122)) + 0..9 + '.' | Get-Random -Count 12
)
}
} | Out-File C:/numbers.csv -Append
And, finally, you could have avoided a statement for looping altogether, and used the ForEach-Object cmdlet instead (whose built-in alias, perhaps confusingly, is also foreach, but there'a also %), as you're already doing inside your loop (1..12 | foreach ...):
[int] $ready = Read-Host "How many you want?"
1..$ready | ForEach-Object {
-join (1..12 | ForEach-Object {
([char[]] (65..90 + 97..122)) + 0..9 + '.' | Get-Random
})
} | Out-File C:/numbers.csv -Append
[1] In Windows PowerShell, Out-File uses UTF-16LE ("Unicode") encoding by default, so even a conceptually empty file still contains 2 bytes, namely the UTF-16LE BOM. In PowerShell [Core] v6+, BOM-less UTF-8 is the default across all cmdlets, so there you'll truly get an empty (0 bytes) file.
Another way is to wrap the loop in a sub-expression and pipe it:
$ready = Read-Host "How many you want?: "
$i = 0
$(do{
(-join(1..12 | ForEach {((65..90)+(97..122)+(".") | % {[char]$_})+(0..9)+(".") | Get-Random}))
$i++
} until ($i -match $ready)) | Out-File C:/numbers.csv -Append
I personally avoid Do loops when I can, because I find them hard to read. Combining the two previous answers, I'd write it like this, because I find it easier to tell what is going on. Using a for loop instead, every line becomes its own self-contained piece of logic.
[int]$amount = Read-Host "How many you want?: "
& {
for ($i = 0; $i -lt $amount; $i++) {
-join(1..12 | foreach {((65..90)+(97..122)+(".") | foreach {[char]$_})+(0..9)+(".") | Get-Random})
}
} | Out-File C:\numbers.csv -Append
(Please do not accept this as an answer, this is just showing another way of doing it)

PowerShell, more efficient way to find duplicate folders

I wrote a little function to scan each folder in $PSModulePath to see if duplicate folder names exist in the various paths in there (as I've found this problem happening quite often in my PowerShell environments!). I use simple logic and I was wondering if some PowerShell gurus maybe have more compact / faster / more efficient ways to achieve a sweep like this (I quite often find that those better at PowerShell seem to have 2-line solutions to something that takes me 15 lines! :-) )?
I'm just taking a path in $PSModulePath and creating an array of the subfolder names there, then looking at the subfolders of the other paths in $PSModulePath and comparing them one by one against the array that I made for the first path, and then repeating for the other paths.
function Find-ModuleDuplicates {
$hits = ""
$ModPaths = $env:PSModulePath -Split ";" -replace "\\+$", "" | sort
foreach ($i in $ModPaths) {
foreach ($j in $ModPaths) {
if ($j -notlike "*$i*") {
$arr_i = (gci $i -Dir).Name
$arr_j = (gci $j -Dir).Name
foreach ($x in $arr_j) {
if ($arr_i -contains $x) {
$hits += "Module '$x' in '$i' has a duplicate`n"
}
}
}
}
}
if ($hits -ne "") { echo "" ; echo $hits }
else { "`nNo duplicate Module folders were found`n" }
}
The following is a solution using Group-Object.
$env:PSModulePath.Split(";") | gci -Directory | group Name |
where Count -gt 1 | select Count,Name,#{ n = "ModulePath"; e = { $_.Group.Parent.FullName } }

Increment variable in PowerShell from within if statement within a foreach loop. Seems broken

I have a super easy one here but I cannot figure it out and driving me insane.
I suspect it is some kind of scoping issue of the variable (global, etc.?)
Anyway, here is my code:
$i = 0
foreach ($line in Get-Content 'somefile.txt') {
if ($i = 1) {
echo "$i Line: $line"
}
$i++
// I even tried $i = $i + 1
}
Output is $1 = 1 always. It doesn't seem to count.
How can I fix this?
$i = 1 assigns 1 to $i, so the result is always 1.
You need to use -eq: if ($i -eq 1) ...
See Get-Help about_comparison_operators for more information.
Your question isn't clear, but it looks like you're trying to output each line of your text file with the line number prepended. If so, one way to do it is:
Get-Content 'somefile.txt' |
ForEach-Object {$i = 0}{
"$i Line: $_"
$i++
}

Fastest way to parse thousands of small files in PowerShell

I have over 16000 inventory log files ranging in size from 3-5 KB on a network share.
Sample file looks like this:
## System Info
SystemManufacturer:=:Dell Inc.
SystemModel:=:OptiPlex GX620
SystemType:=:X86-based PC
ChassisType:=:6 (Mini Tower)
## System Type
isLaptop=No
I need to put them into a DB, so I started parsing them and creating a custom object for each that I can later use to check duplicates, normalize etc...
Initial parse with a code snippet as in below took about 7.5mins.
Foreach ($invlog in $invlogs) {
$content = gc $invlog.FullName -ReadCount 0
foreach ($line in $content) {
if ($line -match '^#|^\s*$') { continue }
$invitem,$value=$line -split ':=:'
[PSCustomObject]#{Name=$invitem;Value=$value}
}
}
I started optimizing it and after several trial and error ended up with this which takes 2mins and 4 secs:
Foreach ($invlog in $invlogs) {
foreach ($line in ([System.IO.File]::ReadLines("$($invlog.FullName)") -match '^\w') ) {
$invitem,$value=$line -split ':=:'
[PSCustomObject]#{name=$invitem;Value=$value} #2.04mins
}
}
I also tried using a hash instead of PSCustomObject, but to my surprise it took much longer (5mins 26secs)
Foreach ($invlog in $invlogs) {
$hash=#{}
foreach ($line in ([System.IO.File]::ReadLines("$($invlog.FullName)") -match $propertyline) ) {
$invitem,$value=$line -split ':=:'
$hash[$invitem]=$value #5.26mins
}
}
What would be the fastest method to use here?
See if this is any faster:
Foreach ($invlog in $invlogs) {
#(gc $invlog.FullName -ReadCount 0) -notmatch '^#|^\s*$' |
foreach {
$invitem,$value=$line -split ':=:'
[PSCustomObject]#{Name=$invitem;Value=$value}
}
}
The -match and -notmatch operators, when appied to an array return all the elements that satisfy the match, so you can eliminate having to test every line for the lines to exclude.
Are you really wanting to create a PS Object for every line, or just one for every file?
If you want one object per file, see if this is any quicker:
The multi-line regex eliminates the line array, and a filter is used in place of the foreach to create the hash entries.
$regex = [regex]'(?ms)^(\w+):=:([^\r]+)'
filter make-hash { #{$_.groups[1].value = $_.groups[2].value} }
Foreach ($invlog in $invlogs) {
$regex.matches([io.file]::ReadAllText($invlog.fullname)) | make-hash
}
The objective of switching to using the multi-line regex and [io.file]::ReadAllText] is to simplify what Powershell is doing with the file input internally. The result of [io.file]::ReadAllText() will be a string object, which is a much simpler type of object than the array of strings that [io.file]::ReadAllLines() will produce, and requires less overhead to counstruct internally. A filter is essentially just the Process block of a function - it will run once for every object that comes to it from the pipeline, so it emulates the action of foreach-object, but actually runs slightly faster (I don't know the internals well enough to tell you exactly why). Both of these changes require more coding and only result in a marginal increase in performace. In my testing switching to multi-line gained about .1ms per file, and changing from foreach-object to the filter another .1 ms. You probably don't see these techniques used very often because of the low return compared to the additional coding work required, but it becomes significant when you start to multiply those fractions of a ms by 160K iterations.
Try this:
Foreach ($invlog in $invlogs) {
$output = #{}
foreach ($line in ([IO.File]::ReadLines("$($invlog.FullName)") -ne '') ) {
if ($line.Contains(":=:")) {
$item, $value = $line.Split(":=:") -ne ''
$output[$item] = $value
}
}
New-Object PSObject -Property $output
}
As a general rule, Regex is sometimes cool but always slower.
Wouldn't you want an object per system, and not per key-value pair? :S
Like this.. By replacing Get-Content to the .Net method you could probably save some time.
Get-ChildItem -Filter *.txt -Path <path to files> | ForEach-Object {
$ht = #{}
Get-Content $_ | Where-Object { $_ -match ':=:' } | ForEach-Object {
$ht[($_ -split ':=:')[0].Trim()] = ($_ -split ':=:')[1].Trim()
}
[pscustomobject]$ht
}
ChassisType SystemManufacturer SystemType SystemModel
----------- ------------------ ---------- -----------
6 (Mini Tower) Dell Inc. X86-based PC OptiPlex GX620