Why my Out-File does not write output into the file - powershell

$ready = Read-Host "How many you want?: "
$i = 0
do{
(-join(1..12 | ForEach {((65..90)+(97..122)+(".") | % {[char]$_})+(0..9)+(".") | Get-Random}))
$i++
} until ($i -match $ready) Out-File C:/numbers.csv -Append
If I give a value of 10 to the script - it will generate 10 random numbers and shows it on pshell. It even generates new file called numbers.csv. However, it does not add the generated output to the file. Why is that?

Your Out-File C:/numbers.csv -Append call is a completely separate statement from your do loop, and an Out-File call without any input simply creates an empty file.[1]
You need to chain (connect) commands with | in order to make them run in a pipeline.
However, with a statement such as as a do { ... } until loop, this won't work as-is, but you can convert such a statement to a command that you can use as part of a pipeline by enclosing it in a script block ({ ... }) and invoking it with &, the call operator (to run in a child scope), or ., the member-access operator (to run directly in the caller's scope):
[int] $ready = Read-Host "How many you want?"
$i = 0
& {
do{
-join (1..12 | foreach {
(65..90 + 97..122 + '.' | % { [char] $_ }) +(0..9) + '.' | Get-Random
})
$i++
} until ($i -eq $ready)
} | Out-File C:/numbers.csv -Append
Note the [int] type constraint to convert the Read-Host output, which is always a string, to a number, and the use of the -eq operator rather than the text- and regex-based -match operator in the until condition; also, unnecessary grouping with (...) has been removed.
Note: An alternative to the use of a script block with either the & or . operator is to use $(...), the subexpression operator, as shown in MikeM's helpful answer. The difference between the two approaches is that the former streams its output to the pipeline - i.e., outputs objects one by one - whereas $(...) invariably collects all output in memory, up front.
For smallish input sets this won't make much of a difference, but the in-memory collection that $(...) performs can become problematic with large input sets, so the & { ... } / . { ... } approach is generally preferable.
Arno van Boven' answer shows a simpler alternative to your do ... until loop based on a for loop.
Combining a foreach loop with .., the range operator, is even more concise and expressive (and the cost of the array construction is usually negligible and overall still amounts to noticeably faster execution):
[int] $ready = Read-Host "How many you want?"
& {
foreach ($i in 1..$ready) {
-join (1..12 | foreach {
([char[]] (65..90 + 97..122)) + 0..9 + '.' | Get-Random
})
}
} | Out-File C:/numbers.csv -Append
The above also shows a simplification of the original command via a [char[]] cast that directly converts an array of code points to an array of characters.
In PowerShell [Core] 7+, you could further simplify by taking advantage of Get-Random's -Count parameter:
[int] $ready = Read-Host "How many you want?"
& {
foreach ($i in 1..$ready) {
-join (
([char[]] (65..90 + 97..122)) + 0..9 + '.' | Get-Random -Count 12
)
}
} | Out-File C:/numbers.csv -Append
And, finally, you could have avoided a statement for looping altogether, and used the ForEach-Object cmdlet instead (whose built-in alias, perhaps confusingly, is also foreach, but there'a also %), as you're already doing inside your loop (1..12 | foreach ...):
[int] $ready = Read-Host "How many you want?"
1..$ready | ForEach-Object {
-join (1..12 | ForEach-Object {
([char[]] (65..90 + 97..122)) + 0..9 + '.' | Get-Random
})
} | Out-File C:/numbers.csv -Append
[1] In Windows PowerShell, Out-File uses UTF-16LE ("Unicode") encoding by default, so even a conceptually empty file still contains 2 bytes, namely the UTF-16LE BOM. In PowerShell [Core] v6+, BOM-less UTF-8 is the default across all cmdlets, so there you'll truly get an empty (0 bytes) file.

Another way is to wrap the loop in a sub-expression and pipe it:
$ready = Read-Host "How many you want?: "
$i = 0
$(do{
(-join(1..12 | ForEach {((65..90)+(97..122)+(".") | % {[char]$_})+(0..9)+(".") | Get-Random}))
$i++
} until ($i -match $ready)) | Out-File C:/numbers.csv -Append

I personally avoid Do loops when I can, because I find them hard to read. Combining the two previous answers, I'd write it like this, because I find it easier to tell what is going on. Using a for loop instead, every line becomes its own self-contained piece of logic.
[int]$amount = Read-Host "How many you want?: "
& {
for ($i = 0; $i -lt $amount; $i++) {
-join(1..12 | foreach {((65..90)+(97..122)+(".") | foreach {[char]$_})+(0..9)+(".") | Get-Random})
}
} | Out-File C:\numbers.csv -Append
(Please do not accept this as an answer, this is just showing another way of doing it)

Related

Join every other line in Powershell

I want to combine every other line from the input below. Here is the input.
ALPHA-FETOPROTEIN ROUTINE CH 0203 001 02/03/2023#10:45 LIVERF3
###-##-#### #######,#### In lab
ALPHA-FETOPROTEIN ROUTINE CH 0203 234 02/03/2023#11:05 LIVER
###-##-#### ########,######## In lab
ANION GAP STAT CH 0203 124 02/03/2023#11:06 DAY
###-##-#### ######,##### #### In lab
BASIC METABOLIC PANE ROUTINE CH 0203 001 02/03/2023#10:45 LIVERF3
###-##-#### #######,#### ###### In lab
This is the desired output
ALPHA-FETOPROTEIN ROUTINE CH 0203 001 02/03/2023#10:45 LIVERF3 ###-##-#### #######,#### In lab
ALPHA-FETOPROTEIN ROUTINE CH 0203 234 02/03/2023#11:05 LIVER ###-##-#### ########,######## In lab
ANION GAP STAT CH 0203 124 02/03/2023#11:06 DAY ###-##-#### ######,##### #### In lab
BASIC METABOLIC PANE ROUTINE CH 0203 001 02/03/2023#10:45 LIVERF3 ###-##-#### #######,#### ###### In lab
The code that I have tried is
for($i = 0; $i -lt $splitLines.Count; $i += 2){
$splitLines[$i,($i+1)] -join ' '
}
It came from Joining every two lines in Powershell output. But I can't seem to get it to work for me. I'm not well versed with powershell, but i'm at the mercy of what's available at work.
Edit: Here is the entire code that I am using as requested.
# SET VARIABLES
$inputfile = "C:\Users\Will\Desktop\testfile.txt"
$outputfile = "C:\Users\Will\Desktop\testfileformatted.txt"
$new_output = "C:\Users\Will\Desktop\new_formatted.txt"
# REMOVE EXTRA CHARACTERS
$remove_beginning_capture = "-------------------------------------------------------------------------------"
$remove_end_capture = "==============================================================================="
$remove_line = "------"
$remove_strings_with_spaces = " \d"
Get-Content $inputfile | Where-Object {$_ -notmatch $remove_beginning_capture} | Where-Object {$_ -notmatch $remove_end_capture} | Where-Object {$_ -notmatch $remove_line} | Where-Object {$_ -notmatch $remove_strings_with_spaces} | ? {$_.trim() -ne "" } | Set-Content $outputfile
# Measures line length for loop
$file_lines = gc $outputfile | Measure-Object
#Remove Whitespace
# $whitespace_removed = (Get-Content $outputfile -Raw) -replace '\s+', ' '| Set-Content -Path C:\Users\Will\Desktop\new_formatted.csv
# Combine every other line
$lines = Get-Content $outputfile -Raw
$newcontent = $lines.Replace("`n","")
Write-Host "Content: $newcontent"
$newcontent | Set-Content $new_output
for($i = 0; $i -lt $splitLines.Count; $i += 2){
$splitLines[$i,($i+1)] -join ' '
}
Just read two lines and then print one
$inputFilename = "c:\temp\test.txt"
$outputFilename = "c:\temp\test1.txt"
$reader = [System.IO.StreamReader]::new($inputFilename)
$writer = [System.IO.StreamWriter]::new($outputFilename)
while(($line = $reader.ReadLine()) -ne $null)
{
$secondLine = ""
if(!$reader.EndOfStream){ $secondLine = $reader.ReadLine() }
$writer.WriteLine($line + $secondLine)
}
$reader.Close()
$writer.Flush()
$writer.Close()
PowerShell-idiomatic solutions:
Use Get-Content with -ReadCount 2 in order to read the lines from your file in pairs, which allows you to process each pair in a ForEach-Object call, where the constituent lines can be joined to form a single output line.
Get-Content -ReadCount 2 yourFile.txt |
ForEach-Object { $_[0] + ' ' + $_[1].TrimStart() }
The above directly outputs the resulting lines (as the for command in your question does), causing them to print to the display by default.
Pipe to Set-Content to save the output to a file:
Get-Content -ReadCount 2 yourFile.txt |
ForEach-Object { $_[0] + ' ' + $_[1].TrimStart() } |
Set-Content yourOutputFile.txt
Performance notes:
Unfortunately (as of PowerShell 7.3.2), Get-Content is quite slow by default - see GitHub issue #7537, and the performance of ForEach-Object and Where-Object could be improved too - see GitHub issue #10982.
At the expense of collecting all inputs and outputs in memory first, you can noticeably improve the performance with the following variation, which avoids the ForEach-Object cmdlet in favor of the intrinsic .ForEach() method, and, instead of piping to Set-Content, passes all output lines via the -Value parameter:
Set-Content $tempOutFile -Value (
(Get-Content -ReadCount 2 $tempInFile).ForEach({ $_[0] + ' ' + $_[1].TrimStart() })
)
Read on for even faster alternatives, but remember that optimizations are only worth undertaking if actually needed - if the first PowerShell-idiomatic solution above is fast enough in practice, it is worth using for its conceptual elegance and concision.
See this Gist for benchmarks that compare the relative performance of the solutions in this answer as well as that of the solution from jdweng's .NET API-based answer.
An better-performing alternative is to use a switch statement with the -File parameter to process files line by line:
$i = 1
switch -File yourFile.txt {
default {
if ($i++ % 2) { $firstLineInPair = $_ }
else { $firstLineInPair + ' ' + $_.TrimStart() }
}
}
Helper index variable $i and the modulo operation (%) are simply used to identify which line is the start of a (new) pair, and which one is its second half.
The switch statement is itself streaming, but it cannot be used as-is as pipeline input. By enclosing it in & { ... }, it can, but that forfeits some of the performance benefits, making it only marginally faster than the optimized Get-Content -ReadCount 2 solution:
& {
$i = 1
switch -File yourFile.txt {
default {
if ($i++ % 2) { $firstLineInPair = $_ }
else { $firstLineInPair + ' ' + $_.TrimStart() }
}
}
} | Set-Content yourOutputFile.txt
For the best performance when writing to a file, use Set-Content $outFile -Value $(...), albeit at the expense of collecting all output lines in memory first:
Set-Content yourOutputFile.txt -Value $(
$i = 1
switch -File yourFile.txt {
default {
if ($i++ % 2) { $firstLineInPair = $_ }
else { $firstLineInPair + ' ' + $_.TrimStart() }
}
}
)
The fastest and most concise solution is to use a regex-based approach, which reads the entire file up front:
(Get-Content -Raw yourFile.txt) -replace '(.+)\r?\n(?: *)(.+\r?\n)', '$1 $2'
Note:
The assumption is that all lines are paired, and that the last line has a trailing newline.
The -replace operation matches two consecutive lines, and joins them together with a space, ignoring leading spaces on the second line. For a detailed explanation of the regex and the ability to interact with it, see this regex101.com page.
To save the output to a file, you can pipe directly to Set-Content:
(Get-Content -Raw yourFile.txt) -replace '(.+)\r?\n(?: *)(.+\r?\n)', '$1 $2' |
Set-Content yourOutputFile.txt
In this case, because the pipeline input to Set-Content is provided by an expression that doesn't involve for-every-input-line calls to script blocks ({ ... }) (as the switch solution requires), there is virtually no slowdown resulting from use of the pipeline (whose use is generally preferable for conceptual elegance and concision).
As for what you tried:
The $splitLines-based solution in your question is predicated on having assigned all lines of the input file to this self-chosen variable as an array, which your code does not do.
While you could fill variable $splitLines with an array of lines from your input file with $splitLines = Get-Content yourFile.txt, given that Get-Content reads text files line by line by default, the switch-based line-by-line solution is more efficient and streams its results (which - if saved to a file - keeps memory usage constant, which matters with large input sets (though rarely with text files)).
A performance tip when reading all lines at once into an array with Get-Content: use -ReadCount 0, which greatly speeds up the operation:
$splitLines = Get-Content -ReadCount 0 yourFile.txt

A better way to slice an array (or a list) in powershell

How can i export mails adresses in CSV files in a range of 30 users for each one.
I have already try this
$users = Get-ADUser -Filter * -Properties Mail
$nbCsv = [int][Math]::Ceiling($users.Count/30)
For($i=0; $i -le $nbCsv; $i++){
$arr=#()
For($j=(0*$i);$j -le ($i + 30);$j++){
$arr+=$users[$j]
}
$arr|Export-Csv -Path ($PSScriptRoot + "\ASSFAM" + ("{0:d2}" -f ([int]$i)) + ".csv") -Delimiter ";" -Encoding UTF8 -NoTypeInformation
}
It works but, i think there is a better way to achieve this task.
Have you got some ideas ?
Thank you.
If you want a subset of an array, you can just use .., the range operator. The first 30 elements of an array would be:
$users[0..29]
Typically, you don't have to worry about going past the end of the array, either (see mkelement0's comment below). If there are 100 items and you're calling $array[90..119], you'll get the last 10 items in the array and no error. You can use variables and expressions there, too:
$users[$i..($i + 29)]
That's the $ith value and the next 29 values after the $ith value (if they exist).
Also, this pattern should be avoided in PowerShell:
$array = #()
loop-construct {
$array += $value
}
Arrays are immutable in .Net, and therefore immutable in PowerShell. That means that adding an element to an array with += means "create a brand new array, copy every element over, and then put this one new item on it, and then delete the old array." It generates tremendous memory pressure, and if you're working with more than a couple hundred items it will be significantly slower.
Instead, just do this:
$array = loop-construct {
$value
}
Strings are similarly immutable and have the same problem with the += operator. If you need to build a string via concatenation, you should use the StringBuilder class.
Ultimately, however, here is how I would write this:
$users = Get-ADUser -Filter * -Properties Mail
$exportFileTemplate = Join-Path -Path $PSScriptRoot -ChildPath 'ASSFAM{0:d2}.csv'
$batchSize = 30
$batchNum = 0
$row = 0
while ($row -lt $users.Count) {
$users[$row..($row + $batchSize - 1)] | Export-Csv ($exportFileTemplate -f $batchNum) -Encoding UTF8 -NoTypeInformation
$row += $batchSize
$batchNum++
}
$row and $batchNum could be rolled into one variable, technically, but this is a bit more readable, IMO.
I'm sure you could also write this with Select-Object and Group-Object, but that's going to be fairly complicated compared to the above and Group-Object isn't entirely known for it's performance prior to PowerShell 6.
If you are using Powershell's strict mode, which may be required in certain configurations or scenarios, then you'll need to check that you don't enumerate past the end of the array. You could do that with:
while ($row -lt $users.Count) {
$users[$row..([System.Math]::Min(($row + $batchSize - 1),($users.Count - 1)))] | Export-Csv ($exportFileTemplate -f $batchNum) -Encoding UTF8 -NoTypeInformation
$row += $batchSize
$batchNum++
}
I believe that's correct, but I may have an off-by-one error in that logic. I have not fully tested it.
Bacon Bits' helpful answer shows how to simplify your code with the help of .., the range operator, but it would be nice to have a general-purpose chunking (partitioning, batching) mechanism; however, as of PowerShell 7.0, there is no built-in feature.
GitHub feature suggestion #8270 proposes adding a -ReadCount <int> parameter to Select-Object, analogous to the parameter of the same name already defined for Get-Content.
If you'd like to see this feature implemented, show your support for the linked issue there.
With that feature in place, you could do the following:
$i = 0
Get-ADUser -Filter * -Properties Mail |
Select-Object -ReadCount 30 | # WISHFUL THINKING: output 30-element arrays
ForEach-Object {
$_ | Export-Csv -Path ($PSScriptRoot + "\ASSFAM" + ("{0:d2}" -f ++$i) + ".csv") -Delimiter ";" -Encoding UTF8 -NoTypeInformation
}
In the interim, you could use custom function Select-Chunk (source code below): replace Select-Object -ReadCount 30 with Select-Chunk -ReadCount 30 in the snippet above.
Here's a simpler demonstration of how it works:
PS> 1..7 | Select-Chunk -ReadCount 3 | ForEach-Object { "$_" }
1 2 3
4 5 6
7
The above shows that the ForEach-Object script block receive the following
three arrays, via $_, in sequence:
1, 2, 3, 4, 5, 6, and , 7
(When you stringify an array, by default you get a space-separated list of its elements; e.g., "$(1, 2, 3)" yields 1 2 3).
Select-Chunk source code:
The implementation uses a [System.Collections.Generic.Queue[object]] instance to collect the inputs in batches of fixed size.
function Select-Chunk {
<#
.SYNOPSIS
Chunks pipeline input.
.DESCRIPTION
Chunks (partitions) pipeline input into arrays of a given size.
By design, each such array is output as a *single* object to the pipeline,
so that the next command in the pipeline can process it as a whole.
That is, for the next command in the pipeline $_ contains an *array* of
(at most) as many elements as specified via -ReadCount.
.PARAMETER InputObject
The pipeline input objects binds to this parameter one by one.
Do not use it directly.
.PARAMETER ReadCount
The desired size of the chunks, i.e., how many input objects to collect
in an array before sending that array to the pipeline.
0 effectively means: collect *all* inputs and output a single array overall.
.EXAMPLE
1..7 | Select-Chunk 3 | ForEach-Object { "$_" }
1 2 3
4 5 6
7
The above shows that the ForEach-Object script block receive the following
three arrays: (1, 2, 3), (4, 5, 6), and (, 7)
#>
[CmdletBinding(PositionalBinding = $false)]
[OutputType([object[]])]
param (
[Parameter(ValueFromPipeline)]
$InputObject
,
[Parameter(Mandatory, Position = 0)]
[ValidateRange(0, [int]::MaxValue)]
[int] $ReadCount
)
begin {
$q = [System.Collections.Generic.Queue[object]]::new($ReadCount)
}
process {
$q.Enqueue($InputObject)
if ($q.Count -eq $ReadCount) {
, $q.ToArray()
$q.Clear()
}
}
end {
if ($q.Count) {
, $q.ToArray()
}
}
}

powershell get the sum of a specific substring position

How can I get the sum of a file from a substring and placing the sum on a specific position (different line) using powershell if have the following conditions:
Get the sum of the numbers from position 3 to 13 of a line that is starting with a character D. Place the sum on position 10 to 14 on the line that starts with the S
So for example, if i have this file:
F123trial text
DA00000038.95==xxx11
DA00000018.95==yyy11
DA00000018.95==zzzyy
S xxxxx
I want to get the sum of 38.95, 18.95 and 18.95 and then place the sum on position xxxxx under the line that starts with the S.
PowerShell's switch statement has powerful, but little-known features that allow you to iterate over the lines of a file (-file) and match lines by regular expressions (-regex).
Not only is switch -file convenient, it is also much faster than using cmdlets in a pipeline (see bottom section).
[double] $sum = 0
switch -regex -file file.txt {
# Note: The string to the left of each script block below ({ ... }),
# e.g., '^D', is the regex to match each line against.
# Inside the script blocks, $_ refers to the input line at hand.
# Extract number, add to sum, output the line.
'^D' { $sum += $_.Substring(2, 11); $_; continue }
# Summary line: place sum at character position 10, with 0-padding
# Note: `-replace ',', '.'` is only needed if your culture uses "," as the
# decimal mark.
'^S' { $_.Substring(0, 9) + '{0:000000000000000.00}' -f $sum -replace ',', '.'; continue }
# All other lines: pass them through.
default { $_ }
}
Note:
continue in the script blocks short-circuits further matching for the line at hand; by contrast, if you used break, no further lines would be processed.
Based on a later comment, I'm assuming you want an 18-character 0-left-padded number on the S line at character position 10.
With your sample file, the above yields:
F123trial text
DA00000038.95==xxx11
DA00000018.95==yyy11
DA00000018.95==zzzyy
S 000000000000076.85
Optional reading: Comparing the performance of switch -file ... to Get-Content ... | ForEach-Object ...
Running the following test script:
& {
# Create a sample file with 100K lines.
1..1e5 > ($tmpFile = [IO.Path]::GetTempFileName())
(Measure-Command { switch -file ($tmpFile) { default { $_ } } }).TotalSeconds,
(Measure-Command { get-content $tmpFile | % { $_ } }).TotalSeconds
Remove-Item $tmpFile
}
yields the following timings on my machine, for instance (the absolute numbers aren't important, but their ratio should give you a sense):
0.0578924 # switch -file
6.0417638 # Get-Content | ForEach-Object
That is, the pipeline-based solution is about 100 (!) times slower than the switch -file solution.
Digging deeper:
Frode F. points out that Get-Content is slow with large files - though its convenience makes it a popular choice - and mentions using the .NET Framework directly as an alternative:
Using [System.IO.File]::ReadAllLines(); however, given that it reads the entire file into memory, that is only an option with smallish files.
Using [System.IO.StreamReader]'s ReadLine() method in a loop.
However, use of the pipeline in itself, irrespective of the specific cmdlets used, introduces overhead. When performance matters - but only then - you should avoid it.
Here's an updated test that includes commands that use the .NET Framework methods, with and without the pipeline (the use of intrinsic .ForEach() method requires PSv4+):
& {
# Create a sample file with 100K lines.
1..1e5 > ($tmpFile = [IO.Path]::GetTempFileName())
(Measure-Command { switch -file ($tmpFile) { default { $_ } } }).TotalSeconds
(Measure-Command { foreach ($line in [IO.File]::ReadLines((Convert-Path $tmpFile))) { $line } }).TotalSeconds
(Measure-Command {
$sr = [IO.StreamReader] (Convert-Path $tmpFile)
while(-not $sr.EndOfStream) { $sr.ReadLine() }
$sr.Close()
}).TotalSeconds
(Measure-Command { [IO.File]::ReadAllLines((Convert-Path $tmpFile)).ForEach({ $_ }) }).TotalSeconds
(Measure-Command { [IO.File]::ReadAllLines((Convert-Path $tmpFile)) | % { $_ } }).TotalSeconds
(Measure-Command { Get-Content $tmpFile | % { $_ } }).TotalSeconds
Remove-Item $tmpFile
}
Sample results, from fastest to slowest:
0.0124441 # switch -file
0.0365348 # [System.IO.File]::ReadLine() in foreach loop
0.0481214 # [System.IO.StreamReader] in a loop
0.1614621 # [System.IO.File]::ReadAllText() with .ForEach() method
0.2745749 # (pipeline) [System.IO.File]::ReadAllText() with ForEach-Object
0.5925222 # (pipeline) Get-Content with ForEach-Object
switch -file is the fastest by a factor of around 3, followed by the no-pipeline .NET solutions; using .ForEach() adds another factor of 3.
Simply introducing the pipeline (ForEach-Object instead of .ForEach()) adds another factor of 2; finally, using the pipeline with Get-Content and ForEach-Object adds another factor of 2.
You could try:
-match to find the lines using regex-pattern
The .NET string-method Substring() to extract the values from the "D"-lines
Measure-Object -Sum to calculate the sum
-replace to insert the value (searches using regex-pattern).
Ex:
$text = Get-Content -Path file.txt
$total = $text -match '^D' |
#Foreach "D"-line, extract the value and cast to double (to be able to sum it)
ForEach-Object { $_.Substring(2,11) -as [double] } |
#Measure the sum
Measure-Object -Sum | Select-Object -ExpandProperty Sum
$text | ForEach-Object {
if($_ -match '^S') {
#Line starts with S -> Insert sum
$_.SubString(0,(17-$total.Length)) + $total + $_.SubString(17)
} else {
#Not "S"-line -> output original content
$_
}
} | Set-Content -Path file.txt

While loop does not produce pipeline output

It appears that a While loop does not produce an output that can continue in the pipeline. I need to process a large (many GiB) file. In this trivial example, I want to extract the second field, sort on it, then get only the unique values. What am I not understanding about the While loop and pushing things through the pipeline?
In the *NIX world this would be a simple:
cut -d "," -f 2 rf.txt | sort | uniq
In PowerShell this would be not quite as simple.
The source data.
PS C:\src\powershell> Get-Content .\rf.txt
these,1,there
lines,3,paragraphs
are,2,were
The script.
PS C:\src\powershell> Get-Content .\rf.ps1
$sr = New-Object System.IO.StreamReader("$(Get-Location)\rf.txt")
while ($line = $sr.ReadLine()) {
Write-Verbose $line
$v = $line.split(',')[1]
Write-Output $v
} | sort
$sr.Close()
The output.
PS C:\src\powershell> .\rf.ps1
At C:\src\powershell\rf.ps1:7 char:3
+ } | sort
+ ~
An empty pipe element is not allowed.
+ CategoryInfo : ParserError: (:) [], ParseException
+ FullyQualifiedErrorId : EmptyPipeElement
Making it a bit more complicated than it needs to be. You have a CSV without headers. The following should work:
Import-Csv .\rf.txt -Header f1,f2,f3 | Select-Object -ExpandProperty f2 -Unique | Sort-Object
Nasir's workaround looks like the way to go here.
If you want to know what was going wrong in your code, the answer is that while loops (and do/while/until loops) don't consistently return values to the pipeline the way that other statements in PowerShell do (actually that is true, and I'll keep the examples of that, but scroll down for the real reason it wasn't working for you).
ForEach-Object -- a cmdlet, not a built-in language feature/statement; does return objects to the pipeline.
1..3 | % { $_ }
foreach -- statement; does return.
foreach ($i in 1..3) { $i }
if/else -- statement; does return.
if ($true) { 1..3 }
for -- statement; does return.
for ( $i = 0 ; $i -le 3 ; $i++ ) { $i }
switch -- statement; does return.
switch (2)
{
1 { 'one' }
2 { 'two' }
3 { 'three' }
}
But for some reason, these other loops seem to act unpredictably.
Loops forever, returns $i (0 ; no incrementing going on).
$i = 0; while ($i -le 3) { $i }
Returns nothing, but $i does get incremented:
$i = 0; while ($i -le 3) { $i++ }
If you wrap the expression inside in parentheses, it seems it does get returned:
$i = 0; while ($i -le 3) { ($i++) }
But as it turns out (I'm learning a bit as I go here), while's strange return semantics have nothing to do with your error; you just can't pipe statements into functions/cmdlets, regardless of their return value.
foreach ($i in 1..3) { $i } | measure
will give you the same error.
You can "get around" this by making the entire statement a sub-expression with $():
$( foreach ($i in 1..3) { $i } ) | measure
That would work for you in this case. Or in your while loop instead of using Write-Output, you could just add your item to an array and then sort it after:
$arr = #()
while ($line = $sr.ReadLine()) {
Write-Verbose $line
$v = $line.split(',')[1]
$arr += $v
}
$arr | sort
I know you're dealing with a large file here, so maybe you're thinking that by piping to sort line by line you'll be avoiding a large memory footprint. In many cases piping does work that way in PowerShell, but the thing about sorting is that you need the whole set to sort it, so the Sort-Object cmdlet will be "collecting" each item you pass to it anyway and then do the actual sorting in the end; I'm not sure you can avoid that at all. Admittedly letting Sort-Object do that instead of building the array yourself might be more efficient depending on how its implemented, but I don't think you'll be saving much on RAM.
other solution
Get-Content -Path C:\temp\rf.txt | select #{Name="Mycolumn";Expression={($_ -split "," )[1]}} | select Mycolumn -Unique | sort

Powershell - Trouble with Out-File - Insert a count variable into file name

I would like the file to save with the count variable between the file prefix and the file extension. Out-File (file prefix + count variable + file extension). sort of thing. I'm ridiculously new to Powershell so please be merciful :P
$max = get-content h:test1\test.txt | Measure-Object
$h = get-content h:test1\test.txt
For($i=0; $i -lt $max.count ; $i++){
select-string h:test2\*.txt -pattern $($h[$i]) | Format-List | Out-File (Join-Path h:\text $i)
}
Simply use
Out-File h:\text\$i.extension
PowerShell's parsing gives rise to a few very weird problem in edge cases, but overall it is designed to work as you would expect it to work in most, if not all, common cases.
In your case I would probably solve the problem a little differently:
Get-Content h:\test1\test.txt |
ForEach-Object { $i = 0 } {
Select-String H:\test2\*.txt -pattern $_ |
Format-List |
Out-File h:\text\$i.extension
$i ++
}
The first block for the ForEach-Object in this case is run once at the start of the loop and initialises the counter variable. I tend to avoid explicit looping constructs (such as for (x; y; z) or foreach (x in y)) in favour of pipeline solutions.
select-string h:test2\*.txt -pattern $($h[$i]) | Format-List | Out-File "h:\text$i.txt"
Like any good scripting language there are a multitude of ways to accomplish a goal, but that is one that should work for you.