Powershell - Trouble with Out-File - Insert a count variable into file name - powershell

I would like the file to save with the count variable between the file prefix and the file extension. Out-File (file prefix + count variable + file extension). sort of thing. I'm ridiculously new to Powershell so please be merciful :P
$max = get-content h:test1\test.txt | Measure-Object
$h = get-content h:test1\test.txt
For($i=0; $i -lt $max.count ; $i++){
select-string h:test2\*.txt -pattern $($h[$i]) | Format-List | Out-File (Join-Path h:\text $i)
}

Simply use
Out-File h:\text\$i.extension
PowerShell's parsing gives rise to a few very weird problem in edge cases, but overall it is designed to work as you would expect it to work in most, if not all, common cases.
In your case I would probably solve the problem a little differently:
Get-Content h:\test1\test.txt |
ForEach-Object { $i = 0 } {
Select-String H:\test2\*.txt -pattern $_ |
Format-List |
Out-File h:\text\$i.extension
$i ++
}
The first block for the ForEach-Object in this case is run once at the start of the loop and initialises the counter variable. I tend to avoid explicit looping constructs (such as for (x; y; z) or foreach (x in y)) in favour of pipeline solutions.

select-string h:test2\*.txt -pattern $($h[$i]) | Format-List | Out-File "h:\text$i.txt"
Like any good scripting language there are a multitude of ways to accomplish a goal, but that is one that should work for you.

Related

Why my Out-File does not write output into the file

$ready = Read-Host "How many you want?: "
$i = 0
do{
(-join(1..12 | ForEach {((65..90)+(97..122)+(".") | % {[char]$_})+(0..9)+(".") | Get-Random}))
$i++
} until ($i -match $ready) Out-File C:/numbers.csv -Append
If I give a value of 10 to the script - it will generate 10 random numbers and shows it on pshell. It even generates new file called numbers.csv. However, it does not add the generated output to the file. Why is that?
Your Out-File C:/numbers.csv -Append call is a completely separate statement from your do loop, and an Out-File call without any input simply creates an empty file.[1]
You need to chain (connect) commands with | in order to make them run in a pipeline.
However, with a statement such as as a do { ... } until loop, this won't work as-is, but you can convert such a statement to a command that you can use as part of a pipeline by enclosing it in a script block ({ ... }) and invoking it with &, the call operator (to run in a child scope), or ., the member-access operator (to run directly in the caller's scope):
[int] $ready = Read-Host "How many you want?"
$i = 0
& {
do{
-join (1..12 | foreach {
(65..90 + 97..122 + '.' | % { [char] $_ }) +(0..9) + '.' | Get-Random
})
$i++
} until ($i -eq $ready)
} | Out-File C:/numbers.csv -Append
Note the [int] type constraint to convert the Read-Host output, which is always a string, to a number, and the use of the -eq operator rather than the text- and regex-based -match operator in the until condition; also, unnecessary grouping with (...) has been removed.
Note: An alternative to the use of a script block with either the & or . operator is to use $(...), the subexpression operator, as shown in MikeM's helpful answer. The difference between the two approaches is that the former streams its output to the pipeline - i.e., outputs objects one by one - whereas $(...) invariably collects all output in memory, up front.
For smallish input sets this won't make much of a difference, but the in-memory collection that $(...) performs can become problematic with large input sets, so the & { ... } / . { ... } approach is generally preferable.
Arno van Boven' answer shows a simpler alternative to your do ... until loop based on a for loop.
Combining a foreach loop with .., the range operator, is even more concise and expressive (and the cost of the array construction is usually negligible and overall still amounts to noticeably faster execution):
[int] $ready = Read-Host "How many you want?"
& {
foreach ($i in 1..$ready) {
-join (1..12 | foreach {
([char[]] (65..90 + 97..122)) + 0..9 + '.' | Get-Random
})
}
} | Out-File C:/numbers.csv -Append
The above also shows a simplification of the original command via a [char[]] cast that directly converts an array of code points to an array of characters.
In PowerShell [Core] 7+, you could further simplify by taking advantage of Get-Random's -Count parameter:
[int] $ready = Read-Host "How many you want?"
& {
foreach ($i in 1..$ready) {
-join (
([char[]] (65..90 + 97..122)) + 0..9 + '.' | Get-Random -Count 12
)
}
} | Out-File C:/numbers.csv -Append
And, finally, you could have avoided a statement for looping altogether, and used the ForEach-Object cmdlet instead (whose built-in alias, perhaps confusingly, is also foreach, but there'a also %), as you're already doing inside your loop (1..12 | foreach ...):
[int] $ready = Read-Host "How many you want?"
1..$ready | ForEach-Object {
-join (1..12 | ForEach-Object {
([char[]] (65..90 + 97..122)) + 0..9 + '.' | Get-Random
})
} | Out-File C:/numbers.csv -Append
[1] In Windows PowerShell, Out-File uses UTF-16LE ("Unicode") encoding by default, so even a conceptually empty file still contains 2 bytes, namely the UTF-16LE BOM. In PowerShell [Core] v6+, BOM-less UTF-8 is the default across all cmdlets, so there you'll truly get an empty (0 bytes) file.
Another way is to wrap the loop in a sub-expression and pipe it:
$ready = Read-Host "How many you want?: "
$i = 0
$(do{
(-join(1..12 | ForEach {((65..90)+(97..122)+(".") | % {[char]$_})+(0..9)+(".") | Get-Random}))
$i++
} until ($i -match $ready)) | Out-File C:/numbers.csv -Append
I personally avoid Do loops when I can, because I find them hard to read. Combining the two previous answers, I'd write it like this, because I find it easier to tell what is going on. Using a for loop instead, every line becomes its own self-contained piece of logic.
[int]$amount = Read-Host "How many you want?: "
& {
for ($i = 0; $i -lt $amount; $i++) {
-join(1..12 | foreach {((65..90)+(97..122)+(".") | foreach {[char]$_})+(0..9)+(".") | Get-Random})
}
} | Out-File C:\numbers.csv -Append
(Please do not accept this as an answer, this is just showing another way of doing it)

Powerhsell/Unix Bash - Find missing numbers for a set of files

I'm currently trying to detect some missing files for batches.
This batch generates files with following convention:
File_1_01.txt (first hour, first iteration)
File_1_02.txt
I want to create a script to find out which iteration is missing.
I found this piece of code in powershell
function missingNumbers {
Get-ChildItem -path ./* -include *.txt
| Where{$_.Name -match '\d+'}
| ForEach{$Matches[0]}
| sort {[int]$_} |% {$i = 1}
{while ($i -lt $_){$i;$i++};$i++}}
However when i use this script against
File_1_01.txt
File_1_02.txt
File_1_04.txt
It doesn't return entry 03
Investigations
Remove hourly occurrence File_<1>_helps and script behaves as expected.
If i concatenate hour and occurrence, script will display all numbers before 101.
I'm open to having sth in Unix as well.
I had another approach in mind, removing the common text between all files but have no idea how to do it.
Instead of matching the 1 every time, you can match the 2 digits \d\d at the end. Putting the pipe symbol on the next line like that will only work in powershell 7.
Get-ChildItem -path ./* -include *.txt |
Where Name -match \d\d |
ForEach{ $Matches[0] } |
sort {[int]$_} |
% { $i = 1 } { while ($i -lt $_){ $i;$i++ }; $i++ }
3
You could put all the numbers together like this, but then they'd have to all be right after each other. Adjust the match pattern or get-childitem for your needs.
Get-ChildItem -path ./* -include *.txt |
Where Name -match '(\d).(\d\d)' |
ForEach{ $Matches[1] + $Matches[2] }
101
102
104

How to print/append row index number on each line in a file in Powershell

sapmple.txt as below
row1col1||col2||col2||col3
row2col1||col2||col2||col3
row3col1||col2||col2||col3
expected
0||row1col1||col2||col2||col3
1||row2col1||col2||col2||col3
2||row3col1||col2||col2||col3
I coded like
get-content "C:\sample.txt" | foreach { "[|][|]" + $_ } | foreach { Index + $_ } | set-content "C:\sample1.txt"
calling the pipe and then respective index, but not working.
Can you please help.
just like this:
$index = 0
Get-Content 'C:\sample.txt' | ForEach-Object { "{0}||{1}" -f $index++, $_ } | Set-Content 'C:\sample1.txt'
If want to prepend leading zeroes to your index so also larger numbers will align (in this example all indices will have 4 digits):
Get-Content 'C:\sample.txt' | ForEach-Object { "{0}||{1}" -f ($index++).ToString("0000") , $_ } | Set-Content 'C:\sample1.txt'
Another thought:
% { $i = 0 } { "$i||$_" ; $i++ }
What is index in your example? Strings do not have an index property nor does Get-Content create one as far as I know.
Get-Content already knows line numbers using ReadCount so keeping a running index is redundant. It does however start counting at one so a small adjustment would need to be made there to match your desire.
Get-Content C:\temp\text.txt | ForEach-Object {"{0}||{1}" -f ($_.ReadCount - 1),$_}
We use the format operator -f to try and make for easier to edit output string. Simply pipe the output of the foreach-object to its desired output.

How to use Powershell Pipeline to Avoid Large Objects?

I'm using a custom function to essentially do a DIR command (recursive file listing) on an 8TB drive (thousands of files).
My first iteration was:
$results = $PATHS | % {Get-FolderItem -Path "$($_)" } | Select Name,DirectoryName,Length,LastWriteTime
$results | Export-CVS -Path $csvfile -Force -Encoding UTF8 -NoTypeInformation -Delimiter "|"
This resulted in a HUGE $results variable and slowed the system down to a crawl by spiking the powershell process to use 99%-100% of the CPU as the processing went on.
I decided to use the power of the pipeline to WRITE to the CSV file directly (presumably freeing up the memory) instead of saving to an intermediate variable, and came up with this:
$PATHS | % {Get-FolderItem -Path "$($_)" } | Select Name,DirectoryName,Length,LastWriteTime | ConvertTo-CSV -NoTypeInformation -Delimiter "|" | Out-File -FilePath $csvfile -Force -Encoding UTF8
This seemed to be working fine (the CSV file was growing..and CPU seemed to be stable) but then abruptly stopped when the CSV file size hit ~200MB, and the error to the console was "The pipeline has been stopped".
I'm not sure the CSV file size had anything to do with the error message, but I'm unable to process this large directory with either method! Any suggestions on how to allow this process to complete successfully?
Get-FolderItem runs robocopy to list the files and converts its output into a PSObject array. This is a slow operation, which isn't required for the actual task, strictly speaking. Pipelining also adds big overhead compared to the foreach statement. In the case of thousands or hundreds of thousands repetitions that becomes noticeable.
We can speed up the process beyond anything pipelining and standard PowerShell cmdlets can offer to write the info for 400,000 files on an SSD drive in 10 seconds.
.NET Framework 4 or newer (included since Win8, installable on Win7/XP) IO.DirectoryInfo's EnumerateFileSystemInfos to enumerate the files in a non-blocking pipeline-like fashion;
PowerShell 3 or newer as it's faster than PS2 overall;
foreach statement which doesn't need to create ScriptBlock context for each item thus it's much faster than ForEach cmdlet
IO.StreamWriter to write each file's info immediately in a non-blocking pipeline-like fashion;
\\?\ prefix trick to lift the 260 character path length restriction;
manual queuing of directories to process to get past "access denied" errors, which otherwise would stop naive IO.DirectoryInfo enumeration;
progress reporting.
function List-PathsInCsv([string[]]$PATHS, [string]$destination) {
$prefix = '\\?\' #' UNC prefix lifts 260 character path length restriction
$writer = [IO.StreamWriter]::new($destination, $false, [Text.Encoding]::UTF8, 1MB)
$writer.WriteLine('Name|Directory|Length|LastWriteTime')
$queue = [Collections.Generic.Queue[string]]($PATHS -replace '^', $prefix)
$numFiles = 0
while ($queue.Count) {
$dirInfo = [IO.DirectoryInfo]$queue.Dequeue()
try {
$dirEnumerator = $dirInfo.EnumerateFileSystemInfos()
} catch {
Write-Warning ("$_".replace($prefix, '') -replace '^.+?: "(.+?)"$', '$1')
continue
}
$dirName = $dirInfo.FullName.replace($prefix, '')
foreach ($entry in $dirEnumerator) {
if ($entry -is [IO.FileInfo]) {
$writer.WriteLine([string]::Join('|', #(
$entry.Name
$dirName
$entry.Length
$entry.LastWriteTime
)))
} else {
$queue.Enqueue($entry.FullName)
}
if (++$numFiles % 1000 -eq 0) {
Write-Progress -activity Digging -status "$numFiles files, $dirName"
}
}
}
$writer.Close()
Write-Progress -activity Digging -Completed
}
Usage:
List-PathsInCsv 'c:\windows', 'd:\foo\bar' 'r:\output.csv'
dont use robocopy, use native PowerShell command, like this :
$PATHS = 'c:\temp', 'c:\temp2'
$csvfile='c:\temp\listresult.csv'
$PATHS | % {Get-ChildItem $_ -file -recurse } | Select Name,DirectoryName,Length,LastWriteTime | export-csv $csvfile -Delimiter '|' -Encoding UTF8 -NoType
Short version for no purist :
$PATHS | % {gci $_ -file -rec } | Select Name,DirectoryName,Length,LastWriteTime | epcsv $csvfile -D '|' -E UTF8 -NoT

Remove Top Line of Text File with PowerShell

I am trying to just remove the first line of about 5000 text files before importing them.
I am still very new to PowerShell so not sure what to search for or how to approach this. My current concept using pseudo-code:
set-content file (get-content unless line contains amount)
However, I can't seem to figure out how to do something like contains.
While I really admire the answer from #hoge both for a very concise technique and a wrapper function to generalize it and I encourage upvotes for it, I am compelled to comment on the other two answers that use temp files (it gnaws at me like fingernails on a chalkboard!).
Assuming the file is not huge, you can force the pipeline to operate in discrete sections--thereby obviating the need for a temp file--with judicious use of parentheses:
(Get-Content $file | Select-Object -Skip 1) | Set-Content $file
... or in short form:
(gc $file | select -Skip 1) | sc $file
It is not the most efficient in the world, but this should work:
get-content $file |
select -Skip 1 |
set-content "$file-temp"
move "$file-temp" $file -Force
Using variable notation, you can do it without a temporary file:
${C:\file.txt} = ${C:\file.txt} | select -skip 1
function Remove-Topline ( [string[]]$path, [int]$skip=1 ) {
if ( -not (Test-Path $path -PathType Leaf) ) {
throw "invalid filename"
}
ls $path |
% { iex "`${$($_.fullname)} = `${$($_.fullname)} | select -skip $skip" }
}
I just had to do the same task, and gc | select ... | sc took over 4 GB of RAM on my machine while reading a 1.6 GB file. It didn't finish for at least 20 minutes after reading the whole file in (as reported by Read Bytes in Process Explorer), at which point I had to kill it.
My solution was to use a more .NET approach: StreamReader + StreamWriter.
See this answer for a great answer discussing the perf: In Powershell, what's the most efficient way to split a large text file by record type?
Below is my solution. Yes, it uses a temporary file, but in my case, it didn't matter (it was a freaking huge SQL table creation and insert statements file):
PS> (measure-command{
$i = 0
$ins = New-Object System.IO.StreamReader "in/file/pa.th"
$outs = New-Object System.IO.StreamWriter "out/file/pa.th"
while( !$ins.EndOfStream ) {
$line = $ins.ReadLine();
if( $i -ne 0 ) {
$outs.WriteLine($line);
}
$i = $i+1;
}
$outs.Close();
$ins.Close();
}).TotalSeconds
It returned:
188.1224443
Inspired by AASoft's answer, I went out to improve it a bit more:
Avoid the loop variable $i and the comparison with 0 in every loop
Wrap the execution into a try..finally block to always close the files in use
Make the solution work for an arbitrary number of lines to remove from the beginning of the file
Use a variable $p to reference the current directory
These changes lead to the following code:
$p = (Get-Location).Path
(Measure-Command {
# Number of lines to skip
$skip = 1
$ins = New-Object System.IO.StreamReader ($p + "\test.log")
$outs = New-Object System.IO.StreamWriter ($p + "\test-1.log")
try {
# Skip the first N lines, but allow for fewer than N, as well
for( $s = 1; $s -le $skip -and !$ins.EndOfStream; $s++ ) {
$ins.ReadLine()
}
while( !$ins.EndOfStream ) {
$outs.WriteLine( $ins.ReadLine() )
}
}
finally {
$outs.Close()
$ins.Close()
}
}).TotalSeconds
The first change brought the processing time for my 60 MB file down from 5.3s to 4s. The rest of the changes is more cosmetic.
$x = get-content $file
$x[1..$x.count] | set-content $file
Just that much. Long boring explanation follows. Get-content returns an array. We can "index into" array variables, as demonstrated in this and other Scripting Guys posts.
For example, if we define an array variable like this,
$array = #("first item","second item","third item")
so $array returns
first item
second item
third item
then we can "index into" that array to retrieve only its 1st element
$array[0]
or only its 2nd
$array[1]
or a range of index values from the 2nd through the last.
$array[1..$array.count]
I just learned from a website:
Get-ChildItem *.txt | ForEach-Object { (get-Content $_) | Where-Object {(1) -notcontains $_.ReadCount } | Set-Content -path $_ }
Or you can use the aliases to make it short, like:
gci *.txt | % { (gc $_) | ? { (1) -notcontains $_.ReadCount } | sc -path $_ }
Another approach to remove the first line from file, using multiple assignment technique. Refer Link
$firstLine, $restOfDocument = Get-Content -Path $filename
$modifiedContent = $restOfDocument
$modifiedContent | Out-String | Set-Content $filename
skip` didn't work, so my workaround is
$LinesCount = $(get-content $file).Count
get-content $file |
select -Last $($LinesCount-1) |
set-content "$file-temp"
move "$file-temp" $file -Force
Following on from Michael Soren's answer.
If you want to edit all .txt files in the current directory and remove the first line from each.
Get-ChildItem (Get-Location).Path -Filter *.txt |
Foreach-Object {
(Get-Content $_.FullName | Select-Object -Skip 1) | Set-Content $_.FullName
}
For smaller files you could use this:
& C:\windows\system32\more +1 oldfile.csv > newfile.csv | out-null
... but it's not very effective at processing my example file of 16MB. It doesn't seem to terminate and release the lock on newfile.csv.