Using Powershell to replace multiple strings in multiple files & folders - powershell

I have a list of strings in a CSV file. The format is:
OldValue,NewValue
223134,875621
321321,876330
....
and the file contains a few hundred rows (each OldValue is unique). I need to process changes over a number of text files in a number of folders & subfolders. My best guess of the number of folders, files, and lines of text are - 15 folders, around 150 text files in each folder, with approximately 65,000 lines of text in each folder (between 400-500 lines per text file).
I will make 2 passes at the data, unless I can do it in one. First pass is to generate a text file I will use as a check list to review my changes. Second pass is to actually make the change in the file. Also, I only want to change the text files where the string occurs (not every file).
I'm using the following Powershell script to go through the files & produce a list of the changes needed. The script runs, but is beyond slow. I haven't worked on the replace logic yet, but I assume it will be similar to what I've got.
# replace a string in a file with powershell
[reflection.assembly]::loadwithpartialname("Microsoft.VisualBasic") | Out-Null
Function Search {
# Parameters $Path and $SearchString
param ([Parameter(Mandatory=$true, ValueFromPipeline = $true)][string]$Path,
[Parameter(Mandatory=$true)][string]$SearchString
)
try {
#.NET FindInFiles Method to Look for file
[Microsoft.VisualBasic.FileIO.FileSystem]::GetFiles(
$Path,
[Microsoft.VisualBasic.FileIO.SearchOption]::SearchAllSubDirectories,
$SearchString
)
} catch { $_ }
}
if (Test-Path "C:\Work\ListofAllFilenamesToSearch.txt") { # if file exists
Remove-Item "C:\Work\ListofAllFilenamesToSearch.txt"
}
if (Test-Path "C:\Work\FilesThatNeedToBeChanged.txt") { # if file exists
Remove-Item "C:\Work\FilesThatNeedToBeChanged.txt"
}
$filefolder1 = "C:\TestFolder\WorkFiles"
$ftype = "*.txt"
$filenames1 = Search $filefolder1 $ftype
$filenames1 | Out-File "C:\Work\ListofAllFilenamesToSearch.txt" -Width 2000
if (Test-Path "C:\Work\FilesThatNeedToBeChanged.txt") { # if file exists
Remove-Item "C:\Work\FilesThatNeedToBeChanged.txt"
}
(Get-Content "C:\Work\NumberXrefList.CSV" |where {$_.readcount -gt 1}) | foreach{
$OldFieldValue, $NewFieldValue = $_.Split("|")
$filenamelist = (Get-Content "C:\Work\ListofAllFilenamesToSearch.txt" -ReadCount 5) #|
foreach ($j in $filenamelist) {
#$testvar = (Get-Content $j )
#$testvar = (Get-Content $j -ReadCount 100)
$testvar = (Get-Content $j -Delimiter "\n")
Foreach ($i in $testvar)
{
if ($i -imatch $OldFieldValue) {
$j + "|" + $OldFieldValue + "|" + $NewFieldValue | Out-File "C:\Work\FilesThatNeedToBeChanged.txt" -Width 2000 -Append
}
}
}
}
$FileFolder = (Get-Content "C:\Work\FilesThatNeedToBeChanged.txt" -ReadCount 5)
Get-ChildItem $FileFolder -Recurse |
select -ExpandProperty fullname |
foreach {
if (Select-String -Path $_ -SimpleMatch $OldFieldValue -Debug -Quiet) {
(Get-Content $_) |
ForEach-Object {$_ -replace $OldFieldValue, $NewFieldValue }|
Set-Content $_ -WhatIf
}
}
In the code above, I've tried several things with Get-Content - default, with -ReadCount, and -Delimiter - in an attempt to avoid an out of memory error.
The only thing I have control over is the length of the old & new replacement strings file. Is there a way to do this in Powershell? Is there a better option/solution? I'm running Windows 7, Powershell version 3.0.

Your main problem is that you're reading the file over and over again to change each of the terms. You need to invert the looping of the replace terms and looping of the files. Also, pre-load the csv. Something like:
$filefolder1 = "C:\TestFolder\WorkFiles"
$ftype = "*.txt"
$filenames = gci -Path $filefolder1 -Filter $ftype -Recurse
$replaceValues = Import-Csv -Path "C:\Work\NumberXrefList.CSV"
foreach ($file in $filenames) {
$contents = Get-Content -Path $file
foreach ($replaceValue in $replaceValues) {
$contents = $contents -replace $replaceValue.OldValue, $replaceValue.NewValue
}
Copy-Item $file "$file.old"
Set-Content -Path $file -Value $contents
}

Related

How to parse through folders and files using PowerShell?

I am trying to construct a script that moves through specific folders and the log files in it, and filters the error codes. After that it passes them into a new file.
I'm not really sure how to do that with for loops so I'll leave my code bellow.
If someone could tell me what I'm doing wrong, that would be greatly appreciated.
$file_name = Read-Host -Prompt 'Name of the new file: '
$path = 'C:\Users\user\Power\log_script\logs'
Add-Type -AssemblyName System.IO.Compression.FileSystem
function Unzip
{
param([string]$zipfile, [string]$outpath)
[System.IO.Compression.ZipFile]::ExtractToDirectory($zipfile, $outpath)
}
if ([System.IO.File]::Exists($path)) {
Remove-Item $path
Unzip 'C:\Users\user\Power\log_script\logs.zip' 'C:\Users\user\Power\log_script'
} else {
Unzip 'C:\Users\user\Power\log_script\logs.zip' 'C:\Users\user\Power\log_script'
}
$folder = Get-ChildItem -Path 'C:\Users\user\Power\log_script\logs\LogFiles'
$files = foreach($logfolder in $folder) {
$content = foreach($line in $files) {
if ($line -match '([ ][4-5][0-5][0-9][ ])') {
echo $line
}
}
}
$content | Out-File $file_name -Force -Encoding ascii
Inside the LogFiles folder are three more folders each containing log files.
Thanks
Expanding on a comment above about recursing the folder structure, and then actually retrieving the content of the files, you could try something line this:
$allFiles = Get-ChildItem -Path 'C:\Users\user\Power\log_script\logs\LogFiles' -Recurse
# iterate the files
$allFiles | ForEach-Object {
# iterate the content of each file, line by line
Get-Content $_ | ForEach-Object {
if ($_ -match '([ ][4-5][0-5][0-9][ ])') {
echo $_
}
}
}
It looks like your inner loop is of a collection ($files) that doesn't yet exist. You assign $files to the output of a ForEach(...) loop then try to nest another loop of $files inside it. Of course at this point $files isn't available to be looped.
Regardless, the issue is you are never reading the content of your log files. Even if you managed to loop through the output of Get-ChildItem, you need to look at each line to perform the match.
Obviously I cannot completely test this, but I see a few issues and have rewritten as below:
$file_name = Read-Host -Prompt 'Name of the new file'
$path = 'C:\Users\user\Power\log_script\logs'
$Pattern = '([ ][4-5][0-5][0-9][ ])'
if ( [System.IO.File]::Exists( $path ) ) { Remove-Item $path }
Expand-Archive 'C:\Users\user\Power\log_script\logs.zip' 'C:\Users\user\Power\log_script'
Select-String -Path 'C:\Users\user\Power\log_script\logs\LogFiles\*' -Pattern $Pattern |
Select-Object -ExpandProperty line |
Out-File $file_name -Force -Encoding ascii
Note: Select-String cannot recurse on its own.
I'm not sure you need to write your own UnZip function. PowerShell has the Expand-Archive cmdlet which can at least match the functionality thus far:
Expand-Archive -Path <SourceZipPath> -DestinationPath <DestinationFolder>
Note: The -Force parameter allows it to over write the destination files if they are already present. which may be a substitute for testing if the file exists and deleting if it does.
If you are going to test for the file that section of code can be simplified as:
if ( [System.IO.File]::Exists( $path ) ) { Remove-Item $path }
Unzip 'C:\Users\user\Power\log_script\logs.zip' 'C:\Users\user\Power\log_script'
This is because you were going to run the UnZip command regardless...
Note: You could also use Test-Path for this.
Also there are enumerable ways to get the matching lines, here are a couple of extra samples:
Get-ChildItem -Path 'C:\Users\user\Power\log_script\logs\LogFiles' |
ForEach-Object{
( Get-Content $_.FullName ) -match $Pattern
# Using match in this way will echo the lines that matched from each run of
# Get-Content. If nothing matched nothing will output on that iteration.
} |
Out-File $file_name -Force -Encoding ascii
This approach will read the entire file into an array before running the match on it. For large files it may pose a memory issue, however it enabled the clever use of -match.
OR:
Get-ChildItem -Path 'C:\Users\user\Power\log_script\logs\LogFiles' |
Get-Content |
ForEach-Object{ If( $_ -match $Pattern ) { $_ } } |
Out-File $file_name -Force -Encoding ascii
Note: You don't need the alias echo or its real cmdlet Write-Output
UPDATE: After fuzzing around a bit and trying different things I finally got it to work.
I'll include the code below just for demonstration purposes.
Thanks everyone
$start = Get-Date
"`n$start`n"
$file_name = Read-Host -Prompt 'Name of the new file: '
Out-File $file_name -Force -Encoding ascii
Expand-Archive -Path 'C:\Users\User\Power\log_script\logs.zip' -Force
$i = 1
$folders = Get-ChildItem -Path 'C:\Users\User\Power\log_script\logs\logs\LogFiles' -Name -Recurse -Include *.log
foreach($item in $folders) {
$files = 'C:\Users\User\Power\log_script\logs\logs\LogFiles\' + $item
foreach($file in $files){
$content = Get-Content $file
Write-Progress -Activity "Filtering..." -Status "File $i of $($folders.Count)" -PercentComplete (($i / $folders.Count) * 100)
$i++
$output = foreach($line in $content) {
if ($line -match '([ ][4-5][0-5][0-9][ ])') {
Add-Content -Path $file_name -Value $line
}
}
}
}
$end = Get-Date
$time = [int]($end - $start).TotalSeconds
Write-Output ("Runtime: " + $time + " Seconds" -join ' ')

Memory exception while filtering large CSV files

getting memory exception while running this code. Is there a way to filter one file at a time and write output and append after processing each file. Seems the below code loads everything to memory.
$inputFolder = "C:\Change\2019\October"
$outputFile = "C:\Change\2019\output.csv"
Get-ChildItem $inputFolder -File -Filter '*.csv' |
ForEach-Object { Import-Csv $_.FullName } |
Where-Object { $_.machine_type -eq 'workstations' } |
Export-Csv $outputFile -NoType
May be can you export and filter your files one by one and append result into your output file like this :
$inputFolder = "C:\Change\2019\October"
$outputFile = "C:\Change\2019\output.csv"
Remove-Item $outputFile -Force -ErrorAction SilentlyContinue
Get-ChildItem $inputFolder -Filter "*.csv" -file | %{import-csv $_.FullName | where machine_type -eq 'workstations' | export-csv $outputFile -Append -notype }
Note: The reason for not using Get-ChildItem ... | Import-Csv ... - i.e., for not directly piping Get-ChildItem to Import-Csv and instead having to call Import-Csv from the script block ({ ... } of an auxiliary ForEach-Object call, is a bug in Windows PowerShell that has since been fixed in PowerShell Core - see the bottom section for a more concise workaround.
However, even output from ForEach-Object script blocks should stream to the remaining pipeline commands, so you shouldn't run out of memory - after all, a salient feature of the PowerShell pipeline is object-by-object processing, which keeps memory use constant, irrespective of the size of the (streaming) input collection.
You've since confirmed that avoiding the aux. ForEach-Object call does not solve the problem, so we still don't know what causes your out-of-memory exception.
Update:
This GitHub issue contains clues as to the reason for excessive memory use, especially with many properties that contain small amounts of data.
This GitHub feature request proposes using strongly typed output objects to help the issue.
The following workaround, which uses the switch statement to process the files as text files, may help:
$header = ''
Get-ChildItem $inputFolder -Filter *.csv | ForEach-Object {
$i = 0
switch -Wildcard -File $_.FullName {
'*workstations*' {
# NOTE: If no other columns contain the word `workstations`, you can
# simplify and speed up the command by omitting the `ConvertFrom-Csv` call
# (you can make the wildcard matching more robust with something
# like '*,workstations,*')
if ((ConvertFrom-Csv "$header`n$_").machine_type -ne 'workstations') { continue }
$_ # row whose 'machine_type' column value equals 'workstations'
}
default {
if ($i++ -eq 0) {
if ($header) { continue } # header already written
else { $header = $_; $_ } # header row of 1st file
}
}
}
} | Set-Content $outputFile
Here's a workaround for the bug of not being able to pipe Get-ChildItem output directly to Import-Csv, by passing it as an argument instead:
Import-Csv -LiteralPath (Get-ChildItem $inputFolder -File -Filter *.csv) |
Where-Object { $_.machine_type -eq 'workstations' } |
Export-Csv $outputFile -NoType
Note that in PowerShell Core you could more naturally write:
Get-ChildItem $inputFolder -File -Filter *.csv | Import-Csv |
Where-Object { $_.machine_type -eq 'workstations' } |
Export-Csv $outputFile -NoType
Solution 2 :
$inputFolder = "C:\Change\2019\October"
$outputFile = "C:\Change\2019\output.csv"
$encoding = [System.Text.Encoding]::UTF8 # modify encoding if necessary
$Delimiter=','
#find header for your files => i take first row of first file with data
$Header = Get-ChildItem -Path $inputFolder -Filter *.csv | Where length -gt 0 | select -First 1 | Get-Content -TotalCount 1
#if not header founded then not file with sise >0 => we quit
if(! $Header) {return}
#create array for header
$HeaderArray=$Header -split $Delimiter -replace '"', ''
#open output file
$w = New-Object System.IO.StreamWriter($outputfile, $true, $encoding)
#write header founded
$w.WriteLine($Header)
#loop on file csv
Get-ChildItem $inputFolder -File -Filter "*.csv" | %{
#open file for read
$r = New-Object System.IO.StreamReader($_.fullname, $encoding)
$skiprow = $true
while ($line = $r.ReadLine())
{
#exclude header
if ($skiprow)
{
$skiprow = $false
continue
}
#Get objet for current row with header founded
$Object=$line | ConvertFrom-Csv -Header $HeaderArray -Delimiter $Delimiter
#write in output file for your condition asked
if ($Object.machine_type -eq 'workstations') { $w.WriteLine($line) }
}
$r.Close()
$r.Dispose()
}
$w.close()
$w.Dispose()
You have to read and write to the .csv files one row at a time, using StreamReader and StreamWriter:
$filepath = "C:\Change\2019\October"
$outputfile = "C:\Change\2019\output.csv"
$encoding = [System.Text.Encoding]::UTF8
$files = Get-ChildItem -Path $filePath -Filter *.csv |
Where-Object { $_.machine_type -eq 'workstations' }
$w = New-Object System.IO.StreamWriter($outputfile, $true, $encoding)
$skiprow = $false
foreach ($file in $files)
{
$r = New-Object System.IO.StreamReader($file.fullname, $encoding)
while (($line = $r.ReadLine()) -ne $null)
{
if (!$skiprow)
{
$w.WriteLine($line)
}
$skiprow = $false
}
$r.Close()
$r.Dispose()
$skiprow = $true
}
$w.close()
$w.Dispose()
get-content *.csv | add-content combined.csv
Make sure combined.csv doesn't exist when you run this, or it's going to go full Ouroboros.

Splitting file into smaller files, working script, but need some tweaks

I have a script here that looks for a delimiter in a text file with several reports in it.  The script saves each individual report as it's own text document. The tweaks I'm trying to achieve are:
In the middle of the data of each page there is - SPEC #: RX:<string>.  I want that string to be saved as the filename.
it currently saves from the delimiter down to the next one. This ignores the first report and grabs every one after. I want it to go from the delimiter UP to the next one, but I haven't figured out how to achieve that.
$InPC = "C:\Users\path"
Get-ChildItem -Path $InPC -Filter *.txt | ForEach-Object -Process {
$basename= $_.BaseName
$m = ( ( Get-Content $_.FullName | Where { $_ | Select-String "END OF
REPORT" -Quiet } | Measure-Object | ForEach-Object { $_.Count } ) -ge 2)
$a = 1
if ($m) {
Get-Content $_.FullName | % {
If ($_ -match "END OF REPORT") {
$OutputFile = "$InPC\$basename _$a.txt"
$a++
}
Add-Content $OutputFile $_
}
Remove-Item $_.FullName
}
}
This works, as stated it outputs the file with END OF REPORT on top, the first report in the file gets omitted as it does not have END OF REPORT above it.
Edited code:
$InPC = 'C:\Path' #
ForEach($File in Get-ChildItem -Path $InPC -Filter *.txt){
$RepNum=0
ForEach($Report in (([IO.File]::ReadAllText('C:\Path'$File) -split 'END OF REPORT\r?\n?' -ne '')){
if ($Report -match 'SPEC #: RX:(?<ReportFile>.*?)\.'){
$ReportFile=$Matches.ReportFile
}
$OutputFile = "{0}\{1}_{2}_{3}.txt" -f $InPC,$File.BaseName,$ReportFile,++$RepNum
$Report | Add-Content $OutputFile
}
# Remove-Item $File.FullName
}
I suggest to use Regular Expressions to
read in the file with -raw parameter and
split the file at the marker END OF REPORT into sections
use the 'SPEC #: RX:(?<ReportFile>.*?)\.' with a named capture group to extract the string
Edit adapted to PowerShell v2
## Q:\Test\2019\09\12\SO_57911471.ps1
$InPC = 'C:\Users\path' # 'Q:\Test\2019\09\12\' #
ForEach($File in Get-ChildItem -Path $InPC -Filter *.txt){
$RepNum=0
ForEach($Report in (((Get-Content $File.FullName) -join "`n") -split 'END OF REPORT\r?\n?' -ne '')){
if ($Report -match 'SPEC #: RX:(?<ReportFile>.*?)\.'){
$ReportFile=$Matches.ReportFile
}
$OutputFile = "{0}\{1}_{2}_{3}.txt" -f $InPC,$File.BaseName,$ReportFile,++$RepNum
$Report | Add-Content $OutputFile
}
# Remove-Item $File.FullName
}
This construed sample text:
## Q:\Test\2019\09\12\SO_57911471.txt
I have a script here that looks for a delimiter in a text file with several reports in it.
In the middle of the data of each page there is -
SPEC #: RX:string1.
I want that string to be saved as the filename.
END OF REPORT
I have a script here that looks for a delimiter in a text file with several reports in it.
In the middle of the data of each page there is -
SPEC #: RX:string2.
I want that string to be saved as the filename.
END OF REPORT
yields:
> Get-ChildItem *string* -name
SO_57911471_string1_1.txt
SO_57911471_string2_2.txt
The added ReportNum is just a precaution in case the string could not be grepped.

Powershell copy files to differrent location based on file content

I've a scenario where everyday I will received 2 csv files where the file naming is something like CMCS_{Timestamp}, example CMCS_02012016100101 and CMCS_02012016100102 . This 2 files are different files and have different structure, but because this 2 files will go into same folder where my ETL tools will pick it up and process. So I wrote a script where the script will based on the structure of the file to distinguish it whether is a file A or file B.
For File A, i tell a script to look at first line of the file and if line start with 'Name,Emp(Date).' then copy the file to folderA else if line start with 'Name,Group.' then copy the file to folderB else copy file to folder C
Here the code that i wrote, the powershell does not generate any errors but it does not produce any results too. I wonder what wrong in my script.
$fileDirectory = "D:\Data";
$output_path = "D:\Output\FileA";
$output_path2 = "D:\Output\FileB";
$output_path2 = "D:\Output\FileC";
foreach($file in Get-ChildItem $fileDirectory)
{
# the full file path.
$filePath = $fileDirectory + "\" + $file;
$getdata = Get-Content -path $filePath
$searchresults = $getdata | Select -Index 1 | Where-Object { $_ -like 'Name,Emp(Date).*' }
$searchresults2 = $getdata | Select -Index 1 | Where-Object { $_ -like 'Name,Group.*' }
if ($searchresults -ne $null) {
Copy-Item $filePath $output_path
}
if ($searchresults2 -ne $null) {
Copy-Item $filePath $output_path2
}
}
Your issue may be caused by the Select -Index 1, as Powershell uses 0 based indexing this will actually select the second line of the file. If you change this to 0 it should correctly get the header row.
On a separate note, instead of doing $filePath = $fileDirectory + "\" + $file; you can just use $file.FullName to get the file path.
EDIT:
I think this should do what you're after:
[string] $FileDirectory = "D:\Data";
[string] $OutputPath = "D:\Output\FileA";
[string] $OutputPath2 = "D:\Output\FileB";
[string] $OutputPath3 = "D:\Output\FileC";
foreach ($FilePath in Get-ChildItem $FileDirectory | Select-Object -ExpandProperty FullName)
{
[string] $Header = Get-Content $FilePath -First 1
if ($Header -match 'Name,Emp.*') {
Copy-Item $FilePath $OutputPath
}
elseif ($Header -match 'Name,Group.*') {
Copy-Item $FilePath $OutputPath2
}
else {
Copy-Item $FilePath $OutputPath3
}
}

Count characters for each line

I am new to WinPowerShell. Please, would you be so kind to give me some code or information, how to write a program which will do for all *.txt files in a folder next:
1.Count characters for each line in the file
2. If length of line exceeds 1024 characters to create a subfolder within that folder and to move file there (that how I will know which file has over 1024 char per line)
I've tried though VB and VBA (this is more familiar to me), but I want to learn some new cool stuff!
Many thanks!
Edit: I found some part of a code that is beginning
$fileDirectory = "E:\files";
foreach($file in Get-ChildItem $fileDirectory)
{
# Processing code goes here
}
OR
$fileDirectory = "E:\files";
foreach($line in Get-ChildItem $fileDirectory)
{
if($line.length -gt 1023){# mkdir and mv to subfolder!}
}
If you are willing to learn, why not start here.
You can use the Get-Content command in PS to get some information of your files. http://blogs.technet.com/b/heyscriptingguy/archive/2013/07/06/powertip-counting-characters-with-powershell.aspx and Getting character count for each row in text doc
With your second edit I did see some effort so I would like to help you.
$path = "D:\temp"
$lengthToNotExceed = 1024
$longFiles = Get-ChildItem -path -File |
Where-Object {(Get-Content($_.Fullname) | Measure-Object -Maximum Length | Select-Object -ExpandProperty Maximum) -ge $lengthToNotExceed}
$longFiles | ForEach-Object{
$target = "$($_.Directory)\$lengthToNotExceed\"
If(!(Test-Path $target)){New-Item $target -ItemType Directory -Force | Out-Null}
Move-Item $_.FullName -Destination $target
}
You can make this a one-liner but it would be unnecessarily complicated. Use measure object on the array returned by Get-Content. The array being, more or less, a string array. In PowerShell strings have a length property which query.
That will return the maximum length line in the file. We use Where-Object to filter only those results with the length we desire.
Then for each file we attempt to move it to the sub directory that is in the same location as the file matched. If no sub folder exists we make it.
Caveats:
You need at least 3.0 for the -File switch. In place of that you can update the Where-Object to have another clause: $_.PSIsContainer
This would perform poorly on files with a large number of lines.
Here's my comment above indented and line broken in .ps1 script form.
$long = #()
foreach ($file in gci *.txt) {
$f=0
gc $file | %{
if ($_.length -ge 1024) {
if (-not($f)) {
$f=1
$long += $file
}
}
}
}
$long | %{
$dest = #($_.DirectoryName, '\test') -join ''
[void](ni -type dir $dest -force)
mv $_ -dest (#($dest, '\', $_.Name) -join '') -force
}
I was also mentioning labels and breaks there. Rather than $f=0 and if (-not($f)), you can break out of the inner loop with break like this:
$long = #()
foreach ($file in gci *.txt) {
:inner foreach ($line in gc $file) {
if ($line.length -ge 1024) {
$long += $file
break inner
}
}
}
$long | %{
$dest = #($_.DirectoryName, '\test') -join ''
[void](ni -type dir $dest -force)
mv $_ -dest (#($dest, '\', $_.Name) -join '') -force
}
Did you happen to notice the two different ways of calling foreach? There's the verbose foreach command, and then there's command | %{} where the iterative item is represented by $_.