I have a Powershell script that scans log files and replaces text when a match is found. The list is currently 500 lines, and I plan to double/triple this. the log files can range from 400KB to 800MB in size.
Currently, when using the below, a 42MB file takes 29mins, and I'm looking for help if anyone can see any way to make this faster?
I tried changing ForEach-Object with ForEach-ObjectFast but it's causing the script to take sufficiently longer. also tried changing the first ForEach-Object to a forloop but still took ~29 mins.
$lookupTable= #{
'aaa:bbb:123'='WORDA:WORDB:NUMBER1'
'bbb:ccc:456'='WORDB:WORDBC:NUMBER456'
}
Get-Content -Path $inputfile | ForEach-Object {
$line=$_
$lookupTable.GetEnumerator() | ForEach-Object {
if ($line-match$_.Key)
{
$line=$line-replace$_.Key,$_.Value
}
}
$line
}|Set-Content -Path $outputfile
Since you say your input file could be 800MB in size, reading and updating the entire content in memory could potentially not fit.
The way to go then is to use a fast line-by-line method and the fastest I know of is switch
# hardcoded here for demo purposes.
# In real life you get/construct these from the Get-ChildItem
# cmdlet you use to iterate the log files in the root folder..
$inputfile = 'D:\Test\test.txt'
$outputfile = 'D:\Test\test_new.txt' # absolute full file path because we use .Net here
# because we are going to Append to the output file, make sure it doesn't exist yet
if (Test-Path -Path $outputfile -PathType Leaf) { Remove-Item -Path $outputfile -Force }
$lookupTable= #{
'aaa:bbb:123'='WORDA:WORDB:NUMBER1'
}
# create a regex string from the Keys of your lookup table,
# merging the strings with a pipe symbol (the regex 'OR').
# your Keys could contain characters that have special meaning in regex, so we need to escape those
$regexLookup = '({0})' -f (($lookupTable.Keys | ForEach-Object { [regex]::Escape($_) }) -join '|')
# create a StreamWriter object to write the lines to the new output file
# Note: use an ABSOLUTE full file path for this
$streamWriter = [System.IO.StreamWriter]::new($outputfile, $true) # $true for Append
switch -Regex -File $inputfile {
$regexLookup {
# do the replacement using the value in the lookup table.
# because in one line there may be multiple matches to replace
# get a System.Text.RegularExpressions.Match object to loop through all matches
$line = $_
$match = [regex]::Match($line, $regexLookup)
while ($match.Success) {
# because we escaped the keys, to find the correct entry we now need to unescape
$line = $line -replace $match.Value, $lookupTable[[regex]::Unescape($match.Value)]
$match = $match.NextMatch()
}
$streamWriter.WriteLine($line)
}
default { $streamWriter.WriteLine($_) } # write unchanged
}
# dispose of the StreamWriter object
$streamWriter.Dispose()
I'm new to Powershell so please try to explain things a little bit too if you can. I'm trying to export the contents of a directory along with some other information in a CSV .
The CSV file contains information about the files however, I just need to match the FileName column (which contains the full path). If it's matched, I need to delete the entire row.
$folder1 = OldFiles
$folder2 = Log Files\January
$file1 = _updatehistory.txt
$file2 = websites.config
In the CSV file, if any of these is matched, the entire row must be deleted. The CSV file contains FileName in this manner:
**FileName**
C:\Installation\New Applications\Root
I've tried doing this:
Import-csv -Path "C:\CSV\Recursion.csv" | Where-Object { $_.FileName -ne $folder2} | Export-csv -Path "C:\CSV\RecursionUpdated.csv" -NoTypeInformation
But it's not working out. I would really appreciate help here.
It looks like you want to match only parts of the full path, so you should use -like or -match operators (or their negated variants) which can do non-exact matching:
$excludes = '*\OldFiles', '*\Log Files\January', '*\_updatehistory.txt', '*\websites.config'
Import-csv -Path "C:\CSV\Recursion.csv" |
Where-Object {
# $matchesExclude Will be $true if at least one exclude pattern matches
# against FileName. Otherwise it will be $null.
$matchesExclude = foreach( $exclude in $excludes ) {
# Output $true if pattern matches, which will be captured in $matchesExclude.
if( $_.FileName -like $exclude ) { $true; break }
}
# This outputs $true if the filename is not excluded, thus Where-Object
# passes the row along the pipeline.
-not $matchesExclude
} | Export-csv -Path "C:\CSV\RecursionUpdated.csv" -NoTypeInformation
This code makes heavily use of PowerShell's implicit output behaviour. E. g. the literal $true in the foreach loop body is implicit output which will be automatically captured in $matchesExclude. If it were not for the assignment $matchesExclude = foreach ..., the value would have been written to the console instead (if not captured somewhere else in the callstack).
I have written this script to find "Procedure division" and replace it with a null value. It seems to work, so that's a good this.
What I need to add is the actual file names that were changed. I have 12K files and only around 800 or so are supposedly needing the change. I need to know what ones were actually changed.
Is there a way to add the path and file name to be displayed in the below script?
$old = Read-Host 'PROCEDURE DIVISION.'
$new = Read-Host ''
Get-Children D:\temp *.cbl -recurse | ForEach {
(Get-Content $_ | ForEach {$_ -replace "$old", "$new"}) | Set-Content $_
}
Suppose I have two csv files. One is
id_number,location_code,category,animal,quantity
12212,3,4,cat,2
29889,7,6,dog,2
98900,
33221,1,8,squirrel,1
the second one is:
98900,2,1,gerbil,1
The second file may have a newline or something at the end (maybe or maybe not, I haven't checked), but only the one line of content. There may be three or four or more different varieties of the "second" file, but each one will have a first element (98900 in this example) that corresponds to an incomplete line in the first file similar to what is in this example.
Is there a way using powershell to automatically merge the line in the second (plus any additional similar) csv file into the matching line(s) of the first file, so that the resulting file is:
12212,3,4,cat,2
29889,7,6,dog,2
98900,2,1,gerbil,1
33221,1,8,squirrel,1
main.csv
id_number,location_code,category,animal,quantity
12212,3,4,cat,2
29889,7,6,dog,2
98900,
33221,1,8,squirrel,1
correction_001.csv
98900,2,1,gerbil,1
merge code used at the commandline, or in the .ps1 file of your choice
$myHeader = #('id_number','location_code','category','animal','quantity')
#Stage all the correction files: last correction in the most recent file wins
$ToFix = #{}
filter Plumbing_Import-Csv($Header){import-csv -LiteralPath $_ -Header $Header}
ls correction*.csv | sort -Property LastWriteTime | Plumbing_Import-Csv $myHeader | %{$ToFix[$_.id_number]=$_}
function myObjPipe($Header){
begin{
function TextTo-CsvField([String]$text){
#text fields which contain comma, double quotes, or new-line are a special case for CSV fields and need to be accounted for
if($text -match '"|,|\n'){return '"'+($text -replace '"','""')+'"'}
return $text
}
function myObjTo-CsvRecord($obj){
return ''+
$obj.id_number +','+
$obj.location_code +','+
$obj.category +','+
(TextTo-CsvField $obj.animal)+','+
$obj.quantity
}
$Header -join ','
}
process{
if($ToFix.Contains($_.id_number)){
$out = $ToFix[$_.id_number]
$ToFix.Remove($_.id_number)
}else{$out = $_}
myObjTo-CsvRecord $out
}
end{
#I assume you'd append any leftover fixes that weren't used
foreach($out in $ToFix.Values){
myObjTo-CsvRecord $out
}
}
}
import-csv main.csv | myObjPipe $myHeader | sc combined.csv -encoding ascii
You could also use ConvertTo-Csv, but my preference is to not have all the extra " cruft.
Edit 1: reduced code redundancy, accounted for \n, fixed appends, and used #OwlsSleeping suggestion about the -Header commandlet parameter
also works with these files:
correction_002.csv
98900,2,1,I Win,1
correction_new.csv
98901,2,1,godzilla,1
correction_too.csv
98902,2,1,gamera,1
98903,2,1,mothra,1
Edit 2: convert gc | ConvertTo-Csv over to Import-Csv to fix the front-end \n issues. Now also works with:
correction_003.csv
29889,7,6,"""bad""
monkey",2
This is a simple solution assuming there's always exactly one match, and you don't care about output order. Change the output path to csv1 to overwrite.
I added headers manually in both input files, but you can specify them in Import-Csv instead if you'd rather avoid changing your files.
[array]$MissingLine = Import-Csv -Path "C:\Users\me\Documents\csv2.csv"
[string]$MissingId = $MissingLine[0].id_number
[array]$BigCsv = Import-Csv -Path "C:\Users\me\Documents\csv1.csv" |
Where-Object {$_.id_number -ne $MissingId}
($BigCsv + $MissingLine) |
Export-Csv -Path "C:\Users\me\Documents\Combined.csv"
I'm looking for a way to export all lines from within a text file where part of the line matches a certain string. The string is actually the first 4 bytes of the file and I'd like to keep the command to only checking those bytes; not the entire row. I want to write the entire row. How would I go about this?
I am using Windows only and don't have the option to use many other tools that might do this.
Thanks in advance for any help.
Do you want to perform a simple "grep"? Then try this
select-string .\test.txt -pattern "\Athat" | foreach {$_.Line}
or this (very similar regex), also writes to an outfile
select-string .\test.txt -pattern "^that" | foreach {$_.Line} | out-file -filepath out.txt
This assumes that you want to search for a 4-byte string "that" at the beginning of the string , or beginning of the line, respectively.
Something like the following Powershell function should work for you:
function Get-Lines {
[cmdletbinding()]
param(
[string]$filename,
[string]$prefix
)
if( Test-Path -Path $filename -PathType Leaf -ErrorAction SilentlyContinue ) {
# filename exists, and is a file
$lines = Get-Content $filename
foreach ( $line in $lines ) {
if ( $line -like "$prefix*" ) {
$line
}
}
}
}
To use it, assuming you save it as get-lines.ps1, you would load the function into memory with:
. .\get-lines.ps1
and then to use it, you could search for all lines starting with "DATA" with something like:
get-lines -filename C:\Files\Datafile\testfile.dat -prefix "DATA"
If you need to save it to another file for viewing later, you could do something like:
get-lines -filename C:\Files\Datafile\testfile.dat -prefix "DATA" | out-file -FilePath results.txt
Or, if I were more awake, you could ignore the script above, use a simpler solution such as the following one-liner:
get-content -path C:\Files\Datafile\testfile.dat | select-string -Pattern "^DATA"
Which just uses the ^ regex character to make sure it's only looking for "DATA" at the beginning of each line.
To get all the lines from c:\somedir\somefile.txt that begin with 'abcd' :
(get-content c:\somedir\somefile.txt) -like 'abcd*'
provided c:\somedir\somefile.txt is not an unusually large (hundreds of MB) file. For that situation:
get-content c:\somedir\somefile.txt -readcount 1000 |
foreach {$_ -like 'abcd*'}