Scanning log file using ForEach-Object and replacing text is taking a very long time - powershell

I have a Powershell script that scans log files and replaces text when a match is found. The list is currently 500 lines, and I plan to double/triple this. the log files can range from 400KB to 800MB in size. 
Currently, when using the below, a 42MB file takes 29mins, and I'm looking for help if anyone can see any way to make this faster?
I tried changing ForEach-Object with ForEach-ObjectFast but it's causing the script to take sufficiently longer. also tried changing the first ForEach-Object to a forloop but still took ~29 mins. 
$lookupTable= #{
'aaa:bbb:123'='WORDA:WORDB:NUMBER1'
'bbb:ccc:456'='WORDB:WORDBC:NUMBER456'
}
Get-Content -Path $inputfile | ForEach-Object {
$line=$_
$lookupTable.GetEnumerator() | ForEach-Object {
if ($line-match$_.Key)
{
$line=$line-replace$_.Key,$_.Value
}
}
$line
}|Set-Content -Path $outputfile

Since you say your input file could be 800MB in size, reading and updating the entire content in memory could potentially not fit.
The way to go then is to use a fast line-by-line method and the fastest I know of is switch
# hardcoded here for demo purposes.
# In real life you get/construct these from the Get-ChildItem
# cmdlet you use to iterate the log files in the root folder..
$inputfile = 'D:\Test\test.txt'
$outputfile = 'D:\Test\test_new.txt' # absolute full file path because we use .Net here
# because we are going to Append to the output file, make sure it doesn't exist yet
if (Test-Path -Path $outputfile -PathType Leaf) { Remove-Item -Path $outputfile -Force }
$lookupTable= #{
'aaa:bbb:123'='WORDA:WORDB:NUMBER1'
}
# create a regex string from the Keys of your lookup table,
# merging the strings with a pipe symbol (the regex 'OR').
# your Keys could contain characters that have special meaning in regex, so we need to escape those
$regexLookup = '({0})' -f (($lookupTable.Keys | ForEach-Object { [regex]::Escape($_) }) -join '|')
# create a StreamWriter object to write the lines to the new output file
# Note: use an ABSOLUTE full file path for this
$streamWriter = [System.IO.StreamWriter]::new($outputfile, $true) # $true for Append
switch -Regex -File $inputfile {
$regexLookup {
# do the replacement using the value in the lookup table.
# because in one line there may be multiple matches to replace
# get a System.Text.RegularExpressions.Match object to loop through all matches
$line = $_
$match = [regex]::Match($line, $regexLookup)
while ($match.Success) {
# because we escaped the keys, to find the correct entry we now need to unescape
$line = $line -replace $match.Value, $lookupTable[[regex]::Unescape($match.Value)]
$match = $match.NextMatch()
}
$streamWriter.WriteLine($line)
}
default { $streamWriter.WriteLine($_) } # write unchanged
}
# dispose of the StreamWriter object
$streamWriter.Dispose()

Related

Deleting CSV the entire row if text in a column matches a specific path or a file name

I'm new to Powershell so please try to explain things a little bit too if you can. I'm trying to export the contents of a directory along with some other information in a CSV .
The CSV file contains information about the files however, I just need to match the FileName column (which contains the full path). If it's matched, I need to delete the entire row.
$folder1 = OldFiles
$folder2 = Log Files\January
$file1 = _updatehistory.txt
$file2 = websites.config
In the CSV file, if any of these is matched, the entire row must be deleted. The CSV file contains FileName in this manner:
**FileName**
C:\Installation\New Applications\Root
I've tried doing this:
Import-csv -Path "C:\CSV\Recursion.csv" | Where-Object { $_.FileName -ne $folder2} | Export-csv -Path "C:\CSV\RecursionUpdated.csv" -NoTypeInformation
But it's not working out. I would really appreciate help here.
It looks like you want to match only parts of the full path, so you should use -like or -match operators (or their negated variants) which can do non-exact matching:
$excludes = '*\OldFiles', '*\Log Files\January', '*\_updatehistory.txt', '*\websites.config'
Import-csv -Path "C:\CSV\Recursion.csv" |
Where-Object {
# $matchesExclude Will be $true if at least one exclude pattern matches
# against FileName. Otherwise it will be $null.
$matchesExclude = foreach( $exclude in $excludes ) {
# Output $true if pattern matches, which will be captured in $matchesExclude.
if( $_.FileName -like $exclude ) { $true; break }
}
# This outputs $true if the filename is not excluded, thus Where-Object
# passes the row along the pipeline.
-not $matchesExclude
} | Export-csv -Path "C:\CSV\RecursionUpdated.csv" -NoTypeInformation
This code makes heavily use of PowerShell's implicit output behaviour. E. g. the literal $true in the foreach loop body is implicit output which will be automatically captured in $matchesExclude. If it were not for the assignment $matchesExclude = foreach ..., the value would have been written to the console instead (if not captured somewhere else in the callstack).

Powershell to Break up CSV by Number of Row

So I am now tasked with getting constant reports that are more than 1 Million lines long.
My last question did not explain all things so I'm tryin got do a better question.
I'm getting a dozen + daily reports that are coming in as CSV files. I don't know what the headers are or anything like that as I get them.
They are huge. I cant open in excel.
I wanted to basically break them up into the same report, just each report maybe 100,000 lines long.
The code I wrote below does not work as I keep getting a
Exception of type 'System.OutOfMemoryException' was thrown.
I am guessing I need a better way to do this.
I just need this file broken down to a more manageable size.
It does not matter how long it takes as I can run it over night.
I found this on the internet, and I tried to manipulate it, but I cant get it to work.
$PSScriptRoot
write-host $PSScriptRoot
$loc = $PSScriptRoot
$location = $loc
# how many rows per CSV?
$rowsMax = 10000;
# Get all CSV under current folder
$allCSVs = Get-ChildItem "$location\Split.csv"
# Read and split all of them
$allCSVs | ForEach-Object {
Write-Host $_.Name;
$content = Import-Csv "$location\Split.csv"
$insertLocation = ($_.Name.Length - 4);
for($i=1; $i -le $content.length ;$i+=$rowsMax){
$newName = $_.Name.Insert($insertLocation, "splitted_"+$i)
$content|select -first $i|select -last $rowsMax | convertto-csv -NoTypeInformation | % { $_ -replace '"', ""} | out-file $location\$newName -fo -en ascii
}
}
The key is not to read large files into memory in full, which is what you're doing by capturing the output from Import-Csv in a variable ($content = Import-Csv "$location\Split.csv").
That said, while using a single pipeline would solve your memory problem, performance will likely be poor, because you're converting from and back to CSV, which incurs a lot of overhead.
Even reading and writing the files as text with Get-Content and Set-Content is slow, however.
Therefore, I suggest a .NET-based approach for processing the files as text, which should substantially speed up processing.
The following code demonstrates this technique:
Get-ChildItem $PSScriptRoot/*.csv | ForEach-Object {
$csvFile = $_.FullName
# Construct a file-path template for the sequentially numbered chunk
# files; e.g., "...\file_split_001.csv"
$csvFileChunkTemplate = $csvFile -replace '(.+)\.(.+)', '$1_split_{0:000}.$2'
# Set how many lines make up a chunk.
$chunkLineCount = 10000
# Read the file lazily and save every chunk of $chunkLineCount
# lines to a new file.
$i = 0; $chunkNdx = 0
foreach ($line in [IO.File]::ReadLines($csvFile)) {
if ($i -eq 0) { ++$i; $header = $line; continue } # Save header line.
if ($i++ % $chunkLineCount -eq 1) { # Create new chunk file.
# Close previous file, if any.
if (++$chunkNdx -gt 1) { $fileWriter.Dispose() }
# Construct the file path for the next chunk, by
# instantiating the template with the next sequence number.
$csvFileChunk = $csvFileChunkTemplate -f $chunkNdx
Write-Verbose "Creating chunk: $csvFileChunk"
# Create the next chunk file and write the header.
$fileWriter = [IO.File]::CreateText($csvFileChunk)
$fileWriter.WriteLine($header)
}
# Write a data row to the current chunk file.
$fileWriter.WriteLine($line)
}
$fileWriter.Dispose() # Close the last file.
}
Note that the above code creates BOM-less UTF-8 files; if your input contains ASCII-range characters only, these files will effectively be ASCII files.
Here's the equivalent single-pipeline solution, which is likely to be substantially slower.
Get-ChildItem $PSScriptRoot/*.csv | ForEach-Object {
$csvFile = $_.FullName
# Construct a file-path template for the sequentially numbered chunk
# files; e.g., ".../file_split_001.csv"
$csvFileChunkTemplate = $csvFile -replace '(.+)\.(.+)', '$1_split_{0:000}.$2'
# Set how many lines make up a chunk.
$chunkLineCount = 10000
$i = 0; $chunkNdx = 0
Get-Content -LiteralPath $csvFile | ForEach-Object {
if ($i -eq 0) { ++$i; $header = $_; return } # Save header line.
if ($i++ % $chunkLineCount -eq 1) { #
# Construct the file path for the next chunk.
$csvFileChunk = $csvFileChunkTemplate -f ++$chunkNdx
Write-Verbose "Creating chunk: $csvFileChunk"
# Create the next chunk file and write the header.
Set-Content -Encoding ASCII -LiteralPath $csvFileChunk -Value $header
}
# Write data row to the current chunk file.
Add-Content -Encoding ASCII -LiteralPath $csvFileChunk -Value $_
}
}
Another option from linux world - split command. To get it on windows just install git bash, then you'll be able to use many linux tools in your CMD/powershell.
Below is the syntax to achieve your goal:
split -l 100000 --numeric-suffixes --suffix-length 3 --additional-suffix=.csv sourceFile.csv outputfile
It's very fast. If you want you can wrap split.exe as a cmdlet

insert blank line before matching pattern in multiple files using powershell

Requirement is to insert a blank line in multiple files before the matching pattern line
Consider a file with below contents
Apple
Tree
orange
[Fruit]
Red
Green
Expected output:
Apple
Tree
orange
[Fruit]
Red
Green
Tried below code. Help me to figure out the mistake in below code
$FileName = Get-ChildItem -Filter *.ini -Recurse
$Pattern = "\[Fruit]\"
[System.Collections.ArrayList]$file = Get-Content $FileName
$insert = #()
for ($i=0; $i -lt $file.count; $i++) {
if ($file[$i] -match $pattern) {
$insert += $i #Record the position of the line before this one
}
}
#Now loop the recorded array positions and insert the new text
$insert | Sort-Object -Descending | ForEach-Object { $file.insert($_," ") }
Set-Content $FileName $file
above code owrks fine for single file but for multiple file, the contents of the file are repeated
Re: how to make this work for multiple files...
$FileName = Get-ChildItem -Filter *.ini -Recurse
If there is only one .ini file then $FileName will be a single file.
The use of the wildcard and -Recurse switch suggests that you are expecting to find multiple files; thus this command will assign that collection of files to the $FileName variable (i.e. it will be an array).
Notice that when you call Get-Content you pass $FileName:
[System.Collections.ArrayList]$file = Get-Content $FileName
This won't work when $FileName is a collection/array of files.
What you need to do is put a loop in place that will perform your "insert a line break" logic foreach (hint hint) of the files in the array. NOW go and look at those PS tutorials again...
Regex character class
Try to take the time to learn regex properly
$Pattern = "\[Fruit\]"

PowerShell read text file line by line and find missing file in folders

I am a novice looking for some assistance. I have a text file containing two columns of data. One column is the Vendor and one is the Invoice.
I need to scan that text file, line by line, and see if there is a match on Vendor and Invoice in a path. In the path, $Location, the first wildcard is the Vendor number and the second wildcard is the Invoice
I want the non-matches output to a text file.
$Location = "I:\\Vendors\*\Invoices\*"
$txt = "C:\\Users\sbagford.RECOEQUIP\Desktop\AP.txt"
$Output ="I:\\Vendors\Missing\Missing.txt"
foreach ($line in Get-Content $txt) {
if (-not($line -match $location)){$line}
}
set-content $Output -value $Line
Sample Data from txt or csv file.
kvendnum wapinvoice
000953 90269211
000953 90238674
001072 11012016
002317 448668
002419 06123711
002419 06137343
002419 06134382
002419 759208
002419 753087
002419 753069
002419 762614
003138 N6009348
003138 N6009552
003138 N6009569
003138 N6009612
003182 770016
003182 768995
003182 06133429
In above data the only match is on the second line: 000953 90238674
and the 6th line: 002419 06137343
Untested, but here's how I'd approach it:
$Location = "I:\\Vendors\\.+\\Invoices\\.+"
$txt = "C:\\Users\sbagford.RECOEQUIP\Desktop\AP.txt"
$Output ="I:\\Vendors\Missing\Missing.txt"
select-string -path $txt -pattern $Location -notMatch |
set-content $Output
There's no need to pick through the file line-by-line; PowerShell can do this for you using select-string. The -notMatch parameter simply inverts the search and sends through any lines that don't match the pattern.
select-string sends out a stream of matchinfo objects that contain the lines that met the search conditions. These objects actually contain far more information that just the matching line, but fortunately PowerShell is smart enough to know how to send the relevant item through to set-content.
Regular expressions can be tricky to get right, but are worth getting your head around if you're going to do tasks like this.
EDIT
$Location = "I:\Vendors\{0}\Invoices\{1}.pdf"
$txt = "C:\\Users\sbagford.RECOEQUIP\Desktop\AP.txt"
$Output = "I:\Vendors\Missing\Missing.txt"
get-content -path $txt |
% {
# extract fields from the line
$lineItems = $_ -split " "
# construct path based on fields from the line
$testPath = $Location -f $lineItems[0], $lineItems[1]
# for debugging purposes
write-host ( "Line:'{0}' Path:'{1}'" -f $_, $testPath )
# test for existence of the path; ignore errors
if ( -not ( get-item -path $testPath -ErrorAction SilentlyContinue ) ) {
# path does not exist, so write the line to pipeline
write-output $_
}
} |
Set-Content -Path $Output
I guess we will have to pick through the file line-by-line after all. If there is a more idiomatic way to do this, it eludes me.
Code above assumes a consistent format in the input file, and uses -split to break the line into an array.
EDIT - version 3
$Location = "I:\Vendors\{0}\Invoices\{1}.pdf"
$txt = "C:\\Users\sbagford.RECOEQUIP\Desktop\AP.txt"
$Output = "I:\Vendors\Missing\Missing.txt"
get-content -path $txt |
select-string "(\S+)\s+(\S+)" |
%{
# pull vendor and invoice numbers from matchinfo
$vendor = $_.matches[0].groups[1]
$invoice = $_.matches[0].groups[2]
# construct path
$testPath = $Location -f $vendor, $invoice
# for debugging purposes
write-host ( "Line:'{0}' Path:'{1}'" -f $_.line, $testPath )
# test for existence of the path; ignore errors
if ( -not ( get-item -path $testPath -ErrorAction SilentlyContinue ) ) {
# path does not exist, so write the line to pipeline
write-output $_
}
} |
Set-Content -Path $Output
It seemed that the -split " " behaved differently in a running script to how it behaves on the command line. Weird. Anyway, this version uses a regular expression to parse the input line. I tested it against the example data in the original post and it seemed to work.
The regex is broken down as follows
( Start the first matching group
\S+ Greedily match one or more non-white-space characters
) End the first matching group
\s+ Greedily match one or more white-space characters
( Start the second matching group
\S+ Greedily match one or more non-white-space characters
) End the second matching groups

Powershell: Search data in *.txt files to export into *.csv

First of all, this is my first question here. I often come here to browse existing topics, but now I'm hung on my own problem. And I didn't found a helpful resource right now. My biggest concern would be, that it won't work in Powershell... At the moment I try to get a small Powershell tool to save me a lot of time. For those who don't know cw-sysinfo, it is a tool that collects information of any host system (e.g. Hardware-ID, Product Key and stuff like that) and generates *.txt files.
My point is, if you have 20, 30 or 80 server in a project, it is a huge amount of time to browse all files and just look for those lines you need and put them together in a *.csv file.
What I have working is more like the basic of the tool, it browses all *.txt in a specific path and checks for my keywords. And here is the problem that I just can use the words prior to those I really need, seen as follow:
Operating System: Windows XP
Product Type: Professional
Service Pack: Service Pack 3
...
I don't know how I can tell Powershell to search for "Product Type:"-line and pick the following "Professional" instead. Later on with keys or serial numbers it will be the same problem, that is why I just can't browse for "Standard" or "Professional".
I placed my keywords($controls) in an extra file that I can attach the project folders and don't need to edit in Powershell each time. Code looks like this:
Function getStringMatch
{
# Loop through the project directory
Foreach ($file In $files)
{
# Check all keywords
ForEach ($control In $controls)
{
$result = Get-Content $file.FullName | Select-String $control -quiet -casesensitive
If ($result -eq $True)
{
$match = $file.FullName
# Write the filename according to the entry
"Found : $control in: $match" | Out-File $output -Append
}
}
}
}
getStringMatch
I think this is the kind of thing you need, I've changed Select-String to not use the -quiet option, this will return a matches object, one of the properties of this is the line I then split the line on the ':' and trim any spaces. These results are then placed into a new PSObject which in turn is added to an array. The array is then put back on the pipeline at the end.
I also moved the call to get-content to avoid reading each file more than once.
# Create an array for results
$results = #()
# Loop through the project directory
Foreach ($file In $files)
{
# load the content once
$content = Get-Content $file.FullName
# Check all keywords
ForEach ($control In $controls)
{
# find the line containing the control string
$result = $content | Select-String $control -casesensitive
If ($result)
{
# tidy up the results and add to the array
$line = $result.Line -split ":"
$results += New-Object PSObject -Property #{
FileName = $file.FullName
Control = $line[0].Trim()
Value = $line[1].Trim()
}
}
}
}
# return the results
$results
Adding the results to a csv is just a case of piping the results to Export-Csv
$results | Export-Csv -Path "results.csv" -NoTypeInformation
If I understand your question correctly, you want some way to parse each line from your report files and extract values for some "keys". Here are a few lines to give you an idea of how you could proceede. The example is for one file, but can be generalized very easily.
$config = Get-Content ".\config.txt"
# The stuff you are searching for
$keys = #(
"Operating System",
"Product Type",
"Service Pack"
)
foreach ($line in $config)
{
$keys | %{
$regex = "\s*?$($_)\:\s*(?<value>.*?)\s*$"
if ($line -match $regex)
{
$value = $matches.value
Write-Host "Key: $_`t`tValue: $value"
}
}
}