PowerShell read text file line by line and find missing file in folders - powershell

I am a novice looking for some assistance. I have a text file containing two columns of data. One column is the Vendor and one is the Invoice.
I need to scan that text file, line by line, and see if there is a match on Vendor and Invoice in a path. In the path, $Location, the first wildcard is the Vendor number and the second wildcard is the Invoice
I want the non-matches output to a text file.
$Location = "I:\\Vendors\*\Invoices\*"
$txt = "C:\\Users\sbagford.RECOEQUIP\Desktop\AP.txt"
$Output ="I:\\Vendors\Missing\Missing.txt"
foreach ($line in Get-Content $txt) {
if (-not($line -match $location)){$line}
}
set-content $Output -value $Line
Sample Data from txt or csv file.
kvendnum wapinvoice
000953 90269211
000953 90238674
001072 11012016
002317 448668
002419 06123711
002419 06137343
002419 06134382
002419 759208
002419 753087
002419 753069
002419 762614
003138 N6009348
003138 N6009552
003138 N6009569
003138 N6009612
003182 770016
003182 768995
003182 06133429
In above data the only match is on the second line: 000953 90238674
and the 6th line: 002419 06137343

Untested, but here's how I'd approach it:
$Location = "I:\\Vendors\\.+\\Invoices\\.+"
$txt = "C:\\Users\sbagford.RECOEQUIP\Desktop\AP.txt"
$Output ="I:\\Vendors\Missing\Missing.txt"
select-string -path $txt -pattern $Location -notMatch |
set-content $Output
There's no need to pick through the file line-by-line; PowerShell can do this for you using select-string. The -notMatch parameter simply inverts the search and sends through any lines that don't match the pattern.
select-string sends out a stream of matchinfo objects that contain the lines that met the search conditions. These objects actually contain far more information that just the matching line, but fortunately PowerShell is smart enough to know how to send the relevant item through to set-content.
Regular expressions can be tricky to get right, but are worth getting your head around if you're going to do tasks like this.
EDIT
$Location = "I:\Vendors\{0}\Invoices\{1}.pdf"
$txt = "C:\\Users\sbagford.RECOEQUIP\Desktop\AP.txt"
$Output = "I:\Vendors\Missing\Missing.txt"
get-content -path $txt |
% {
# extract fields from the line
$lineItems = $_ -split " "
# construct path based on fields from the line
$testPath = $Location -f $lineItems[0], $lineItems[1]
# for debugging purposes
write-host ( "Line:'{0}' Path:'{1}'" -f $_, $testPath )
# test for existence of the path; ignore errors
if ( -not ( get-item -path $testPath -ErrorAction SilentlyContinue ) ) {
# path does not exist, so write the line to pipeline
write-output $_
}
} |
Set-Content -Path $Output
I guess we will have to pick through the file line-by-line after all. If there is a more idiomatic way to do this, it eludes me.
Code above assumes a consistent format in the input file, and uses -split to break the line into an array.
EDIT - version 3
$Location = "I:\Vendors\{0}\Invoices\{1}.pdf"
$txt = "C:\\Users\sbagford.RECOEQUIP\Desktop\AP.txt"
$Output = "I:\Vendors\Missing\Missing.txt"
get-content -path $txt |
select-string "(\S+)\s+(\S+)" |
%{
# pull vendor and invoice numbers from matchinfo
$vendor = $_.matches[0].groups[1]
$invoice = $_.matches[0].groups[2]
# construct path
$testPath = $Location -f $vendor, $invoice
# for debugging purposes
write-host ( "Line:'{0}' Path:'{1}'" -f $_.line, $testPath )
# test for existence of the path; ignore errors
if ( -not ( get-item -path $testPath -ErrorAction SilentlyContinue ) ) {
# path does not exist, so write the line to pipeline
write-output $_
}
} |
Set-Content -Path $Output
It seemed that the -split " " behaved differently in a running script to how it behaves on the command line. Weird. Anyway, this version uses a regular expression to parse the input line. I tested it against the example data in the original post and it seemed to work.
The regex is broken down as follows
( Start the first matching group
\S+ Greedily match one or more non-white-space characters
) End the first matching group
\s+ Greedily match one or more white-space characters
( Start the second matching group
\S+ Greedily match one or more non-white-space characters
) End the second matching groups

Related

Scanning log file using ForEach-Object and replacing text is taking a very long time

I have a Powershell script that scans log files and replaces text when a match is found. The list is currently 500 lines, and I plan to double/triple this. the log files can range from 400KB to 800MB in size. 
Currently, when using the below, a 42MB file takes 29mins, and I'm looking for help if anyone can see any way to make this faster?
I tried changing ForEach-Object with ForEach-ObjectFast but it's causing the script to take sufficiently longer. also tried changing the first ForEach-Object to a forloop but still took ~29 mins. 
$lookupTable= #{
'aaa:bbb:123'='WORDA:WORDB:NUMBER1'
'bbb:ccc:456'='WORDB:WORDBC:NUMBER456'
}
Get-Content -Path $inputfile | ForEach-Object {
$line=$_
$lookupTable.GetEnumerator() | ForEach-Object {
if ($line-match$_.Key)
{
$line=$line-replace$_.Key,$_.Value
}
}
$line
}|Set-Content -Path $outputfile
Since you say your input file could be 800MB in size, reading and updating the entire content in memory could potentially not fit.
The way to go then is to use a fast line-by-line method and the fastest I know of is switch
# hardcoded here for demo purposes.
# In real life you get/construct these from the Get-ChildItem
# cmdlet you use to iterate the log files in the root folder..
$inputfile = 'D:\Test\test.txt'
$outputfile = 'D:\Test\test_new.txt' # absolute full file path because we use .Net here
# because we are going to Append to the output file, make sure it doesn't exist yet
if (Test-Path -Path $outputfile -PathType Leaf) { Remove-Item -Path $outputfile -Force }
$lookupTable= #{
'aaa:bbb:123'='WORDA:WORDB:NUMBER1'
}
# create a regex string from the Keys of your lookup table,
# merging the strings with a pipe symbol (the regex 'OR').
# your Keys could contain characters that have special meaning in regex, so we need to escape those
$regexLookup = '({0})' -f (($lookupTable.Keys | ForEach-Object { [regex]::Escape($_) }) -join '|')
# create a StreamWriter object to write the lines to the new output file
# Note: use an ABSOLUTE full file path for this
$streamWriter = [System.IO.StreamWriter]::new($outputfile, $true) # $true for Append
switch -Regex -File $inputfile {
$regexLookup {
# do the replacement using the value in the lookup table.
# because in one line there may be multiple matches to replace
# get a System.Text.RegularExpressions.Match object to loop through all matches
$line = $_
$match = [regex]::Match($line, $regexLookup)
while ($match.Success) {
# because we escaped the keys, to find the correct entry we now need to unescape
$line = $line -replace $match.Value, $lookupTable[[regex]::Unescape($match.Value)]
$match = $match.NextMatch()
}
$streamWriter.WriteLine($line)
}
default { $streamWriter.WriteLine($_) } # write unchanged
}
# dispose of the StreamWriter object
$streamWriter.Dispose()

Powershell Files fetch

Am looking for some help to create a PowerShell script.
I have a folder where I have lots of files, I need only those file that has below two content inside it:
must have any matching string pattern as same as in file file1 (the content of file 1 is -IND 23042528525 or INDE 573626236 or DSE3523623 it can be more strings like this)
also have date inside the file in between 03152022 and 03312022 in the format mmddyyyy.
file could be old so nothing to do with creation time.
then save the result in csv containing the path of the file which fulfill above to conditions.
Currently am using the below command that only gives me the file which fulfilling the 1 condition.
$table = Get-Content C:\Users\username\Downloads\ISIN.txt
Get-ChildItem `
-Path E:\data\PROD\server\InOut\Backup\*.txt `
-Recurse |
Select-String -Pattern ($table)|
Export-Csv C:\Users\username\Downloads\File_Name.csv -NoTypeInformation
To test if a file contains a certain keyword from a range of keywords, you can use regex for that. If you also want to find at least one valid date in format 'MMddyyyy' in that file, you need to do some extra work.
Try below:
# read the keywords from the file. Ensure special characters are escaped and join them with '|' (regex 'OR')
$keywords = (Get-Content -Path 'C:\Users\username\Downloads\ISIN.txt' | ForEach-Object {[regex]::Escape($_)}) -join '|'
# create a regex to capture the date pattern (8 consecutive digits)
$dateRegex = [regex]'\b(\d{8})\b' # \b means word boundary
# and a datetime variable to test if a found date is valid
$testDate = Get-Date
# set two variables to the start and end date of your range (dates only, times set to 00:00:00)
$rangeStart = (Get-Date).AddDays(1).Date # tomorrow
$rangeEnd = [DateTime]::new($rangeStart.Year, $rangeStart.Month, 1).AddMonths(1).AddDays(-1) # end of the month
# find all .txt files and loop through. Capture the output in variable $result
$result = Get-ChildItem -Path 'E:\data\PROD\server\InOut\Backup'-Filter '*.txt'-File -Recurse |
ForEach-Object {
$content = Get-Content -Path $_.FullName -Raw
# first check if any of the keywords can be found
if ($content -match $keywords) {
# now check if a valid date pattern 'MMddyyyy' can be found as well
$dateFound = $false
$match = $dateRegex.Match($content)
while ($match.Success -and !$dateFound) {
# we found a matching pattern. Test if this is a valid date and if so
# set the $dateFound flag to $true and exit the while loop
if ([datetime]::TryParseExact($match.Groups[1].Value,
'MMddyyyy',[CultureInfo]::InvariantCulture,
[System.Globalization.DateTimeStyles]::None,
[ref]$testDate)) {
# check if the found date is in the set range
# this tests INCLUDING the start and end dates
$dateFound = ($testDate -ge $rangeStart -and $testDate -le $rangeEnd)
}
$match = $match.NextMatch()
}
# finally, if we also successfully found a date pattern, output the file
if ($dateFound) { $_.FullName }
elseif ($content -match '\bUNKNOWN\b') {
# here you output again, because unknown was found instead of a valid date in range
$_.FullName
}
}
}
# result is now either empty or a list of file fullnames
$result | set-content -Path 'C:\Users\username\Downloads\MatchedFiles.txt'

Using regex in a key/value lookup table in powershell?

I am creating the below script to search through and replace data in a set of files. The problem I'm running into is I need to ONLY match if it's the beginning of the line, and I'm not sure how/where would I use regex in the below example (e.g. ^A, ^B) when doing the comparison? I tried putting the caret in front of the name values in the table, but that didn't work...
$lookupTable = #{
'A'='1';
'B'='2'
#etc
}
Get-ChildItem 'c:\windows\system32\dns' -Filter *.dns |
Foreach-Object {
$file = $_
Write-Host "$file"
(Get-Content -Path $file -Raw) | ForEach-Object {
$line = $_
$lookupTable.GetEnumerator() | ForEach-Object {
$line = $line -replace $_.Name, $_.Value
}
$line
} | Set-Content -Path $file
}
The -replace operator accepts Regex. Just $line = $line -replace "^$($_.Name)", "$_.Value".
the way that regex works makes getting a proper "start of line" marker into the regex pattern along with the $VarName a tad iffy. so i broke it out into it's own line and used the -f string format operator to build the regex pattern.
then i used the way that -replace works on an array of strings that one usually gets from Get-Content to work on the whole array at each pass.
note that the strings have lower case items where they otta be replaced, and uppercase items where the item should NOT be replaced. [grin]
$LookUpTable = #{
A = 'Wizbang Shadooby'
Z = '666 is the number of the beast'
}
$LineList = #(
'a sdfq A er Z xcv'
'qwertyuiop A'
'z xcvbnm'
'z A xcvbnm'
'qwertyuiop Z'
)
$LookUpTable.GetEnumerator() |
ForEach-Object {
$Target = '^{0}' -f $_.Name
$LineList = $LineList -replace $Target, $_.Value
}
$LineList
output ...
Wizbang Shadooby sdfq A er Z xcv
qwertyuiop A
666 is the number of the beast xcvbnm
666 is the number of the beast A xcvbnm
qwertyuiop Z
# Here is a complete, working script that beginners can read.
# This thread
# Using regex in a key/value lookup table in powershell?
# https://stackoverflow.com/questions/57277282/using-regex-in-a-key-value-lookup-table-in-powershell
# User-modifiable variables.
# substitutions
# We need to specify what we're looking for (keys).
# We need to specify our substitutions (values).
# Example: Looking for A and substituting 1 in its place.
# Add as many pairs as you like.
# Here I use an array of objects instead of a Hashtable so that I can specify upper- and lowercase matches.
# Use the regular expression caret (^) to match the beginning of a line.
$substitutions = #(
[PSCustomObject]#{ Key = '^A'; Value = '1' },
[PSCustomObject]#{ Key = '^B'; Value = '2' },
[PSCustomObject]#{ Key = '^Sit'; Value = '[Replaced Text]' }, # Example for my Latin placeholder text.
[PSCustomObject]#{ Key = 'nihil'; Value = '[replaced text 2]' }, # Lowercase example.
[PSCustomObject]#{ Key = 'Nihil'; Value = '[Replaced Text 3]' } # Omit comma for the last array item.
)
# Folder where we are looking for files.
$inputFolder = 'C:\Users\Michael\PowerShell\Using regex in a key value lookup table in powershell\input'
# Here I've created some sample files using Latin placeholder text from
# https://lipsum.com/
# Folder where we are saving the modified files.
# This can be the same as the input folder.
# I'm creating this so we can test without corrupting the original files.
$outputFolder = 'C:\Users\Michael\PowerShell\Using regex in a key value lookup table in powershell\output'
#$outputFolder = $inputFolder
# We are only interested in files ending with .dns
$filterString = '*.dns'
# Here is an example for text files.
#$filterString = '*.txt'
# For all files.
#$filterString = '*.*'
# More info.
# https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.management/get-childitem?view=powershell-6#parameters
# Search on the page for -Filter
# You won't need to update any variables after this line.
# ===================================================================
# Generate a list of files to look at.
$fileList = Get-ChildItem $inputFolder -Filter $filterString
# Simple example.
# get-content .\apple.dns | % { $_ -replace "sit", "michael" } | set-content "C:\output\apple.dns"
# input file substitutions output
# Set up loops.
# For each file.
#{
# For each key-value pair.
#}
# "For each key-value pair."
# Create a function.
# Pipe in a string.
# Specify a list of substitutions.
# Make the substitutions.
# Output a modified string.
filter find_and_replace ([object[]] $substitutions)
{
# The automatic variable $_ will be a line from the file.
# This comes from the pipeline.
# Copy the input string.
# This avoids modifying a pipeline object.
$myString = $_
# Look at each key-value pair passed to the function.
# In practice, these are the ones we defined at the top of the script.
foreach ($pair in $substitutions)
{
# Modify the strings.
# Update the string after each search.
# case-sensitive -creplace instead of -replace
$myString = $myString -creplace $pair.Key, $pair.Value
}
# Output the final, modified string.
$myString
}
# "For each file."
# main
# Do something with each file.
foreach ($file in $fileList)
{
# Where are we saving the output?
$outputFile = Join-Path -Path $outputFolder -ChildPath $file.Name
# Create a pipeline.
# Pipe strings to our function.
# Let the function modify the strings.
# Save the output to the output folder.
# This mirrors our simple example but with dynamic files and substitutions.
# find_and_replace receives strings from the pipeline and we pass $substitutions into it.
Get-Content $file | find_and_replace $substitutions | Set-Content $outputFile
# The problem with piping files into a pipeline is that
# by the time the pipeline gets to Set-Content,
# we only have modified strings
# and we have no information to create the path for an output file.
# ex [System.IO.FileInfo[]] | [String[]] | [String] | Set-Content ?
#
# Instead, we're in a loop that preserves context.
# And we have the opportunity to create and use the variable $outputFile
# ex foreach ($file in [System.IO.FileInfo[]])
# ex $outputFile = ... $file ...
# ex [String[]] | [String] | Set-Content $outputFile
# Quote
# (Get-Content -Path $file -Raw)
# By omitting -Raw, we get: one string for each line.
# This is instead of getting: one string for the whole file.
# This keeps us from having to use
# the .NET regular expression multiline option (and the subexpression \r?$)
# while matching.
#
# What it is.
# Multiline Mode
# https://learn.microsoft.com/en-us/dotnet/standard/base-types/regular-expression-options#Multiline
#
# How you would get started.
# Miscellaneous Constructs in Regular Expressions
# https://learn.microsoft.com/en-us/dotnet/standard/base-types/miscellaneous-constructs-in-regular-expressions
}

insert blank line before matching pattern in multiple files using powershell

Requirement is to insert a blank line in multiple files before the matching pattern line
Consider a file with below contents
Apple
Tree
orange
[Fruit]
Red
Green
Expected output:
Apple
Tree
orange
[Fruit]
Red
Green
Tried below code. Help me to figure out the mistake in below code
$FileName = Get-ChildItem -Filter *.ini -Recurse
$Pattern = "\[Fruit]\"
[System.Collections.ArrayList]$file = Get-Content $FileName
$insert = #()
for ($i=0; $i -lt $file.count; $i++) {
if ($file[$i] -match $pattern) {
$insert += $i #Record the position of the line before this one
}
}
#Now loop the recorded array positions and insert the new text
$insert | Sort-Object -Descending | ForEach-Object { $file.insert($_," ") }
Set-Content $FileName $file
above code owrks fine for single file but for multiple file, the contents of the file are repeated
Re: how to make this work for multiple files...
$FileName = Get-ChildItem -Filter *.ini -Recurse
If there is only one .ini file then $FileName will be a single file.
The use of the wildcard and -Recurse switch suggests that you are expecting to find multiple files; thus this command will assign that collection of files to the $FileName variable (i.e. it will be an array).
Notice that when you call Get-Content you pass $FileName:
[System.Collections.ArrayList]$file = Get-Content $FileName
This won't work when $FileName is a collection/array of files.
What you need to do is put a loop in place that will perform your "insert a line break" logic foreach (hint hint) of the files in the array. NOW go and look at those PS tutorials again...
Regex character class
Try to take the time to learn regex properly
$Pattern = "\[Fruit\]"

Powershell: Search data in *.txt files to export into *.csv

First of all, this is my first question here. I often come here to browse existing topics, but now I'm hung on my own problem. And I didn't found a helpful resource right now. My biggest concern would be, that it won't work in Powershell... At the moment I try to get a small Powershell tool to save me a lot of time. For those who don't know cw-sysinfo, it is a tool that collects information of any host system (e.g. Hardware-ID, Product Key and stuff like that) and generates *.txt files.
My point is, if you have 20, 30 or 80 server in a project, it is a huge amount of time to browse all files and just look for those lines you need and put them together in a *.csv file.
What I have working is more like the basic of the tool, it browses all *.txt in a specific path and checks for my keywords. And here is the problem that I just can use the words prior to those I really need, seen as follow:
Operating System: Windows XP
Product Type: Professional
Service Pack: Service Pack 3
...
I don't know how I can tell Powershell to search for "Product Type:"-line and pick the following "Professional" instead. Later on with keys or serial numbers it will be the same problem, that is why I just can't browse for "Standard" or "Professional".
I placed my keywords($controls) in an extra file that I can attach the project folders and don't need to edit in Powershell each time. Code looks like this:
Function getStringMatch
{
# Loop through the project directory
Foreach ($file In $files)
{
# Check all keywords
ForEach ($control In $controls)
{
$result = Get-Content $file.FullName | Select-String $control -quiet -casesensitive
If ($result -eq $True)
{
$match = $file.FullName
# Write the filename according to the entry
"Found : $control in: $match" | Out-File $output -Append
}
}
}
}
getStringMatch
I think this is the kind of thing you need, I've changed Select-String to not use the -quiet option, this will return a matches object, one of the properties of this is the line I then split the line on the ':' and trim any spaces. These results are then placed into a new PSObject which in turn is added to an array. The array is then put back on the pipeline at the end.
I also moved the call to get-content to avoid reading each file more than once.
# Create an array for results
$results = #()
# Loop through the project directory
Foreach ($file In $files)
{
# load the content once
$content = Get-Content $file.FullName
# Check all keywords
ForEach ($control In $controls)
{
# find the line containing the control string
$result = $content | Select-String $control -casesensitive
If ($result)
{
# tidy up the results and add to the array
$line = $result.Line -split ":"
$results += New-Object PSObject -Property #{
FileName = $file.FullName
Control = $line[0].Trim()
Value = $line[1].Trim()
}
}
}
}
# return the results
$results
Adding the results to a csv is just a case of piping the results to Export-Csv
$results | Export-Csv -Path "results.csv" -NoTypeInformation
If I understand your question correctly, you want some way to parse each line from your report files and extract values for some "keys". Here are a few lines to give you an idea of how you could proceede. The example is for one file, but can be generalized very easily.
$config = Get-Content ".\config.txt"
# The stuff you are searching for
$keys = #(
"Operating System",
"Product Type",
"Service Pack"
)
foreach ($line in $config)
{
$keys | %{
$regex = "\s*?$($_)\:\s*(?<value>.*?)\s*$"
if ($line -match $regex)
{
$value = $matches.value
Write-Host "Key: $_`t`tValue: $value"
}
}
}