Powershell Files fetch - powershell

Am looking for some help to create a PowerShell script.
I have a folder where I have lots of files, I need only those file that has below two content inside it:
must have any matching string pattern as same as in file file1 (the content of file 1 is -IND 23042528525 or INDE 573626236 or DSE3523623 it can be more strings like this)
also have date inside the file in between 03152022 and 03312022 in the format mmddyyyy.
file could be old so nothing to do with creation time.
then save the result in csv containing the path of the file which fulfill above to conditions.
Currently am using the below command that only gives me the file which fulfilling the 1 condition.
$table = Get-Content C:\Users\username\Downloads\ISIN.txt
Get-ChildItem `
-Path E:\data\PROD\server\InOut\Backup\*.txt `
-Recurse |
Select-String -Pattern ($table)|
Export-Csv C:\Users\username\Downloads\File_Name.csv -NoTypeInformation

To test if a file contains a certain keyword from a range of keywords, you can use regex for that. If you also want to find at least one valid date in format 'MMddyyyy' in that file, you need to do some extra work.
Try below:
# read the keywords from the file. Ensure special characters are escaped and join them with '|' (regex 'OR')
$keywords = (Get-Content -Path 'C:\Users\username\Downloads\ISIN.txt' | ForEach-Object {[regex]::Escape($_)}) -join '|'
# create a regex to capture the date pattern (8 consecutive digits)
$dateRegex = [regex]'\b(\d{8})\b' # \b means word boundary
# and a datetime variable to test if a found date is valid
$testDate = Get-Date
# set two variables to the start and end date of your range (dates only, times set to 00:00:00)
$rangeStart = (Get-Date).AddDays(1).Date # tomorrow
$rangeEnd = [DateTime]::new($rangeStart.Year, $rangeStart.Month, 1).AddMonths(1).AddDays(-1) # end of the month
# find all .txt files and loop through. Capture the output in variable $result
$result = Get-ChildItem -Path 'E:\data\PROD\server\InOut\Backup'-Filter '*.txt'-File -Recurse |
ForEach-Object {
$content = Get-Content -Path $_.FullName -Raw
# first check if any of the keywords can be found
if ($content -match $keywords) {
# now check if a valid date pattern 'MMddyyyy' can be found as well
$dateFound = $false
$match = $dateRegex.Match($content)
while ($match.Success -and !$dateFound) {
# we found a matching pattern. Test if this is a valid date and if so
# set the $dateFound flag to $true and exit the while loop
if ([datetime]::TryParseExact($match.Groups[1].Value,
'MMddyyyy',[CultureInfo]::InvariantCulture,
[System.Globalization.DateTimeStyles]::None,
[ref]$testDate)) {
# check if the found date is in the set range
# this tests INCLUDING the start and end dates
$dateFound = ($testDate -ge $rangeStart -and $testDate -le $rangeEnd)
}
$match = $match.NextMatch()
}
# finally, if we also successfully found a date pattern, output the file
if ($dateFound) { $_.FullName }
elseif ($content -match '\bUNKNOWN\b') {
# here you output again, because unknown was found instead of a valid date in range
$_.FullName
}
}
}
# result is now either empty or a list of file fullnames
$result | set-content -Path 'C:\Users\username\Downloads\MatchedFiles.txt'

Related

Deleting CSV the entire row if text in a column matches a specific path or a file name

I'm new to Powershell so please try to explain things a little bit too if you can. I'm trying to export the contents of a directory along with some other information in a CSV .
The CSV file contains information about the files however, I just need to match the FileName column (which contains the full path). If it's matched, I need to delete the entire row.
$folder1 = OldFiles
$folder2 = Log Files\January
$file1 = _updatehistory.txt
$file2 = websites.config
In the CSV file, if any of these is matched, the entire row must be deleted. The CSV file contains FileName in this manner:
**FileName**
C:\Installation\New Applications\Root
I've tried doing this:
Import-csv -Path "C:\CSV\Recursion.csv" | Where-Object { $_.FileName -ne $folder2} | Export-csv -Path "C:\CSV\RecursionUpdated.csv" -NoTypeInformation
But it's not working out. I would really appreciate help here.
It looks like you want to match only parts of the full path, so you should use -like or -match operators (or their negated variants) which can do non-exact matching:
$excludes = '*\OldFiles', '*\Log Files\January', '*\_updatehistory.txt', '*\websites.config'
Import-csv -Path "C:\CSV\Recursion.csv" |
Where-Object {
# $matchesExclude Will be $true if at least one exclude pattern matches
# against FileName. Otherwise it will be $null.
$matchesExclude = foreach( $exclude in $excludes ) {
# Output $true if pattern matches, which will be captured in $matchesExclude.
if( $_.FileName -like $exclude ) { $true; break }
}
# This outputs $true if the filename is not excluded, thus Where-Object
# passes the row along the pipeline.
-not $matchesExclude
} | Export-csv -Path "C:\CSV\RecursionUpdated.csv" -NoTypeInformation
This code makes heavily use of PowerShell's implicit output behaviour. E. g. the literal $true in the foreach loop body is implicit output which will be automatically captured in $matchesExclude. If it were not for the assignment $matchesExclude = foreach ..., the value would have been written to the console instead (if not captured somewhere else in the callstack).

Parse out date from filename and sort by date

I have a series of files named as such in a folder:
- myFile201801010703.file
I'm trying to parse out the yyyymmdd portion of each filename in the folder and sort them based on the date into an array.
So if I had the following files:
myFile201801200000.file (01/20/2018)
myFile201800100000.file (01/01/2018)
myFile201801100000.file (01/10/2018)
It would sort them into an array as such:
myFile201800100000.file (01/01/2018)
myFile201801100000.file (01/10/2018)
myFile201801200000.file (01/20/2018)
I have a process that works for file with timestamps included in the name, though have been unable to tweak it for work with only a date:
# RegEx pattern to parse the timestamps
$Pattern = '(\d{4})(\d{2})(\d{2})*\' + ".fileExtension"
$FilesList = New-Object System.Collections.ArrayList
$Temp = New-Object System.Collections.ArrayList
Get-ChildItem $SourceFolder | ForEach {
if ($_.Name -match $Pattern) {
Write-Verbose "Add $($_.Name)" -Verbose
$Date = $Matches[2],$Matches[3],$Matches[1] -join '/'
$Time = $Matches[4..6] -join ':'
[void]$Temp.Add(
(New-Object PSObject -Property #{
Date = [datetime]"$($Date) $($Time)" #If I comment out $($Time)it doesn't work.
File = $_
}
))
}
}
} catch {
Write-Host "`n*** $Error ***`n"
}
# Sort the files by the parsed timestamp and add to $FilesList
$FilesList.AddRange(#($Temp | Sort Date | Select -Expand File))
# Clear out the temp collection
$Temp.Clear()
The two lines in particular that I think might be culprit are:
$Time = $Matches[4..6] -join ':' Since I'm not parsing any time
Date = [datetime]"$($Date) $($Time)" Again, no time is parsed. Can't change the type to date either it seems?
With this format:
myFileYYYYMMddHHmm.file
the individual parts of the date and time is already arranged from largest (the year) to smallest (the minute) - this makes the string sortable!
Only thing we need to do is grab the last 12 digits of the file name before the extension:
$SortedArray = Get-ChildItem *.file |Sort-Object {$_.BaseName -replace '^.*(\d{12})$','$1'}
The regex pattern used:
^.*(\d{12})$
Can be broken down as follows:
^ # start of string
.* # any character, 0 or more times
( # capture group
\d{12} # any digit, 12 times
) # end of capture group
$ # end of string
The regex engine will expand $1 in the substitution string to "capture group #1", ie. the 12 digits we picked up at the end.

PowerShell read text file line by line and find missing file in folders

I am a novice looking for some assistance. I have a text file containing two columns of data. One column is the Vendor and one is the Invoice.
I need to scan that text file, line by line, and see if there is a match on Vendor and Invoice in a path. In the path, $Location, the first wildcard is the Vendor number and the second wildcard is the Invoice
I want the non-matches output to a text file.
$Location = "I:\\Vendors\*\Invoices\*"
$txt = "C:\\Users\sbagford.RECOEQUIP\Desktop\AP.txt"
$Output ="I:\\Vendors\Missing\Missing.txt"
foreach ($line in Get-Content $txt) {
if (-not($line -match $location)){$line}
}
set-content $Output -value $Line
Sample Data from txt or csv file.
kvendnum wapinvoice
000953 90269211
000953 90238674
001072 11012016
002317 448668
002419 06123711
002419 06137343
002419 06134382
002419 759208
002419 753087
002419 753069
002419 762614
003138 N6009348
003138 N6009552
003138 N6009569
003138 N6009612
003182 770016
003182 768995
003182 06133429
In above data the only match is on the second line: 000953 90238674
and the 6th line: 002419 06137343
Untested, but here's how I'd approach it:
$Location = "I:\\Vendors\\.+\\Invoices\\.+"
$txt = "C:\\Users\sbagford.RECOEQUIP\Desktop\AP.txt"
$Output ="I:\\Vendors\Missing\Missing.txt"
select-string -path $txt -pattern $Location -notMatch |
set-content $Output
There's no need to pick through the file line-by-line; PowerShell can do this for you using select-string. The -notMatch parameter simply inverts the search and sends through any lines that don't match the pattern.
select-string sends out a stream of matchinfo objects that contain the lines that met the search conditions. These objects actually contain far more information that just the matching line, but fortunately PowerShell is smart enough to know how to send the relevant item through to set-content.
Regular expressions can be tricky to get right, but are worth getting your head around if you're going to do tasks like this.
EDIT
$Location = "I:\Vendors\{0}\Invoices\{1}.pdf"
$txt = "C:\\Users\sbagford.RECOEQUIP\Desktop\AP.txt"
$Output = "I:\Vendors\Missing\Missing.txt"
get-content -path $txt |
% {
# extract fields from the line
$lineItems = $_ -split " "
# construct path based on fields from the line
$testPath = $Location -f $lineItems[0], $lineItems[1]
# for debugging purposes
write-host ( "Line:'{0}' Path:'{1}'" -f $_, $testPath )
# test for existence of the path; ignore errors
if ( -not ( get-item -path $testPath -ErrorAction SilentlyContinue ) ) {
# path does not exist, so write the line to pipeline
write-output $_
}
} |
Set-Content -Path $Output
I guess we will have to pick through the file line-by-line after all. If there is a more idiomatic way to do this, it eludes me.
Code above assumes a consistent format in the input file, and uses -split to break the line into an array.
EDIT - version 3
$Location = "I:\Vendors\{0}\Invoices\{1}.pdf"
$txt = "C:\\Users\sbagford.RECOEQUIP\Desktop\AP.txt"
$Output = "I:\Vendors\Missing\Missing.txt"
get-content -path $txt |
select-string "(\S+)\s+(\S+)" |
%{
# pull vendor and invoice numbers from matchinfo
$vendor = $_.matches[0].groups[1]
$invoice = $_.matches[0].groups[2]
# construct path
$testPath = $Location -f $vendor, $invoice
# for debugging purposes
write-host ( "Line:'{0}' Path:'{1}'" -f $_.line, $testPath )
# test for existence of the path; ignore errors
if ( -not ( get-item -path $testPath -ErrorAction SilentlyContinue ) ) {
# path does not exist, so write the line to pipeline
write-output $_
}
} |
Set-Content -Path $Output
It seemed that the -split " " behaved differently in a running script to how it behaves on the command line. Weird. Anyway, this version uses a regular expression to parse the input line. I tested it against the example data in the original post and it seemed to work.
The regex is broken down as follows
( Start the first matching group
\S+ Greedily match one or more non-white-space characters
) End the first matching group
\s+ Greedily match one or more white-space characters
( Start the second matching group
\S+ Greedily match one or more non-white-space characters
) End the second matching groups

Read numbers from multiple files and sum

I have a logfile C:\temp\data.log
It contains the following data:
totalSize = 222,6GB
totalSize = 4,2GB
totalSize = 56,2GB
My goal is to extract the numbers from the file and sum them up including the number after the comma. So far it works if I don't regex the number included with value after comma, and only use the number in front of the comma. The other problem I have is if the file only contains one row like below example, if it only contains one line it splits up the number 222 into three file containing the number 2 in three files. If the above logfile contains 2 lines or more it works and sums up as it should, as long I don't use value with comma.
totalSize = 222,6GB
Here is a bit of the code for the regex to add to end of existing variable $regex included with comma is:
[,](\d{1,})
I haven't included the above regex, as it does not sum up properly then.
The whole script is below:
#Create path variable to store contents grabbed from $log_file
$extracted_strings = "C:\temp\amount.txt"
#Create path variable to read from original file
$log_file = "C:\temp\data.log"
#Read data from file $log_file
Get-Content -Path $log_file | Select-String "(totalSize = )" | out-file $extracted_strings
#Create path variable to write only numbers to file $output_numbers
$output_numbers = "C:\temp\amountresult.log"
#Create path variable to write to file jobblog1
$joblog1_file = "C:\temp\joblog1.txt"
#Create path variable to write to file jobblog2
$joblog2_file = "C:\temp\joblog2.txt"
#Create path variable to write to file jobblog3
$joblog3_file = "C:\temp\joblog3.txt"
#Create path variable to write to file jobblog4
$joblog4_file = "C:\temp\joblog4.txt"
#Create path variable to write to file jobblog5
$joblog5_file = "C:\temp\joblog5.txt"
#Create pattern variable to read with select string
$regex = "[= ](\d{1,})"
select-string -Path $extracted_strings -Pattern $regex -AllMatches | % { $_.Matches } | % { $_.Value } > $output_numbers
(Get-Content -Path $output_numbers)[0..0] -replace '\s' > $joblog1_file
(Get-Content -Path $output_numbers)[1..1] -replace '\s' > $joblog2_file
(Get-Content -Path $output_numbers)[2..2] -replace '\s' > $joblog3_file
(Get-Content -Path $output_numbers)[3..3] -replace '\s' > $joblog4_file
(Get-Content -Path $output_numbers)[4..4] -replace '\s' > $joblog5_file
$jobdata0 = (Get-Content -Path $joblog1_file)
$jobdata1 = (Get-Content -Path $joblog2_file)
$jobdata2 = (Get-Content -Path $joblog3_file)
$jobdata3 = (Get-Content -Path $joblog4_file)
$jobdata4 = (Get-Content -Path $joblog5_file)
$result = $jobdata0 + $jobdata1 + $jobdata2 + $jobdata3 + $jobdata4
$result
So my questions is:
How can I get this to work if the file C:\temp\data.log only contains one string without dividing that single number into multiple files. It should also work if it contains multiple strings, as it is now it works with multiple strings.
And how can I include the comma values in the calculation?
The result I get if I run this script should be 282, maybe its even possible to shorten the script?
Where $log_file has contents like the example above.
Get-Content $log_file | Where-Object{$_ -match "\d+(,\d+)?"} |
ForEach-Object{[double]($matches[0] -replace ",",".")} |
Measure-Object -Sum |
Select-Object -ExpandProperty sum
Match all of the lines that have numerical values with optional commas. I am assuming they could be optional as I do not know how whole numbers appear. Replace the comma with a period and cast as a double. Using measure object we sum up all the values and expand the result.
Not the only way to do it but it is simple enough to understand what is going on.
You can always wrap the above up in a loop so that you can use it for multiple files. Get-ChildItem "C:temp\" -Filter "job*" | ForEach-Object... etc.
Matt's helpful answer shows a concise and effective solution.
As for what you tried:
As for why a line with a single token such as 222,6 can result in multiple outputs in this command:
select-string -Path $extracted_strings -Pattern $regex -AllMatches |
% { $_.Matches } | % { $_.Value } > $output_numbers
Your regex, [= ](\d{1,}), does not explain the symptom, but just \d{1,} would, because that would capture 222 and 6 separately, due to -AllMatches.
[= ](\d{1,}) probably doesn't do what you want, because [= ] matches a single character that can be either a = or a space; with your sample input, this would only ever match the space before the numbers.
To match characters in sequence, simply place them next to each other: = (\d{1,})
Also note that even though you're enclosing \d{1,} in (...) to create a capture group, your later code doesn't actually use what that capture group matched; use (...) only if you need it for precedence (in which case you can even opt out of subexpression capturing with (?:...)) or if you do have a need to access what the subexpression matched.
That said, you could actually utilize a capture group here (an alternative would be to use a look-behind assertion), which allows you to both match the leading =<space> for robustness and extract only the numeric token of interest (saving you the need to trim whitespace later).
If we simplify \d{1,} to \d+ and append ,\d+ to also match the number after the comma, we get:
= (\d+,\d+)
The [System.Text.RegularExpressions.Match] instances returned by Select-String then allow us to access what the capture group captured, via the .Groups property (the following simplified example also works with multiple input lines):
> 'totalSize = 222,6GB' | Select-String '= (\d+,\d+)' | % { $_.Matches.Groups[1].Value }
222,6
On a side note: your code contains a lot of repetition that could be eliminated with arrays and pipelines; for instance:
$joblog1_file = "C:\temp\joblog1.txt"
$joblog2_file = "C:\temp\joblog2.txt"
$joblog3_file = "C:\temp\joblog3.txt"
$joblog4_file = "C:\temp\joblog4.txt"
$joblog5_file = "C:\temp\joblog5.txt"
could be replaced with (create an array of filenames, using a pipeline):
$joblog_files = 1..5 | % { "C:\temp\joblog$_.txt" }
and
$jobdata0 = (Get-Content -Path $joblog1_file)
$jobdata1 = (Get-Content -Path $joblog2_file)
$jobdata2 = (Get-Content -Path $joblog3_file)
$jobdata3 = (Get-Content -Path $joblog4_file)
$jobdata4 = (Get-Content -Path $joblog5_file)
$result = $jobdata0 + $jobdata1 + $jobdata2 + $jobdata3 + $jobdata4
could then be replaced with (pass the array of filenames to Get-Content):
$result = Get-Content $joblog_files

Find and replace strings in files in a given date range by filename

A nice tough one for you all. I'm trying to find and replace a given string in a bunch of files. The files have a date stamp in the file name i.e. YYYY_MM_DD_file.txt
I wish to search and replace within a date range for these files and then replace a string I define, I cannot use date modified as the date range, I must rely on the stamp in the filename.
So far I set my date range in WPF text fields:
$Filename = $Filenamebox.text
$startdate = [datetime] $startdatetext.text
$enddate = [datetime] $enddatetext.Text
$NewFilenamereal = $Newfilename.Text
$array =
do {
$startdate.ToString('yyyy_MM_dd*')
$startdate = $startdate.AddDays(1)
}
until ($startdate -gt [datetime] $enddate)
$files1 = $array | foreach-object {"C:\Users\michael.lawton\Desktop\KGB\Test folder\$_"}
write-host $files1
I then get child items using the $files1 array I have created as a search mask for the files in the date range and find all matches. Store this in a variable and replace the string $filename with the new string $Newfilenamereal.
$Matches1 = get-childitem $files1 | select-string $Filename | foreach-object {$_ -replace $Filename,$Newfilenamereal} | out-string
write-host $Matches1
However I cannot work out how to overwrite what has been found and replaced in the $Matches1 variable to the original files. I have tried set-content, however this will simply either erase everything I have in the date stamped files or cannot understand the $files1 array as a file path.
So my question to you lovely people is how do I write what I have replaced in the environment to the actual files?
Just retrieve the file content using the Get-Content cmdlet and replace the string. Finally write it back using the Set-Content cmdlet:
Get-ChildItem $files1 | ForEach-Object {
($_ | Get-Content -Raw) -replace $Filename,$Newfilenamereal |
Set-Content -Path $_.FullName -Encoding UTF8
}