Finding and changing a string inside a text file using Powershell

Finding and changing a string inside a text file using Powershell - powershell

I am trying to change the domains of emails inside a text file for example "john#me.com to john#gmail.com". The emails are stored in a array and I am currently using a for loop with the replace method but I cannot get it to work. Here is the code that I have so far.
$folders = #('Folder1','Folder2','Folder3','Folder4','Folder5')
$names = #('John','Mary','Luis','Gary', 'Gil')
$emails = #("John#domain.com", "Mary#domain.com", "Luis#domain.com", "Gary#domain.com", "Gil#domain.com")
$emails2 = #("John#company.com", "Mary#company.com", "Luis#company.com", "Gary#company.com", "Gil#comapny.com")
$content = "C:\Users\john\Desktop\contact.txt"
#create 10 new local users
foreach ($user in $users){ New-LocalUser -Name $user -Description "This is a test account." -NoPassword }
#Create 5 folders on desktop
$folders.foreach({New-Item -Path "C:\Users\John\Desktop\$_" -ItemType directory})
#create 5 folders on in documents
$folders.foreach({New-Item -Path "C:\users\john\Documents\$_" -ItemType directory})
#create contact.tct
New-Item -Path "C:\Users\John\Desktop" -Name "contact.txt"
#add 5 names to file
ForEach($name in $names){$name | Add-Content -Path "C:\Users\John\Desktop\contact.txt"}
#add 5 emails to file
ForEach($email in $emails){$email | Add-Content -Path "C:\Users\John\Desktop\contact.txt"}
#change emails to #comapny.com
for($i = 0; $i -lt 5; $i++){$emails -replace "$emails[$i]", $emails2[$i]}

In your particular example, you want to replace one string with another in each of your array elements. You can do that without looping:
$emails = $emails -replace '#domain\.com$','#company.com'
Since -replace uses regex matching, the . metacharacter must be escaped to be matched literally. In your case it probably does not matter since . matches any character, but for completeness, you should escape it.
Using the .NET Regex class method Escape(), you can programmatically escape metacharacters.
$emails -replace [regex]::Escape('#domain.com'),'#company.com'
With your code, in order to update $emails, you need to interpolate your array strings properly and update your variable on each loop iteration:
for($i = 0; $i -lt 5; $i++) {
$emails = $emails -replace $emails[$i], $emails2[$i]
}
$emails # displays the updates
If $emails contains other regex metacharacters besides just the single ., it could be another reason why you are having matching issues. It would then just be easiest to escape the metacharacters:
for($i = 0; $i -lt 5; $i++) {
$emails = $emails -replace [regex]::Escape($emails[$i]), $emails2[$i]
}
$emails # displays the updates
Explanation:
When double quotes are parsed (if not inside a verbatim string), the parser will do string expansion. When this happens to variables references that include operators, only the variables are expanded and the rest of the quoted expression including the operator characters is treated as a verbatim string. You can see this with a trivial example:
$str = 'my string 1','my string 2'
"$str[0]"
Output:
my string 1 my string 2[0]
To get around this behavior, you either need to not use quotes around the expression or use the sub-expression operator $():
$str[0]
"$($str[0])"
Note that a quoted array reference will convert the array into a string. Each element of that array will be separated based on the $OFS value (single space by default) of your environment.

Related

collect a value per console and multiply powershell

I am trying to collect a value per console, in this case an ip address, and that the suffix of this is self-incrementally and multiplied by line up to 254.
I tried with for, but this create each file text. thats my code.
$ipaddress=$args[0]
New-Item .\direcciones.txt -Force
$filetext= New-Item .\direcciones.txt -Force
for ($i=1, $i -le 254;)
{ Add-Content -Path $filetext -Value $ipaddress.$i }
i expect something like the filetext:
192.168.1.1
192.168.1.2
192.168.1.3
...
192.168.1.254
the autoincrement value that i want is the last octet and the 3 first is the $arg

The key is to enclose $ipaddress.$i in "..."; additionally, you can streamline your code:
1..254 | ForEach-Object { "$ipaddress.$_" } | Set-Content $filetext
Or, more efficiently, but more obscurely:
1..254 -replace '^', "$ipaddress." | Set-Content $filetext
As for what you tried:
(This may just be a posting artifact) for ($i=1, $i -le 254;) creates an infinite loop, because you're missing ; ++$i for incrementing the iterator variable $i.
Even in argument-parsing mode, $ipaddress.$i - in the absence of enclosure in "..." - is interpreted as an expression, meaning that $i is interpreted as the name of a property to access on $ipaddress, which therefore results in $null, so that no data is written to the output file.
Only inside an expandable (double-quoted) string ("...") - i.e. "$ipaddress.$i" in this case - are $ipaddress and $i expanded individually.

Powershell Files fetch

Am looking for some help to create a PowerShell script.
I have a folder where I have lots of files, I need only those file that has below two content inside it:
must have any matching string pattern as same as in file file1 (the content of file 1 is -IND 23042528525 or INDE 573626236 or DSE3523623 it can be more strings like this)
also have date inside the file in between 03152022 and 03312022 in the format mmddyyyy.
file could be old so nothing to do with creation time.
then save the result in csv containing the path of the file which fulfill above to conditions.
Currently am using the below command that only gives me the file which fulfilling the 1 condition.
$table = Get-Content C:\Users\username\Downloads\ISIN.txt
Get-ChildItem `
-Path E:\data\PROD\server\InOut\Backup\*.txt `
-Recurse |
Select-String -Pattern ($table)|
Export-Csv C:\Users\username\Downloads\File_Name.csv -NoTypeInformation

To test if a file contains a certain keyword from a range of keywords, you can use regex for that. If you also want to find at least one valid date in format 'MMddyyyy' in that file, you need to do some extra work.
Try below:
# read the keywords from the file. Ensure special characters are escaped and join them with '|' (regex 'OR')
$keywords = (Get-Content -Path 'C:\Users\username\Downloads\ISIN.txt' | ForEach-Object {[regex]::Escape($_)}) -join '|'
# create a regex to capture the date pattern (8 consecutive digits)
$dateRegex = [regex]'\b(\d{8})\b' # \b means word boundary
# and a datetime variable to test if a found date is valid
$testDate = Get-Date
# set two variables to the start and end date of your range (dates only, times set to 00:00:00)
$rangeStart = (Get-Date).AddDays(1).Date # tomorrow
$rangeEnd = [DateTime]::new($rangeStart.Year, $rangeStart.Month, 1).AddMonths(1).AddDays(-1) # end of the month
# find all .txt files and loop through. Capture the output in variable $result
$result = Get-ChildItem -Path 'E:\data\PROD\server\InOut\Backup'-Filter '*.txt'-File -Recurse |
ForEach-Object {
$content = Get-Content -Path $_.FullName -Raw
# first check if any of the keywords can be found
if ($content -match $keywords) {
# now check if a valid date pattern 'MMddyyyy' can be found as well
$dateFound = $false
$match = $dateRegex.Match($content)
while ($match.Success -and !$dateFound) {
# we found a matching pattern. Test if this is a valid date and if so
# set the $dateFound flag to $true and exit the while loop
if ([datetime]::TryParseExact($match.Groups[1].Value,
'MMddyyyy',[CultureInfo]::InvariantCulture,
[System.Globalization.DateTimeStyles]::None,
[ref]$testDate)) {
# check if the found date is in the set range
# this tests INCLUDING the start and end dates
$dateFound = ($testDate -ge $rangeStart -and $testDate -le $rangeEnd)
}
$match = $match.NextMatch()
}
# finally, if we also successfully found a date pattern, output the file
if ($dateFound) { $_.FullName }
elseif ($content -match '\bUNKNOWN\b') {
# here you output again, because unknown was found instead of a valid date in range
$_.FullName
}
}
}
# result is now either empty or a list of file fullnames
$result | set-content -Path 'C:\Users\username\Downloads\MatchedFiles.txt'

Extract all Capitalized words from document using PowerShell

Using PowerShell extract all Capitalized words from a Document. Everything works until the last line of code as far as I can tell. Something wrong with my RegEx or is my approach all wrong?
#Extract content of Microsoft Word Document to text
$word = New-Object -comobject Word.Application
$word.Visible = $True
$doc = $word.Documents.Open("D:\Deleteme\test.docx")
$sel = $word.Selection
$paras = $doc.Paragraphs
$path = "D:\deleteme\words.txt"
foreach ($para in $paras)
{
$para.Range.Text | Out-File -FilePath $path -Append
}
#Find all capitalized words :( Everything works except this. I want to extract all Capitalized words
$capwords = Get-Content $path | Select-string -pattern "/\b[A-Z]+\b/g"

PowerShell uses strings to store regexes and has no syntax for regex literals such as /.../ - nor for post-positional matching options such as g.
PowerShell is case-insensitive by default and requires opt-in for case-sensitivity (-CaseSensitive in the case of Select-String).
Without that, [A-Z] is effectively the same as [A-Za-z] and therefore matches both upper- and lowercase (English) letters.
The equivalent of the g option is Select-String's -AllMatches switch, which looks for all matches on each input line (by default, it only looks for the first.
What Select-String outputs aren't strings, i.e. not the matching lines directly, but wrapper objects of type [Microsoft.PowerShell.Commands.MatchInfo] with metadata about each match.
Instances of that type have a .Matches property that contains array of [System.Text.RegularExpressions.Match] instances, whose .Value property contains the text of each match (whereas the .Line property contains the matching line in full).
To put it all together:
$capwords = Get-Content -Raw $path |
Select-String -CaseSensitive -AllMatches -Pattern '\b[A-Z]+\b' |
ForEach-Object { $_.Matches.Value }
Note the use of -Raw with Get-Content, which greatly speeds up processing, because the entire file content is read as a single, multi-line string - essentially, Select-String then sees the entire content as a single "line". This optimization is possible, because you're not interested in line-by-line processing and only care about what the regex captured, across all lines.
As an aside:
$_.Matches.Value takes advantage of PowerShell's member-access enumeration, which you can similarly leverage to avoid having to loop over the paragraphs in $paras explicitly:
# Use member-access enumeration on collection $paras to get the .Range
# property values of all collection elements and access their .Text
# property value.
$paras.Range.Text | Out-File -FilePath $path
.NET API alternative:
The [regex]::Matches() .NET method allows for a more concise - and better-performing - alternative:
$capwords = [regex]::Matches((Get-Content -Raw $path), '\b[A-Z]+\b').Value
Note that, in contrast with PowerShell, the .NET regex APIs are case-sensitive by default, so no opt-in is required.
.Value again utilizes member-access enumeration in order to extract the matching text from all returned match-information objects.

I modified your script and was able to get all the upper-case words in my test doc.
$word = New-Object -comobject Word.Application
$word.Visible = $True
$doc = $word.Documents.Open("D:\WordTest\test.docx")
$sel = $word.Selection
$paras = $doc.Paragraphs
$path = "D:\WordTest\words.txt"
foreach ($para in $paras)
{
$para.Range.Text | Out-File -FilePath $path -Append
}
# Get all words in the content
$AllWords = (Get-Content $path)
# Split all words into an array
$WordArray = ($AllWords).split(' ')
# Create array for capitalized words to capture them during ForEach loop
$CapWords = #()
# ForEach loop for each word in the array
foreach($SingleWord in $WordArray){
# Perform a check to see if the word is fully capitalized
$Check = $SingleWord -cmatch '\b[A-Z]+\b'
# If check is true, remove special characters and put it into the $CapWords array
if($Check -eq $True){
$SingleWord = $SingleWord -replace '[\W]', ''
$CapWords += $SingleWord
}
}
I had it come out as an array of capitalized words, but you could always join it back if you wanted it to be a string:
$CapString = $CapWords -join " "

Store Filename as separate variables in PowerShell - based on specific character

I have a vendor file that is stored in a specific format and I would like to use PowerShell to convert the file name into three separate variables which I would then use to pass to a stored procedure that would be executed.
The file format would look like:
AA_BBBBB_YYYYMMDD.xlsx
For instance, the following is an example of a current Excel file:
TX_StampingFee_20210303.xlsx
The 1st characters will generally be a US state abbreviation but not always - so it could be 2 or 3 characters long. The 2nd section is the type of fee the Excel file contains while the last section is the date in a YYYYMMDD format.
What I would like to do is have PowerShell read the file name and separate the name into three separate variables:
$State
$Fee
$Date
What I have so far will list the Excel file in the directory and then I tried to split the name up - but I believe the file extension on the name is causing an error. How can I split this file name into separate variables based off the "_" ( underscore ) character?
Foreach ($file in Get-Childitem "C:\PowerShell\Test\*.xlsx") {
$file.name
}
$filenames = (Get-ChildItem -Path "C:\PowerShell\Test\*.xlsx" -Directory).Name
Write-Host $filenames
$chararray = $filenames.Split("_")
$chararray

You can target the basename which is the filename minus the extension. Then when you split into 3 parts you can directly put those into 3 variables.
Foreach ($file in Get-Childitem "C:\PowerShell\Test\*.xlsx") {
$State,$Fee,$Date = $file.basename.split('_')
Write-Host State: $State Fee: $Fee Date: $Date
}

Use BaseName rather than the Name property to get the filename without the extension.
Then, if you trust the pattern of the filenames, you can index into your array with a range to join your 1+ fee type substrings into a single string:
$filenames = (Get-ChildItem -Path "C:\PowerShell\Test\*.xlsx" -File).BaseName
if ($filenames) {
[System.Collections.ArrayList]$fees = #()
[System.Collections.ArrayList]$states = #()
[System.Collections.ArrayList]$dates = #()
foreach ($name in $filenames) {
$chararray = $name.Split("_")
$arrlen = $chararray.length
if ($arrlen -ge 3) {
$states += $chararray[0]
$fees += $chararray[1..$arrlen-2] -join '_'
$dates += $chararray[$arrlen])
}
}
}

PowerShell read text file line by line and find missing file in folders

I am a novice looking for some assistance. I have a text file containing two columns of data. One column is the Vendor and one is the Invoice.
I need to scan that text file, line by line, and see if there is a match on Vendor and Invoice in a path. In the path, $Location, the first wildcard is the Vendor number and the second wildcard is the Invoice
I want the non-matches output to a text file.
$Location = "I:\\Vendors\*\Invoices\*"
$txt = "C:\\Users\sbagford.RECOEQUIP\Desktop\AP.txt"
$Output ="I:\\Vendors\Missing\Missing.txt"
foreach ($line in Get-Content $txt) {
if (-not($line -match $location)){$line}
}
set-content $Output -value $Line
Sample Data from txt or csv file.
kvendnum wapinvoice
000953 90269211
000953 90238674
001072 11012016
002317 448668
002419 06123711
002419 06137343
002419 06134382
002419 759208
002419 753087
002419 753069
002419 762614
003138 N6009348
003138 N6009552
003138 N6009569
003138 N6009612
003182 770016
003182 768995
003182 06133429
In above data the only match is on the second line: 000953 90238674
and the 6th line: 002419 06137343

Untested, but here's how I'd approach it:
$Location = "I:\\Vendors\\.+\\Invoices\\.+"
$txt = "C:\\Users\sbagford.RECOEQUIP\Desktop\AP.txt"
$Output ="I:\\Vendors\Missing\Missing.txt"
select-string -path $txt -pattern $Location -notMatch |
set-content $Output
There's no need to pick through the file line-by-line; PowerShell can do this for you using select-string. The -notMatch parameter simply inverts the search and sends through any lines that don't match the pattern.
select-string sends out a stream of matchinfo objects that contain the lines that met the search conditions. These objects actually contain far more information that just the matching line, but fortunately PowerShell is smart enough to know how to send the relevant item through to set-content.
Regular expressions can be tricky to get right, but are worth getting your head around if you're going to do tasks like this.
EDIT
$Location = "I:\Vendors\{0}\Invoices\{1}.pdf"
$txt = "C:\\Users\sbagford.RECOEQUIP\Desktop\AP.txt"
$Output = "I:\Vendors\Missing\Missing.txt"
get-content -path $txt |
% {
# extract fields from the line
$lineItems = $_ -split " "
# construct path based on fields from the line
$testPath = $Location -f $lineItems[0], $lineItems[1]
# for debugging purposes
write-host ( "Line:'{0}' Path:'{1}'" -f $_, $testPath )
# test for existence of the path; ignore errors
if ( -not ( get-item -path $testPath -ErrorAction SilentlyContinue ) ) {
# path does not exist, so write the line to pipeline
write-output $_
}
} |
Set-Content -Path $Output
I guess we will have to pick through the file line-by-line after all. If there is a more idiomatic way to do this, it eludes me.
Code above assumes a consistent format in the input file, and uses -split to break the line into an array.
EDIT - version 3
$Location = "I:\Vendors\{0}\Invoices\{1}.pdf"
$txt = "C:\\Users\sbagford.RECOEQUIP\Desktop\AP.txt"
$Output = "I:\Vendors\Missing\Missing.txt"
get-content -path $txt |
select-string "(\S+)\s+(\S+)" |
%{
# pull vendor and invoice numbers from matchinfo
$vendor = $_.matches[0].groups[1]
$invoice = $_.matches[0].groups[2]
# construct path
$testPath = $Location -f $vendor, $invoice
# for debugging purposes
write-host ( "Line:'{0}' Path:'{1}'" -f $_.line, $testPath )
# test for existence of the path; ignore errors
if ( -not ( get-item -path $testPath -ErrorAction SilentlyContinue ) ) {
# path does not exist, so write the line to pipeline
write-output $_
}
} |
Set-Content -Path $Output
It seemed that the -split " " behaved differently in a running script to how it behaves on the command line. Weird. Anyway, this version uses a regular expression to parse the input line. I tested it against the example data in the original post and it seemed to work.
The regex is broken down as follows
( Start the first matching group
\S+ Greedily match one or more non-white-space characters
) End the first matching group
\s+ Greedily match one or more white-space characters
( Start the second matching group
\S+ Greedily match one or more non-white-space characters
) End the second matching groups