How to compare two sequential strings in a file - powershell

I have a big file consists of "before" and "after" cases for every item as follows:
case1 (BEF) ACT
(AFT) BLK
case2 (BEF) ACT
(AFT) ACT
case3 (BEF) ACT
(AFT) CLC
...
I need to select all of the strings which have (BEF) ACT on the "first" string and (AFT) BLK on the "second" and place the result to a file.
The idea is to create a clause like
IF (stringX.LineNumber consists of "(BEF) ACT" AND stringX+1.LineNumber consists of (AFT) BLK)
{OutFile $stringX+$stringX+1}
Sorry for the syntax, I've just starting to work with PS :)
$logfile = 'c:\temp\file.txt'
$matchphrase = '\(BEF\) ACT'
$linenum=Get-Content $logfile | Select-String $matchphrase | ForEach-Object {$_.LineNumber+1}
$linenum
#I've worked out how to get a line number after the line with first required phrase
Create a new file with a result as follows:
string with "(BEF) ACT" following with a string with "(AFT) BLK"

Select-String -SimpleMatch -CaseSensitive '(BEF) ACT' c:\temp\file.txt -Context 0,1 |
ForEach-Object {
$lineAfter = $_.Context.PostContext[0]
if ($lineAfter.Contains('(AFT) BLK')) {
$_.Line, $lineAfter # output
}
} # | Set-Content ...
-SimpleMatch performs string-literal substring matching, which means you can pass the search string as-is, without needing to escape it.
However, if you needed to further constrain the search, such as to ensure that it only occurs at the end of a line ($), you would indeed need a regular expression with the (implied) -Pattern parameter: '\(BEF\) ACT$'
Also note PowerShell is generally case-insensitive by default, which is why switch -CaseSensitive is used.
Note how Select-String can accept file paths directly - no need for a preceding Get-Content call.
-Context 0,1 captures 0 lines before and 1 line after each match, and includes them in the [Microsoft.PowerShell.Commands.MatchInfo] instances that Select-String outputs.
Inside the ForEach-Object script block, $_.Context.PostContext[0] retrieves the line after the match and .Contains() performs a literal substring search in it.
Note that .Contains() is a method of the .NET System.String type, and such methods - unlike PowerShell - are case-sensitive by default, but you can use an optional parameter to change that.
If the substring is found on the subsequent line, both the line at hand and the subsequent one are output.
The above looks for all matching pairs in the input file; if you only wanted to find the first pair, append | Select-Object -First 2 to the Select-String call.

Another way of doing this is to read the $logFile in as a single string and use a RegEx match to get the parts you want:
$logFile = 'c:\temp\file.txt'
$outFile = 'c:\temp\file2.txt'
# read the content of the logfile as a single string
$content = Get-Content -Path $logFile -Raw
$regex = [regex] '(case\d+\s+\(BEF\)\s+ACT\s+\(AFT\)\s+BLK)'
$match = $regex.Match($content)
($output = while ($match.Success) {
$match.Value
$match = $match.NextMatch()
}) | Set-Content -Path $outFile -Force
When used the result is:
case1 (BEF) ACT
(AFT) BLK
case7 (BEF) ACT
(AFT) BLK
Regex details:
( Match the regular expression below and capture its match into backreference number 1
case Match the characters “case” literally
\d Match a single digit 0..9
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
\s Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
\( Match the character “(” literally
BEF Match the characters “BEF” literally
\) Match the character “)” literally
\s Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
ACT Match the characters “ACT” literally
\s Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
\( Match the character “(” literally
AFT Match the characters “AFT” literally
\) Match the character “)” literally
\s Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
BLK Match the characters “BLK” literally
)

My other answer completes your own Select-String-based solution attempt. Select-String is versatile, but slow, though it is appropriate for processing files too large to fit into memory as a whole, given that it processes files line by line.
However, PowerShell offers a much faster line-by-line processing alternative: switch -File - see the solution below.
Theo's helpful answer, which reads the entire file into memory first, will probably perform best overall, depending on file size, but it comes at the cost of increased complexity, due to relying heavily on direct use of .NET functionality.
$(
$firstLine = ''
switch -CaseSensitive -Regex -File t.txt {
'\(BEF\) ACT' { $firstLine = $_; continue }
'\(AFT\) BLK' {
# Pair found, output it.
# If you don't want to look for further pairs,
# append `; break` inside the block.
if ($firstLine) { $firstLine, $_ }
# Look for further pairs.
$firstLine = ''; continue
}
default { $firstLine = '' }
}
) # | Set-Content ...
Note: The enclosing $(...) is only needed if you want to send the output directly to the pipeline to a cmdlet such as Set-Content; it is not needed for capturing the output in a variable: $pair = switch ...
-Regex interprets the branch conditionals as regular expressions.
$_ inside a branch's action script block ({ ... } refers to the line at hand.
The overall approach is:
$firstLine stores the 1st line of interest once found, and when the 2nd line's pattern is found and $firstLine is set (is nonempty), the pair is output.
The default handler resets $firstLine, to ensure that only two consecutive lines that contain the strings of interest are considered.

Related

Parse info from Text File - Powershell

Beginner here, I am working on a error log file and library, the current step I am on is to pull specific information from a txt file.
The code I have currently is...
$StatusErr = "Type 1","Type 2"
for ($i=0; $i -lt $StatusErr.length; $i++) {
get-content C:\blah\Logs\StatusErrors.TXT |
select-string $StatusErr[$i] |
add-content C:\blah\Logs\StatusErrorsresult.txt
}
while it is working, I need it to display as
Type-1-Description
2-Description
Type-1-Description
2-Description
Type-1-Description
2-Description
etc.
it is currently displaying as
Type 1 = Type-1-Description
Type 1 = Type-1-Description
Type 1 = Type-1-Description
Type 2 = 2-Description
Type 2 = 2-Description
Type 2 = 2-Description
I am unsure how to change the arrangement and remove unneeded spaces and the = sign
You need to search for both patterns in a single Select-String call in order to get matching lines in order.
While the -Pattern parameter does accept an array of patterns, in this case a single regex will do.
You need to use a regex pattern in order to capture and output only part of the lines that match.
$StatusErrRegex = '(?<=Type [12]\s*=\s*)[^ ]+'
get-content C:\blah\Logs\StatusErrors.TXT |
select-string $StatusErrRegex |
foreach-object { $_.Matches.Value } |
set-content C:\blah\Logs\StatusErrorsresult.txt
Note that I've replaced add-content with set-content, as I'm assuming you don't want to append to a preexisting file. set-content writes all objects it receives via the pipeline to the output file.
Select-String outputs Microsoft.PowerShell.Commands.MatchInfo instances whose .Matches property provides access to the part of the line that was matched.
For an explanation of the regex and the ability to experiment with it, see this regex101.com page.
Additional notes:
Select-String, like PowerShell in general, is case-insensitive by default; add the -CaseSensitive switch, if needed.
(?<=...) is a (positive) lookbehind assertion, whose matching text doesn't became part of what the regex captures.
\s* matches zero or more whitespace characters; \s+ would match one or more.
[^ ]+ matches one or more (+) characters that are not ^ spaces ( ), and thereby captures the run of non-space characters to the right of the = sign.
To match any of multiple words at the start of the pattern, use a regex alternation (|), e.g. '(?<=(type|data) [12]\s*=\s*)[^ ]+'

Insert comma before a value if it is blank

Below is the data I have
-Ignored:31,Modified,src,data,all,*file,MINOSFIC/UTMNUP10
-Ignored:33,Modified,src,&,tgt,data,all,*file,MINOSFIC/UVEGAP10
-Ignored:92,Synchro,is,running,*file,MINOSFIC/VM010P50
-Ignored:01,Object,hold,(synchro),*file,MINOSFIC/VM010U51
here I am parsing the data and keeping in csv
for 1st and 2nd line it is working but when it is coming to 3rd and 4th line, it is pushing the column value one forward as there is no data before *file
Please let me know how to handle this. how to insert a comma before *file if there is not entry (like for first 2 line it is all)
$allIGfiles = Get-ChildItem -Path 'C:\LG2' -Recurse -Filter "*LG_VFN*"
foreach($file in $allIGfiles)
{
$filename = $file.FullName
$data = Get-Content $filename | Select -SkipLast 1
$Lines = #()
foreach ($line in $data){
if($line -match "Ignored")
{
$Lines+=$line
}
}
$NewLines = #($Lines | % { ($_ -replace "\s{2,}",",") -replace "(\d) ", '$1,'} )
$NewLines | Export-Csv 'c:\file.csv' -append -NoTypeInformation
Original data
-Ignored:31 Modified src data all *file MINOSFIC/UTMNUP10
-Ignored:33 Modified src & tgt data all *file MINOSFIC/UVEGAP10
-Ignored:92 Synchro is running *file MINOSFIC/VM010P50
-Ignored:01 Object hold (synchro) *file MINOSFIC/VM010U51
Update:
I am now getting the data like below but when i am trying to put it in csv it is only writing numbers to the file
-Ignored:31,Modified src data,all *file,MINOSFIC/UTMNUP10
-Ignored:33,Modified src & tgt data,all *file,MINOSFIC/UVEGAP10
-Ignored:92,Synchro is running,*file,MINOSFIC/VM010P50
-Ignored:01,Object hold (synchro),*file,MINOSFIC/VM010U51
-Ignored:01,Object hold (synchro),*file,MINOSFIC/VM010U52
-Ignored:01,Object hold (synchro),*file,MINOSFIC/VM010U53
-Ignored:01,Object hold (synchro),*file,MINOSFIC/VM010U54
Data with other code
Object OK (partial) 97% *file MINOSFIC/VM011P10
-Ignored:18 Object hold *file MINOSFIC/VM011P50
Object OK (partial) 78% *file MINOSFIC/VM800P30
*Error: 09 Diff. Creation date *file MINOSSVG/S100000702
*Error: 09 Diff. Creation date *file MINOSSVG/S100000805
-Ignored:18 Object hold *file MINOSSVG/S100001154
*Error: 09 Diff. Creation date *file MINOSSVG/S100001227
You could do this by using a regex that at first ignores the all value, but when constructing the comma separated new string, this will be inserted when found:
Read the file as string array
$data = Get-Content -Path $filename
I'm faking that by using a Here-String below:
$data = #"
-Ignored:31 Modified src data all *file MINOSFIC/UTMNUP10
-Ignored:33 Modified src & tgt data all *file MINOSFIC/UVEGAP10
-Ignored:92 Synchro is running *file MINOSFIC/VM010P50
-Ignored:01 Object hold (synchro) *file MINOSFIC/VM010U51
"# -split '\r?\n'
$result = foreach ($line in $data) {
if ($line -match '^(-Ignored:\d+)\s+(.+)\s+(\*file)\s+(.*)') {
'{0},{1},{2},{3},{4}' -f $matches[1],
($matches[2] -replace 'all$').Trim(),
($matches[2] -split '\s{2,}')[-1],
$matches[3],
$matches[4]
}
}
# output to console screen
$result
# write to file
$result | Set-Content -Path 'X:\TheNewFile.txt'
Output:
-Ignored:31,Modified src data,all,*file,MINOSFIC/UTMNUP10
-Ignored:33,Modified src & tgt data,all,*file,MINOSFIC/UVEGAP10
-Ignored:92,Synchro is running,,*file,MINOSFIC/VM010P50
-Ignored:01,Object hold (synchro),,*file,MINOSFIC/VM010U51
To also do this with *Error.. lines as in your updated example, change the line
if ($line -match '^(-Ignored:\d+)\s+(.+)\s+(\*file)\s+(.*)') {
into
if ($line -match '^((?:-Ignored|\*Error):\s*\d+)\s+(.+)\s+(\*file)\s+(.*)') {
Regex details:
^ Assert position at the beginning of a line (at beginning of the string or after a line break character)
( Match the regular expression below and capture its match into backreference number 1
(?: Match the regular expression below
Match either the regular expression below (attempting the next alternative only if this one fails)
-Ignored Match the characters “-Ignored” literally
| Or match regular expression number 2 below (the entire group fails if this one fails to match)
\* Match the character “*” literally
Error Match the characters “Error” literally
)
: Match the character “:” literally
\s Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
\d Match a single digit 0..9
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
\s Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
( Match the regular expression below and capture its match into backreference number 2
. Match any single character that is not a line break character
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
\s Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
( Match the regular expression below and capture its match into backreference number 3
\* Match the character “*” literally
file Match the characters “file” literally
)
\s Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
( Match the regular expression below and capture its match into backreference number 4
. Match any single character that is not a line break character
* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
)
One approach is to first replace two consecutive spaces with a comma. Then replace digit followed with a space with the same digit via capture group and a comma. Like so,
$data=#(
'-Ignored:31 Modified src data all *file MINOSFIC/UTMNUP10',
'-Ignored:33 Modified src & tgt data all *file MINOSFIC/UVEGAP10',
'-Ignored:92 Synchro is running *file MINOSFIC/VM010P50',
'-Ignored:01 Object hold (synchro) *file MINOSFIC/VM010U51')
$data | % { ($_ -replace "\s{2,}",",") -replace "(\d) ", '$1,'}
-Ignored:31,Modified src data,all *file,MINOSFIC/UTMNUP10
-Ignored:33,Modified src & tgt data,all *file,MINOSFIC/UVEGAP10
-Ignored:92,Synchro is running,*file,MINOSFIC/VM010P50
-Ignored:01,Object hold (synchro),*file,MINOSFIC/VM010U51
This would get all *file in same column as *file. Should that not be enough, use ConvertFrom-String or do another replacement to introduce the missing column. As of how, you probably need to calculate how many commas there are and deduct from that if the column is needed.

etc/host file entries match or not if not match entries then send a Triger

Here, I am new to PowerShell scripting. I am trying to do Host file entries should match every entry in a dynamic list of IP and DNS name entries from C:\Windows\System32\drivers\etc. Any mismatch will fail the audit.
entries are
10.23.24.45 foo.com
10.24.45.34 domain.com
Here is my code.
"$Pattern = '^(?<IP>\d{1,3}(\.\d{1,3}){3})\s+(?<Host>.+)$'
$File = "$env:SystemDrive\Windows\System32\Drivers\etc\hosts"
$Entries = #()
(Get-Content -Path $File) | ForEach-Object {
If ($_ -match $Pattern) {
$Entries += "$($Matches.IP) $($Matches.Host)"
Write-Host " the values are $Entries"
$FailureMessage = "IP and host entries are existing"
}
else {
$FailureMessage = "IP and host entries are doesn't existing"
}
}"
But this is not working for me. Can you help here
A hosts file can also contain comment lines or comments after the IP and Hostname parts, preceeded by a # character. Your regex at the moment does not account for that.
I would create a lookup hashtable with all required entries and use that to find whether the hosts file contains any of these or not and as result create an array of PSObjects for nice formatting and easy filtering.
Something like:
$file = "$env:SystemDrive\Windows\System32\Drivers\etc\hosts"
$pattern = '^\s*(?<IP>[0-9a-f.:]+)\s+(?<HostName>[^\s#]+)(?<Comment>.*)$'
# create an array of Hashtables with required entries
$required = #{Ip = '10.23.24.45'; HostName = 'foo.com'},
#{Ip = '10.24.45.34'; HostName = 'domain.com'}
# read the current content of the hosts file, filter only lines that match the pattern
$result = Get-Content -Path $file | Where-Object { $_ -match $pattern } | ForEach-Object {
$ip = $matches.Ip
$hostname = $matches.HostName
# test if the entry is one of the required ones
$exists = [bool]($required | Where-Object { $_.Ip -eq $ip -and $_.HostName -eq $hostname })
# output an object
[PsCustomObject]#{
IP = $ip
HostName = $hostname
Exists = $exists
}
}
# show results on screen
$result | Format-Table -AutoSize
Next you can send an email using Send-MailMessage if any of the required entries is missing
# select the entries where property Exists is False
$missing = $result | Where-Object { -not $_.Exists }
if ($missing) {
# here is where you send your mail message
}
Regex details:
^ Assert position at the beginning of a line (at beginning of the string or after a line break character)
\s Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
(?<IP> Match the regular expression below and capture its match into backreference with name “IP”
[0-9a-f.:] Match a single character present in the list below
A character in the range between “0” and “9”
A character in the range between “a” and “f”
One of the characters “.:”
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
\s Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
(?<HostName> Match the regular expression below and capture its match into backreference with name “HostName”
[^\s#] Match a single character NOT present in the list below
A whitespace character (spaces, tabs, line breaks, etc.)
The character “#”
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
(?<Comment> Match the regular expression below and capture its match into backreference with name “Comment”
. Match any single character that is not a line break character
* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
)
$ Assert position at the end of a line (at the end of the string or before a line break character)

Find lines between a pattern, and append 1st line to lines

I have the following case I'm trying to script in Powershell. I have done this exercise using Sed on a bash terminal, but having trouble writing in Powershell. Any help would be greatly appreciated.
(sed -r -e '/^N/h;/^[N-]/d;G;s/(.*)\n(.*)/\2 \1/' <file>, with a file format without < and > chars. surrounding the first letter on each line)
The start pattern always start with a <N> (only 1 instance per block), lines between start with a <J>, and the end pattern is always --
--------------
<N>ABC123
<J>SomethingHere1
<J>SomethingHere2
<J>SomethingHere3
-------------- <-- end of section
I'm trying to take the first line in each section <N> and copy it AFTER each <J> in the same section. For example:
<J>SomethingHere1 <N>ABC123
<J>SomethingHere2 <N>ABC123
<J>SomethingHere3 <N>ABC123
The number of <J> lines per section can vary (0-N). In a case with no <J>, nothing needs to be done.
Powershell version:5.1.16299.611
The following, pipeline-based solution isn't fast, but conceptually straightforward:
Get-Content file.txt | ForEach-Object {
if ($_ -match '^-+$') { $newSect = $true }
elseif ($newSect) { $firstSectionLine = $_; $newSect = $False }
else { "{0}`t{1}" -f $_, $firstSectionLine }
}
It reads and processes lines one by one (with the line at hand reflected in automatic variable $_.
It uses a regex (^-+) with the -match operator to identify section dividers; if found, flag $newSect is set to signal that the next line is the section's first data line.
If the first data line is hit, it is cached in variable $firstSectionLine, and the $newSect flag is reset.
All other lines are by definition the lines to which the first data line is to be appended, which is done via the -f string-formatting operator, using a tab char. (`t) as the separator.
Here's a faster PSv4+ solution that is more complex, however, and it reads the entire input file into memory up front:
((Get-Content -Raw file.txt) -split '(?m)^-+(?:\r?\n)?' -ne '').ForEach({
$firstLine, $otherLines = $_ -split '\r?\n' -ne ''
foreach ($otherLine in $otherLines) { "{0}`t{1}" -f $otherLine, $firstLine }
})
Get-Content -Raw reads in the input file in full, as a single string.
It uses the -split operator to split the input file into sections, and then processes each section.
Regex '(?m)^-+(?:\r?\n)?' matches a section divider line, optionally followed by a newline.
(?m) is the multiline option, which makes ^ and $ match the start and end of each line, respectively:
\r?\n matches a newline, either in CRLF (\r\n) or LF-only (\n) form.
(?:...) is a non-capturing group; making it non-capturing prevents what it matches from being included in the elements returned by -split.
-ne '' filters out resulting empty elements.
-split '\r?\n' splits each section into individual lines.
If performance is still a concern, you could speed up reading the file with [IO.File]::ReadAllText("$PWD/file.txt").

How can I replace every comma with a space in a text file before a pattern using PowerShell

I have a text file with lines in this format:
FirstName,LastName,SSN,$x.xx,$x.xx,$x.xx
FirstName,MiddleInitial,LastName,SSN,$x.xx,$x.xx,$x.xx
The lines could be in either format. For example:
Joe,Smith,123-45-6789,$150.00,$150.00,$0.00
Jane,F,Doe,987-65-4321,$250.00,$500.00,$0.00
I want to basically turn everything before the SSN into a single field for the name thus:
Joe Smith,123-45-6789,$150.00,$150.00,$0.00
Jane F Doe,987-65-4321,$250.00,$500.00,$0.00
How can I do this using PowerShell? I think I need to use ForEach-Object and at some point replace "," with " ", but I don't know how to specify the pattern. I also don't know how to use a ForEach-Object with a $_.Where so that I can specify the "SkipUntil" mode.
Thanks very much!
Mathias is correct; you want to use the -replace operator, which uses regular expressions. I think this will do what you want:
$string -replace ',(?=.*,\d{3}-\d{2}-\d{4})',' '
The regular expression uses a lookahead (?=) to look for any commas that are followed by any number of any character (. is any character, * is any number of them including 0) that are then followed by a comma immediately followed by a SSN (\d{3}-\d{2}-\d{4}). The concept of "zero-width assertions", such as this lookahead, simply means that it is used to determine the match, but it not actually returned as part of the match.
That's how we're able to match only the commas in the names themselves, and then replace them with a space.
I know it's answered, and neatly so, but I tried to come up with an alternative to using a regex - count the number of commas in a line, then replace either the first one, or the first two, commas in the line.
But strings can't count how many times a character appears in them without using the regex engine(*), and replacements can't be done a specific number of times without using the regex engine(**), so it's not very neat:
$comma = [regex]","
Get-Content data.csv | ForEach {
$numOfCommasToReplace = $comma.Matches($_).Count - 4
$comma.Replace($_, ' ', $numOfCommasToReplace)
} | Out-File data2.csv
Avoiding the regex engine entirely, just for fun, gets me things like this:
Get-Content .\data.csv | ForEach {
$1,$2,$3,$4,$5,$6,$7 = $_ -split ','
if ($7) {"$1 $2 $3,$4,$5,$6,$7"} else {"$1 $2,$3,$4,$5,$6"}
} | Out-File data2.csv
(*) ($line -as [char[]] -eq ',').Count
(**) while ( #counting ) { # split/mangle/join }