Find lines between a pattern, and append 1st line to lines - powershell

I have the following case I'm trying to script in Powershell. I have done this exercise using Sed on a bash terminal, but having trouble writing in Powershell. Any help would be greatly appreciated.
(sed -r -e '/^N/h;/^[N-]/d;G;s/(.*)\n(.*)/\2 \1/' <file>, with a file format without < and > chars. surrounding the first letter on each line)
The start pattern always start with a <N> (only 1 instance per block), lines between start with a <J>, and the end pattern is always --
--------------
<N>ABC123
<J>SomethingHere1
<J>SomethingHere2
<J>SomethingHere3
-------------- <-- end of section
I'm trying to take the first line in each section <N> and copy it AFTER each <J> in the same section. For example:
<J>SomethingHere1 <N>ABC123
<J>SomethingHere2 <N>ABC123
<J>SomethingHere3 <N>ABC123
The number of <J> lines per section can vary (0-N). In a case with no <J>, nothing needs to be done.
Powershell version:5.1.16299.611

The following, pipeline-based solution isn't fast, but conceptually straightforward:
Get-Content file.txt | ForEach-Object {
if ($_ -match '^-+$') { $newSect = $true }
elseif ($newSect) { $firstSectionLine = $_; $newSect = $False }
else { "{0}`t{1}" -f $_, $firstSectionLine }
}
It reads and processes lines one by one (with the line at hand reflected in automatic variable $_.
It uses a regex (^-+) with the -match operator to identify section dividers; if found, flag $newSect is set to signal that the next line is the section's first data line.
If the first data line is hit, it is cached in variable $firstSectionLine, and the $newSect flag is reset.
All other lines are by definition the lines to which the first data line is to be appended, which is done via the -f string-formatting operator, using a tab char. (`t) as the separator.
Here's a faster PSv4+ solution that is more complex, however, and it reads the entire input file into memory up front:
((Get-Content -Raw file.txt) -split '(?m)^-+(?:\r?\n)?' -ne '').ForEach({
$firstLine, $otherLines = $_ -split '\r?\n' -ne ''
foreach ($otherLine in $otherLines) { "{0}`t{1}" -f $otherLine, $firstLine }
})
Get-Content -Raw reads in the input file in full, as a single string.
It uses the -split operator to split the input file into sections, and then processes each section.
Regex '(?m)^-+(?:\r?\n)?' matches a section divider line, optionally followed by a newline.
(?m) is the multiline option, which makes ^ and $ match the start and end of each line, respectively:
\r?\n matches a newline, either in CRLF (\r\n) or LF-only (\n) form.
(?:...) is a non-capturing group; making it non-capturing prevents what it matches from being included in the elements returned by -split.
-ne '' filters out resulting empty elements.
-split '\r?\n' splits each section into individual lines.
If performance is still a concern, you could speed up reading the file with [IO.File]::ReadAllText("$PWD/file.txt").

Related

Powershell- How to wrap a specific line in an array object in double quotes?

Lets say I have a text file whose content is this:
======= Section 1 ==========
This is line one
This is line two
This is line three
======= Section 2 ==========
This is line Four
This is line Five
This is line Six
I import it into Powershell using $SourceFile = Get-Content '.\source.txt' -Delimiter '==='
then clean it up with $Objects = $SourceFile -replace '^=.*', ''
I now have this $Objects array that has:
This is line one
This is line two
This is line three
and
This is line Four
This is line Five
This is line Six
What I really want to know is how can I wrap a specific line in double quotes or parenthesis etc, so that specific line in all the objects arrays are also handled the same way.
For example line 3 of both arrays should have double quotes wrapped around them:
This is line one
This is line two
"This is line three"
and
This is line Four
This is line Five
"This is line Six"
I have tried many things, closest I got was $test = $objects -replace 'This is line three', '"This is line three"' As you can see this is less than ideal for an object of many arrays.
I am still fairly new to Powershell, any help would be greatly appreciated
Assuming that you want to wrap the 3rd line in each block in double quotes:
# Read the input file and split it into blocks of lines by a
# regex matching the section lines (without including the section lines).
(Get-Content -Raw file.txt) -split '(?m)^===.*\r?\n' -ne '' |
ForEach-Object { # Process each block of lines (multi-line string)
# Split this block into individual lines, ignoring a trailing newline.
$linesInBlock = $_ -replace '\r?\n\z' -split '\r?\n'
# Modify the 3rd line.
$linesInBlock[2] = '"{0}"' -f $linesInBlock[2]
# Output the lines, including the modification.
Write-Verbose -Verbose "-- next block"
$linesInBlock
}
Output with your sample input:
VERBOSE: -- next block
This is line one
This is line two
"This is line three"
VERBOSE: -- next block
This is line Four
This is line Five
"This is line Six"

add quotation mark to a text file powershell

I need to add the quotation mark to a text file that contains 500 lines text.
The format is inconsistent. It has dashes, dots, numbers, and letters. For example
1527c705-839a-4832-9118-54d4Bd6a0c89
16575getfireshot.com.FireShotCaptureWebpageScreens
3EA2211E.GestetnerDriverUtility
I have tried to code this
$Flist = Get-Content "$home\$user\appfiles\out.txt"
$Flist | %{$_ -replace '^(.*?)', '"'}
I got the result which only added to the beginning of a line.
"Microsoft.WinJS.2.0
The expected result should be
"Microsoft.WinJS.2.0"
How to add quotation-mark to the end of each line as well?
There is no strict need to use a regex (regular expression) in your case (requires PSv4+):
(Get-Content $home\$user\appfiles\out.txt).ForEach({ '"{0}"' -f $_ })
Array method .ForEach() processes each input line via the script block ({ ... }) passed to it.
'"{0}"' -f $_ effectively encloses each input line ($_) in double quotes, via -f, the string-format operator.
If you did want to use a regex:
(Get-Content $home\$user\appfiles\out.txt) -replace '^|$', '"'
Regex ^|$ matches both the start (^) and the end ($) of the input string and replaces both with a " char., effectively enclosing the input string in double quotes.
As for what you tried:
^(.*?)
just matches the very start of the string (^), and nothing else, given that .*? - due to using the non-greedy duplication symbol ? - matches nothing else.
Therefore, replacing what matched with " only placed a " at the start of the input string, not also at the end.
You can use regex to match both:
The beginning of the line ^(.*?)
OR |
The End of the line $
I.e. ^(.*?)|$
$Flist = Get-Content "$home\$user\appfiles\out.txt"
$Flist | %{$_ -replace '^(.*?)|$', '"'}

How to compare two sequential strings in a file

I have a big file consists of "before" and "after" cases for every item as follows:
case1 (BEF) ACT
(AFT) BLK
case2 (BEF) ACT
(AFT) ACT
case3 (BEF) ACT
(AFT) CLC
...
I need to select all of the strings which have (BEF) ACT on the "first" string and (AFT) BLK on the "second" and place the result to a file.
The idea is to create a clause like
IF (stringX.LineNumber consists of "(BEF) ACT" AND stringX+1.LineNumber consists of (AFT) BLK)
{OutFile $stringX+$stringX+1}
Sorry for the syntax, I've just starting to work with PS :)
$logfile = 'c:\temp\file.txt'
$matchphrase = '\(BEF\) ACT'
$linenum=Get-Content $logfile | Select-String $matchphrase | ForEach-Object {$_.LineNumber+1}
$linenum
#I've worked out how to get a line number after the line with first required phrase
Create a new file with a result as follows:
string with "(BEF) ACT" following with a string with "(AFT) BLK"
Select-String -SimpleMatch -CaseSensitive '(BEF) ACT' c:\temp\file.txt -Context 0,1 |
ForEach-Object {
$lineAfter = $_.Context.PostContext[0]
if ($lineAfter.Contains('(AFT) BLK')) {
$_.Line, $lineAfter # output
}
} # | Set-Content ...
-SimpleMatch performs string-literal substring matching, which means you can pass the search string as-is, without needing to escape it.
However, if you needed to further constrain the search, such as to ensure that it only occurs at the end of a line ($), you would indeed need a regular expression with the (implied) -Pattern parameter: '\(BEF\) ACT$'
Also note PowerShell is generally case-insensitive by default, which is why switch -CaseSensitive is used.
Note how Select-String can accept file paths directly - no need for a preceding Get-Content call.
-Context 0,1 captures 0 lines before and 1 line after each match, and includes them in the [Microsoft.PowerShell.Commands.MatchInfo] instances that Select-String outputs.
Inside the ForEach-Object script block, $_.Context.PostContext[0] retrieves the line after the match and .Contains() performs a literal substring search in it.
Note that .Contains() is a method of the .NET System.String type, and such methods - unlike PowerShell - are case-sensitive by default, but you can use an optional parameter to change that.
If the substring is found on the subsequent line, both the line at hand and the subsequent one are output.
The above looks for all matching pairs in the input file; if you only wanted to find the first pair, append | Select-Object -First 2 to the Select-String call.
Another way of doing this is to read the $logFile in as a single string and use a RegEx match to get the parts you want:
$logFile = 'c:\temp\file.txt'
$outFile = 'c:\temp\file2.txt'
# read the content of the logfile as a single string
$content = Get-Content -Path $logFile -Raw
$regex = [regex] '(case\d+\s+\(BEF\)\s+ACT\s+\(AFT\)\s+BLK)'
$match = $regex.Match($content)
($output = while ($match.Success) {
$match.Value
$match = $match.NextMatch()
}) | Set-Content -Path $outFile -Force
When used the result is:
case1 (BEF) ACT
(AFT) BLK
case7 (BEF) ACT
(AFT) BLK
Regex details:
( Match the regular expression below and capture its match into backreference number 1
case Match the characters “case” literally
\d Match a single digit 0..9
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
\s Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
\( Match the character “(” literally
BEF Match the characters “BEF” literally
\) Match the character “)” literally
\s Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
ACT Match the characters “ACT” literally
\s Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
\( Match the character “(” literally
AFT Match the characters “AFT” literally
\) Match the character “)” literally
\s Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
BLK Match the characters “BLK” literally
)
My other answer completes your own Select-String-based solution attempt. Select-String is versatile, but slow, though it is appropriate for processing files too large to fit into memory as a whole, given that it processes files line by line.
However, PowerShell offers a much faster line-by-line processing alternative: switch -File - see the solution below.
Theo's helpful answer, which reads the entire file into memory first, will probably perform best overall, depending on file size, but it comes at the cost of increased complexity, due to relying heavily on direct use of .NET functionality.
$(
$firstLine = ''
switch -CaseSensitive -Regex -File t.txt {
'\(BEF\) ACT' { $firstLine = $_; continue }
'\(AFT\) BLK' {
# Pair found, output it.
# If you don't want to look for further pairs,
# append `; break` inside the block.
if ($firstLine) { $firstLine, $_ }
# Look for further pairs.
$firstLine = ''; continue
}
default { $firstLine = '' }
}
) # | Set-Content ...
Note: The enclosing $(...) is only needed if you want to send the output directly to the pipeline to a cmdlet such as Set-Content; it is not needed for capturing the output in a variable: $pair = switch ...
-Regex interprets the branch conditionals as regular expressions.
$_ inside a branch's action script block ({ ... } refers to the line at hand.
The overall approach is:
$firstLine stores the 1st line of interest once found, and when the 2nd line's pattern is found and $firstLine is set (is nonempty), the pair is output.
The default handler resets $firstLine, to ensure that only two consecutive lines that contain the strings of interest are considered.

Manipulating first character of each line in a file with .Replace()

Say I have a text file 123.txt
one,two,three
four,five,six
My goal is to capitalize the first character of each line by using Get-Culture. This is my attempt:
$str = gc C:\Users\Administrator\Desktop\123.txt
#Split each line into an array
$array = $str.split("`n")
for($i=0; $i -lt $array.Count; $i++) {
#Returns O and F:
$text = (Get-Culture).TextInfo.ToTitleCase($array[$i].Substring(0,1))
#Supposed to replace the first letter of each array with $text
$array[$i].Replace($array[$i].Substring(0,1), $text) >> .\Desktop\finish.txt
}
Result:
One,twO,three
Four,Five,six
I understand that .Replace() is replaces every occurrence of the current array, which is why I made sure that it's replacing ONLY the first character of the array with $array[$i].Substring(0,1), but this doesn't work.
Try the following:
Get-Content C:\Users\Administrator\Desktop\123.txt | ForEach-Object {
if ($_) {
$_.Substring(0, 1).ToUpper() + $_.Substring(1)
} else {
$_
}
} > .\Desktop\finish.txt
Get-Content reads the input file line by line and sends each line - stripped of its line terminator - through the pipeline.
ForEach-Object processes each line in the associated script block, in which $_ represents the line at hand:
if ($_) tests if the line is nonempty, i.e. if there's at least 1 character; if not, the else block simply passes the empty line through.
$_.Substring(0, 1).ToUpper() converts the line's 1st character to uppercase, implicitly using the current culture (with a single character, this is equivalent to applying Get-Culture).TextInfo.ToTitleCase()).
+ $_.Substring(1) appends the rest of the line.
Only > rater than >> is needed to write to the output file, because the entire pipeline's output is written at once.
The reason this is not working is because you are replacing the character...
$array[$i].Substring(0,1)
... but you are using the Replace method on the entire array element
$array[$i].Replace(...
Here the array element is a string, equal to a line of the input. So it will replace every occurrence of that character.
Get-Content (unless you use the -Raw parameter) by default returns the text as an array of strings. So you should be able to use this regex replace (I have used ToString().ToUpper() - nothing wrong with the Get-Culture method)
$str = gc C:\Users\Administrator\Desktop\123.txt
foreach($line in $str){
$line -replace '^\w', $line[0].ToString().ToUpper() >> .\Desktop\finish.txt
}
Regex explanation:
^ is an anchor. It specifies "the beginning of the string"
\w matches a word character - usually a-z, A-Z, 0-9
See mklement0's comments here for the more focused ^\p{Ll} and here for further explanation

How can I replace every comma with a space in a text file before a pattern using PowerShell

I have a text file with lines in this format:
FirstName,LastName,SSN,$x.xx,$x.xx,$x.xx
FirstName,MiddleInitial,LastName,SSN,$x.xx,$x.xx,$x.xx
The lines could be in either format. For example:
Joe,Smith,123-45-6789,$150.00,$150.00,$0.00
Jane,F,Doe,987-65-4321,$250.00,$500.00,$0.00
I want to basically turn everything before the SSN into a single field for the name thus:
Joe Smith,123-45-6789,$150.00,$150.00,$0.00
Jane F Doe,987-65-4321,$250.00,$500.00,$0.00
How can I do this using PowerShell? I think I need to use ForEach-Object and at some point replace "," with " ", but I don't know how to specify the pattern. I also don't know how to use a ForEach-Object with a $_.Where so that I can specify the "SkipUntil" mode.
Thanks very much!
Mathias is correct; you want to use the -replace operator, which uses regular expressions. I think this will do what you want:
$string -replace ',(?=.*,\d{3}-\d{2}-\d{4})',' '
The regular expression uses a lookahead (?=) to look for any commas that are followed by any number of any character (. is any character, * is any number of them including 0) that are then followed by a comma immediately followed by a SSN (\d{3}-\d{2}-\d{4}). The concept of "zero-width assertions", such as this lookahead, simply means that it is used to determine the match, but it not actually returned as part of the match.
That's how we're able to match only the commas in the names themselves, and then replace them with a space.
I know it's answered, and neatly so, but I tried to come up with an alternative to using a regex - count the number of commas in a line, then replace either the first one, or the first two, commas in the line.
But strings can't count how many times a character appears in them without using the regex engine(*), and replacements can't be done a specific number of times without using the regex engine(**), so it's not very neat:
$comma = [regex]","
Get-Content data.csv | ForEach {
$numOfCommasToReplace = $comma.Matches($_).Count - 4
$comma.Replace($_, ' ', $numOfCommasToReplace)
} | Out-File data2.csv
Avoiding the regex engine entirely, just for fun, gets me things like this:
Get-Content .\data.csv | ForEach {
$1,$2,$3,$4,$5,$6,$7 = $_ -split ','
if ($7) {"$1 $2 $3,$4,$5,$6,$7"} else {"$1 $2,$3,$4,$5,$6"}
} | Out-File data2.csv
(*) ($line -as [char[]] -eq ',').Count
(**) while ( #counting ) { # split/mangle/join }