Remove blank lines after specific text (without using -notmatch) - powershell

We have a script that uses a function to go through a text file and replace certain words with either other words or with nothing. The spots that get replaced with nothing leave behind a blank line, which we need to remove in some cases (but not all). I've seen several places where people mention using things like -notmatch to copy over everything to a new file except what you want left behind, but there are a lot of blank lines we want left in place.
For example:
StrangerThings: A Netflix show
'blank line to be removed'
'blank line to be removed'
Cast: Some actors
Crew: hard-working people
'blank line left in place'
KeyGrip
'blank line to be removed'
Gaffer
'blank line left in place'
So that it comes out like this:
StrangerThings: A Netflix show
Cast: Some actors
Crew: hard-working people
KeyGrip
Gaffer
We've tried doing a -replace, but that doesn't get rid of the blank line. Additionally, we have to key off of the text to the left of the ":" in each line. The data to the right in most cases is dynamic, so we can't hard-code anything in for that.
function FormatData {
#FUNCTION FORMATS DATA BASED ON SECTIONS
#This is where we're replacing some words in the different sections
#Some of these we replace leave the blank lines behind
$data[$section[0]..$section[1]] -replace $oldword,$newword
$output | Set-Content $outputFile
}
$oldword = "oldword"
$newword = "newword"
FormatData
$oldword = "oldword1"
$newword = "" #leaves a blank line
FormatData
$oldword = "Some phrase: "
$newword = "" #leaves a blank line
FormatData
We just need a pointer in the right direction on how to delete/remove a blank line (or several lines) after specific text, please.

Since it looks like you are reading in an array and doing replacements, the array index will not go away. You can change the value to blank or white space, and it will still appear as a blank line when it is output to a file or console. Using the -replace operator with no replacement string, replaces the regex match with an empty string.
One approach could be to read the data in raw like Get-Content -Raw and then the text is read into memory as is, but you lose array indexing. At that point, you have full control over replacing newline characters if you choose to do so. A second approach would be to mark the blank lines you want to keep initially (<#####> in this example), do the replacements, remove the blank spaces, and then clean up the markings.
# Do this before any new word replacements happen. Pass this object into any functions.
$data = $data -replace "^\s*$","<#####>"
$data[$section[0]..$section[1]] -replace $oldword,$newword
($output | Where-Object {$_}) -replace "<#####>" | Set-Content $outputFile
Explanation:
Any value that is white space, blank, or null will evaluate to false in a PowerShell boolean conditional statement. Since the Where-Object script block performs a boolean conditional evaluation, you can simply just check the pipeline object ($_). Any value (in this case a line) that is not white space, null, or empty, will be true.
Below is a trivial example of the behavior:
$arr = "one","two","three"
$arr
one
two
three
$arr -replace "two"
one
three
$arr[1] = "two"
$arr
one
two
three
$arr -replace "two" | Where-Object {$_}
one
three
You can set a particular array value to $null and have it appear to go away. When writing to a file, it will appear as if the line has been removed. However, the array will still have that $null value. So you have to be careful.
$arr[1] = $null
$arr
one
three
$arr.count
3
If you use another collection type that supports resizing, you have the Remove method available. At that point though, you are adding extra logic to handle index removals and can't be enumerating the collection while you are changing its size.

If all you are doing is parsing a text file:
function FormatData {
$Input -replace $oldword,$newword
}
$FileContent = Get-Content "C:\TextFile.txt"
$OutputFile = "C:\TextOutput.txt"
$oldword = "oldword"
$newword = "newword"
$FileContent = $FileContent | FormatData
$oldword = '^(Crew: hard-working people)([`r`n]+).*oldword1.*[`r`n]+'
$newword = '$1$2$2' # Leaves a blank Line after Crew: hard-working people
$FileContent = $FileContent | FormatData
$oldword = '^.*oldword1.*[`r`n]+'
$newword = '' # Does not leave a blank Line
$FileContent = $FileContent | FormatData
$FileContent | Set-Content $outputFile

Related

Add Content to a specific line in powershell

I have seen this post:
Add-Content - append to specific line
But I cannot add these lines because "Array index is out of range".
What my script is doing:
Find the line
Loop through the array that contains the data i want to add
$file[$line]+=$data
$line+=1
Write to file
Should I create a new file content and then add each line of the original file to it?
IF so, do you know how to do that and how to stop and add my data in between?
Here is the part of my code where I try to add:
$f=Get-Content $path
$ct=$begin+1 #$begin is the line where I want to place content under
foreach($add in $add_to_yaml)
{
$f[$ct]+=$add
$ct+=1
}
$f | Out-File -FilePath $file
Let's break down your script and try to analyze what's going on:
$f = Get-Content $path
Get-Content, by default, reads text files and spits out 1 string per individual line in the file. If the file found at $path has 10 lines, the resulting value stored in $f will be an array of 10 string values.
Worth noting is that array indices in PowerShell (and .NET in general) are zero-based - to get the 10th line from the file, we'd reference index 9 in the array ($f[9]).
That means that if you want to concatenate stuff to the end of (or "under") line 10, you need to specify index 9. For this reason, you'll want to change the following line:
$ct = $begin + 1 #$begin is the line where i want to place content under
to
$ct = $begin
Now that we have the correct starting offset, let's look at the loop:
foreach($add in $add_to_yaml)
{
$f[$ct] += $add
$ct += 1
}
Assuming $add_to_yaml contains multiple strings, the loop body will execute more than once. Let's take a look at the first statement:
$f[$ct] += $add
We know that $f[$ct] resolves to a string - and strings have the += operator overloaded to mean "string concatenation". That means that the string value stored in $f[$ct] will be modified (eg. the string will become longer), but the array $f itself does not change its size - it still contains the same number of strings, just one of them is a little longer.
Which brings us to the crux of your issue, this line right here:
$ct += 1
By incrementing the index counter, you effectively "skip" to the next string for every value in $add_to_yaml - so if the number of elements you want to add exceeds the number of lines after $begin, you naturally reach a point "beyond the bounds" of the array before you're finished.
Instead of incrementing $ct, make sure you concatenate your new string values with a newline sequence:
$f[$ct] = $f[$ct],$add -join [Environment]::Newline
Putting it all back together, you end up with something like this (notice we can discard $ct completely, since its value is constant an equal to $begin anyway):
$f = Get-Content $path
foreach($add in $add_to_yaml)
{
$f[$begin] = $f[$begin],$add -join [Environment]::Newline
}
But wait a minute - all the strings in $add_to_yaml are simply going to be joined by newlines - we can do that in a single -join operation and get rid of the loop too!
$f = Get-Content $path
$f[$begin] = #($f[$begin];$add_to_yaml) -join [Environment]::Newline
$f | Out-File -FilePath $file
Much simpler :)

question about powershell text manipulation

I apologise for asking the very basic question as I am beginner in Scripting.
i was wondering why i am getting different result from two different source with the same formatting. Below are my sample
file1.txt
Id Name Members
122 RCP_VMWARE-DMZ-NONPROD DMZ_NPROD01_111
DMZ_NPROD01_113
123 RCP_VMWARE-DMZ-PROD DMZ_PROD01_110
DMZ_PROD01_112
124 RCP_VMWARE-DMZ-INT.r87351 DMZ_TEMPL_210.r
DMZ_DECOM_211.r
125 RCP_VMWARE-LAN-NONPROD NPROD02_20
NPROD03_21
NPROD04_22
NPROD06_24
file2.txt
Id Name Members
4 HPUX_PROD HPUX_PROD.3
HPUX_PROD.4
HPUX_PROD.5
i'm trying to display the Name column and with this code i'm able to display the file1.txt correctly.
PS C:\Share> gc file1.txt |Select-Object -skip 1 | foreach-object { $_.split(" ")[1]} | ? {$_.trim() -ne "" }
RCP_VMWARE-DMZ-NONPROD
RCP_VMWARE-DMZ-PROD
RCP_VMWARE-DMZ-INT.r87351
RCP_VMWARE-LAN-NONPROD
However with the file2 im getting a different output.
PS C:\Share> gc .\file2.txt |Select-Object -skip 1 | foreach-object { $_.split(" ")[1]} | ? {$_.trim() -ne "" }
4
changing the code to *$_.split(" ")[2]}* helps to display the output correctly
However, i would like to have just 1 code which can be apply for both situation.appreciate if you can help me to sort this.. thank you in advance...
This happens because the latter file has different format.
When examined carefully, one notices there are two spaces between 4 and HPUX_PROD strings:
Id Name Members
4 HPUX_PROD HPUX_PROD.3
^^^^
On the first file, there is a single space between number and string:
Id Name Members
122 RCP_VMWARE-DMZ-NONPROD DMZ_NPROD01_111
^^^
As how to fix the issue depends if you need to match both file formats, or if the other has simply a typing error.
The existing answers are helpful, but let me try to break it down conceptually:
.Split(" ") splits the input string by each individual space character, whereas what you're looking for is to split by runs of (one or more) spaces, given that your column values can be separated by more than one space.
For instance 'a b'.split(' ') results in 3 array elements - 'a', '', 'b' - because the empty string between the two spaces is considered an element too.
The .NET [string] type's .Split() method is based on verbatim strings or character sets and therefore doesn't allow you to express the concept of "one ore more spaces" as a split criterion, whereas PowerShell's regex-based -split operator does.
Conveniently, -split's unary form (see below) has this logic built in: it splits each input string by any nonempty run of whitespace, while also ignoring leading and trailing whitespace, which in your case obviates the need for a regex altogether.
This answer compares and contrasts the -split operator with string type's .Split() method, and makes the case for routinely using the former.
Therefore, a working solution (for both input files) is:
Get-Content .\file2.txt | Select-Object -Skip 1 |
Foreach-Object { if ($value = (-split $_)[1]) { $value } }
Note:
If the column of interest contains a value (at least one non-whitespace character), so must all preceding columns in order for the approach to work. Also, column values themselves must not have embedded whitespace (which is true for your sample input).
The if conditional both extracts the 2nd column value ((-split $_)[1]) and assigns it to a variable ($value = ), whose value then implicitly serves as a Boolean:
Any nonempty string is implicitly $true, in which case the extracted value is output in the associated block ({ $value }); conversely, an empty string results in no output.
For a general overview of PowerShell's implicit to-Boolean conversions, see this bottom section of this answer.
Since this sort-of looks like csv output with spaces as delimiter (but not quite), I think you could use ConvertFrom-Csv on this:
# read the file as string array, trim each line and filter only the lines that
# when split on 1 or more whitespace characters has more than one field
# then replace the spaces by a comma and treat it as CSV
# return the 'Name' column only
(((Get-Content -Path 'D:\Test\file1.txt').Trim() |
Where-Object { #($_ -split '\s+').Count -gt 1 }) -replace '\s+', ',' |
ConvertFrom-Csv).Name
Shorter, but because you are only after the Name column, this works too:
((Get-Content -Path 'D:\Test\file2.txt').Trim() -replace '\s+', ',' | ConvertFrom-Csv).Name -ne ''
Output for file1
RCP_VMWARE-DMZ-NONPROD
RCP_VMWARE-DMZ-PROD
RCP_VMWARE-DMZ-INT.r87351
RCP_VMWARE-LAN-NONPROD
Output for file2
HPUX_PROD

Remove list of phrases if they are present in a text file using Powershell

I'm trying to use a list of phrases (over 100) which I want to be removed from a text file (products.txt) which has lines of text inside it (they are tab separated / new line each). So that the results which do not match the list of phrases will be re-written in the current file.
#cd .\Desktop\
$productlist = #(
'example',
'juicebox',
'telephone',
'keyboard',
'manymore')
foreach ($product in $productlist) {
get-childitem products.txt | Select-String -Pattern $product -NotMatch | foreach {$_.line} | Out-File -FilePath .\products.txt
}
The above code does not remove the words listed in the $productlist, it simply outputs all links in products.txt again.
The lines inside of products.txt file are these:
productcatalog
product1example
juicebox038
telephoneiphone
telephoneandroid
randomitem
logitech
coffeetable
razer
Thank you for your help.
Here's my solution. You need the parentheses otherwise the input file will be in use when trying to write to the file. Select-string accepts an array of patterns. I wish I could pipe 'path' to set-content but it doesn't work.
$productlist = 'example', 'juicebox', 'telephone', 'keyboard', 'manymore'
(Select-String $productlist products.txt -NotMatch) | % line |
set-content products.txt
here's one way to do what you want. it's somewhat more direct than what yo used. [grin] it uses the way that PoSh can act on an entire collection when it is on the LEFT side of an operator.
what it does ...
fakes reading in a text file
when ready to do this in real life, replace the whole #region/#endregion block with a call to Get-Content.
builds the exclude list
converts that into a regex OR pattern
filters out the items that match the unwanted list
shows that resulting list
the code ...
#region >>> fake reading in a text file
# when ready to do this for real, replace the whole "#region/#endregion" block with a call to Get-Content
$ProductList = #'
productcatalog
product1example
juicebox038
telephoneiphone
telephoneandroid
randomitem
logitech
coffeetable
razer
'# -split [System.Environment]::NewLine
#endregion >>> fake reading in a text file
$ExcludedProductList = #(
'example'
'juicebox'
'telephone'
'keyboard'
'manymore'
)
$EPL_Regex = $ExcludedProductList -join '|'
$RemainingProductList = $ProductList -notmatch $EPL_Regex
$RemainingProductList
output ...
productcatalog
randomitem
logitech
coffeetable
razer

Powershell- match split and replace based on index

I have a file
AB*00*Name1First*Name1Last*test
BC*JCB*P1*Church St*Texas
CD*02*83*XY*Fax*LM*KY
EF*12*Code1*TX*1234*RJ
I need to replace the 5th element in the CD segment alone from LM to ET in each of the file in the folder. Element delimiter is * as mentioned in the above sample file content. I am new to PowerShell and tried a code as below but unfortunately it is not giving desired results. Can any of you please provide some help?
foreach($xfile in $inputfolder)
{
If ($_ match "^CD\*")
{
[System.IO.File]::ReadAllText($xfile).replace(($_.split("*")[5],"ET") | Set-Content $xfile
}
[System.IO.File]::WriteAllText($xfile),((Get-Content $xfile -join("~")))
}
here's a slightly different way to get there ... [grin] what it does ...
fakes reading in a test file
when ready to do this for real, remove the entire #region/#endregion block and use Get-Content.
sets the constants
iterates thru the imported text file lines
checks for a line that starts with the target pattern
if found ...
== escapes the old value with [regex]::Escape() to deal with the asterisks
== replaces the escaped old value with the new value
== outputs the new version of that line
if NOT found, outputs the line as-is
stores all the lines into the $OutStuff var
displays that on screen
the code ...
#region >>> fake reading in a plain text file
# in real life, use Get-Content
$InStuff = #'
AB*00*Name1First*Name1Last*test
BC*JCB*P1*Church St*Texas
CD*02*83*XY*Fax*LM*KY
EF*12*Code1*TX*1234*RJ
'# -split [System.Environment]::NewLine
#endregion >>> fake reading in a plain text file
$TargetLineStart = 'CD*'
$OldValue = '*LM*'
$NewValue = '*ET*'
$OutStuff = foreach ($IS_Item in $InStuff)
{
if ($IS_Item.StartsWith($TargetLineStart))
{
$IS_Item -replace [regex]::Escape($OldValue), $NewValue
}
else
{
$IS_Item
}
}
$OutStuff
output ...
AB*00*Name1First*Name1Last*test
BC*JCB*P1*Church St*Texas
CD*02*83*XY*Fax*ET*KY
EF*12*Code1*TX*1234*RJ
i will leave saving that to a new file [or overwriting the old one] to the user. [grin]
You could capture all that comes before the match in group 1, and match LM.
In the replacement use $1ET
^(CD*(?:[^*\r\n]+\*){5})LM\b
Regex demo
If you don't want to match LM literally, you could also match any other char than * or a newline.
^(CD*(?:[^*\r\n]+\*){5})[^*\r\n]+\b
Replace example
$allText = Get-Content -Raw file.txt
$allText -replace '(?m)^(CD*(?:[^*\r\n]+\*){5})LM\b','$1ET'
Output
AB*00*Name1First*Name1Last*test
BC*JCB*P1*Church St*Texas
CD*02*83*XY*Fax*ET*KY
EF*12*Code1*TX*1234*RJ

Change specific part of a string

I've got a .txt-File with some text in it:
Property;Value
PKG_GUID;"939de9ec-c9ac-4e03-8bef-7b7ab99bff74"
PKG_NAME;"WinBasics"
PKG_RELATED_TICKET;""
PKG_CUSTOMER_DNS_SERVERS;"12314.1231
PKG_CUSTOMER_SEARCH_DOMAINS;"ms.com"
PKG_JOIN_EXISTING_DOMAIN;"True"
PKG_DOMAINJOIN_DOMAIN;"ms.com"
PKG_DOMAINJOIN_USER;"mdoe"
PKG_DOMAINJOIN_PASSWD;"*******"
So now, is there a way to replace those *'s with e.g. numbers or sth. ?
If so, may you tell me how to do it?
Much like Rahul I would use RegEx as well. Considering the application I'd run Get-Content through a ForEach loop, and replace text as needed on a line-by-line basis.
Get-Content C:\Path\To\File.txt | ForEach{$_ -replace "(PKG_DOMAINJOIN_PASSWD;`")([^`"]+?)(`")", "`${1}12345678`$3"}
That would output:
Property;Value
PKG_GUID;"939de9ec-c9ac-4e03-8bef-7b7ab99bff74"
PKG_NAME;"WinBasics"
PKG_RELATED_TICKET;""
PKG_CUSTOMER_DNS_SERVERS;"12314.1231
PKG_CUSTOMER_SEARCH_DOMAINS;"ms.com"
PKG_JOIN_EXISTING_DOMAIN;"True"
PKG_DOMAINJOIN_DOMAIN;"ms.com"
PKG_DOMAINJOIN_USER;"mdoe"
PKG_DOMAINJOIN_PASSWD;"12345678"
On second thought, I don't know if I'd do that. I might import it as a CSV, update the property, and export the CSV again.
Import-CSV C:\Path\To\File.txt -Delimiter ";" |%{if($_.Property -eq "PKG_DOMAINJOIN_PASSWD"){$_.value = "12345678";$_}else{$_}|export-csv c:\path\to\newfile.txt -delimiter ";" -notype
If You are using Powershell V2.0 (Hopefully) you can try something like below. gc is short hand for get-content commandlet.
(gc D:\SO_Test\test.txt) -replace '\*+','12345678'
With this the resultant data would be as below (notice the last line)
Property;Value
PKG_GUID;"939de9ec-c9ac-4e03-8bef-7b7ab99bff74"
<Rest of the lines here>
PKG_DOMAINJOIN_USER;"mdoe"
PKG_DOMAINJOIN_PASSWD;"12345678" <-- Notice here; *'s changed to numbers
Rahul's answer was good, I just wanted to mention that *+ will replace all instances of a single * character or more, so it would match any other place there is at least one star. If what you posted is all you would ever expect for you sample data though this would be fine.
You could alter the regex match to make it more specific if it was needed by changing it to something like
\*{3,0}
which would match 3 or more stars, or very specific would be
(?<=")\*{3,}(?=")
which would replace 3 or more stars which are surrounded by double quotes.
Here's a function that uses regex lookahead and lookbehind zero-length assertions to replace named parameters in a string similar to your example:
function replace-x( $string, $name, $value ) {
$regex = "(?<=$([regex]::Escape($name));`").*(?=`")"
$string -replace $regex, $value
}
Its reusable for different settings in your file, e.g:
$settings = get-content $filename
$settings = replace-x $settings PKG_DOMAINJOIN_USER foo
$settings = replace-x $settings PKG_DOMAINJOIN_PASSWD bar