Powershell replace between start and end - powershell

I need to replace everything between two points.
$import = Get-Content C:\bookmarks.html
$newbody = Get-Content C:\newbookmarks.html
$remove = '(?<=<DT><H3 ADD_DATE=""1544626193"" LAST_MODIFIED=""154649885"">Import-IE</H3>).*?(?=</DL>)'
$import | %{$_.replace($remove,"$newbody")}
My problem is to get all content between start:
<DT><H3 ADD_DATE=""1544626193"" LAST_MODIFIED=""154649885"">Import-IE</H3>
and the end:
</DL>
incl multiple lines
Example html:
<DT><H3 ADD_DATE="1544626193" LAST_MODIFIED="1546498855">Import-IE</H3>
<DL><p>
<DT>golem.de
<DT>heise online
</DL>
Regards

A couple of changes needed to make this work:
One big multiline string
Since you want to do a replace over multiple lines, we need to makes sure all the lines are contained in the same string, so let's start with that - we can use the -Raw parameter switch with Get-Content:
$import = Get-Content C:\bookmarks.html -Raw
Exact pattern matching in regex
Next up we have the regex pattern itself - there's a few discrepancies between that and the sample content you've shown:
LAST_MODIFIED=""154649885"" # pattern has nested double-quotes and only one 5 at the end
LAST_MODIFIED="1546498855" # input uses just one pair of double-quotes and value has two 5's at the end
So let's fix that, and make sure the input string we're looking for is properly escaped while we're at it:
$remove = "(?<=$([regex]::Escape('<DT><H3 ADD_DATE="1544626193" LAST_MODIFIED="1546498855">Import-IE</H3>'))).*?(?=</DL>)"
String.Replace doesn't support regex
Then, we'll have to abandon the String.Replace() method that you're currently using - because it doesn't actually support regex - so we'll use the -replace operator instead:
$import -replace $remove,"$newbody"
Use -replace in SingleLine mode
The only thing we need now, is to instruct the regex parser to treat the input in SingleLine mode - so that .*? will capture newlines as well. This is super easy though, we just add an options flag s at the start of the regex pattern:
$import -replace "(?s)$remove","$newbody"
And that's it :)
$import = Get-Content C:\bookmarks.html -Raw
$newbody = Get-Content C:\newbookmarks.html
$remove = "(?<=$([regex]::Escape('<DT><H3 ADD_DATE="1544626193" LAST_MODIFIED="1546498855">Import-IE</H3>'))).*?(?=</DL>)"
$import -replace "(?s)$remove","$newbody"

Related

Extract string between two special characters in powershell

I need to extract a list with strings that are between two special characters (= and ;).
Below is an example of the file with line types and the needed strings in bold.
File is a quite big one, type is xml.
<type="string">data source=**HOL4624**;integrated sec>
<type="string">data source=**HOL4625**;integrated sec>
I managed to find the lines matching “data source=”, but how to get the name after?
Used code is below.
Get-content regsrvr.txt | select-string -pattern "data source="
Thank you very much!
<RegisteredServers:ConnectionStringWithEncryptedPassword type="string">data source=HOL4624;integrated security=True;pooling=False;multipleactiveresultsets=False;connect timeout=30;encrypt=False;trustservercertificate=False;packet size=4096</RegisteredServers:ConnectionStringWithEncryptedPassword>
<RegisteredServers:ConnectionStringWithEncryptedPassword type="string">data source=HOL4625;integrated security=True;pooling=False;multipleactiveresultsets=False;connect timeout=30;encrypt=False;trustservercertificate=False;packet size=4096</RegisteredServers:ConnectionStringWithEncryptedPassword>
The XML is not valid, so it's not a clean parse, anyway you can use string split with regex match:
$html = #"
<RegisteredServers:ConnectionStringWithEncryptedPassword type="string">data source=HOL4624;integrated security=True;pooling=False;multipleactiveresultsets=False;connect timeout=30;encrypt=False;trustservercertificate=False;packet size=4096</RegisteredServers:ConnectionStringWithEncryptedPassword>
<RegisteredServers:ConnectionStringWithEncryptedPassword type="string">data source=HOL4625;integrated security=True;pooling=False;multipleactiveresultsets=False;connect timeout=30;encrypt=False;trustservercertificate=False;packet size=4096</RegisteredServers:ConnectionStringWithEncryptedPassword>
"#
$html -split '\n' | % {$null = $_ -match 'data source=.*?;';$Matches[0]} |
% {($_ -split '=')[1] -replace ';'}
HOL4624
HOL4625
Since the connectionstring is for SQL Server, let's use .Net's SqlConnectionStringBuilder to do all the work for us. Like so,
# Test data, XML extraction is left as an exercise
$str = 'data source=HOL4624;integrated security=True;pooling=False;multipleactiveresultsets=False;connect timeout=30;encrypt=False;trustservercertificate=False;packet size=4096'
$builder = new-object System.Data.SqlClient.SqlConnectionStringBuilder($str)
# Check some parameters
$builder.DataSource
HOL4624
$builder.IntegratedSecurity
True
You can expand your try at using Select-String with a better use of regex. Also, you don't need to use Get-Content first. Instead you can use the -Path parameter of Select-String.
The following Code will read the given file and return the value between the = and ;:
(Select-String -Path "regsrvr.txt" -pattern "(?:data source=)(.*?)(?:;)").Matches | % {$_.groups[1].Value}
Pattern Explanation (RegEx):
You can use -pattern to capture an String given a matching RegEx. The Regex can be describe as such:
(?: opens an non-capturing Group
data source= matches the charactes data source=
) closes the non-capturing Group
(.*?) matches any amount of characters and saves them in a Group. The ? is the lazy operator. This will stop the matching part at the first occurence of the following group (in this case the ;).
(?:;) is the final non-capturing Group for the closing ;
Structuring the Output
Select-String returns a Microsoft.PowerShell.Commands.MatchInfo-Object.
You can find the matched Strings (the whole String and all captured groups) in there. We can also loop through this Output and return the Value of the captured Groups: | % {$_.groups[1].Value}
% is just an Alias for For-Each.
For more Informations look at the Select-String-Documentation and try your luck with some RegEx.

How can I replace every comma with a space in a text file before a pattern using PowerShell

I have a text file with lines in this format:
FirstName,LastName,SSN,$x.xx,$x.xx,$x.xx
FirstName,MiddleInitial,LastName,SSN,$x.xx,$x.xx,$x.xx
The lines could be in either format. For example:
Joe,Smith,123-45-6789,$150.00,$150.00,$0.00
Jane,F,Doe,987-65-4321,$250.00,$500.00,$0.00
I want to basically turn everything before the SSN into a single field for the name thus:
Joe Smith,123-45-6789,$150.00,$150.00,$0.00
Jane F Doe,987-65-4321,$250.00,$500.00,$0.00
How can I do this using PowerShell? I think I need to use ForEach-Object and at some point replace "," with " ", but I don't know how to specify the pattern. I also don't know how to use a ForEach-Object with a $_.Where so that I can specify the "SkipUntil" mode.
Thanks very much!
Mathias is correct; you want to use the -replace operator, which uses regular expressions. I think this will do what you want:
$string -replace ',(?=.*,\d{3}-\d{2}-\d{4})',' '
The regular expression uses a lookahead (?=) to look for any commas that are followed by any number of any character (. is any character, * is any number of them including 0) that are then followed by a comma immediately followed by a SSN (\d{3}-\d{2}-\d{4}). The concept of "zero-width assertions", such as this lookahead, simply means that it is used to determine the match, but it not actually returned as part of the match.
That's how we're able to match only the commas in the names themselves, and then replace them with a space.
I know it's answered, and neatly so, but I tried to come up with an alternative to using a regex - count the number of commas in a line, then replace either the first one, or the first two, commas in the line.
But strings can't count how many times a character appears in them without using the regex engine(*), and replacements can't be done a specific number of times without using the regex engine(**), so it's not very neat:
$comma = [regex]","
Get-Content data.csv | ForEach {
$numOfCommasToReplace = $comma.Matches($_).Count - 4
$comma.Replace($_, ' ', $numOfCommasToReplace)
} | Out-File data2.csv
Avoiding the regex engine entirely, just for fun, gets me things like this:
Get-Content .\data.csv | ForEach {
$1,$2,$3,$4,$5,$6,$7 = $_ -split ','
if ($7) {"$1 $2 $3,$4,$5,$6,$7"} else {"$1 $2,$3,$4,$5,$6"}
} | Out-File data2.csv
(*) ($line -as [char[]] -eq ',').Count
(**) while ( #counting ) { # split/mangle/join }

Powershell: Comparing a block of text to a file

It certainly seemed like a simple enough task but for whatever reason this doesn't work:
#Verifies that the firefox proxy setting have been applied
#locate Prefsjs file
$PrefsFiles = Get-Item -Path ($env:SystemDrive+"\Users\*\AppData\Roaming\Mozilla\Firefox\Profiles\*\prefs.js")
#read in Prefsjs
$Prefsjs = (Get-Content $PrefsFiles)
#Block to compare
$Update= #"
user_pref("network.proxy.http", "0.0.0.0");
user_pref("network.proxy.http_port", 80);
"#
($Prefsjs -contains $Update)
The last line should return a true because the text actually does exist in $Prefsjs... Any ideas?
It's not going to match because you're comparing a multi-line string to an array of single line strings.
You need to compare like objects, which means $Prefsjs also needs to be a single, multi-line string. The easiest way to do that is to add the -Raw switch to your Get-Content:
#read in Prefsjs
$Prefsjs = (Get-Content $PrefsFiles -Raw)
But now $Prefsjs is not an array any more, so you can't use -Contains. It's now just a single string, so you can use the string contains() method to accomplish the same thing:
$Prefsjs.contains($Update)

Change specific part of a string

I've got a .txt-File with some text in it:
Property;Value
PKG_GUID;"939de9ec-c9ac-4e03-8bef-7b7ab99bff74"
PKG_NAME;"WinBasics"
PKG_RELATED_TICKET;""
PKG_CUSTOMER_DNS_SERVERS;"12314.1231
PKG_CUSTOMER_SEARCH_DOMAINS;"ms.com"
PKG_JOIN_EXISTING_DOMAIN;"True"
PKG_DOMAINJOIN_DOMAIN;"ms.com"
PKG_DOMAINJOIN_USER;"mdoe"
PKG_DOMAINJOIN_PASSWD;"*******"
So now, is there a way to replace those *'s with e.g. numbers or sth. ?
If so, may you tell me how to do it?
Much like Rahul I would use RegEx as well. Considering the application I'd run Get-Content through a ForEach loop, and replace text as needed on a line-by-line basis.
Get-Content C:\Path\To\File.txt | ForEach{$_ -replace "(PKG_DOMAINJOIN_PASSWD;`")([^`"]+?)(`")", "`${1}12345678`$3"}
That would output:
Property;Value
PKG_GUID;"939de9ec-c9ac-4e03-8bef-7b7ab99bff74"
PKG_NAME;"WinBasics"
PKG_RELATED_TICKET;""
PKG_CUSTOMER_DNS_SERVERS;"12314.1231
PKG_CUSTOMER_SEARCH_DOMAINS;"ms.com"
PKG_JOIN_EXISTING_DOMAIN;"True"
PKG_DOMAINJOIN_DOMAIN;"ms.com"
PKG_DOMAINJOIN_USER;"mdoe"
PKG_DOMAINJOIN_PASSWD;"12345678"
On second thought, I don't know if I'd do that. I might import it as a CSV, update the property, and export the CSV again.
Import-CSV C:\Path\To\File.txt -Delimiter ";" |%{if($_.Property -eq "PKG_DOMAINJOIN_PASSWD"){$_.value = "12345678";$_}else{$_}|export-csv c:\path\to\newfile.txt -delimiter ";" -notype
If You are using Powershell V2.0 (Hopefully) you can try something like below. gc is short hand for get-content commandlet.
(gc D:\SO_Test\test.txt) -replace '\*+','12345678'
With this the resultant data would be as below (notice the last line)
Property;Value
PKG_GUID;"939de9ec-c9ac-4e03-8bef-7b7ab99bff74"
<Rest of the lines here>
PKG_DOMAINJOIN_USER;"mdoe"
PKG_DOMAINJOIN_PASSWD;"12345678" <-- Notice here; *'s changed to numbers
Rahul's answer was good, I just wanted to mention that *+ will replace all instances of a single * character or more, so it would match any other place there is at least one star. If what you posted is all you would ever expect for you sample data though this would be fine.
You could alter the regex match to make it more specific if it was needed by changing it to something like
\*{3,0}
which would match 3 or more stars, or very specific would be
(?<=")\*{3,}(?=")
which would replace 3 or more stars which are surrounded by double quotes.
Here's a function that uses regex lookahead and lookbehind zero-length assertions to replace named parameters in a string similar to your example:
function replace-x( $string, $name, $value ) {
$regex = "(?<=$([regex]::Escape($name));`").*(?=`")"
$string -replace $regex, $value
}
Its reusable for different settings in your file, e.g:
$settings = get-content $filename
$settings = replace-x $settings PKG_DOMAINJOIN_USER foo
$settings = replace-x $settings PKG_DOMAINJOIN_PASSWD bar

Edit text between two lines using powershell

I want to change this text
PortNumber=10001
;UserName=xxxxxxxxx
;Password=xxxxxxxxx
CiPdfPath=xxxxx
into this
PortNumber=10001
UserName=xxxxxxxxx
Password=xxxxxxxxx
CiPdfPath=xxxxx
I cannot simply search for ;Username=xxxx and ;Password=xxxx because they exist multiple times in the file and need to be commented on some places.
I found the next command
$file = Get-Content "Test.ini" -raw
$file -replace "(?m)^PortNumber=10001[\n\r]+;UserName=xxxx[\r\n]+;Password=xxxx","PortNumber=10001 `r`nUserName=xxxxx`r`nPassword=xxxxx"
And it worked!
But maybe it can be simplyfied
If you use the (?ms) (multiline-singleline) option and here-strings, you can do most of the work with copy/paste:
$string =
#'
PortNumber=10001
;UserName=xxxxxxxxx
;Password=xxxxxxxxx
CiPdfPath=xxxxx
'#
$regex =
#'
(?ms)PortNumber=10001
;UserName=xxxxxxxxx
;Password=xxxxxxxxx
CiPdfPath=xxxxx
'#
$replace =
#'
PortNumber=10001
UserName=xxxxxxxxx
Password=xxxxxxxxx
CiPdfPath=xxxxx
'#
$string -replace $regex,$replace
PortNumber=10001
UserName=xxxxxxxxx
Password=xxxxxxxxx
CiPdfPath=xxxxx
Why don't you search full text which you'd like to replace?
So find:
PortNumber=10001
;UserName=xxxxxxxxx
;Password=xxxxxxxxx
CiPdfPath=xxxxx
and replace with:
PortNumber=10001
UserName=xxxxxxxxx
Password=xxxxxxxxx
CiPdfPath=xxxxx
You can use regular expression to express irrelevant characters
http://www.regular-expressions.info/powershell.html
http://www.powershelladmin.com/wiki/Powershell_regular_expressions
You could use Regex.
Or even simpler, depending on your requirement;
If you know the linenumber of the lines you want to replace, you could easily do this do replace the certain lines:
Given that the file format is the text you've pasted (e.g. username on line 2 and password on line 3), read the file into a line buffer. Replace line 2 and 3 and set the content back to the file.
$lines=(Get-Content .\Test.txt)
$lines[1]= $lines[1].Replace(";","")
$lines[2]= $lines[2].Replace(";","")
$lines|Set-Content .\Test.txt
I might be misunderstading the nature of the question but are you not simply trying to remove the leading semicolons? Is it important to seach for those strings exclusivley?
$file = Get-Content "Test.ini" -raw
$file -replace "(?sm)^;"
$file -replace "(?smi)^;(?=(username|password))"
Both examples should produce the same output. The first will match all leading semicolons. The second will match leading semicolons if the are followed, using a lookahead, by either username or password.