Extract string between two special characters in powershell - powershell

I need to extract a list with strings that are between two special characters (= and ;).
Below is an example of the file with line types and the needed strings in bold.
File is a quite big one, type is xml.
<type="string">data source=**HOL4624**;integrated sec>
<type="string">data source=**HOL4625**;integrated sec>
I managed to find the lines matching “data source=”, but how to get the name after?
Used code is below.
Get-content regsrvr.txt | select-string -pattern "data source="
Thank you very much!
<RegisteredServers:ConnectionStringWithEncryptedPassword type="string">data source=HOL4624;integrated security=True;pooling=False;multipleactiveresultsets=False;connect timeout=30;encrypt=False;trustservercertificate=False;packet size=4096</RegisteredServers:ConnectionStringWithEncryptedPassword>
<RegisteredServers:ConnectionStringWithEncryptedPassword type="string">data source=HOL4625;integrated security=True;pooling=False;multipleactiveresultsets=False;connect timeout=30;encrypt=False;trustservercertificate=False;packet size=4096</RegisteredServers:ConnectionStringWithEncryptedPassword>

The XML is not valid, so it's not a clean parse, anyway you can use string split with regex match:
$html = #"
<RegisteredServers:ConnectionStringWithEncryptedPassword type="string">data source=HOL4624;integrated security=True;pooling=False;multipleactiveresultsets=False;connect timeout=30;encrypt=False;trustservercertificate=False;packet size=4096</RegisteredServers:ConnectionStringWithEncryptedPassword>
<RegisteredServers:ConnectionStringWithEncryptedPassword type="string">data source=HOL4625;integrated security=True;pooling=False;multipleactiveresultsets=False;connect timeout=30;encrypt=False;trustservercertificate=False;packet size=4096</RegisteredServers:ConnectionStringWithEncryptedPassword>
"#
$html -split '\n' | % {$null = $_ -match 'data source=.*?;';$Matches[0]} |
% {($_ -split '=')[1] -replace ';'}
HOL4624
HOL4625

Since the connectionstring is for SQL Server, let's use .Net's SqlConnectionStringBuilder to do all the work for us. Like so,
# Test data, XML extraction is left as an exercise
$str = 'data source=HOL4624;integrated security=True;pooling=False;multipleactiveresultsets=False;connect timeout=30;encrypt=False;trustservercertificate=False;packet size=4096'
$builder = new-object System.Data.SqlClient.SqlConnectionStringBuilder($str)
# Check some parameters
$builder.DataSource
HOL4624
$builder.IntegratedSecurity
True

You can expand your try at using Select-String with a better use of regex. Also, you don't need to use Get-Content first. Instead you can use the -Path parameter of Select-String.
The following Code will read the given file and return the value between the = and ;:
(Select-String -Path "regsrvr.txt" -pattern "(?:data source=)(.*?)(?:;)").Matches | % {$_.groups[1].Value}
Pattern Explanation (RegEx):
You can use -pattern to capture an String given a matching RegEx. The Regex can be describe as such:
(?: opens an non-capturing Group
data source= matches the charactes data source=
) closes the non-capturing Group
(.*?) matches any amount of characters and saves them in a Group. The ? is the lazy operator. This will stop the matching part at the first occurence of the following group (in this case the ;).
(?:;) is the final non-capturing Group for the closing ;
Structuring the Output
Select-String returns a Microsoft.PowerShell.Commands.MatchInfo-Object.
You can find the matched Strings (the whole String and all captured groups) in there. We can also loop through this Output and return the Value of the captured Groups: | % {$_.groups[1].Value}
% is just an Alias for For-Each.
For more Informations look at the Select-String-Documentation and try your luck with some RegEx.

Related

powershell - get a single word out of a string

I have a string:
<UserInputs><UserInput Question="Groupname" Answer="<Values Count="1"><Value DisplayName="AllHummanresources" Id="af05c5d3-2312-c897-8439-08979d4d0a49" /></Values>" Type="System.SupportingItem.PortalControl.InstancePicker" /><UserInput Question="Ausgabe" Answer="Namen" Type="richtext" /></UserInputs>
I want to trim the string to get as result "AllHummanresources". So I need the word between DisplayName=" and " .
How can I achieve this goal?
I did not find a fitting example in the net :(
greetings
You can use Select-String Cmdlet along with regex.
$result = $your_string | Select-String -Pattern "DisplayName="(.*?)""
If the match is successful you can access the group by
Write-Host $result.Matches.Groups[1].Value
You could use the -replace operator so that you omit everything apart from that string.
$string -replace '.+(?:DisplayName=")(.*?)&quot.+', '$1'
Granted this is only as good as the consistency of your input string.

Remove list of phrases if they are present in a text file using Powershell

I'm trying to use a list of phrases (over 100) which I want to be removed from a text file (products.txt) which has lines of text inside it (they are tab separated / new line each). So that the results which do not match the list of phrases will be re-written in the current file.
#cd .\Desktop\
$productlist = #(
'example',
'juicebox',
'telephone',
'keyboard',
'manymore')
foreach ($product in $productlist) {
get-childitem products.txt | Select-String -Pattern $product -NotMatch | foreach {$_.line} | Out-File -FilePath .\products.txt
}
The above code does not remove the words listed in the $productlist, it simply outputs all links in products.txt again.
The lines inside of products.txt file are these:
productcatalog
product1example
juicebox038
telephoneiphone
telephoneandroid
randomitem
logitech
coffeetable
razer
Thank you for your help.
Here's my solution. You need the parentheses otherwise the input file will be in use when trying to write to the file. Select-string accepts an array of patterns. I wish I could pipe 'path' to set-content but it doesn't work.
$productlist = 'example', 'juicebox', 'telephone', 'keyboard', 'manymore'
(Select-String $productlist products.txt -NotMatch) | % line |
set-content products.txt
here's one way to do what you want. it's somewhat more direct than what yo used. [grin] it uses the way that PoSh can act on an entire collection when it is on the LEFT side of an operator.
what it does ...
fakes reading in a text file
when ready to do this in real life, replace the whole #region/#endregion block with a call to Get-Content.
builds the exclude list
converts that into a regex OR pattern
filters out the items that match the unwanted list
shows that resulting list
the code ...
#region >>> fake reading in a text file
# when ready to do this for real, replace the whole "#region/#endregion" block with a call to Get-Content
$ProductList = #'
productcatalog
product1example
juicebox038
telephoneiphone
telephoneandroid
randomitem
logitech
coffeetable
razer
'# -split [System.Environment]::NewLine
#endregion >>> fake reading in a text file
$ExcludedProductList = #(
'example'
'juicebox'
'telephone'
'keyboard'
'manymore'
)
$EPL_Regex = $ExcludedProductList -join '|'
$RemainingProductList = $ProductList -notmatch $EPL_Regex
$RemainingProductList
output ...
productcatalog
randomitem
logitech
coffeetable
razer

Wanted to get middle text from PSObject in Powershell

I have a PSObject which contains the following Values
AZREUS/MYVM-0.mydomain.com
AZREUS/MYVM-1.mydomain.com
I need only the VM name stored in a new PS-Object, how can I do that.
The list should return like below.
MYVM-0
MYVM-1
a simple way can be to use -replace operator:
$list = #('AZREUS/MYVM-0.mydomain.com','AZREUS/MYVM-1.mydomain.com')
$list -replace 'AZREUS/'-replace '\.mydomain\.com'
YannCha's answer is an efficient answer if your strings always begin with AZREUS/ and end with .mydomain.com. You can use a single -replace to get the desired result.
$obj = 'AZREUS/MYVM-0.mydomain.com','AZREUS/MYVM-1.mydomain.com'
$obj -replace '^AZREUS/(.*)\.mydomain\.com$','$1'
$1 represents capture group 1, which was created by the first parentheses grouping (). It contains .* contents. See Regex for regex explanation.
Taking the same approach further dynamically, you can use pattern matching. This removes all beginning characters including the first /. Then removes the first . and all characters after it.
$obj -replace '^.*/(.*?)\..*$','$1'
See Regex for regex explanation.
Note that if your object items are not strings, they will need to support being converted to strings or you will have to do that yourself before applying -replace.
Or using -match. Do them one at a time.
'AZREUS/MYVM-0.mydomain.com' -match 'AZREUS/(.*).mydomain.com' > $null; $matches[1]
MYVM-0
'AZREUS/MYVM-1.mydomain.com' -match 'AZREUS/(.*).mydomain.com' > $null; $matches[1]
MYVM-1

Powershell replace between start and end

I need to replace everything between two points.
$import = Get-Content C:\bookmarks.html
$newbody = Get-Content C:\newbookmarks.html
$remove = '(?<=<DT><H3 ADD_DATE=""1544626193"" LAST_MODIFIED=""154649885"">Import-IE</H3>).*?(?=</DL>)'
$import | %{$_.replace($remove,"$newbody")}
My problem is to get all content between start:
<DT><H3 ADD_DATE=""1544626193"" LAST_MODIFIED=""154649885"">Import-IE</H3>
and the end:
</DL>
incl multiple lines
Example html:
<DT><H3 ADD_DATE="1544626193" LAST_MODIFIED="1546498855">Import-IE</H3>
<DL><p>
<DT>golem.de
<DT>heise online
</DL>
Regards
A couple of changes needed to make this work:
One big multiline string
Since you want to do a replace over multiple lines, we need to makes sure all the lines are contained in the same string, so let's start with that - we can use the -Raw parameter switch with Get-Content:
$import = Get-Content C:\bookmarks.html -Raw
Exact pattern matching in regex
Next up we have the regex pattern itself - there's a few discrepancies between that and the sample content you've shown:
LAST_MODIFIED=""154649885"" # pattern has nested double-quotes and only one 5 at the end
LAST_MODIFIED="1546498855" # input uses just one pair of double-quotes and value has two 5's at the end
So let's fix that, and make sure the input string we're looking for is properly escaped while we're at it:
$remove = "(?<=$([regex]::Escape('<DT><H3 ADD_DATE="1544626193" LAST_MODIFIED="1546498855">Import-IE</H3>'))).*?(?=</DL>)"
String.Replace doesn't support regex
Then, we'll have to abandon the String.Replace() method that you're currently using - because it doesn't actually support regex - so we'll use the -replace operator instead:
$import -replace $remove,"$newbody"
Use -replace in SingleLine mode
The only thing we need now, is to instruct the regex parser to treat the input in SingleLine mode - so that .*? will capture newlines as well. This is super easy though, we just add an options flag s at the start of the regex pattern:
$import -replace "(?s)$remove","$newbody"
And that's it :)
$import = Get-Content C:\bookmarks.html -Raw
$newbody = Get-Content C:\newbookmarks.html
$remove = "(?<=$([regex]::Escape('<DT><H3 ADD_DATE="1544626193" LAST_MODIFIED="1546498855">Import-IE</H3>'))).*?(?=</DL>)"
$import -replace "(?s)$remove","$newbody"

How can I replace every comma with a space in a text file before a pattern using PowerShell

I have a text file with lines in this format:
FirstName,LastName,SSN,$x.xx,$x.xx,$x.xx
FirstName,MiddleInitial,LastName,SSN,$x.xx,$x.xx,$x.xx
The lines could be in either format. For example:
Joe,Smith,123-45-6789,$150.00,$150.00,$0.00
Jane,F,Doe,987-65-4321,$250.00,$500.00,$0.00
I want to basically turn everything before the SSN into a single field for the name thus:
Joe Smith,123-45-6789,$150.00,$150.00,$0.00
Jane F Doe,987-65-4321,$250.00,$500.00,$0.00
How can I do this using PowerShell? I think I need to use ForEach-Object and at some point replace "," with " ", but I don't know how to specify the pattern. I also don't know how to use a ForEach-Object with a $_.Where so that I can specify the "SkipUntil" mode.
Thanks very much!
Mathias is correct; you want to use the -replace operator, which uses regular expressions. I think this will do what you want:
$string -replace ',(?=.*,\d{3}-\d{2}-\d{4})',' '
The regular expression uses a lookahead (?=) to look for any commas that are followed by any number of any character (. is any character, * is any number of them including 0) that are then followed by a comma immediately followed by a SSN (\d{3}-\d{2}-\d{4}). The concept of "zero-width assertions", such as this lookahead, simply means that it is used to determine the match, but it not actually returned as part of the match.
That's how we're able to match only the commas in the names themselves, and then replace them with a space.
I know it's answered, and neatly so, but I tried to come up with an alternative to using a regex - count the number of commas in a line, then replace either the first one, or the first two, commas in the line.
But strings can't count how many times a character appears in them without using the regex engine(*), and replacements can't be done a specific number of times without using the regex engine(**), so it's not very neat:
$comma = [regex]","
Get-Content data.csv | ForEach {
$numOfCommasToReplace = $comma.Matches($_).Count - 4
$comma.Replace($_, ' ', $numOfCommasToReplace)
} | Out-File data2.csv
Avoiding the regex engine entirely, just for fun, gets me things like this:
Get-Content .\data.csv | ForEach {
$1,$2,$3,$4,$5,$6,$7 = $_ -split ','
if ($7) {"$1 $2 $3,$4,$5,$6,$7"} else {"$1 $2,$3,$4,$5,$6"}
} | Out-File data2.csv
(*) ($line -as [char[]] -eq ',').Count
(**) while ( #counting ) { # split/mangle/join }