Powershell Nested Replace - powershell

I am looping through multiple remote machines looking for a certain string to appear in a log file (other things are being collected from each device but for simplicity I've left that out). When I find them I want to return them and write to a central log, this is all working perfectly, except I want to tidy up the central log, by removing information from each string.
So I start with
**28-Jan-2021 01:31:49,TCPServer.run(),3,JMX TCP Server running on 8085
But want to save to Central Log
28-Jan-2021 01:31:349,JMX TCP 8085
And I can achieve this using the below, but surely there is a more succinct way to do this? (have played about with -Replace but no joy)
$JMXString8085 = $JMXString8085.Replace("TCPServer.run(),3,","")
$JMXString8085 = $JMXString8085.Replace("}","")
$JMXString8085 = $JMXString8085.Replace(" Server running on","")

[...] surely there is a more succinct way to do this? (have played about with -Replace but no joy)
There is, and -replace can indeed help us here. -replace is a regex operator, it performs text replacement using regular expressions - patterns we can use to describe strings that we might not be quite sure the exact contents of.
For a string like:
$string = '**28-Jan-2021 01:31:49,TCPServer.run(),3,JMX TCP Server running on 8085'
... we could describe the fields in between the commas, and use that to tell PowerShell to only preserve some of them for example:
PS ~> $string -replace '^\*\*([^,]+),[^,]+,[^,]+,([^,]+) Server running on (\d+)', '$1,$2 $3'
28-Jan-2021 01:31:49,JMX TCP 8085
The pattern I used in this example (^\*\*([^,]+),[^,]+,[^,]+,([^,]+) Server running on (\d+)) might seem a bit alien at first, so let's try and break it down:
^ # carret means "start of string"
\*\* # Then we look for two literal asterisks
( # This open parens means "start of a capture group"
[^,]+ # This means "1 or more characters that are NOT a comma", captures the timestamp
) # And this matching closing parens means "end of capture group"
, # Match a literal comma
[^,]+ # Same as above, this one will match "TCPServer.run()"
, # Same as above
[^,]+ # You probably get the point by now
, # ...
( # This open parens means "start ANOTHER capture group"
[^,]+? # The `?` at the end means "capture as few as possible", captures "JMX TCP"
) # And this matching closing parens still means "end of capture group"
Server... # Just match the literal string " Server running on "
( # Finally a THIRD capture group
\d+ # capturing "1 or more digits", in your case "8085"
) # and end of group
Since our pattern "captures" a number of substrings, we can now refer to these individual substrings in out substition pattern $1,$2 $3, and PowerShell will replace the $N references with the capture group value.

here is yet another way to do the job. [grin]
what it does ...
assigns the string to a $Var
chains .Replace() to get rid of the asterisks and the "Server" phrase
splits on the , chars
takes the 1st & 4th items from that split
joins them into one string with , [comma then space] for a delimiter
assigns that to a new $Var
displays the results
the code ...
$InString = '**28-Jan-2021 01:31:49,TCPServer.run(),3,JMX TCP Server running on 8085'
$OutString = ($InString.Replace('**', '').Replace('Server running on ', '').Split(',')[0, 3]) -join ', '
$OutString
output ...
28-Jan-2021 01:31:49, JMX TCP 8085

Related

How to filter powershell ssh output

I use the script described in PowerShell, read/write to SSH.NET streams to get info from my firewall. However I need to isolate only 4 values.
edit "test-connection1"
set vdom "test1"
set ip 192.168.0.1 255.255.255.0
set allowaccess ping
set inbandwidth 10000
set outbandwidth 10000
edit "test-connection2"
set vdom "test1"
set ip 192.168.1.1 255.255.255.0
set allowaccess ping
set inbandwidth 10000
set outbandwidth 10000
--
edit "test-connection3"
set vdom "test2"
set ip 192.168.2.1 255.255.255.0
set allowaccess ping
set inbandwidth 10000
set outbandwidth 10000
I need to show only bold values. New row needs to be created on each "edit". The values can be separated by comma.
I need to get following result
test-connection,test1,10000,10000
test-connection2,test1,10000,10000
test-connection3,test2,10000,10000
How can I manipulate output created in function
function ReadStream($reader)
{
$line = $reader.ReadLine();
while ($line -ne $null)
{
$line
$line = $reader.ReadLine()
}
}
If each edit is predictable, you can do something like the following:
switch -Regex -File edit.txt {
'^edit "(?<edit>[^"]+)"' {$edit = $matches.edit}
'^set vdom "(?<vdom>[^"]+)"' {$vdom = $matches.vdom}
'^set inbandwidth (?<inb>\d+)' {$inb = $matches.inb}
'^set outbandwidth (?<outb>\d+)' { $edit,$vdom,$inb,$matches.outb -join ","}
}
$matches is an automatic variable that contains the result of string comparison using the -match operator. The variable is overwritten every time there is a successful match. Capture group values can be accessed by their names using the member access operator .. This is why you see the syntax $matches.edit to retrieve the value of the edit capture group.
The switch statement can read a file line-by-line using the -File parameter and perform regex matching using the -Regex parameter.
If the format of the edit entries are predictable, we can assume that we will always have an edit, vdom, inbandwidth, and outbandwidth lines in that order. That means we can assume we will have matches in that order and can therefore output all of the edit block matches once outbandwidth is matched.
The regular expressions (regexes) are the values within the single quotes on each line. Below is a breakdown of the two types of expressions used:
^edit "(?<edit>[^"]+)":
^ asserts position at start of a line
edit matches the characters edit literally (case insensitive)
Named Capture Group edit (?[^"]+)
Match a single character not present in the list [^"]+, which means not a double quote character.
Quantifier (+) — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
" matches the character " literally (case sensitive)
^set inbandwidth (?<inb>\d+):
^ asserts position at start of a line
set inbandwidth matches the characters set inbandwidth literally (case insensitive)
Named Capture Group inb (?\d+)
\d+ matches a digit (equal to [0-9])
Quantifier (+) — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)

RegEx with powershell and then replace

So I have this variabe:
Frontend=http://xxxx-xxx.xxx.se/nexus/service/local/repositories/xxxxx_Releases/content/xxx/1.1.1.2/xxxx-1.1.01.2.nupkg
Now I want a powershell that takes out only the 1.1.1.2 (regex?)
Then this nr should be replaced in a property file (propfile.properties) looking like this
FE=1.1.1.1
So the 1.1.1.1 should be replaced with 1.1.1.2
Is this possible to get to work with powershell?
EDIT. The numbers looking for in the variable would be 1.X (0-3).X (0-X).X (0-X)
Extracting the numbers for a start:
$str = "Frontend=http://xxxx-xxx.xxx.se/nexus/service/local/repositories/xxxxx_Releases/content/xxx/1.1.1.2/xxxx-1.1.01.2.nupkg"
$str -match '^(.*?)((\d\.){3}\d)(.*)$'
$matches[2] # 1.1.1.2

String variable position being overwritten in write-host

If I run the below code, $SRN can be written as output or added to another variable, but trying to include either another variable or regular text causes it to be overwritten from the beginning of the line. I'm assuming it's something to do with how I'm assigning $autocode and $SRN initially but can't tell what it's trying to do.
# Load the property set to allow us to get to the email body.
$item.load($psPropertySet) # Load the data.
$bod = ($item.Body.Text -creplace '(?m)^\s*\r?\n','') -split "\n" # Get the body text, remove blank lines, split on line breaks to create an array (otherwise it is a single string).
$autocode = $bod[4].split('-')[2] # Get line 4 (should be Title), split on dash, look for 3rd element, this should contain our automation code.
$SRN = $bod[1] -replace 'ID: ','' # Get line 2 (should be ID), find and replace the preceding text.
# Skip processing if autocode does not match our list of handled ones.
if ($autocode -cin $autocodes)
{
write-host "$SRN $autocode"
write-host "$autocode $SRN"
write-host "$SRN test"
$var = "$SRN $autocode"
$var
}
The code results in this, you can see if $SRN isn't at the start of the line it is fine. Unsure where the extra spaces come from either:
KRNE8385
KRNE SR1788385
test8385
KRNE8385
I would expect to see this:
SR1788385 KRNE
KRNE SR1788385
SR1788385 test
SR1788385 KRNE
LotPings pointed me down the right path, both variables still had either "0D" or "\r" in them. My regex replace was only getting rid of them on blank lines, and I split the array on "\n" only. Changing line 3 in the original code to the below appears to have resolved the issue. First time seeing Format-Hex, but it appears to be excellent for troubleshooting such issues.
$bod = ($item.Body.Text -creplace '(?m)^\s*\r?\n','') -split "\r\n"

Perl $1 giving uninitialized value error

I am trying to extract a part of a string and put it into a new variable. The string I am looking at is:
maker-scaffold_26653|ref0016423-snap-gene-0.1
(inside a $gene_name variable)
and the thing I want to match is:
scaffold_26653|ref0016423
I'm using the following piece of code:
my $gene_name;
my $scaffold_name;
if ($gene_name =~ m/scaffold_[0-9]+\|ref[0-9]+/) {
$scaffold_name = $1;
print "$scaffold_name\n";
}
I'm getting the following error when trying to execute:
Use of uninitialized value $scaffold_name in concatenation (.) or string
I know that the pattern is right, because if I use $' instead of $1 I get
-snap-gene-0.1
I'm at a bit of a loss: why will $1 not work here?
If you want to use a value from the matching you have to make () arround the character in regex
To expand on Jens' answer, () in a regex signifies an anonymous capture group. The content matched in a capture group is stored in $1-9+ from left to right, so for example,
/(..):(..):(..)/
on an HH:MM:SS time string will store hours, minutes, and seconds in $1, $2, $3 respectively. Naturally this begins to become unwieldy and is not self-documenting, so you can assign the results to a list instead:
my ($hours, $mins, $secs) = $time =~ m/(..):(..):(..)/;
So your example could bypass the use of $ variables by doing direct assignment:
my ($scaffold_name) = $gene_name =~ m/(scaffold_[0-9]+[|]ref[0-9]+)/;
# $scaffold_name now contains 'scaffold_26653|ref0016423'
You can even get rid of the ugly =~ binding by using for as a topicalizer:
my $scaffold_name;
for ($gene_name) {
($scaffold_name) = m/(scaffold_\d+[|]ref\d+)/;
print $scaffold_name;
}
If things start to get more complex, I prefer to use named capture groups (introduced in Perl v5.10.0):
$gene_name =~ m{
(?<scaffold_name> # ?<name> creates a named capture group
scaffold_\d+? # 'scaffold' and its trailing digits
[|] # Literal pipe symbol
ref\d+ # 'ref' and its trailing digits
)
}xms; # The x flag lets us write more readable regexes
print $+{scaffold_name}, "\n";
The results of named capture groups are stored in the magic hash %+. Access is done just like any other hash lookup, with the capture groups as the keys. %+ is locally scoped in the same way the $ are, so it can be used as a drop-in replacement for them in most situations.
It's overkill for this particular example, but as regexes start to get larger and more complicated, this saves you the trouble of either having to scroll all the way back up and count anonymous capture groups from left to right to find which of those darn $ variables is holding the capture you wanted, or scan across a long list assignment to find where to add a new variable to hold a capture that got inserted in the middle.
My personal rule of thumb is to assign the results of anonymous captured to descriptively named lexically scoped variables for 3 or less captures, then switch to using named captures, comments, and indentation in regexes when more are necessary.

Maximum number of captured groups in perl regex

Given a regex in perl, how do I find the maximum number of captured groups in that regex? I know that I can use $1, $2 etc to reference the first, second etc captured groups. But how do I find the maximum number of such groups? By captured groups, I mean the string matched by a regex in paranthesis. For ex: if the regex is (a+)(b+)c+ then the string "abc" matches that regex. And the first captured group will be $1, second will be $2.
amon hinted at the answer to this question when he mentioned the %+ hash. But what you need is the #+ array:
#+
This array holds the offsets of the ends of the last successful submatches in the currently active dynamic scope. $+[0] is the offset into the string of the end of the entire match. This is the same value as what the pos function returns when called on the variable that was matched against. The nth element of this array holds the offset of the nth submatch, so $+1 is the offset past where $1 ends, $+[2] the offset past where $2 ends, and so on. You can use $#+ to determine how many subgroups were in the last successful match. See the examples given for the #- variable. [enphasis added]
$re = "(.)" x 500;
$str = "a" x 500;
$str =~ /$re/;
print "Num captures is $#+"; # outputs "Num captures is 500"
The number of captures is effectivly unlimited. While there can only be nine captures that you can access with the $1–$9 variables, you can use more capture groups.
If you have more than a few capture groups, you might want to use named captures, like
my $str = "foobar";
if ($str =~ /(?<name>fo+)/) {
say $+{name};
}
Output: foo. You can access the values of named captures via the %+ hash.
You can use code like the following to give you a count of capture groups:
$regex = qr/..../; # Some arbitrary regex with capture groups
my #capture = '' =~ /$regex|()/; # A successful match incorporating the regex
my $groups_in_my_regex = scalar(#capture) - 1;
The way it works is that it performs a match which must succeed and then checks how many capture groups were created. (An extra one is created due to the trailing |()
Edit: Actually, it doesn't seem to be necessary to append an extra capture group. Just so long as the match is guaranteed to succeed then the array will contain an entry for every capture group.
So we can change the 2nd and 3rd lines to:
my #capture = '' =~ /$regex|/; # A successful match incorporating the regex
my $groups_in_my_regex = scalar(#capture);
See also:
Count the capture groups in a qr regex?