Powershell -replace for perl users: how to replace-match-$1 in PowerShell? - powershell

Take a look at this perl code:
$x = "this is a test 123 ... this is only a test"
$x =~ s/"test\s+(\d+)"/"test $1"/
print $x
this is a test 123 ... this is only a test
Notice that I match a number with regex (\d+), it gets put into the temporary variable $1, then it gets put in the output string as an expansion of $1 temporary variable...
Is there a way to do above perl replacement in powershell? I'm thinking if its possible then its something like this??
$x = "this is a test 123 ... this is only a test"
$x = $x -replace "test\s+(\d+)", "test $Matches[1]"
write-host $x
this is a test 123 ... this is only a test
Of course it doesn't work... I was curious how to do this since i have a lot of perl scripts to convert to PowerShell..

Not that different in PowerShell:
$x = "this is a test 123 ... this is only a test"
$x = $x -replace 'test\s+(\d+)', 'test $1'
Write-Host $x
Output:
this is a test 123 ... this is only a test
Regex details:
test Match the characters “test” literally
\s Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
( Match the regular expression below and capture its match into backreference number 1
\d Match a single digit 0..9
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
You can find out more here

There's another way in powershell 7 with script blocks { }. You also need the subexpression operator $( ) to refer to object properties or arrays inside a string. I'm just saving $_ to $a so you can look at it. It's a little more convoluted, but sometimes the timing of '$1' (note the single quotes) isn't what you need, like if you want to add to it.
"this is a test 123 ... this is only a test" -replace "test\s+(\d+)",
{ $a = $_ ; "test $(1 + $_.groups[1].value)" } # 1 + 123
this is a test 124 ... this is only a test
$a
Groups : {0, 1}
Success : True
Name : 0
Captures : {0}
Index : 10
Length : 13
Value : test 123
ValueSpan :

Related

Can PowerShell replace a case sensitive portion of text it found using -clike?

Let's say I have two addresses:
123 Newark Road Ne
987 Ne Netherland Avenue
I need to update the directional portion from Ne to NE. However, I don't want to update Newark to NEwark and the same for Netherland. I think I can find all the instances easy enough with this IF statement in a loop:
$testAddress = '123 Newark Road Ne'
if (($testAddress -clike '* Ne') -or ($testAddress -clike '* Ne *')){
#code to replace Ne
}
But how do I go about replacing it? I can't use a -creplace '* Ne', '* NE'. Finding the index of '* Ne' just gives me -1 so I don't think I can do anything with that. I'm sure there's an easy concept that I'm just not coming across.
You can use Regular Expressions to replace a certain part of your input with something which is not possible in the substitution operand in a regex expression by design (like uppercasing in .NET) by using a MatchEvaluator, which is constructed in PowerShell like a scriptblock.
With that MatchEvaluator you can manipulate the matched part like you want, hence you are not restricted to anything when it comes to manipulation.
Beginning with PowerShell 6 you can even use it directly with the -replace and -creplace operators.
PowerShell versions below 6 does not have this option, but it's still possible using the .NET Regex Replace Method [regex]::Replace() with a MatchEvaluator.
PS 5.1
$textToReplace = 'Ne 123 Newark Road Ne', '123 Newark Road Ne', '987 Ne Netherland Avenue'
foreach ($text in $textToReplace) {
# using a scriptblock as System.Text.RegularExpressions.MatchEvaluator
# the param() part is mandatory. Everything that follows is the return for that particular match
[regex]::Replace($text, '(?<!\w)Ne(?!\w)', { param($regexMatch) $regexMatch.Value.ToUpper() })
}
PS 6+
$textToReplace = 'Ne 123 Newark Road Ne', '123 Newark Road Ne', '987 Ne Netherland Avenue'
foreach ($text in $textToReplace) {
$text -creplace '(?<!\w)Ne(?!\w)', { $_.Value.toUpper() }
}
Regular Expression Pattern Explanation
The pattern (?<!\w)Ne(?!\w) matches all words Ne for which the preceding and following character is not a word character using a negative lookbehind (?<!) and negative lookahead (?!) group construct.
\w (Word) in .NET includes all Unicode characters of the following categories:
MSFT: Character classes in regular expressions -> Word character: \w:
These include, but are not limited to:
a-z and variants like è
A-Z and variants like À
0-9
_
cyrillic characters
chinese characters
...
In short, \w captures almost all word characters which are represented in the Unicode character set.
Resources
MSFT: Replacement with a script block in PS6+
#('123 Newark Road Ne'
'987 Ne Netherland Avenue')|foreach{
switch -regex ($_)
{
'Ne$'{$_ -replace 'Ne$','NE'}
' Ne '{$_ -replace ' Ne ',' NE '}
Default {$_}
}
}
Or use word boundaries around Ne:
'123 Newark Road Ne','987 Ne Netherland Avenue' | ForEach-Object {
if ($_ -cmatch '(\bNe\b)') { $_ -creplace '(\bNe\b)', $Matches[1].ToUpper() }
else { $_ }
}
Output
123 Newark Road NE
987 NE Netherland Avenue

Why does `print "XYZ$_"` work but `print "$_XYZ"` doesn't?

For input abc, the code
perl -ne 'print "XYZ$_"'
prints XYZabc, but after switching the order of $_ and XYZ, i.e.
perl -ne 'print "$_XYZ"'
it prints nothing. Why?
XYZ can be part of a variable name, so $_XYZ is a variable name, rather than $_ followed by a literal XYZ.
You can split the string up:
perl -ne 'print $_ . "XYZ"'
Perl identifiers may contain any letters, digits, or underscore, so you are asking perl to print the value of the variable $_XYZ, which doesn't exist
You may surround the name of the variable with braces { ... } to separate it from any surrounding characters, like so
perl -ne 'print "${_}XYZ"'

PERL : Using Text::Wrap and specify the end of line

Yes, I'm re-writing cowsay :)
#!/usr/bin/perl
use Text::Wrap;
$Text::Wrap::columns = 40;
my $FORTUNE = "The very long sentence that will be outputted by another command and it can be very long so it is word-wrapped The very long sentence that will be outputted by another command and it can be very long so it is word-wrapped";
my $TOP = " _______________________________________
/ \\
";
my $BOTTOM = "\\_______________________________________/
";
print $TOP;
print wrap('| ', '| ', $FORTUNE) . "\n";
print $BOTTOM;
Produces this
_______________________________________
/ \
| The very long sentence that will be
| outputted by another command and it
| can be very long so it is
| word-wrapped The very long sentence
| that will be outputted by another
| command and it can be very long so it
| is word-wrapped
\_______________________________________/
How can I get this ?
_______________________________________
/ \
| The very long sentence that will be |
| outputted by another command and it |
| can be very long so it is |
| word-wrapped The very long sentence |
| that will be outputted by another |
| command and it can be very long so it |
| is word-wrapped |
\_______________________________________/
I could not find a way in the documentation, but you can apply a small hack if you save the string. It is possible to assign a new line ending by using a package variable:
$Text::Wrap::separator = "|$/";
You also need to prevent the module from expanding tabs and messing with the character count:
$Text::Wrap::unexpand = 0;
This is simply a pipe | followed by the input record separator $/ (newline most often). This will add a pipe to the end of the line, but no padding space, which will have to be added manually:
my $text = wrap('| ', '| ', $FORTUNE) . "\n";
$text =~ s/(^.+)\K\|/' ' x ($Text::Wrap::columns - length($1)) . '|'/gem;
print $text;
This will match the beginning of each line, ending with a |, add the padding space by multiplying a space by columns minus length of matched string. We use the /m modifier to make ^ match newlines inside the string. .+ by itself will not match newlines, which means each match will be an entire line. The /e modifier will "eval" the replacement part as code, not a string.
Note that it is somewhat of a quick hack, so bugs are possible.
If you're willing to download a more powerful module, you can use Text::Format. It has a lot more options for customizing, but the most relevant one is rightFill which fills the rest of the columns in each line with spaces.
Unfortunately, you can't customize the left and right sides with non-space characters. You can use a workaround by doing regex substitutions, just as Text::NWrap does in its source code.
#!/usr/bin/env perl
use utf8;
use Text::Format;
chop(my $FORTUNE = "The very long sentence that will be outputted by another command and it can be very long so it is word-wrapped " x 2);
my $TOP = "/" . '‾'x39 . "\\\n";
my $BOTTOM = "\\_______________________________________/\n";
my $formatter = Text::Format->new({ columns => 37, firstIndent => 0, rightFill => 1 });
my $text = $formatter->format($FORTUNE);
$text =~ s/^/| /mg;
$text =~ s/\n/ |\n/mg;
print $TOP;
print $text;
print $BOTTOM;

another line split (powershell or other scripting tools under windows)

i have a log file in hand, looks like this:
0226 111641 (1911) 0 some space separated message containing whatever letters and marks
I need to import it to database, to use filters on it, when troubleshooting is needed. Currently i think powershell is best selection to achieve this, but i'm too green to know specifically how to do it so it can perform actually. I tried to do it like this:
$file = Get-Content "test.txt"
foreach ($line in $file)
{
#Write-Host $line
$a = $line
$month1 = $a[0..1]
$month2 = "$month1"
$month2 = $month2.ToString()
$month = $month2.Replace(" ", "")
$day1 = $a[2..3]
$day2 = "$day1"
$day2 = $day2.ToString()
$day = $day2.Replace(" ", "")
}
... and so on. after that inserting it to database. However, log file is quite big (currently 15MB in 3 weeks, expected to be hundreds of megabytes within months), and already the script takes about 4-5min to process it.
So what i need is method to split four space separated columns from beginning of the line, convert first and second to date and time and add them with message part of the line to database. Separately processing each block of text seems too time consuming, excel for example can process this file within seconds. Is there around some position aware csv-import command?
Thanks.
Found this:
Replace first two whitespace occurrences with a comma using sed
would help, if i would use linux... :(
I'm not sure if the ConvertFrom-Csv or Import-Csv cmdlets can help you since your field delimiter can appear in the message field. Without knowing what these different fields are, I came up with this:
$file = Get-Content "test.txt"
foreach ($line in $file)
{
# Split $line into at most 5 fields
$fields = $line -split ' ', 5;
# fields[0] is a two-digit month followed by a two-digit day
$date = [DateTime]::ParseExact($fields[0], 'MMdd', $null);
$field2 = $fields[1];
$field3 = $fields[2];
$field4 = $fields[3];
$message = $fields[4];
# Process variables here...
}
Using the sample text you provided for $line, the above variables look like this after execution:
PS> Get-Variable -Name #('date', 'field*', 'line', 'message')
Name Value
---- -----
date 2/26/2012 12:00:00 AM
field2 111641
field3 (1911)
field4 0
fields {0226, 111641, (1911), 0...}
line 0226 111641 (1911) 0 some space separated message
message some space separated message
More information will be needed on the format of your data in order to give you a more specific answer.

How do I echo string with bash command more in Perl?

This is what I tried:
my $s = "s" x 1000;
my $r = `echo $s |more`;
But it doesn't work, my program exits directly...
It does not work in your example, because you never print $r. The output is captured in the variable $r. By using system() instead, you can see the output printed to STDOUT, but then you cannot use the output as you (probably) expected.
Just do:
print $r;
Update: I changed say to print, since "echo" already gives you a newline.
To escape shell meta characters, as mentioned in the comments, you can use quotemeta.
You should also be aware that | more has no effect when capturing output from the shell into a variable. The process is simply: echo | more | $r, and you might as well skip more.
try with the system() command :
my $s = "s" x 1000;
my $r = system("echo $s |more");
will display all your 's', and in $r you will have the result (0 in this case) of the command.