Can I use split on a string two times in a row using piping? - powershell

Let's say I have the following string dog.puppy/cat.kitten/bear.cub. I want to get bear only. I can do this by executing the following:
$str = "dog.puppy/cat.kitten/bear.cub"
$str = $str.split('/')[2]
$str = $str.split('.')[0]
Am I able to get bear using one line using piping? This doesn't work, but it would be something like this:
$str = "dog.puppy/cat.kitten/bear.cub"
$str = $str.split('/')[2] | $_.split('.')[0]

Am I able to get "bear" using one line using piping?
With strings already stored in memory, there is no good reason to use the pipeline - it will only slow things down.
Instead, use PowerShell's operators, which are faster than using a pipeline and generally more flexible than the similarly named .NET [string] type's methods, because they operate on regexes (regular expressions) and can also operate on entire arrays:
PS> ('dog.puppy/cat.kitten/bear.cub' -split '[/.]')[-2]
bear
That is, split the input string by either literal / or literal . (using a character set, [...]), and return the penultimate (second to last) ([-2]) token.
See this answer for why you should generally prefer the -split operator to the String.Split() method, for instance.
The same applies analogously to preferring the -replace operator to the String.Replace() method.
It's easy to chain these operators:
PS> ('dog.puppy/cat.kitten/bearOneTwoThree.cub' -split '[/.]')[-2] -csplit '(?=\p{Lu})'
bear
One
Two
Three
That is, return the penultimate token and split it case-sensitively (using the -csplit variation of the -split operator) whenever an uppercase letter (\p{Lu}) starts, using a positive look-ahead assertion, (?=...).

You can do this:
$str = "dog.puppy/cat.kitten/bear.cub".split('/')[2].split('.')[0]
No piping needed.

There is an easier solution,
the .split() method uses each char to split,
using a negative index counts from back, so
> "dog.puppy/cat.kitten/bear.cub".split( '/.')[-2]
bear
Your other question can be solved with the RegEx based -csplit operator using a
nonconsuming positive lookahead and
a final -join ' ' to concatenate the array elements with a space.
$str = "dog.puppy/cat.kitten/bearOneTwoThree.cub"
$str = $str.split('/.')[-2] -csplit '(?=[A-Z])' -join ' '
$str
bear One Two Three

Related

Swap string order in one line or swap lines order in powershell

I need to swap place of 2 or more regex strings in one line or some lines in a txt file in powershell.
In npp i just find ^(String 1.*)\r\n(String 2.*)\r\n(String 3.*)$ and replace with \3\r\n\1\r\n\2:
String 1 aksdfh435##%$dsf
String 2 aksddfgdfg$dsf
String 3 aksddfl;gksf
Turns to:
String 3 aksddfl;gksf
String 1 aksdfh435##%$dsf
String 2 aksddfgdfg$dsf
So how can I do it in Powershell? And if possible can I use the command by calling powershell -command in cmd?
It's basically exactly the same in PowerShell, eg:
$Content = #'
Unrelated data 1
Unrelated data 2
aksdfh435##%$dsf
aksddfgdfg$dsf
aksddfl;gksf
Unrelated data 3
'#
$LB = [System.Environment]::NewLine
$String1= [regex]::Escape('aksdfh435##%$dsf')
$String2= [regex]::Escape('aksddfgdfg$dsf')
$String3= [regex]::Escape('aksddfl;gksf')
$RegexMatch = "($String1.*)$LB($String2.*)$LB($String3.*)$LB"
$Content -replace $RegexMatch,"`$3$LB`$1$LB`$2$LB"
outputs:
Unrelated data 1
Unrelated data 2
aksddfl;gksf
aksdfh435##%$dsf
aksddfgdfg$dsf
Unrelated data 3
I used [System.Environment]::NewLine since it uses the default line break no matter what system you're on. Bound to a variable for easier to read code. Either
\r\n
or
`r`n
would've worked as well. The former if using single quotes and the latter (using backticks) when using double quotes. The backtick is what I use to escape $1, $2 and so on as well, that being the format to use when grabbing the first, second, third group from the regex.
I also use the [regex]::Escape('STRING') method to escape the strings to avoid special characters messing things up.
To use file input instead replace $Content with something like this:
$Content = Get-Content -Path 'C:\script\lab\Tests\testfile.txt' -Raw
and replace the last line with something like:
$Content -replace $RegexMatch,"`$3$LB`$1$LB`$2$LB" | Set-Content -Path 'C:\script\lab\Tests\testfile.txt'
In PowerShell it is not very different.
The replacement string needs to be inside double-qoutes (") here because of the newline characters and because of that, you need to backtick-escape the backreference variables $1, $2 and $3:
$str -replace '^(String 1.*)\r?\n(String 2.*)\r?\n(String 3.*)$', "`$3`r`n`$1`r`n`$2"
This is assuming your $str is a single multiline string as the question implies.

Need to use regular expressions inside of a function to match a string passed to it

What I'd like to do is create a function that is passed a string to match a regular expression inside of the function. Let's call the function "matching." It should use the -match command. I want it to meet the criteria:
The < character
Four alphabetic characters of upper of lowercase (a-z or A-Z)
The > character
The - character
Four digits, 0-9
So basically it would just look like "matching whateverstringisenteredhere" then it'd give me true or false. Probably incredibly simple for you guys, but to someone new at powershell it seems really difficult.
So far I have this:
function matching ($args0)
{
$r = '\b[A-Za-z]{4}[0-9]{4}<>-\b'
$r -match ($args0)
}
The problem seems to be it's not treating it as a regular expression inside the function. It's only taking it literally.
The regex goes on the right side of the -match operator and the string to be matched goes on the left. Try:
$args0 -match $r

Split by dot using Perl

I use the split function by two ways. First way (string argument to split):
my $string = "chr1.txt";
my #array1 = split(".", $string);
print $array1[0];
I get this error:
Use of uninitialized value in print
When I do split by the second way (regular expression argument to split), I don't get any errors.
my #array1 = split(/\./, $string); print $array1[0];
My first way of splitting is not working only for dot.
What is the reason behind this?
"\." is just ., careful with escape sequences.
If you want a backslash and a dot in a double-quoted string, you need "\\.". Or use single quotes: '\.'
If you just want to parse files and get their suffixes, better use the fileparse() method from File::Basename.
Additional details to the information provided by Mat:
In split "\.", ... the first parameter to split is first interpreted as a double-quoted string before being passed to the regex engine. As Mat said, inside a double-quoted string, a \ is the escape character, meaning "take the next character literally", e.g. for things like putting double quotes inside a double-quoted string: "\""
So your split gets passed "." as the pattern. A single dot means "split on any character". As you know, the split pattern itself is not part of the results. So you have several empty strings as the result.
But why is the first element undefined instead of empty? The answer lies in the documentation for split: if you don't impose a limit on the number of elements returned by split (its third argument) then it will silently remove empty results from the end of the list. As all items are empty the list is empty, hence the first element doesn't exist and is undefined.
You can see the difference with this particular snippet:
my #p1 = split "\.", "thing";
my #p2 = split "\.", "thing", -1;
print scalar(#p1), ' ', scalar(#p2), "\n";
It outputs 0 6.
The "proper" way to deal with this, however, is what #soulSurfer2010 said in his post.

Removing quotes from string

So I thought this would just be a simple issue however I'm getting the incorrect results. Basically I am trying to remove the quotes around a string. For example I have the string "01:00" and I want 01:00, below is the code on how I thought I would be able to do this:
$expected_start_time = $conditions =~ m/(\"[^\"])/;
Every time this runs it returns 1, so I'm guessing that it is just returning true and not actually extracting the string from the quotes. This happen no matter what is in the quotes "02:00", "02:20", "08:00", etc.
All you forgot was parens for the LHS to put the match into list context so it returns the submatch group(s). The normal way to do this is:
($expected_start_time) = $condition =~ /"([^"]*)"/;
It appears that you know that the first and last character are quotes. If that is the case, use
$expected_start_time = substr $conditions, 1, -1;
No need for a regexp.
The brute force way is:
$expected_start_time = $conditions;
$expected_start_time =~ s/"//g;
Note that the original regex:
m/(\"[^\"])/
would capture the opening quote and the following non-quote character. To capture the non-quote characters between double quotes, you'd need some variant on:
m/"([^"]*)"/;
This being Perl (and regexes), TMTOWTDI - There's More Than One Way Do It.
In scalar context a regex returns true if the regex matches the string. You can access the match with $1. See perlre.

What's the difference between 'eq' and '=~' in Perl?

What is the difference between these two operators? Specifically, what difference in $a will lead to different behavior between the two?
$a =~ /^pattern$/
$a eq 'pattern'
eq is for testing string equality, == is the same thing but for numerical equality.
The =~ operator is for applying a regular expression to a scalar.
For the gory details of every Perl operator and what they're for, see the perldoc perlop manpage.
As others have noted, ($a =~ /^pattern$/) uses the regular expression engine to evaluate whether the strings are identical, whereas ($a eq 'pattern') is the plain string equality test.
If you really only want to know whether two strings are identical, the latter is preferred for reasons of:
Readability - It is more concise, containing fewer special characters.
Maintainability - With a regex pattern, you must escape any special characters that may appear in your string, or use extra markers such as \Q and \E. With a single-quoted string, the only character you need to escape is a single quote. (You also have to escape backslashes if they are followed by another backslash or the string delimiter.)
Performance - You don't incur the overhead of firing up the regex engine just to compare a string. If this happens several million times in your program, for example, the benefit is notable.
On the other hand, the regex form is far more flexible if you need to do something other than a plain string equality test. See perldoc perlre for more on regular expressions.
EDIT: Like most everyone else before ysth, I missed the obvious functional difference between them and went straight for more abstract differences. I've clarified the question but I'll leave the answer as a (hopefully) useful reference.
eq -- Tests for string equality.
=~ -- Binds a scalar expression to a pattern match.
See here for more in-depth descriptions of all of the operators.
"pattern\n" :)
$a = "pattern\n";
print "ok 1\n" if $a =~ /^pattern$/;
print "ok 2\n" if $a eq 'pattern';
Perhaps you meant /^pattern\z/.
=~ is the binding operator. It is used to bind a value to either a pattern match (m//), a substitution (s///), or a transliteration (tr// or y//).
eq is the string equality operator; it compares two values to determine whether or not they're equal when considered as strings. There is a peer == operator that does the same thing only considering the values as numbers. (In Perl, strings and numbers are mostly interchangeable with conversions happening automatically depending on how the values are used. Because of this, when you want to compare two values you must specify the type of comparison to perform.)
In general, $var =~ m/.../ determines whether or not the value of $var matches a pattern, not whether it equals a particular value. However, in this case the pattern is anchored at both ends and contains nothing but literal characters, so it's equivalent to a string comparison. It's better to use eq here because it's clearer and faster.