Strange result from String.Split() - powershell

Why does the following result in an array with 7 elements with 5 blank? I'd expect only 2 elements.
Where are the 5 blank elements coming from?
$a = 'OU=RAH,OU=RAC'
$b = $a.Split('OU=')
$b.Count
$b
<#
Outputs:
7
RAH,
RAC
#>

In order to split by strings (rather than a set of characters) and/or regular expressions, use PowerShell's -split operator:
PS> ('OU=RAH,OU=RAC' -split ',?OU=') -ne '' # parentheses not strictly needed
RAH
RAC
-split by default interprets its RHS as a regular expression, and ,?OU= matches both OU by itself and ,OU, resulting in the desired splitting, returning the tokens as an array.
For all features supported by -split, including literal string matching, limiting the number of tokens returned, and use of script blocks, see Get-Help about_split.
Since the input starts with a match, however, -split considers the first element of the split to be the empty string. By passing the resulting array of tokens to -ne '', we filter out these empty strings.
By contrast, in Windows PowerShell use of the .NET (FullCLR, up to 4.x) String.Split() method, as you've tried, works very differently:
'OU=RAH,OU=RAC'.Split('OU=')
OU= is interpreted as an array of characters, any of which, individually acts as separator - irrespective of the order in which the characters are specified. Leading, adjacent, and trailing separators are by default considered to separate empty tokens, so you get an array of 7 tokens:
#( '', '', '', 'RAH,', '', '', 'RAC')
Note to PowerShell Core users (PowerShell versions 6 and above):
The .NET Core String.Split() method now does have a scalar [string] overload that looks for an entire string as the separator, which PowerShell Core selects by default; to get the character-array behavior described, you must cast to [char[]] explicitly:
'OU=RAH,OU=RAC'.Split([char[]] 'OU=')
If you construct the .Split() method call carefully, you can specify strings, but note that you still don't get regular-expression support:
PS> 'OU=RAH,OU=RAC'.Split([string[]] 'OU=', 'RemoveEmptyEntries')
RAH,
RAC
works to split by literal string OU=, removing empty entries, but as you can see, that doesn't allow you to account for the ,
You can take this further by specifying an array of strings to split by, which works in this simple case, but ultimately doesn't give you the same flexibility as the regular expressions that PowerShell's -split operator provides:
PS> 'OU=RAH,OU=RAC'.Split([string[]] ('OU=', ',OU='), 'RemoveEmptyEntries')
RAH
RAC
Note that specifying an (array of) strings requires the 2-argument form of the method call, meaning you must also specify a System.StringSplitOptions enumeration value. Use 'None' to not apply any options (as of this writing, the only true option that is supported is 'RemoveEmptyEntries', as used above).
(The type-safe way to specify option is to use, e.g., [System.StringSplitOptions]::None, however, passing the option name as a string is a convenient shortcut; e.g., 'None'.)

It splits the string for each character in the separator. So its splitting it on 'O', 'U' & '='.
As #mklement0 has commented, my earlier answer would not work in all cases. So here is an alternate way to get the expected items.
$a.Split(',') |% { $_.Split('=') |? { $_ -ne 'OU' } }
This code will split the string, first on , then each item will be split on = and ignore the items that are OU, eventually returning the expected values:
RAH
RAC
This will work even in case of:
$a = 'OU=FOO,OU=RAH,OU=RAC'
generating 3 items FOO, RAH & RAC
To get only 2 string as expected you could use following line:
$a.Split('OU=', [System.StringSplitOptions]::RemoveEmptyEntries)
Which will give output as:
RAH,
RAC
And if you use (note the comma in the separator)
$a.Split(',OU=', [System.StringSplitOptions]::RemoveEmptyEntries)
you will get
RAH
RAC
This is probably what you want. :)

Never mind. Just realised it looks for strings on either side of 'O', 'U', and '='.
There are therefore 5 blank chars (in front of the first 'O', between 'O' and 'U', between 'U' and '=', between the second 'O' and 'U', between the second 'U' and '=').

String.Split() is character oriented. It splits on O, U, = as three separate places.
Think of it as intending to be used for 1,2,3,4,5. If you had ,2,3,4, it would imply there were empty spaces at the start and end. If you had 1,2,,,5 it would imply two empty spaces in the middle.
You can see with something like:
PS C:\> $a = 'OU=RAH,OU=RAC'
PS C:\> $a.Split('RAH')
OU=
,OU=
C
The spaces are R_A_H and R_A. Split on the end of a string, it introduces blanks at the start/end.
PowerShell's -split operator is string oriented.
PS D:\t> $a = 'OU=RAH,OU=RAC'
PS D:\t> $a -split 'OU='
RAH,
RAC
You might do better to split on the comma, then replace out OU=, or vice versa, e.g.
PS D:\t> $a = 'OU=RAH,OU=RAC'
PS D:\t> $a.Replace('OU=','').Split(',')
RAH
RAC

Related

powershell: concatenate an extension to each element of an array [duplicate]

I have an array and when I try to append a string to it the array converts to a single string.
I have the following data in an array:
$Str
451 CAR,-3 ,7 ,10 ,0 ,3 , 20 ,Over: 41
452 DEN «,40.5,0,7,0,14, 21 ,  Cover: 4
And I want to append the week of the game in this instance like this:
$Str = "Week"+$Week+$Str
I get a single string:
Week16101,NYG,42.5 ,3 ,10 ,3 ,3 , 19 ,Over 43 102,PHI,- 1,14,7,0,3, 24 ,  Cover 4 103,
Of course I'd like the append to occur on each row.
Instead of a for loop you could also use the Foreach-Object cmdlet (if you prefer using the pipeline):
$str = "apple","lemon","toast"
$str = $str | ForEach-Object {"Week$_"}
Output:
Weekapple
Weeklemon
Weektoast
Another option for PowerShell v4+
$str = $str.ForEach({ "Week" + $Week + $_ })
Something like this will work for prepending/appending text to each line in an array.
Set array $str:
$str = "apple","lemon","toast"
$str
apple
lemon
toast
Prepend text now:
for ($i=0; $i -lt $Str.Count; $i++) {
$str[$i] = "yogurt" + $str[$i]
}
$str
yogurtapple
yogurtlemon
yogurttoast
This works for prepending/appending static text to each line. If you need to insert a changing variable this may require some modification. I would need to see more code in order to recommend something.
Another solution, which is fast and concise, albeit a bit obscure.
It uses the regex-based -replace operator with regex '^' which matches the position at the start of each input string and therefore effectively prepends the replacement string to each array element (analogously, you could use '$' to append):
# Sample array.
$array = 'one', 'two', 'three'
# Prepend 'Week ' to each element and create a new array.
$newArray = $array -replace '^', 'Week '
$newArray then contains 'Week one', 'Week two', 'Week three'
To show an equivalent foreach solution, which is syntactically simpler than a for solution (but, like the -replace solution above, invariably creates a new array):
[array] $newArray = foreach ($element in $array) { 'Week ' + $element }
Note: The [array] cast is needed to ensure that the result is always an array; without it, if the input array happens to contain just one element, PowerShell would assign the modified copy of that element as-is to $newArray; that is, no array would be created.
As for what you tried:
"Week"+$Week+$Str
Because the LHS of the + operation is a single string, simple string concatenation takes place, which means that the array in $str is stringified, which by default concatenates the (stringified) elements with a space character.
A simplified example:
PS> 'foo: ' + ('bar', 'baz')
foo: bar baz
Solution options:
For per-element operations on an array, you need one of the following:
A loop statement, such as foreach or for.
Michael Timmerman's answer shows a for solution, which - while syntactically more cumbersome than a foreach solution - has the advantage of updating the array in place.
A pipeline that performs per-element processing via the ForEach-Object cmdlet, as shown in Martin Brandl's answer.
An expression that uses the .ForEach() array method, as shown in Patrick Meinecke's answer.
An expression that uses an operator that accepts arrays as its LHS operand and then operates on each element, such as the -replace solution shown above.
Tradeoffs:
Speed:
An operator-based solution is fastest, followed by for / foreach, .ForEach(), and, the slowest option, ForEach-Object.
Memory use:
Only the for option with indexed access to the array elements allows in-place updating of the input array; all other methods create a new array.[1]
[1] Strictly speaking, what .ForEach() returns isn't a .NET array, but a collection of type [System.Collections.ObjectModel.Collection[psobject]], but the difference usually doesn't matter in PowerShell.

Find first occurrence of any one of the array elements in a string - Powershell

The problem is to find the position of the very first occurrence of any of the elements of an array.
$terms = #("#", ";", "$", "|");
$StringToBeSearched = "ABC$DEFG#";
The expected output needs to be: 3, as '$' occurs before any of the other $terms in the $StringToBeSearched variable
Also, the idea is to do it in the least expensive way.
# Define the characters to search for as an array of [char] instances ([char[]])
# Note the absence of `#(...)`, which is never needed for array literals,
# and the absence of `;`, which is only needed to place *multiple* statements
# on the same line.
[char[]] $terms = '#', ';', '$', '|'
# The string to search trough.
# Note the use of '...' rather than "...",
# to avoid unintended expansion of "$"-prefixed tokens as
# variable references.
$StringToBeSearched = 'ABC$DEFG#'
# Use the [string] type's .IndexOfAny() method to find the first
# occurrence of any of the characters in the `$terms` array.
$StringToBeSearched.IndexOfAny($terms) # -> 3

Powershell skip element in array, if it blank

I have a powershell script, where I receive names of elements as a variables from Jenkins:
$IISarray = #("$ENV:Cashier_NAME", "$ENV:Terminal_NAME", "$ENV:Content_Manager_NAME", "$ENV:Kiosk_BO_NAME")
foreach ($string in $IISarray){
"some code goes here"
}
Sometimes random elements can be blank. How can I add a check to see if the current element in array is blank, skip it and go to next element?
It's easiest to use -ne '' to created a filtered copy of the array that excludes empty entries, courtesy of the ability of many PowerShell operators to act as a filter with an array-valued LHS.
Note: I'm assuming you mean to filter out empty strings, not also blank (all-whitespace) ones, given that undefined environment variables expand to an empty string.
# Sample array with empty elements.
# Note: No need for #(...), unless there's just *one* element.
$IISarray = "foo", "", "bar", "baz", ""
# Note the `-ne ''`, which filters out empty elements.
foreach ($string in $IISarray -ne ''){
$string # echo
}
The above yields:
foo
bar
baz
soundstripe's answer offers a Where-Object solution, which potentially provides added flexibility via the ability to specify an arbitrary filter script block, but the use of a pipeline is a bit heavy-handed for this use case.
Fortunately, PSv4+ offers the .Where() collection method, which performs noticeably better.
Let me demonstrate it with a solution that also rules out blank (all-whitespace) elements:
# Note the all-whitespace element, which we want to ignore too.
PS> ("foo", " ", "bar", "baz", "").Where({ $_.Trim() })
foo
bar
baz
Similar to the Where-Object cmdlet, you pass a script block to the .Where() method, inside of which the automatic $_ variable represents the input element at hand.
The .Trim() method trims leading and trailing whitespace from a string and returns the result.
An all-whitespace string therefore results in the empty string.
In a Boolean context (as the .Where() method script block implicitly is), the empty string evaluates to $false, whereas any non-empty string is $true.
You can choose to be explicit, however ($_.Trim() -ne ''), or even use a .NET method ([string]::IsNullOrWhiteSpace($_)).
You can use Where-Object to filter out null or empty values. It is very commonly used, so ? is shorthand for Where-Object.
$IISarray = #("$ENV:Cashier_NAME", "$ENV:Terminal_NAME", "$ENV:Content_Manager_NAME", "$ENV:Kiosk_BO_NAME")
foreach ($string in ($IISarray | ? {$_})){
"some code goes here"
}
The $_ is an automatic variable representing each incoming object in the pipeline. Both $null and the empty string '' are falsy in Powershell, so only non-null values with length > 0 will be passed in to your for loop.
# you can skip the `#` and brackets as well as the quotation marks
$IISarray = $ENV:Cashier_NAME, $ENV:Terminal_NAME, $ENV:Content_Manager_NAME, $ENV:Kiosk_BO_NAME
foreach($String in $IISarray) {
# trim the strings and check the length
if($String.Trim().Length -gt 0) {
"some code goes here"
}
}

Extract the nth to nth characters of an string object

I have a filename and I wish to extract two portions of this and add into variables so I can compare if they are the same.
$name = FILE_20161012_054146_Import_5785_1234.xml
So I want...
$a = 5785
$b = 1234
if ($a = $b) {
# do stuff
}
I have tried to extract the 36th up to the 39th character
Select-Object {$_.Name[35,36,37,38]}
but I get
{5, 7, 8, 5}
Have considered splitting but looks messy.
There are several ways to do this. One of the most straightforward, as PetSerAl suggested is with .Substring():
$_.name.Substring(35,4)
Another way is with square braces, as you tried to do, but it gives you an array of [char] objects, not a string. You can use -join and you can use a range to make that easier:
$_.name[35..38] -join ''
For what you're doing, matching a pattern, you could also use a regular expression with capturing groups:
if ($_.name -match '_(\d{4})_(\d{4})\.xml$') {
if ($Matches[1] -eq $Matches[2]) {
# ...
}
}
This way can be very powerful, but you need to learn more about regex if you're not familiar. In this case it's looking for an underscore _ followed by 4 digits (0-9), followed by an underscore, and four more digits, followed by .xml at the end of the string. The digits are wrapped in parentheses so they are captured separately to be referenced later (in $Matches).
Yet another approach: returns 1234 substring four times.
$FileName = "FILE_20161012_054146_Import_5785_1234.xml"
# $FileName
$FileName.Substring(33,4) # Substring method (zero-based)
-join $FileName[33..36] # indexing from beginning (zero-based)
-join $FileName[-8..-5] # reverse indexing:
# e.g. $FileName[-1] returns the last character
$FileArr = $FileName.Split("_.") # Split (depends only on filename "pattern template")
$FileArr[$FileArr.Count -2] # does not depend on lengths of tokens

Split a string with powershell to get the first and last element

If you do: git describe --long
you get: 0.3.1-15-g3b885c5
Thats the meaning of the above string:
Tag-CommitDistance-CommitId (http://git-scm.com/docs/git-describe)
How would you split the string to get the first (Tag) and last (CommitId) element?
By using String.split() with the count parameter to manage dashes in the commitid:
$x = "0.3.1-15-g3b885c5"
$tag = $x.split("-",3)[0]
$commitid = $x.split("-",3)[-1]
Note: This answer focuses on improving on the split-into-tokens-by-- approach from Richard's helpful answer, though note that that approach isn't fully robust, because git tag names may themselves contain - characters, so you cannot blindly assume that the first - instance ends the tag name.
To account for that, use Richard's robust solution instead.
Just to offer a more PowerShell-idiomatic variant:
# Stores '0.3.1' in $tag, and 'g3b885c5' in $commitId
$tag, $commitId = ('0.3.1-15-g3b885c5' -split '-')[0, -1]
PowerShell's -split operator is used to split the input string into an array of tokens by separator -
While the [string] type's .Split() method would be sufficient here, -split offers many advantages in general.
[0, -1] extracts the first (0) and last (-1) element from the array returned by -split and returns them as a 2-element array.
$tag, $commitId = is a destructuring multi-assignment that assigns the elements of the resulting 2-element array to a variable each.
I can't recall if dashes are allowed in tags, so I'll assume they are, but will not appear in the last two fields.
Thus:
if ("0.3.1-15-g3b885c5" -match '(.*)-\d+-([^-]+)') {
$tag = $Matches[1];
$commitId = $Matches[2]
}