Split & Trim in a single step - powershell

In PS 5.0 I can split and trim a string in a single line, like this
$string = 'One, Two, Three'
$array = ($string.Split(',')).Trim()
But that fails in PS 2.0. I can of course do a foreach to trim each item, or replace ', ' with ',' before doing the split, but I wonder if there is a more elegant approach that works in all versions of PowerShell?
Failing that, the replace seems like the best approach to address all versions with a single code base.

TheMadTechnician has provided the crucial pointer in a comment on the question:
Use the -split operator, which works the same in PSv2: It expects a regular expression (regex) as the separator, allowing for more sophisticated tokenizing than the [string] type's .Split() method, which operates on literals:
PS> 'One, Two, Three' -split ',\s*' | ForEach-Object { "[$_]" }
[One]
[Two]
[Three]
Regex ,\s* splits the input string by a comma followed by zero or more (*) whitespace characters (\s).
In fact, choosing -split over .Split() is advisable in general, even in later PowerShell versions.
However, to be fully equivalent to the .Trim()-based solution in the question, trimming of leading and trailing whitespace is needed too:
PS> ' One, Two,Three ' -split ',' -replace '^\s+|\s+$' | ForEach-Object { "[$_]" }
[One]
[Two]
[Three]
-replace '^\s+|\s+$' removes the leading and trailing whitespace from each token resulting from the split: | specifies an alternation so that the subexpressions on either side of it are considered a match; ^\s+, matches leading whitespace, \s+$ matches trailing whitespace; \s+ represents a non-empty (one or more, +) run of whitespace characters; for more information about the -replace operator, see this answer.
(In PSv3+, you could simplify to (' One, Two,Three ' -split ',').Trim() or use the solution from the question.
To also weed out empty / all-whitespace elements, append -ne '')
As for why ('One, Two, Three'.Split(',')).Trim() doesn't work in PSv2: The .Split() method returns an array of tokens, and invoking the .Trim() method on that array - as opposed to its elements - isn't supported in PSv2.
In PSv3+, the .Trim() method call is implicitly "forwarded" to the elements of the resulting array, resulting in the desired trimming of the individual tokens - this feature is called member-access enumeration.

I don't have PS 2.0 but you might try something like
$string = 'One, Two, Three'
$array = ($string.Split(',') | % { $_.Trim() })
and see if that suits. This is probably less help for you but for future readers who have moved to future versions you can use the #Requires statement. See help about_Requires to determine if your platforms supports this feature.

Related

question about powershell text manipulation

I apologise for asking the very basic question as I am beginner in Scripting.
i was wondering why i am getting different result from two different source with the same formatting. Below are my sample
file1.txt
Id Name Members
122 RCP_VMWARE-DMZ-NONPROD DMZ_NPROD01_111
DMZ_NPROD01_113
123 RCP_VMWARE-DMZ-PROD DMZ_PROD01_110
DMZ_PROD01_112
124 RCP_VMWARE-DMZ-INT.r87351 DMZ_TEMPL_210.r
DMZ_DECOM_211.r
125 RCP_VMWARE-LAN-NONPROD NPROD02_20
NPROD03_21
NPROD04_22
NPROD06_24
file2.txt
Id Name Members
4 HPUX_PROD HPUX_PROD.3
HPUX_PROD.4
HPUX_PROD.5
i'm trying to display the Name column and with this code i'm able to display the file1.txt correctly.
PS C:\Share> gc file1.txt |Select-Object -skip 1 | foreach-object { $_.split(" ")[1]} | ? {$_.trim() -ne "" }
RCP_VMWARE-DMZ-NONPROD
RCP_VMWARE-DMZ-PROD
RCP_VMWARE-DMZ-INT.r87351
RCP_VMWARE-LAN-NONPROD
However with the file2 im getting a different output.
PS C:\Share> gc .\file2.txt |Select-Object -skip 1 | foreach-object { $_.split(" ")[1]} | ? {$_.trim() -ne "" }
4
changing the code to *$_.split(" ")[2]}* helps to display the output correctly
However, i would like to have just 1 code which can be apply for both situation.appreciate if you can help me to sort this.. thank you in advance...
This happens because the latter file has different format.
When examined carefully, one notices there are two spaces between 4 and HPUX_PROD strings:
Id Name Members
4 HPUX_PROD HPUX_PROD.3
^^^^
On the first file, there is a single space between number and string:
Id Name Members
122 RCP_VMWARE-DMZ-NONPROD DMZ_NPROD01_111
^^^
As how to fix the issue depends if you need to match both file formats, or if the other has simply a typing error.
The existing answers are helpful, but let me try to break it down conceptually:
.Split(" ") splits the input string by each individual space character, whereas what you're looking for is to split by runs of (one or more) spaces, given that your column values can be separated by more than one space.
For instance 'a b'.split(' ') results in 3 array elements - 'a', '', 'b' - because the empty string between the two spaces is considered an element too.
The .NET [string] type's .Split() method is based on verbatim strings or character sets and therefore doesn't allow you to express the concept of "one ore more spaces" as a split criterion, whereas PowerShell's regex-based -split operator does.
Conveniently, -split's unary form (see below) has this logic built in: it splits each input string by any nonempty run of whitespace, while also ignoring leading and trailing whitespace, which in your case obviates the need for a regex altogether.
This answer compares and contrasts the -split operator with string type's .Split() method, and makes the case for routinely using the former.
Therefore, a working solution (for both input files) is:
Get-Content .\file2.txt | Select-Object -Skip 1 |
Foreach-Object { if ($value = (-split $_)[1]) { $value } }
Note:
If the column of interest contains a value (at least one non-whitespace character), so must all preceding columns in order for the approach to work. Also, column values themselves must not have embedded whitespace (which is true for your sample input).
The if conditional both extracts the 2nd column value ((-split $_)[1]) and assigns it to a variable ($value = ), whose value then implicitly serves as a Boolean:
Any nonempty string is implicitly $true, in which case the extracted value is output in the associated block ({ $value }); conversely, an empty string results in no output.
For a general overview of PowerShell's implicit to-Boolean conversions, see this bottom section of this answer.
Since this sort-of looks like csv output with spaces as delimiter (but not quite), I think you could use ConvertFrom-Csv on this:
# read the file as string array, trim each line and filter only the lines that
# when split on 1 or more whitespace characters has more than one field
# then replace the spaces by a comma and treat it as CSV
# return the 'Name' column only
(((Get-Content -Path 'D:\Test\file1.txt').Trim() |
Where-Object { #($_ -split '\s+').Count -gt 1 }) -replace '\s+', ',' |
ConvertFrom-Csv).Name
Shorter, but because you are only after the Name column, this works too:
((Get-Content -Path 'D:\Test\file2.txt').Trim() -replace '\s+', ',' | ConvertFrom-Csv).Name -ne ''
Output for file1
RCP_VMWARE-DMZ-NONPROD
RCP_VMWARE-DMZ-PROD
RCP_VMWARE-DMZ-INT.r87351
RCP_VMWARE-LAN-NONPROD
Output for file2
HPUX_PROD

How can I replace every comma with a space in a text file before a pattern using PowerShell

I have a text file with lines in this format:
FirstName,LastName,SSN,$x.xx,$x.xx,$x.xx
FirstName,MiddleInitial,LastName,SSN,$x.xx,$x.xx,$x.xx
The lines could be in either format. For example:
Joe,Smith,123-45-6789,$150.00,$150.00,$0.00
Jane,F,Doe,987-65-4321,$250.00,$500.00,$0.00
I want to basically turn everything before the SSN into a single field for the name thus:
Joe Smith,123-45-6789,$150.00,$150.00,$0.00
Jane F Doe,987-65-4321,$250.00,$500.00,$0.00
How can I do this using PowerShell? I think I need to use ForEach-Object and at some point replace "," with " ", but I don't know how to specify the pattern. I also don't know how to use a ForEach-Object with a $_.Where so that I can specify the "SkipUntil" mode.
Thanks very much!
Mathias is correct; you want to use the -replace operator, which uses regular expressions. I think this will do what you want:
$string -replace ',(?=.*,\d{3}-\d{2}-\d{4})',' '
The regular expression uses a lookahead (?=) to look for any commas that are followed by any number of any character (. is any character, * is any number of them including 0) that are then followed by a comma immediately followed by a SSN (\d{3}-\d{2}-\d{4}). The concept of "zero-width assertions", such as this lookahead, simply means that it is used to determine the match, but it not actually returned as part of the match.
That's how we're able to match only the commas in the names themselves, and then replace them with a space.
I know it's answered, and neatly so, but I tried to come up with an alternative to using a regex - count the number of commas in a line, then replace either the first one, or the first two, commas in the line.
But strings can't count how many times a character appears in them without using the regex engine(*), and replacements can't be done a specific number of times without using the regex engine(**), so it's not very neat:
$comma = [regex]","
Get-Content data.csv | ForEach {
$numOfCommasToReplace = $comma.Matches($_).Count - 4
$comma.Replace($_, ' ', $numOfCommasToReplace)
} | Out-File data2.csv
Avoiding the regex engine entirely, just for fun, gets me things like this:
Get-Content .\data.csv | ForEach {
$1,$2,$3,$4,$5,$6,$7 = $_ -split ','
if ($7) {"$1 $2 $3,$4,$5,$6,$7"} else {"$1 $2,$3,$4,$5,$6"}
} | Out-File data2.csv
(*) ($line -as [char[]] -eq ',').Count
(**) while ( #counting ) { # split/mangle/join }

Replace character from link with powershell

I would like replace ? in
"EquipmentInfo["?"] = "<iframe src='http://bing.fr'></iframe>";"
by a variable.
I tried this:
(get-content C:\word.txt) -replace '?', '$obj' | Set-Content C:\word.txt
I would use a positive lookbehind to ensure you find the right question mark. Also you have to use double quote on your replacement since you want to replace a variable:
(get-content C:\word.txt -raw) -replace '(?<=EquipmentInfo\[")\?', "$obj" | Set-Content C:\word.txt
Regex used:
(?<=EquipmentInfo\[")\?
This answer explains the original problem.
jisaak's helpful answer provides a comprehensive solution.
The -replace operator takes a regular expression as the first operand on the RHS, in which ? is a so-called metacharacter with special meaning.
Thus, to use a literal ?, you must escape it, using \:
(get-content C:\word.txt) -replace '\?', $obj
Note: Do not use '...' around $obj, unless you want literal string $obj; generally, to reference variables inside strings you must use "...", but that's not necessary here.
A simple example with a literal:
'Right?' -replace '\?', '!' # -> 'Right!'

How do I match "|" in a regular expression in PowerShell?

I want to use a regular expression to filter out if a string contains one of "&" or "|" or "=". I tried:
$compareRegex = [String]::Join("|", #("&","|", "="));
"mydfa" -match $compareStr
PowerShell prints "True". This is not what I wanted, and it seems "|" itself has confused PowerShell for a matching. How do I fix it?
#Kayasax answer would do in this case (thus +1), just wanted to suggest more general solution.
First of all: you are not using the pattern that you've just created. I suspect $compareStr is $null, thus it will match anything.
To the point: if you want to create pattern that will match characters/strings and you can't predict if any of them will be/contain special character or not, just use [regex]::Escape() for any item you want to match against:
$patternList = "&","|", "=" | ForEach-Object { [regex]::Escape($_) }
$compareRegex = $patternList -join '|'
"mydfa" -match $compareRegex
In such a case input can be dynamic, and you won't end up with pattern that matches anything.
The | has a special meaning in regular expressions. Alternations (lists of alternative matches) are separated by this character. For instance, the expression
a|b|c
would match either a or b or c.
For matching a literal | you need to escape it with backslash (\|) or put it in a character class ([|]), so your expression should look like this:
"mydfa" -match "\||&|="
or like this:
"mydfa" -match "[|&=]"

How to Split DistinguishedName?

I have a list of folks and their DN from AD (I do not have direct access to that AD). Their DNs are in format:
$DNList = 'CN=Bob Dylan,OU=Users,OU=Dept,OU=Agency,OU=NorthState,DC=myworld,DC=com',
'CN=Ray Charles,OU=Contractors,OU=Dept,OU=Agency,OU=NorthState,DC=myworld,DC=com',
'CN=Martin Sheen,OU=Users,OU=Dept,OU=Agency,OU=WaySouth,DC=myworld,DC=com'
I'd like to make $DNList return the following:
OU=Users,OU=Dept,OU=Agency,OU=NorthState,DC=myworld,DC=com
OU=Contractors,OU=Dept,OU=Agency,OU=NorthState,DC=myworld,DC=com
OU=Users,OU=Dept,OU=Agency,OU=WaySouth,DC=myworld,DC=com
I decided to turn my comment into an answer:
$DNList | ForEach-Object {
$_ -replace '^.+?(?<!\\),',''
}
Debuggex Demo
This will correctly handle escaped commas that are part of the first component.
We do a non-greedy match for one or more characters at the beginning of the string, then look for a comma that is not preceded by a backslash (so that the dot will match the backslash and comma combination and keep going).
You can remove the first element with a replacement like this:
$DNList -replace '^.*?,(..=.*)$', '$1'
^.*?, is the shortest match from the beginning of the string to a comma.
(..=.*)$ matches the rest of the string (starting with two characters after the comma followed by a = character) and groups them, so that the match can be referenced in the replacement as $1.
You have 7 items per user, comma separated and you want rid of the first one.
So, split each item in the array using commas as the delimiter, return matches 1-6 (0 being the first item that you want to skip), then join with commas again e.g.
$DNList = $DNList|foreach{($_ -split ',')[1..6] -join ','}
If you then enter $DNList it returns
OU=Users,OU=Dept,OU=Agency,OU=NorthState,DC=myworld,DC=com
OU=Contractors,OU=Dept,OU=Agency,OU=NorthState,DC=myworld,DC=com
OU=Users,OU=Dept,OU=Agency,OU=WaySouth,DC=myworld,DC=com
Similar to Grahams answer but removed the hardcoded array values so it will just remove the CN portion without worrying how long the DN is.
$DNList | ForEach-Object{($_ -split "," | Select-Object -Skip 1) -join ","}
Ansgar most likely has a good reason but you can just use regex to remove every before the first comma
$DNList -replace "^.*?,"
Update based on briantist
To maintain a different answer but one that works this regex can still have issues but I doubt these characters will appear in a username
$DNList -replace "^.*?,(?=OU=)"
Regex uses a look ahead to be sure the , is followed by OU=
Similarly you could do this
($DNList | ForEach-Object{($_ -split "(,OU=)" | Select-Object -Skip 1) -join ""}) -replace "^,"