Powershell - print only text between quotes? - powershell

How can I have the output of the following text only show the text in the quotes (without the quotes)?
Sample text"
this is an "apple". it is red
this is an "orange". it is orange
this is an "blood orange". it is reddish
becomes:
apple
orange
blood orange
Ideally I'd like to do it in a one liner if possible. I think it's regular expression with -match but I'm not sure.

here is one way
$text='this is an "apple". it is red
this is an "orange". it is orange
this is an "blood orange". it is reddish'
$text.split("`n")|%{
$_.split('"')[1]
}
This is the winning solution
$text='this is an "apple". it is red
this is an "orange". it is orange
this is an "blood orange". it is reddish'
$text|%{$_.split('"')[1]}

Just another way using regex:
appcmd list apppool | % { [regex]::match( $_ , '(?<=")(.+)(?=")' ) } | select -expa value
or
appcmd list apppool | % { ([regex]::match( $_ , '(?<=")(.+)(?=")' )).value }

A concise solution based on .NET method [regex]::Matches(), using PSv3+ syntax:
$str = #'
this is an "apple". it is red
this is an "orange". it is orange
this is an "blood orange". it is reddish
'#
[regex]::Matches($str, '".*?"').Value -replace '"'
Regex ".*?" matches "..."-enclosed tokens and .Matches() returns all of them; .Value extracts them, and -replace '"' strips the " chars.
This means that the above even works with multiple "..." tokens per line (though note that extracting tokens with embedded escaped " chars. (e.g, \") won't work).
Use of the -match operator - which only looks for a (one) match - is an option only if:
you split the input into lines
and each line contains at most 1 "..." token (which is true for the sample input in the question).
Here'a PSv4+ solution:
# Split string into lines, then use -match to find the first "..." token
($str -split "`r?`n").ForEach({ if ($_ -match '"(.*?)"') { $Matches[1] } })
Automatic variable $Matches contains the results of the previous -match operation (if the LHS was a scalar) and index [1] contains what the 1st (and only) capture group ((...)) matched.
It would be handy if -match had a variant named, say, -matchall, so that one could write:
# WISHFUL THINKING (as of PowerShell Core 6.2)
$str -matchall '".*?"' -replace '"'
See this feature suggestion on GitHub.

Related

question about powershell text manipulation

I apologise for asking the very basic question as I am beginner in Scripting.
i was wondering why i am getting different result from two different source with the same formatting. Below are my sample
file1.txt
Id Name Members
122 RCP_VMWARE-DMZ-NONPROD DMZ_NPROD01_111
DMZ_NPROD01_113
123 RCP_VMWARE-DMZ-PROD DMZ_PROD01_110
DMZ_PROD01_112
124 RCP_VMWARE-DMZ-INT.r87351 DMZ_TEMPL_210.r
DMZ_DECOM_211.r
125 RCP_VMWARE-LAN-NONPROD NPROD02_20
NPROD03_21
NPROD04_22
NPROD06_24
file2.txt
Id Name Members
4 HPUX_PROD HPUX_PROD.3
HPUX_PROD.4
HPUX_PROD.5
i'm trying to display the Name column and with this code i'm able to display the file1.txt correctly.
PS C:\Share> gc file1.txt |Select-Object -skip 1 | foreach-object { $_.split(" ")[1]} | ? {$_.trim() -ne "" }
RCP_VMWARE-DMZ-NONPROD
RCP_VMWARE-DMZ-PROD
RCP_VMWARE-DMZ-INT.r87351
RCP_VMWARE-LAN-NONPROD
However with the file2 im getting a different output.
PS C:\Share> gc .\file2.txt |Select-Object -skip 1 | foreach-object { $_.split(" ")[1]} | ? {$_.trim() -ne "" }
4
changing the code to *$_.split(" ")[2]}* helps to display the output correctly
However, i would like to have just 1 code which can be apply for both situation.appreciate if you can help me to sort this.. thank you in advance...
This happens because the latter file has different format.
When examined carefully, one notices there are two spaces between 4 and HPUX_PROD strings:
Id Name Members
4 HPUX_PROD HPUX_PROD.3
^^^^
On the first file, there is a single space between number and string:
Id Name Members
122 RCP_VMWARE-DMZ-NONPROD DMZ_NPROD01_111
^^^
As how to fix the issue depends if you need to match both file formats, or if the other has simply a typing error.
The existing answers are helpful, but let me try to break it down conceptually:
.Split(" ") splits the input string by each individual space character, whereas what you're looking for is to split by runs of (one or more) spaces, given that your column values can be separated by more than one space.
For instance 'a b'.split(' ') results in 3 array elements - 'a', '', 'b' - because the empty string between the two spaces is considered an element too.
The .NET [string] type's .Split() method is based on verbatim strings or character sets and therefore doesn't allow you to express the concept of "one ore more spaces" as a split criterion, whereas PowerShell's regex-based -split operator does.
Conveniently, -split's unary form (see below) has this logic built in: it splits each input string by any nonempty run of whitespace, while also ignoring leading and trailing whitespace, which in your case obviates the need for a regex altogether.
This answer compares and contrasts the -split operator with string type's .Split() method, and makes the case for routinely using the former.
Therefore, a working solution (for both input files) is:
Get-Content .\file2.txt | Select-Object -Skip 1 |
Foreach-Object { if ($value = (-split $_)[1]) { $value } }
Note:
If the column of interest contains a value (at least one non-whitespace character), so must all preceding columns in order for the approach to work. Also, column values themselves must not have embedded whitespace (which is true for your sample input).
The if conditional both extracts the 2nd column value ((-split $_)[1]) and assigns it to a variable ($value = ), whose value then implicitly serves as a Boolean:
Any nonempty string is implicitly $true, in which case the extracted value is output in the associated block ({ $value }); conversely, an empty string results in no output.
For a general overview of PowerShell's implicit to-Boolean conversions, see this bottom section of this answer.
Since this sort-of looks like csv output with spaces as delimiter (but not quite), I think you could use ConvertFrom-Csv on this:
# read the file as string array, trim each line and filter only the lines that
# when split on 1 or more whitespace characters has more than one field
# then replace the spaces by a comma and treat it as CSV
# return the 'Name' column only
(((Get-Content -Path 'D:\Test\file1.txt').Trim() |
Where-Object { #($_ -split '\s+').Count -gt 1 }) -replace '\s+', ',' |
ConvertFrom-Csv).Name
Shorter, but because you are only after the Name column, this works too:
((Get-Content -Path 'D:\Test\file2.txt').Trim() -replace '\s+', ',' | ConvertFrom-Csv).Name -ne ''
Output for file1
RCP_VMWARE-DMZ-NONPROD
RCP_VMWARE-DMZ-PROD
RCP_VMWARE-DMZ-INT.r87351
RCP_VMWARE-LAN-NONPROD
Output for file2
HPUX_PROD

Powershell replace last two occurrences of a '/' in file path with '.'

I have a filepath, and I'm trying to remove the last two occurrences of the / character into . and also completely remove the '{}' via Powershell to then turn that into a variable.
So, turn this:
xxx-xxx-xx\xxxxxxx\x\{xxxx-xxxxx-xxxx}\xxxxx\xxxxx
Into this:
xxx-xxx-xx\xxxxxxx\x\xxxx-xxxxx-xxxx.xxxxx.xxxxx
I've tried to get this working with the replace cmdlet, but this seems to focus more on replacing all occurrences or the first/last occurrence, which isn't my issue. Any guidance would be appreciated!
Edit:
So, I have an excel file and i'm creating a powershell script that uses a for each loop over every row, which amounts to thousands of entries. For each of those entries, I want to create a secondary variable that will take the full path, and save that path minus the last two slashes. Here's the portion of the script that i'm working on:
Foreach($script in $roboSource)
{
$logFileName = "$($script.a).txt".Replace('(?<=^[^\]+-[^\]+)-','.')
}
$script.a will output thousands of entries in this format:
xxx-xxx-xx\xxxxxxx\x{xxxx-xxxxx-xxxx}\xxxxx\xxxxx
Which is expected.
I want $logFileName to output this:
xxx-xxx-xx\xxxxxxx\x\xxxx-xxxxx-xxxx.xxxxx.xxxxx
I'm just starting to understand regex, and I believe the capture group between the parenthesis should be catching at least one of the '\', but testing attempts show no changes after adding the replace+regex.
Please let me know if I can provide more info.
Thanks!
You can do this in two fairly simply -replace operations:
Remove { and }
Replace the last two \:
$str = 'xxx-xxx-xx\xxxxxxx\x\{xxxx-xxxxx-xxxx}\xxxxx\xxxxx'
$str -replace '[{}]' -replace '\\([^\\]*)\\([^\\]*)$','.$1.$2'
The second pattern matches:
\\ # 1 literal '\'
( # open first capture group
[^\\]* # 0 or more non-'\' characters
) # close first capture group
\\ # 1 literal '\'
( # open second capture group
[^\\]* # 0 or more non-'\' characters
) # close second capture group
$ # end of string
Which we replace with the first and second capture group values, but with . before, instead of \: .$1.$2
If you're using PowerShell Core version 6.1 or newer, you can also take advantage of right-to-left -split:
($str -replace '[{}]' -split '\\',-3) -join '.'
-split '\\',-3 has the same effect as -split '\\',3, but splitting from the right rather than the left.
A 2-step approach is simplest in this case:
# Input string.
$str = 'xxx-xxx-xx\xxxxxxx\x\{xxxx-xxxxx-xxxx}\xxxxx\xxxxx'
# Get everything before the "{"
$prefix = $str -replace '\{.+'
# Get everything starting with the "{", remove "{ and "}",
# and replace "\" with "."
$suffix = $str.Substring($prefix.Length) -replace '[{}]' -replace '\\', '.'
# Output the combined result (or assign to $logFileName)
$prefix + $suffix
If you wanted to do it with a single -replace operation (with nesting), things get more complicated:
Note: This solution requires PowerShell Core (v6.1+)
$str -replace '(.+)\{(.+)\}(.+)',
{ $_.Groups[1].Value + $_.Groups[2].Value + ($_.Groups[3].Value -replace '\\', '.') }
Also see the elegant PS-Core-only -split based solution with a negative index (to split only a fixed number of tokens off the end) in Mathias R. Jessen's helpful answer.
try this
$str='xxx-xxx-xx\xxxxxxx\x\{xxxx-xxxxx-xxxx}\xxxxx\xxxxx'
#remove bracket and split for get array
$Array=$str -replace '[{}]' -split '\\'
#take all element except 2 last elements, and concat after last elems
"{0}.{1}.{2}" -f ($Array[0..($Array.Length -3)] -join '\'), $Array[-2], $Array[-1]

Split & Trim in a single step

In PS 5.0 I can split and trim a string in a single line, like this
$string = 'One, Two, Three'
$array = ($string.Split(',')).Trim()
But that fails in PS 2.0. I can of course do a foreach to trim each item, or replace ', ' with ',' before doing the split, but I wonder if there is a more elegant approach that works in all versions of PowerShell?
Failing that, the replace seems like the best approach to address all versions with a single code base.
TheMadTechnician has provided the crucial pointer in a comment on the question:
Use the -split operator, which works the same in PSv2: It expects a regular expression (regex) as the separator, allowing for more sophisticated tokenizing than the [string] type's .Split() method, which operates on literals:
PS> 'One, Two, Three' -split ',\s*' | ForEach-Object { "[$_]" }
[One]
[Two]
[Three]
Regex ,\s* splits the input string by a comma followed by zero or more (*) whitespace characters (\s).
In fact, choosing -split over .Split() is advisable in general, even in later PowerShell versions.
However, to be fully equivalent to the .Trim()-based solution in the question, trimming of leading and trailing whitespace is needed too:
PS> ' One, Two,Three ' -split ',' -replace '^\s+|\s+$' | ForEach-Object { "[$_]" }
[One]
[Two]
[Three]
-replace '^\s+|\s+$' removes the leading and trailing whitespace from each token resulting from the split: | specifies an alternation so that the subexpressions on either side of it are considered a match; ^\s+, matches leading whitespace, \s+$ matches trailing whitespace; \s+ represents a non-empty (one or more, +) run of whitespace characters; for more information about the -replace operator, see this answer.
(In PSv3+, you could simplify to (' One, Two,Three ' -split ',').Trim() or use the solution from the question.
To also weed out empty / all-whitespace elements, append -ne '')
As for why ('One, Two, Three'.Split(',')).Trim() doesn't work in PSv2: The .Split() method returns an array of tokens, and invoking the .Trim() method on that array - as opposed to its elements - isn't supported in PSv2.
In PSv3+, the .Trim() method call is implicitly "forwarded" to the elements of the resulting array, resulting in the desired trimming of the individual tokens - this feature is called member-access enumeration.
I don't have PS 2.0 but you might try something like
$string = 'One, Two, Three'
$array = ($string.Split(',') | % { $_.Trim() })
and see if that suits. This is probably less help for you but for future readers who have moved to future versions you can use the #Requires statement. See help about_Requires to determine if your platforms supports this feature.

Find and replace a string containing both double quotes and brackets

Let's say I have a test file named testfile.txt containing the below line:
one (two) "three"
I want to use PowerShell to say that if the entire string exists, place a line directly underneath it with the value:
four (five) "six"
(Notice that it includes both spaces, brackets and double quotes. This is important, as the problem I am having is I think with escaping the brackets and double quotes).
So the result would be:
one (two) "three"
four (five) "six"
I thought the easiest way of doing it would be to say that if the first string is found, replace it with the first string itself again, and the new string forming a new line included in the same command. I had difficulty putting the strings in line so I tried using a herestring variable whereby an entire text block with formatting is read. It still does not parse the full string with quotes into the pipeline. I'm new to powershell so don't hold back if you see something stupid.
$herestring1 = #"
one (two) "three"
"#
$herestring2 = #"
one (two) "three"
four (five) "six"
"#
if((Get-Content testfile.txt) | select-string $herestring1) {
"Match found - replacing string"
(Get-Content testfile.txt) | ForEach-Object { $_ -replace $herestring1,$herestring2 } | Set-Content ./testfile.txt
"Replaced string successfully"
}
else {
"No match found"}
The above just gives "No match found" every time. This is because it does not find the first string in the file.
I have tried variations using backtick [ ` ] and doubling quotes to try to escape, but I thought the point in a here string was that it should parse the text block including all formatting so I should not have to.
If I change the file to contain only:
one two three
and then change the herestring accordingly to:
$herestring1 = #"
one two three
"#
$herestring2 = #"
one two three
four five six
"#
Then it works ok and I get the string replaced as I want.
As Martin points out, you can use -SimpleMatch with Select-String to avoid parsing it as a regular expression.
But -replace will still be using a regex.
You can escape the pattern for RegEx using [RegEx]::Escape():
$herestring1 = #"
one (two) "three"
"#
$herestring2 = #"
one (two) "three"
four (five) "six"
"#
$pattern1 = [RegEx]::Escape($herestring1)
if((Get-Content testfile.txt) | select-string $pattern1) {
"Match found - replacing string"
(Get-Content testfile.txt) | ForEach-Object { $_ -replace $pattern1,$herestring2 } | Set-Content ./testfile.txt
"Replaced string successfully"
}
else {
"No match found"}
Regular expressions interpret parentheses () (what you are calling brackets) as special. By default, spaces are not special, but they can be with certain regex options. Double quotes are no problem.
In regex, the escape character is backslash \, and this is independent of any escaping you do for the PowerShell parser using backtick `.
[RegEx]::Escape() will ensure anything special to regex is escaped so that a regex pattern will interpret it as literal, so your pattern will end up looking like this: one\ \(two\)\ "three"
Just use the Select-String cmdlet with the -SimpleMatch switch:
# ....
if((Get-Content testfile.txt) | select-string -SimpleMatch $herestring1) {
# ....
-SimpleMatch
Indicates that the cmdlet uses a simple match rather than a regular
expression match. In a simple match, Select-String searches the input
for the text in the Pattern parameter. It does not interpret the value
of the Pattern parameter as a regular expression statement.
Source.

How can I replace every comma with a space in a text file before a pattern using PowerShell

I have a text file with lines in this format:
FirstName,LastName,SSN,$x.xx,$x.xx,$x.xx
FirstName,MiddleInitial,LastName,SSN,$x.xx,$x.xx,$x.xx
The lines could be in either format. For example:
Joe,Smith,123-45-6789,$150.00,$150.00,$0.00
Jane,F,Doe,987-65-4321,$250.00,$500.00,$0.00
I want to basically turn everything before the SSN into a single field for the name thus:
Joe Smith,123-45-6789,$150.00,$150.00,$0.00
Jane F Doe,987-65-4321,$250.00,$500.00,$0.00
How can I do this using PowerShell? I think I need to use ForEach-Object and at some point replace "," with " ", but I don't know how to specify the pattern. I also don't know how to use a ForEach-Object with a $_.Where so that I can specify the "SkipUntil" mode.
Thanks very much!
Mathias is correct; you want to use the -replace operator, which uses regular expressions. I think this will do what you want:
$string -replace ',(?=.*,\d{3}-\d{2}-\d{4})',' '
The regular expression uses a lookahead (?=) to look for any commas that are followed by any number of any character (. is any character, * is any number of them including 0) that are then followed by a comma immediately followed by a SSN (\d{3}-\d{2}-\d{4}). The concept of "zero-width assertions", such as this lookahead, simply means that it is used to determine the match, but it not actually returned as part of the match.
That's how we're able to match only the commas in the names themselves, and then replace them with a space.
I know it's answered, and neatly so, but I tried to come up with an alternative to using a regex - count the number of commas in a line, then replace either the first one, or the first two, commas in the line.
But strings can't count how many times a character appears in them without using the regex engine(*), and replacements can't be done a specific number of times without using the regex engine(**), so it's not very neat:
$comma = [regex]","
Get-Content data.csv | ForEach {
$numOfCommasToReplace = $comma.Matches($_).Count - 4
$comma.Replace($_, ' ', $numOfCommasToReplace)
} | Out-File data2.csv
Avoiding the regex engine entirely, just for fun, gets me things like this:
Get-Content .\data.csv | ForEach {
$1,$2,$3,$4,$5,$6,$7 = $_ -split ','
if ($7) {"$1 $2 $3,$4,$5,$6,$7"} else {"$1 $2,$3,$4,$5,$6"}
} | Out-File data2.csv
(*) ($line -as [char[]] -eq ',').Count
(**) while ( #counting ) { # split/mangle/join }