How to Select-String multiline? - powershell

I am trying to Select-String on a text that is on multiple lines.
Example:
"This is line1
<Test>Testing Line 2</Test>
<Test>Testing Line 3</Test>"
I want to be able to select all 3 lines in Select-String. However, it is only selecting the first line when I do Select-String -Pattern "Line1". How can I extract all 3 lines together?
Select-String -Pattern "Line1"

Select-String operates on its input objects individually, and if you either pass a file path directly to it (via -Path or -LiteralPath) or you pipe output from Get-Content, matching is performed on each line.
Therefore, pass your input as a single, multiline string, which, if it comes from a file, is most easily achieved with Get-Content -Raw (PSv3+):
Get-Content -Raw file.txt | Select-String -Pattern Line1
Note that this means that if the pattern matches, the file's entire content is output, accessible via the output object's .Line property.
By contrast, if you want to retain per-line matching but also capture a fixed number of surrounding lines, use the -Context parameter.
Get-Content file.txt | Select-String -Pattern Line1 -Context 0, 2
Ansgar Wiechers' helpful answer shows how to extract all the lines from the result.

Select-String allows to select a given number of lines before or after the matching line via the parameter -Context. -Context 2,0 selects the preceding 2 lines, -Context 0,2 selects the subsequent 2 lines, -Context 2,2 selects the 2 lines before as well as the 2 lines after the match.
You won't get match and context lines in one big lump, though, so you need to combine matched line and context if you want them as a single string:
Select-String -Pattern 'Line1' -Context 0,2 | ForEach-Object {
$($_.Line; $_.Context.PostContext) | Out-String
}
As #mklement0 correctly pointed out in the comments, the above is comparatively slow, which isn't a problem if you're only processing a few matches, but becomes an issue if you need to process hundreds or thousands of matches. To improve performance you can merge the values into a single array and use the -join operator:
Select-String -Pattern 'Line1' -Context 0,2 | ForEach-Object {
(,$_.Line + $_.Context.PostContext) -join [Environment]::NewLine
}
Note that the two code snippets don't produce the exact same result, because Out-String appends a newline to each line including the last one, whereas -join only puts newlines between lines (not at the end of the last one). Each snippet can be modified to produce the same result as the other, though. Trim the strings from the first example to remove trailing newlines, or append another newline to the strings from the second one.
If you want the output as individual lines just output the Line and PostContext properties without merging them into one string:
Select-String -Pattern 'Line1' -Context 0,2 | ForEach-Object {
$_.Line
$_.Context.PostContext
}

Related

Powershell, how to capture argument(s) of Select-String and include with matched output

Thanks to #mklement0 for the help with getting this far with answer given in Powershell search directory for code files with text matching input a txt file.
The below Powershell works well for finding the occurrences of a long list of database field names in a source code folder.
$inputFile = 'C:\DataColumnsNames.txt'
$outputFile = 'C:\DataColumnsUsages.txt'
Get-ChildItem C:\ProjectFolder -Filter *.cs -Recurse -Force -ea SilentlyContinue |
Select-String -Pattern (Get-Content $inputFile) |
Select-Object Path, LineNumber, line |
Export-csv $outputfile
However, many lines of source code have multiple matches, especially ADO.NET SQL statements with a lot of field names on one line. If the field name argument was included with the matching output the results will be more directly useful with less additional massaging such as lining up everything with the original field name list. For example if there is a source line "BatchId = NewId" it will match field name list item "BatchId". Is there an easy way to include in the output both "BatchId" and "BatchId = NewId"?
Played with the matches object but it doesn't seem to have the information. Also tried Pipeline variable like here but X is null.
$inputFile = 'C:\DataColumnsNames.txt'
$outputFile = 'C:\DataColumnsUsages.txt'
Get-ChildItem C:\ProjectFolder -Filter *.cs -Recurse -Force -ea SilentlyContinue |
Select-String -Pattern (Get-Content $inputFile -PipelineVariable x) |
Select-Object $x, Path, LineNumber, line |
Export-csv $outputile
Thanks.
The Microsoft.PowerShell.Commands.MatchInfo instances that Select-String outputs have a Pattern property that reflects the specific pattern among the (potential) array of patterns passed to -Pattern that matched on a given line.
The caveat is that if multiple patterns match, .Pattern only reports the pattern among those that matched that is listed first among them in the -Pattern argument.
Here's a simple example, using an array of strings to simulate lines from files as input:
'A fool and',
'his barn',
'are soon parted.',
'foo and bar on the same line' |
Select-String -Pattern ('bar', 'foo') |
Select-Object Line, LineNumber, Pattern
The above yields:
Line LineNumber Pattern
---- ---------- -------
A fool and 1 foo
his barn 2 bar
foo and bar on the same line 4 bar
Note how 'bar' is listed as the Pattern value for the last line, even though 'foo' appeared first in the input line, because 'bar' comes before 'foo' in the pattern array.
To reflect the actual pattern that appears first on the input line in a Pattern property, more work is needed:
Formulate your array of patterns as a single regex using alternation (|), wrapped as a whole in a capture group ((...)) - e.g., '(bar|foo)')
Note: The expression used below, '({0})' -f ('bar', 'foo' -join '|'), constructs this regex dynamically, from an array (the array literal 'bar', 'foo' here, but you can substitute any array variable or even (Get-Content $inputFile)); if you want to treat the input patterns as literals and they happen to contain regex metacharacters (such as .), you'll need to escape them with [regex]::Escape() first.
Use a calculated property to define a custom Pattern property that reports the capture group's value, which is the first among the values encountered on each input line:
'A fool and',
'his barn',
'are soon parted.',
'foo and bar on the same line' |
Select-String -AllMatches -Pattern ('({0})' -f ('bar', 'foo' -join '|')) |
Select-Object Line, LineNumber,
#{ n='Pattern'; e={ $_.Matches[0].Groups[1].Value } }
This yields (abbreviated to show only the last match):
Line LineNumber Pattern
---- ---------- -------
...
foo and bar on the same line 4 foo
Now, 'foo' is properly reported as the matching pattern.
To report all patterns found on each line:
Switch -AllMatches is required to tell Select-String to find all matches on each line, represented in the .Matches collection of the MatchInfo output objects.
The .Matches collection must then be enumerated (via the .ForEach() collection method) to extract the capture-group value from each match.
'A fool and',
'his barn',
'are soon parted.',
'foo and bar on the same line' |
Select-String -AllMatches -Pattern ('({0})' -f ('bar', 'foo' -join '|')) |
Select-Object Line, LineNumber,
#{ n='Pattern'; e={ $_.Matches.ForEach({ $_.Groups[1].Value }) } }
This yields (abbreviated to show only the last match):
Line LineNumber Pattern
---- ---------- -------
...
foo and bar on the same line 4 {foo, bar}
Note how both 'foo' and 'bar' are now reported in Pattern, in the order encountered on the line.
The solid information and examples from #mklement0 were enough to point me in the right direction for researching and understanding more about Powershell and the object pipeline and calculated properties.
I was able to finally achieve my goals of a cross referencing a list of table and field names to the C# code base.The input file is simply table and field names, pipe delimited. (one of the glitches I had was not using pipe in the split, it was a visual error that took awhile to finally see, so check for that). The output is the table name, field name, code file name, line number and actual line. It's not perfect but much better than manual effort for a few hundred fields! And now there are possibilities for further automation in the data mapping and conversion project. Thought about using C# utility programming but that might have taken just as long to figure out and implement and much more cumbersome that a working Powershell.
The key for me at this point is "working"! My first deeper dive into the abstruse world of Powershell. The key points of my solution are the use of the calculated property to get the table and field names in the output, realization that expressions can be used in certain places like to build a Pattern and that the pipeline is passing only certain specific objects after each command (maybe that is too restricted of a view but it's better than what I had before).
Hope this helps someone in future. I could not find any examples close enough to get over the hump and so asked my first ever stackoverflow questions.
$inputFile = "C:\input.txt"
$outputFile = "C:\output.csv"
$results = Get-Content $inputfile
foreach ($i in $results) {
Get-ChildItem -Path "C:\ProjectFolder" -Filter *.cs -Recurse -ErrorAction SilentlyContinue -Force |
Select-String -Pattern $i.Split('|')[1] |
Select-Object #{ n='Pattern'; e={ $i.Split('|')[0], $i.Split('|')[1] -join '|'} }, Filename, LineNumber, line |
Export-Csv $outputFile -Append}

Remove start and end spaces in specific csv column

I am trying to remove start and end spaces in column data in CSV file. I've got a solution to remove all spaces in the csv, but it's creating non-readable text in description column.
Get-Content –path test.csv| ForEach-Object {$_.Trim() -replace "\s+" } | Out-File -filepath out.csv -Encoding ascii
e.g.
'192.168.1.2' ' test-1-TEST' 'Ping Down at least 1 min' '3/11/2017' 'Unix Server' 'Ping' 'critical'
'192.168.1.3' ' test-2-TEST' ' Ping Down at least 3 min' '3/11/2017' 'windows Server' 'Ping' 'critical'
I only want to remove space only from ' test-1-TEST' and not from 'Ping Down at least 1 min'. Is this possible?
"IP","ServerName","Status","Date","ServerType","Test","State"
"192.168.1.2"," test-1-TEST","Ping Down at least 1 min","3/11/2017","Unix Server","Ping","critical"
"192.168.1.3"," test-2-TEST"," Ping Down at least 3 min","3/11/2017","windows Server","Ping","critical"
For example file above:
Import-Csv C:\folder\file.csv | ForEach-Object {
$_.ServerName = $_.ServerName.trim()
$_
} | Export-Csv C:\folder\file2.csv -NoTypeInformation
Replace ServerName with the name of the Column you want to remove spaces from (aka trim).
If your CSV does not have header (which means its not a true CSV) and/or you want to better preserve the original file structure and formatting you could try to expand on your regex a little.
(Get-Content c:\temp\test.txt -Raw) -replace "(?<=')\s+(?=[^' ])|(?<=[^' ])\s+(?=')"
That should remove all leading and trailing spaces inside the quoted values. Not the delimeters themselves.
Read the file in as one string. Could be bad idea depending on file size. Not required as the solution is not dependent on that. Can still be read line be line with the same transformation achieving the same result. Use two replacements that are similar. First is looking for spaces that exist after a single quote but not followed by another quote or space. Second is looking for spaces before a quote that are not preceded by a quote or space.
Just wanted to give a regex example. You can look into this with more detail and explanation at regex101.com. There you will see an alternation pattern instead of two separate replacements.
(Get-Content c:\temp\test.txt -Raw) -replace "(?<=')\s+(?=[^' ])|(?<=[^' ])\s+(?=')"
The first example is a little easier on the eyes.
I was having issues consistently replicating this but if you are having issues with it replacing newlines as well then you can just do the replacement one line at a time and that should work as well.
(Get-Content c:\temp\test.txt) | Foreach-Object{
$_ -replace "(?<=')\s+(?=[^' ])|(?<=[^' ])\s+(?=')"
} | Set-Content c:\temp\test.txt

search first "x" amount of characters in a line, and output entire line

I have a text file that could contain up to 1000 lines of data in the following format:
14410:3012669|EU14410|20/01/2017||||1|6|4|OUT FROM UNDER||22/02/2017 04:01:47|22/02/2017 21:19:52
14:3012670|EU016271751|20/01/2017||||2|6|4|BLOCK BET|\\acis-prod\Pictures\Entry\EU01627.jpg|22/02/2017 04:02:02|22/02/2017 21:19:52
301111:3012671|EU016275|20/01/2017||||2|6|4|VITAE MEDICAL CLINIC|\\tm-prod\Pictures\Entry\EU01.jpg|22/02/2017 04:02:11|22/02/2017 21:19:53
each line will start with the following format
"set of characters up to max of 8":"set of characters unlimited max"
I want to search the characters ONLY up until the first colon. Those characters could contain any amount up to a maximum of 8. (hopefully shown well in my examples above) I'm trying to search those first characters, up to the ":" of each line to see if it contains a string, and return the whole line. still new to powershell so I've only tried a simple select:
$path = "C:\Users\ME\Desktop\acsep22\acsnic-20170222_233324.done"
Get-ChildItem $path -recurse | Select-String -pattern ("14410","3011981","3011982",) | out-file $logfile |format-list
which works - but I didn't take into account that the string could also appear twice in the same line ( though unrelated to the first 7 characters)
for example:
14410:3012669|EU14410|
contains 14410 twice, they're unrelated in terms of their significance and I only want to search and return based on the first number
could somebody help me achieve this or could some one point me toward the cmdlet that would help?
I've tried various searches online (and via the Microsoft online resource) but a lot of results are more to do with "return the first X amount of characters" rather than "search using only the first X amount and return line"
Kind Regards
You could use a simple Where-Object filter to check whether the string before the : is one of the strings you expect:
$strings = '14410','3011981','3011982'
Get-Content $path |Where-Object {$strings -contains ($_ -split ':')[0]}
This is probably the most PowerShell-idiomatic approach.
If you want to use Select-String, you'll have to construct a regex pattern that will match on strings that start with one of the strings and then a colon:
$strings = '14410','3011981','3011982'
$pattern = '^(?:{0}):' -f ($strings -join '|') # pattern is now '^(?:14410|3011981|3011982):'
Select-String -Path $path -Pattern $pattern
If you just want the bare string itself from the output, grab the Line property from the objects returned by Select-String:
Select-String -Path $path -Pattern $pattern |Select-Object -Expand Line
or
Select-String -Path $path -Pattern $pattern |ForEach-Object Line
The pattern above uses a non-capturing group (?:pattern-goes-here) to match any one of the strings, at the start ^ of a string, followed by :.
Both solutions will work with an arbitrary number of strings

Pipe to Out-GridView and Out-File got different rows?

The following command returns only one row (the parameter -Context 10 is ignored.)
select-string -path file.txt -pattern "..." -Context 10 | Out-GridView
However, the following command create a file with all the lines.
select-string -path file.txt -pattern "..." -Context 10 | Out-File file2
Why there is a difference?
This is because Out-Gridview consumes the entire MatchInfo object that Select-String outputs, and displays all of the properties of that object as columns. Out-File on the other hand basically performs the ToString() method on everything before it outputs it to a file, and for that kind of object when it converts to a string it outputs the line, and the context lines as well. If you want Out-GridView to do that you will have to pipe to Out-String and then to Out-Gridview.

Count specific string in text file using PowerShell

Is it possible to count specific strings in a file and save the value in a variable?
For me it would be the String "/export" (without quotes).
Here's one method:
$FileContent = Get-Content "YourFile.txt"
$Matches = Select-String -InputObject $FileContent -Pattern "/export" -AllMatches
$Matches.Matches.Count
Here's a way to do it.
$count = (get-content file1.txt | select-string -pattern "/export").length
As mentioned in comments, this will return the count of lines containing the pattern, so if any line has more than one instance of the pattern, the count won't be correct.
If you're searching in a large file (several gigabytes) that could have have millions of matches, you might run into memory problems. You can do something like this (inspired by a suggestion from NealWalters):
Select-String -Path YourFile.txt -Pattern '/export' -SimpleMatch | Measure-Object -Line
This is not perfect because
it counts the number of lines that contain the match, not the total number of matches.
it prints some headings along with the count, rather than putting just the count into a variable.
You can probably solve these if you need to. But at least you won't run out of memory.
grep -co vs grep -c
Both are useful and thanks for the "o" version. New one to me.