How to trim blank spaces from PowerShell output? - powershell

I am using a PowerShell script to find all occurrences of a regex expression and output it to file. I have two objectives for this question.
Remove leading white space from a column's value
Specify a width for an extra field (LineNumbers)
This is what I have so far:
gci -recurse -include *.* | Select-String -pattern $regexPattern |`
Format-Table -GroupBy Name -Property Path, Line -AutoSize
This outputs the following:
Path Line
---- ----
C:\myRandomFile.txt This is line 1 and it is random text.
C:\myRandomFile.txt This is line 2 and it has leading white space.
C:\myNextRandomFile.txt This is line 3 and it has leading white space.
C:\myLastRandomFile.txt This is line 4.
This is because the files have leading white spaces (actually indents/tab spaces, but outputted as white spaces). I can't change the original files and remove the leading white space as they are our production files/SQL scripts.
I want to trim the leading white space for the Line column so that the output looks like this:
Path Line
---- ----
C:\myRandomFile.txt This is line 1 and it is random text.
C:\myRandomFile.txt This is line 2 and it has no leading white space.
C:\myNextRandomFile.txt This is line 3 and it has no leading white space.
C:\myLastRandomFile.txt This is line 4 and this is how it should look.
And, if I add the LineNumbers column by using
-property LineNumbers
then the LineNumbers column take up about half the space in the row. Can I specify the width of the LineNumbers? I've tried the -AutoSize flag, but this doesn't seem to work well. I've tried
LineNumber;width=50
LineNumber width=50
LineNumber -width 50
and all variations of this, but I get errors to the likes of "Format-Table: A parameter cannot be found that matches parameter name width=50"

I can't test it right now, but I think this should do the trick or at least get you going in the right direction:
gci -recurse -include *.* | Select-String -pattern $regexPattern |`
Format-Table Path, #{Name='Line'; Expression={$_.Line -replace '^\s+', ''}; Width=50}

You can use the TrimStart() method to remove leading spaces. There's also TrimEnd() to remove characters from the end, or Trim() to remove characters from both sides of the string.

I would not use Format-Table for output to a file.
I'd rather use Export-Csv
gci -recurse -include *.* | Select-String -pattern $regexPattern |`
select-object linenumber, path, line | Export-Csv c:\mycsv.csv -Delimiter "`t"
If you still want to use Format-Table I reccomend reading this aricle
http://www.computerperformance.co.uk/powershell/powershell_-f_format.htm
Quote:
"{0,28} {1, 20} {2,-8}" -f ` creates:
A column for the 1st item of 28 characters, right-aligned and adds a
space A column for the 2nd item of 20 characters right-aligned and
adds a space A column for the 3rd item of 8 characters left-aligned.

In case someone from this decade comes looking here, there is an alternative using Trim():
gci -recurse -include *.* | Select-String -pattern $regexPattern |`
Format-Table Path, #{Name='Line'; Expression={$_.Line.Trim()}; Width=50}

Related

Powershell, how to capture argument(s) of Select-String and include with matched output

Thanks to #mklement0 for the help with getting this far with answer given in Powershell search directory for code files with text matching input a txt file.
The below Powershell works well for finding the occurrences of a long list of database field names in a source code folder.
$inputFile = 'C:\DataColumnsNames.txt'
$outputFile = 'C:\DataColumnsUsages.txt'
Get-ChildItem C:\ProjectFolder -Filter *.cs -Recurse -Force -ea SilentlyContinue |
Select-String -Pattern (Get-Content $inputFile) |
Select-Object Path, LineNumber, line |
Export-csv $outputfile
However, many lines of source code have multiple matches, especially ADO.NET SQL statements with a lot of field names on one line. If the field name argument was included with the matching output the results will be more directly useful with less additional massaging such as lining up everything with the original field name list. For example if there is a source line "BatchId = NewId" it will match field name list item "BatchId". Is there an easy way to include in the output both "BatchId" and "BatchId = NewId"?
Played with the matches object but it doesn't seem to have the information. Also tried Pipeline variable like here but X is null.
$inputFile = 'C:\DataColumnsNames.txt'
$outputFile = 'C:\DataColumnsUsages.txt'
Get-ChildItem C:\ProjectFolder -Filter *.cs -Recurse -Force -ea SilentlyContinue |
Select-String -Pattern (Get-Content $inputFile -PipelineVariable x) |
Select-Object $x, Path, LineNumber, line |
Export-csv $outputile
Thanks.
The Microsoft.PowerShell.Commands.MatchInfo instances that Select-String outputs have a Pattern property that reflects the specific pattern among the (potential) array of patterns passed to -Pattern that matched on a given line.
The caveat is that if multiple patterns match, .Pattern only reports the pattern among those that matched that is listed first among them in the -Pattern argument.
Here's a simple example, using an array of strings to simulate lines from files as input:
'A fool and',
'his barn',
'are soon parted.',
'foo and bar on the same line' |
Select-String -Pattern ('bar', 'foo') |
Select-Object Line, LineNumber, Pattern
The above yields:
Line LineNumber Pattern
---- ---------- -------
A fool and 1 foo
his barn 2 bar
foo and bar on the same line 4 bar
Note how 'bar' is listed as the Pattern value for the last line, even though 'foo' appeared first in the input line, because 'bar' comes before 'foo' in the pattern array.
To reflect the actual pattern that appears first on the input line in a Pattern property, more work is needed:
Formulate your array of patterns as a single regex using alternation (|), wrapped as a whole in a capture group ((...)) - e.g., '(bar|foo)')
Note: The expression used below, '({0})' -f ('bar', 'foo' -join '|'), constructs this regex dynamically, from an array (the array literal 'bar', 'foo' here, but you can substitute any array variable or even (Get-Content $inputFile)); if you want to treat the input patterns as literals and they happen to contain regex metacharacters (such as .), you'll need to escape them with [regex]::Escape() first.
Use a calculated property to define a custom Pattern property that reports the capture group's value, which is the first among the values encountered on each input line:
'A fool and',
'his barn',
'are soon parted.',
'foo and bar on the same line' |
Select-String -AllMatches -Pattern ('({0})' -f ('bar', 'foo' -join '|')) |
Select-Object Line, LineNumber,
#{ n='Pattern'; e={ $_.Matches[0].Groups[1].Value } }
This yields (abbreviated to show only the last match):
Line LineNumber Pattern
---- ---------- -------
...
foo and bar on the same line 4 foo
Now, 'foo' is properly reported as the matching pattern.
To report all patterns found on each line:
Switch -AllMatches is required to tell Select-String to find all matches on each line, represented in the .Matches collection of the MatchInfo output objects.
The .Matches collection must then be enumerated (via the .ForEach() collection method) to extract the capture-group value from each match.
'A fool and',
'his barn',
'are soon parted.',
'foo and bar on the same line' |
Select-String -AllMatches -Pattern ('({0})' -f ('bar', 'foo' -join '|')) |
Select-Object Line, LineNumber,
#{ n='Pattern'; e={ $_.Matches.ForEach({ $_.Groups[1].Value }) } }
This yields (abbreviated to show only the last match):
Line LineNumber Pattern
---- ---------- -------
...
foo and bar on the same line 4 {foo, bar}
Note how both 'foo' and 'bar' are now reported in Pattern, in the order encountered on the line.
The solid information and examples from #mklement0 were enough to point me in the right direction for researching and understanding more about Powershell and the object pipeline and calculated properties.
I was able to finally achieve my goals of a cross referencing a list of table and field names to the C# code base.The input file is simply table and field names, pipe delimited. (one of the glitches I had was not using pipe in the split, it was a visual error that took awhile to finally see, so check for that). The output is the table name, field name, code file name, line number and actual line. It's not perfect but much better than manual effort for a few hundred fields! And now there are possibilities for further automation in the data mapping and conversion project. Thought about using C# utility programming but that might have taken just as long to figure out and implement and much more cumbersome that a working Powershell.
The key for me at this point is "working"! My first deeper dive into the abstruse world of Powershell. The key points of my solution are the use of the calculated property to get the table and field names in the output, realization that expressions can be used in certain places like to build a Pattern and that the pipeline is passing only certain specific objects after each command (maybe that is too restricted of a view but it's better than what I had before).
Hope this helps someone in future. I could not find any examples close enough to get over the hump and so asked my first ever stackoverflow questions.
$inputFile = "C:\input.txt"
$outputFile = "C:\output.csv"
$results = Get-Content $inputfile
foreach ($i in $results) {
Get-ChildItem -Path "C:\ProjectFolder" -Filter *.cs -Recurse -ErrorAction SilentlyContinue -Force |
Select-String -Pattern $i.Split('|')[1] |
Select-Object #{ n='Pattern'; e={ $i.Split('|')[0], $i.Split('|')[1] -join '|'} }, Filename, LineNumber, line |
Export-Csv $outputFile -Append}

How to Select-String multiline?

I am trying to Select-String on a text that is on multiple lines.
Example:
"This is line1
<Test>Testing Line 2</Test>
<Test>Testing Line 3</Test>"
I want to be able to select all 3 lines in Select-String. However, it is only selecting the first line when I do Select-String -Pattern "Line1". How can I extract all 3 lines together?
Select-String -Pattern "Line1"
Select-String operates on its input objects individually, and if you either pass a file path directly to it (via -Path or -LiteralPath) or you pipe output from Get-Content, matching is performed on each line.
Therefore, pass your input as a single, multiline string, which, if it comes from a file, is most easily achieved with Get-Content -Raw (PSv3+):
Get-Content -Raw file.txt | Select-String -Pattern Line1
Note that this means that if the pattern matches, the file's entire content is output, accessible via the output object's .Line property.
By contrast, if you want to retain per-line matching but also capture a fixed number of surrounding lines, use the -Context parameter.
Get-Content file.txt | Select-String -Pattern Line1 -Context 0, 2
Ansgar Wiechers' helpful answer shows how to extract all the lines from the result.
Select-String allows to select a given number of lines before or after the matching line via the parameter -Context. -Context 2,0 selects the preceding 2 lines, -Context 0,2 selects the subsequent 2 lines, -Context 2,2 selects the 2 lines before as well as the 2 lines after the match.
You won't get match and context lines in one big lump, though, so you need to combine matched line and context if you want them as a single string:
Select-String -Pattern 'Line1' -Context 0,2 | ForEach-Object {
$($_.Line; $_.Context.PostContext) | Out-String
}
As #mklement0 correctly pointed out in the comments, the above is comparatively slow, which isn't a problem if you're only processing a few matches, but becomes an issue if you need to process hundreds or thousands of matches. To improve performance you can merge the values into a single array and use the -join operator:
Select-String -Pattern 'Line1' -Context 0,2 | ForEach-Object {
(,$_.Line + $_.Context.PostContext) -join [Environment]::NewLine
}
Note that the two code snippets don't produce the exact same result, because Out-String appends a newline to each line including the last one, whereas -join only puts newlines between lines (not at the end of the last one). Each snippet can be modified to produce the same result as the other, though. Trim the strings from the first example to remove trailing newlines, or append another newline to the strings from the second one.
If you want the output as individual lines just output the Line and PostContext properties without merging them into one string:
Select-String -Pattern 'Line1' -Context 0,2 | ForEach-Object {
$_.Line
$_.Context.PostContext
}

Remove Ellipse from Table Output

I am trying to output to a text file the results of the powershell cmdlet Compare-Object The problem is I cannot eliminate the ellipse truncation.
The code below provides a table formatting definition variable which specifies a width of 1000 for the Path column. Yet the output file always truncates the Path column at 122 characters.
The Compare-Object cmdlet is comparing two ArrayLists which are just lists of file path strings from common folder paths between two servers.
What I am attempting to do is put the SideIndicator as the first column and the full path in the second. I do not want truncating of the file path.
$tableFormat = #{Expression={$_.SideIndicator};Label="Side Indicator";width=15}, #{Expression={$_.InputObject};Label="Path";width=1000}
$outputFilename = ($server1 + "_" + $server2 + "_FileCompare" + ".txt");
Compare-Object $Hive1FileArray $Hive2FileArray -IncludeEqual | Format-Table $tableFormat | Out-String | Out-File $outputFilename
I also tried removing Out-String from the pipe makes no difference.
What is going wrong here?
Thanks
Compare-Object $Hive1FileArray $Hive2FileArray -IncludeEqual |`
Format-Table $tableFormat -AutoSize |`
Out-String -Width 1000 |`
Out-File $outputFilename
Read
Get-Help 'Format-Table' -ShowWindow or its Online Version:
-AutoSize
Adjusts the column size and number of columns based on the width of
the data. By default, the column size and number are determined by the
view.
Get-Help 'Out-String' -ShowWindow or its Online Version:
-Width <Int32>
Specifies the number of characters in each line of output. Any
additional characters are truncated, not wrapped. If you omit this
parameter, the width is determined by the characteristics of the host
program. The default value for the Windows PowerShell console is 80
(characters).
Not much more to say not knowing Compare-Object cmdlet input objects…
I know this is a year old but another useful parameter of Format-Table is -wrap.
-Wrap []
Indicates that the cmdlet displays text that exceeds the column width on the next line. By default, text that exceeds the column width is truncated.
Required? false
Position? named
Default value False
Accept pipeline input? False
Accept wildcard characters? false

Whitespace and truncation with ellipsis on Select-Object

I'm trying to figure out why Select-Object
adds a lot of whitespace at the start of its output; and
truncates long properties with ellipsis.
Here's a repro of what I mean. Suppose you run these commands on C:\:
New-Item "MyTest" -Type Directory
cd MyTest
"Some very long lorem ipsum like text going into a certain file, bla bla bla and some more bla." | Out-File test.txt
Get-ChildItem | Select-String "text" | Select-Object LineNumber,Line
This will show output like this:
The ellipsis I can understand, that would be just the way the command ends up getting formatted when the result is written to the console host. However, the whitespace at the start still confuses me in this case.
Things get weirder for me though when I pipe the result to either clip or Out-File output.txt. I get similarly formatted output, with a lot of whitespace at the start and truncated Line properties.
Which command is causing this behavior, and how can I properly solve this? Most importantly: how can I get the full results into a file or onto my clipboard?
The default behavior of outputting the data is to use Format-Table without any modifiers, and the default behavior of Format-Table is to split the viewport into columns of equal width. This makes no assumption on the output width, and is faster in that the cmdlet doesn't need to process any string data from the pipeline prior to output.
To reduce the whitespace, you should use Format-Table -AutoSize as the output method. The -AutoSize switch first measures the widths of data, then outputs with regard to calculated width. If you need to not receive ellipsis and always display the full data set, add -Wrap switch to Format-Table, this way the value will be wrapped into more than a single line, but you can copy it via selecting a square area in Powershell window, just strip newlines off the clipped contents.
Get-ChildItem | Select-String "text" | Select-Object LineNumber,Line | Format-Table -AutoSize -Wrap
I'd say the best way to get the full output into a file would be to export the result as a CSV:
Get-ChildItem |
Select-String "text" |
Select-Object LineNumber,Line |
Export-Csv 'out.csv'
You could also build a string from the selected properties, which might be better for copying the data to the clipboard:
Get-ChildItem |
Select-String "text" |
ForEach-Object { '{0}:{1}' -f $_.LineNumber, $_.Line } |
Tee-Object 'out.txt' | clip
The behavior you observed is caused by the way PowerShell displays output. Basically, it looks at the first object and counts the properties. Objects with less than 5 properties are sent to Format-Table, otherwise to Format-List. The columns of tabular output are spread evenly across the available space. As #Vesper already mentioned you can enforce proportional column width by using the -AutoSize parameter, and wrapping of long lines by using the -Wrap parameter. Format-List wraps long strings by default.
See this blog post from Jeffrey Snover for more information.

Batch or Powershell to find lines equal to value and remove ones that are not

I am attempting to automate the manual validation of a file that I get daily. Currently the file I get is suppose to have 42 characters in a each line, mix characters. But randomly the file comes missing a space or invalid data length in a field. I am lost on how to check each lines length, and then remove the invalid lines from the master file and insert them into their own output file. I have made some head way with line length validation.
Get-Content dailyfile.txt | ForEach-Object { $_ | Measure-Object -Character } >> output.txt
But I cant wrap my head around how to use the output to find the specific line that doesn't equal 42. I may be asking more then a mouth full, but I cant even see light at the end of the tunnel on this one.
So something like this then.
Get-Content dailyfile.txt | Where-Object{$_.Length -lt 42} | Set-Content output.txt
Get-Content returns an array of strings. We use a Where-Object to pass the lines in the text file that contain a length of less than 42. If there is a chance it could be more than -ne would also work.
Mostly because I could not resist I wanted to help you with the code you had in your OP. While it is inefficient and longer this is what you could have done to complete your original code.
$TheAnswertotheUltimateQuestionofLifeTheUniverseandEverything = 42
Get-Content C:\temp\data.log | Where-Object{($_ | Measure-Object -Character | Select-Object -ExpandProperty Characters) -lt $TheAnswertotheUltimateQuestionofLifeTheUniverseandEverything} | Set-Content output.txt