I need help formatting output with PowerShell's Out-File cmdlet - powershell

I have a series of documents that are going through the following function designed to count word occurrences in each document. This function works fine outputting to the console, but now I want to generate a text file containting the information, but with the file name appended to each word in the list.
My current console output is:
"processing document1 with x unique words occuring as follows"
"word1 12"
"word2 8"
"word3 3"
"word4 4"
"word5 1"
I want a delimited file in this format:
document1;word1;12
document1;word2;8
document1;word3;3
document1;word4;4
document1;word1;1
document2;word1;16
document2;word2;11
document2;word3;9
document2;word4;9
document2;word1;13
While the function below gets me the lists of words and occurences, I'm having a hard time figuring out where or how to insert the filename variable so that it prints at the head of each line. MSDN has been less-than helpful, and most of the places I try to insert the variable result in errors (see below)
function Count-Words ($docs) {
$document = get-content $docs
$document = [string]::join(" ", $document)
$words = $document.split(" `t",[stringsplitoptions]::RemoveEmptyEntries)
$uniq = $words | sort -uniq
$words | % {$wordhash=#{}} {$wordhash[$_] += 1}
Write-Host $docs "contains" $wordhash.psbase.keys.count "unique words distributed as follows."
$frequency = $wordhash.psbase.keys | sort {$wordhash[$_]}
-1..-25 | %{ $frequency[$_]+" "+$wordhash[$frequency[$_]]} | Out-File c:\out-file-test.txt -append
$grouped = $words | group | sort count
Do I need to create a string to pass to the out-file cmdlet? is this just something I've been putting in the wrong place on the last few tries? I'd like to understand WHY it's going in a particular place as well. Right now I'm just guessing, because I know I have no idea where to put the out-file to achieve my selected results.
I've tried formatting my command per powershell help, using -$docs and -FilePath, but each time I add anything to the out-file above that runs successfully, I get the following error:
Out-File : Cannot validate argument on parameter 'Encoding'. The argument "c:\out-file-test.txt" does not bel
ong to the set "unicode,utf7,utf8,utf32,ascii,bigendianunicode,default,oem" specified by the ValidateSet attribute. Sup
ply an argument that is in the set and then try the command again.
At C:\c.ps1:39 char:71
+ -1..-25 | %{ $frequency[$_]+" "+$wordhash[$frequency[$_]]} | Out-File <<<< -$docs -width 1024 c:\users\x46332\co
unt-test.txt -append
+ CategoryInfo : InvalidData: (:) [Out-File], ParameterBindingValidationException
+ FullyQualifiedErrorId : ParameterArgumentValidationError,Microsoft.PowerShell.Commands.OutFileCommand

I rewrote most of your code. You should utilize objects to make it easier formatting the way you want. This one splits on "space" and groups words together. Try this:
Function Count-Words ($paths) {
$output = #()
foreach ($path in $paths) {
$file = Get-ChildItem $path
((Get-Content $file) -join " ").Split(" ", [System.StringSplitOptions]::RemoveEmptyEntries) | Group-Object | Select-Object -Property #{n="FileName";e={$file.BaseName}}, Name, Count | % {
$output += "$($_.FileName);$($_.Name);$($_.Count)"
}
}
$output | Out-File test-out2.txt -Append
}
$filepaths = ".\test.txt", ".\test2.txt"
Count-Words -paths $filepaths
It outputs like you asked(document;word;count). If you want documentname to include extension, change $file.BaseName to $file.Name . Testoutput:
test;11;1
test;9;2
test;13;1
test2;word11;5
test2;word1;4
test2;12;1
test2;word2;2

Slightly different approach:
function Get-WordCounts ($doc)
{
$text_ = [IO.File]::ReadAllText($doc.fullname)
$WordHash = #{}
$text_ -split '\b' -match '\w+'|
foreach {$WordHash[$_]++}
$WordHash.GetEnumerator() |
foreach {
New-Object PSObject -Property #{
Word = $_.Key
Count = $_.Value
}
}
}
$docs = gci c:\testfiles\*.txt |
sort name
&{
foreach ($doc in dir $docs)
{
Get-WordCounts $doc |
sort Count -Descending |
foreach {
(&{$doc.Name;$_.Word;$_.Count}) -join ';'
}
}
} | out-file c:\somedir\wordcounts.txt

Try this:
$docs = #("document1", "document2", ...)
$docs | % {
$doc = $_
Get-Content $doc `
| % { $_.split(" `t",[stringsplitoptions]::RemoveEmptyEntries) } `
| Group-Object `
| select #{n="Document";e={$doc}}, Name, Count
} | Export-CSV output.csv -Delimiter ";" -NoTypeInfo
If you want to make this into a function you could do it like this:
function Count-Words($docs) {
foreach ($doc in $docs) {
Get-Content $doc `
| % { $_.split(" `t",[stringsplitoptions]::RemoveEmptyEntries) } `
| Group-Object `
| select #{n="Document";e={$doc}}, Name, Count
}
}
$files = #("document1", "document2", ...)
Count-Words $files | Export-CSV output.csv -Delimiter ";" -NoTypeInfo

Related

Count number of comments over multiple files, including multi-line comments

I'm trying to write a script that counts all comments in multiple files, including both single line (//) and multi-line (/* */) comments and prints out the total. So, the following file would return 4
// Foo
var text = "hello world";
/*
Bar
*/
alert(text);
There's a requirement to include specific file types and exclude certain file types and folders, which I already have working in my code.
My current code is:
( gci -include *.cs,*.aspx,*.js,*.css,*.master,*.html -exclude *.designer.cs,jquery* -recurse `
| ? { $_.FullName -inotmatch '\\obj' } `
| ? { $_.FullName -inotmatch '\\packages' } `
| ? { $_.FullName -inotmatch '\\release' } `
| ? { $_.FullName -inotmatch '\\debug' } `
| ? { $_.FullName -inotmatch '\\plugin-.*' } `
| select-string "^\s*//" `
).Count
How do I change this to get multi-line comments as well?
UPDATE: My final solution (slightly more robust than what I was asking for) is as follows:
$CodeFiles = Get-ChildItem -include *.cs,*.aspx,*.js,*.css,*.master,*.html -exclude *.designer.cs,jquery* -recurse |
Where-Object { $_.FullName -notmatch '\\(obj|packages|release|debug|plugin-.*)\\' }
$TotalFiles = $CodeFiles.Count
$IndividualResults = #()
$CommentLines = ($CodeFiles | ForEach-Object{
#Get the comments via regex
$Comments = ([regex]::matches(
[IO.File]::ReadAllText($_.FullName),
'(?sm)^[ \t]*(//[^\n]*|/[*].*?[*]/)'
).Value -split '\r?\n') | Where-Object { $_.length -gt 0 }
#Get the total lines
$Total = ($_ | select-string .).Count
#Add to the results table
$IndividualResults += #{
File = $_.FullName | Resolve-Path -Relative;
Comments = $Comments.Count;
Code = ($Total - $Comments.Count)
Total = $Total
}
Write-Output $Comments
}).Count
$TotalLines = ($CodeFiles | select-string .).Count
$TotalResults = New-Object PSObject -Property #{
Files = $TotalFiles
Code = $TotalLines - $CommentLines
Comments = $CommentLines
Total = $TotalLines
}
Write-Output (Get-Location)
Write-Output $IndividualResults | % { new-object PSObject -Property $_} | Format-Table File,Code,Comments,Total
Write-Output $TotalResults | Format-Table Files,Code,Comments,Total
To be clear: Using string matching / regular expressions is not a fully robust way to detect comments in JavaScript / C# code, because there can be false positives (e.g., var s = "/* hi */";); for robust parsing you'd need a language parser.
If that is not a concern, and it is sufficient to detect comments (that start) on their own line, optionally preceded by whitespace, here's a concise solution (PSv3+):
(Get-ChildItem -include *.cs,*.aspx,*.js,*.css,*.master,*.html -exclude *.designer.cs,jquery* -recurse |
Where-Object { $_.FullName -notmatch '\\(obj|packages|release|debug|plugin-.*)' } |
ForEach-Object {
[regex]::matches(
[IO.File]::ReadAllText($_.FullName),
'(?sm)^[ \t]*(//[^\n]*|/[*].*?[*]/)'
).Value -split '\r?\n'
}
).Count
With the sample input, the ForEach-Object command yields 4.
Remove the ^[ \t]* part to match comments starting anywhere on a line.
The solution reads each input file as a single string with [IO.File]::ReadAllText() and then uses the [regex]::Matches() method to extract all (potentially line-spanning) comments.
Note: You could use Get-Content -Raw instead to read the file as a single string, but that is much slower, especially when processing multiple files.
The regex uses in-line options s and m ((?sm)) to respectively make . match newlines too and to make anchors ^ and $ match line-individually.
^[ \t]* matches any mix of spaces and tabs, if any, at the start of a line.
//[^\n]*$ matches a string that starts with // through the end of the line.
/[*].*?[*]/ matches a block comment across multiple lines; note the lazy quantifier, *?, which ensures that very next instance of the closing */ delimiter is matched.
The matched comments (.Value) are then split into individual lines (-split '\r?\n'), which are output.
The resulting lines across all files are then counted (.Count)
As for what you tried:
The fundamental problem with your approach is that Select-String with file-info object input (such as provided by Get-ChildItem) invariably processes the input files line by line.
While this could be remedied by calling Select-String inside a ForEach-Object script block in which you pass each file's content as a single string to Select-String, direct use of the underlying regex .NET types, as shown above, is more efficient.
An IMO better approach is to count net code lines by removing single/multi line comments.
For a start a script that handles single files and returns for your above sample.cs the result 5
((Get-Content sample.cs -raw) -replace "(?sm)^\s*\/\/.*?$" `
-replace "(?sm)\/\*.*?\*\/.*`n" | Measure-Object -Line).Lines
EDIT: without removing empty lines, build the difference from total lines
## Q:\Test\2018\10\31\SO_53092258.ps1
$Data = Get-ChildItem *.cs | ForEach-Object {
$Content = Get-Content $_.FullName -Raw
$TotalLines = (Measure-Object -Input $Content -Line).Lines
$CodeLines = ($Content -replace "(?sm)^\s*\/\/.*?$" `
-replace "(?sm)\/\*.*?\*\/.*`n" | Measure-Object -Line).Lines
$Comments = $TotalLines - $CodeLines
[PSCustomObject]#{
File = $_.FullName
Lines = $TotalLines
Comments= $Comments
}
}
$Data
"="*40
"TotalLines={0} TotalCommentLines={1}" -f (
$Data | Measure-Object -Property Lines,Comments -Sum).Sum
Sample output:
> Q:\Test\2018\10\31\SO_53092258.ps1
File Lines Comments
---- ----- --------
Q:\Test\2018\10\31\example.cs 10 5
Q:\Test\2018\10\31\sample.cs 9 4
============================================
TotalLines=19 TotalCommentLines=9

PowerShell - transpose results from a hashtable

I need to check the warranty of many servers, but the output returned by the module I found in https://www.powershellgallery.com/packages/HPWarranty/2.6.2 seems to be a hashtable and the first column contains what I want to be my rows.
the script below will return this where every 5 rows the fields repeat - output1.csv:
TYPE System.Management.Automation.PSCustomObject
"Component","Codecount"
"SerialNumber","CZ36092P5H"
"ProductNumber","727021-B21"
"OverallEntitlementStartDate","2016-03-04"
"OverallEntitlementEndDate","2019-04-02"
"ActiveEntitlement","true"
"SerialNumber","CZ36092P5K"
"ProductNumber","727021-B21"
"OverallEntitlementStartDate","2016-03-04"
"OverallEntitlementEndDate","2019-04-02"
"ActiveEntitlement","true"
How can I transpose the output so that SerialNumber, ProductNumber, OverallEntitlementStartDate, OverallEntitlementEndDate and ActiveEntitlement are the columns?
# variables
$dest_path = "C:\Scripts\HPE\HPWarranty"
$export_date = Get-Date -Format o | ForEach-Object {$_ -replace ":", "-"}
$myScriptName = $MyInvocation.MyCommand.Name
$transcriptPath = $dest_path + "\" + $myScriptName + "_transcript_" + $export_date + ".txt"
$csvPath = $dest_path + "\" + "hpe_list1.csv"
#Start transcript of script activities and set transcript location
start-transcript -append -path $transcriptPath | Out-Null
# import serials & part numbers to be processed
$csv_info = Import-Csv $csvPath
foreach ($line in $csv_info) {
$hash = (Get-HPEntWarrantyEntitlement -ProductNumber $line.ProductNumber -SerialNumber $line.SerialNumber)
&{$hash.getenumerator() |
ForEach-Object {new-object psobject -Property #{Component = $_.name;Codecount=$_.value}}
} | Export-Csv "C:\Scripts\HPE\HPWarranty\output1.csv" -Append
}
# Stop Transcript
Stop-Transcript | Out-Null
hpe_list1.csv that the script processes contains the details for two servers:
ProductNumber,SerialNumber
727021-B21,CZ36092P5H
727021-B21,CZ36092P5K
Cast the output hashtable to a [pscustomobject]:
$WarrantyInfo = foreach ($line in $csv_info) {
[pscustomobject](Get-HPEntWarrantyEntitlement -ProductNumber $line.ProductNumber -SerialNumber $line.SerialNumber)
}
$WarrantyInfo | Export-Csv "C:\Scripts\HPE\HPWarranty\output1.csv"

Combining like objects in an array

I am attempting to analyze a group of text files (MSFTP logs) and do counts of IP addresses that have submitted bad credentials. I think I have it worked out except I don't think that the array is passing to/from the function correctly. As a result, I get duplicate entries if the same IP appears in multiple log files. What am I doing wrong?
Function LogBadAttempt($FTPLog,$BadPassesArray)
{
$BadPassEx="PASS - 530"
Foreach($Line in $FTPLog)
{
if ($Line -match $BadPassEx)
{
$IP=($Line.Split(' '))[1]
if($BadPassesArray.IP -contains $IP)
{
$CurrentIP=$BadPassesArray | Where-Object {$_.IP -like $IP}
[int]$CurrentCount=$CurrentIP.Count
$CurrentCount++
$CurrentIP.Count=$CurrentCount
}else{
$info=#{"IP"=$IP;"Count"='1'}
$BadPass=New-Object -TypeName PSObject -Property $info
$BadPassesArray += $BadPass
}
}
}
return $BadPassesArray
}
$BadPassesArray=#()
$FTPLogs = Get-Childitem \\ftpserver\MSFTPSVC1\test
$Result = ForEach ($LogFile in $FTPLogs)
{
$FTPLog=Get-Content ($LogFile.fullname)
LogBadAttempt $FTPLog
}
$Result | Export-csv C:\Temp\test.csv -NoTypeInformation
The result looks like...
Count IP
7 209.59.17.20
20 209.240.83.135
18441 209.59.17.20
13059 200.29.3.98
and would like it to combine the entries for 209.59.17.20
You're making this way too complicated. Process the files in a pipeline and use a hashtable to count the occurrences of each IP address:
$BadPasswords = #{}
Get-ChildItem '\\ftpserver\MSFTPSVC1\test' | Get-Content | ? {
$_ -like '*PASS - 530*'
} | % {
$ip = ($_ -split ' ')[1]
$BadPasswords[$ip]++
}
$BadPasswords.GetEnumerator() |
select #{n='IP';e={$_.Name}}, #{n='Count';e={$_.Value}} |
Export-Csv 'C:\Temp\test.csv' -NoType

How to convert text file containing double quotes to csv format using powershell

I have a text file(with header Actual_Output and saved it as actual.txt) containing data such as
Actual_Output
W
à
é
"
'
(
_
ç
²"
^
^
*
END
I want to convert it into csv file using powershell. I doing in this way
$DB = import-csv E:\actual.txt
$outarray = #()
foreach ($Data in $DB)
{
$First = $Data.Actual_Output
$outarray += New-Object PsObject -property #{
'Actual_Output' = $First
}
write-host "Actual_Output: " $First
write-Host ""
}
$outarray | export-csv 'E:\result.csv' -NoTypeInformation -Encoding utf8
I am getting the output like this as shown in screenshot
I want each data to be listed in seperate cell. Actually double quote " is creating problem here. Please help in resolving this. Sorry if i am unclear in describing the issue
Tested this, and it seems to work better:
Get-Content actual.txt | select -Skip 1 |
foreach {
New-Object PSObject -Property #{Actual_Output = $_}
} | export-csv result.csv -NoTypeInformation -Encoding UTF8
The file isn't well-formed as CSV initially, so Import-CSV isn't able to parse it correctly.

Remove New Line Character from CSV file's string column

I have a CSV File with a string column were that column spans to multiple lines. I want to aggregate those multiple lines into one line.
For example
1, "asdsdsdsds", "John"
2, "dfdhifdkinf
dfjdfgkdnjgknkdjgndkng
dkfdkjfnjdnf", "Roy"
3, "dfjfdkgjfgn", "Rahul"
I want my output to be
1, "asdsdsdsds", "John"
2, "dfdhifdkinf dfjdfgkdnjgknkdjgndkng dkfdkjfnjdnf", "Roy"
3, "dfjfdkgjfgn", "Rahul"
I want to achieve this output using PowerShell
Thanks.
Building on Ansgar's answer, here's how to do it when:
You don't know the column names
Your CSV file may contain CR or LF independently
(Import-Csv $csvInput) | % {
$line = $_
foreach ($prop in $line.PSObject.Properties) {
$line.($prop.Name) = ($prop.Value -replace '[\r\n]',' ')
}
$line
} | Export-Csv $csvOutput -NoTypeInformation
Try this:
$csv = 'C:\path\to\your.csv'
(Import-Csv $csv -Header 'ID','Value','Name') | % {
$_.Value = $_.Value -replace "`r`n",' '
$_
} | Export-Csv $csv -NoTypeInformation
If your CSV contains headers, remove -Header 'ID','Value','Name' from the import and replace Value with the actual column name.
If you don't want double quotes around the fields, you can remove them by replacing Export-Csv with something like this:
... | ConvertTo-Csv -NoTypeInformation | % { $_ -replace '"' } | Out-File $csv
To remove the header from the output you add another filter before Out-File to skip the first line:
... | select -Skip 1 | Out-File $csv
You can import the csv, do a specialized select, and write the result into a new CSV.
import-csv Before.csv -Header "ID","Change" | Select ID,#{Name="NoNewLines", Expression={$_.Change -replace "`n"," "}} | export-csv After.csv
The key part is in the select statement, which allows you to pass a specialized hash table (Name is the name of the property, Expression is a scriptblock that computes it).
You may need to fiddle with headers a bit to get the exact output you want.
The problems with Export-CSV are twofold:
Early versions (powershell1 & 2) do not allow you to append data to the CSV
If the data being piped to it contains newline characters, the data is useless in Excel
The solution to both of the above is to use Convertto-CSV instead. Here is a sample:
{bunch of stuff} | ConvertTo-CSV | %{$_ -replace "`n","<NL>"} | %{$_ -replace "`r","<CR>"} >>$AppendFile
Note that this allows you to do whatever editing on the data (in this case, replacing newline data), and using redirecrors to append.
FYI: I've created a CSV Cleaner: https://stackoverflow.com/a/32016543/361842
This can be used to replace any unwanted characters / should be straight-forward to adapt to your needs.
Code copied below; though I recommend referring to the above thread to see any feedback from others.
clear-host
[Reflection.Assembly]::LoadWithPartialName("System.IO") | out-null
[Reflection.Assembly]::LoadWithPartialName("Microsoft.VisualBasic") | out-null
function Clean-CsvStream {
[CmdletBinding()]
param (
[Parameter(Mandatory = $true, ValueFromPipeline=$true)]
[string]$CsvRow
,
[Parameter(Mandatory = $false)]
[char]$Delimiter = ','
,
[Parameter(Mandatory = $false)]
[regex]$InvalidCharRegex
,
[Parameter(Mandatory = $false)]
[string]$ReplacementString
)
begin {
[bool]$IsSimple = [string]::IsNullOrEmpty($InvalidCharRegex)
if(-not $IsSimple) {
[System.IO.MemoryStream]$memStream = New-Object System.IO.MemoryStream
[System.IO.StreamWriter]$writeStream = New-Object System.IO.StreamWriter($memStream)
[Microsoft.VisualBasic.FileIO.TextFieldParser]$Parser = new-object Microsoft.VisualBasic.FileIO.TextFieldParser($memStream)
$Parser.SetDelimiters($Delimiter)
$Parser.HasFieldsEnclosedInQuotes = $true
[long]$seekStart = 0
}
}
process {
if ($IsSimple) {
$CsvRow
} else { #if we're not replacing anything, keep it simple
$seekStart = $memStream.Seek($seekStart, [System.IO.SeekOrigin]::Current)
$writeStream.WriteLine($CsvRow)
$writeStream.Flush()
$seekStart = $memStream.Seek($seekStart, [System.IO.SeekOrigin]::Begin)
write-output (($Parser.ReadFields() | %{$_ -replace $InvalidCharRegex,$ReplacementString }) -join $Delimiter)
}
}
end {
if(-not $IsSimple) {
try {$Parser.Close(); $Parser.Dispose()} catch{}
try {$writeStream.Close(); $writeStream.Dispose()} catch{}
try {$memStream.Close(); $memStream.Dispose()} catch{}
}
}
}
$csv = #(
(new-object -TypeName PSCustomObject -Property #{A="this is regular text";B="nothing to see here";C="all should be good"})
,(new-object -TypeName PSCustomObject -Property #{A="this is regular text2";B="what the`nLine break!";C="all should be good2"})
,(new-object -TypeName PSCustomObject -Property #{A="this is regular text3";B="ooh`r`nwindows line break!";C="all should be good3"})
,(new-object -TypeName PSCustomObject -Property #{A="this is regular text4";B="I've got;a semi";C="all should be good4"})
,(new-object -TypeName PSCustomObject -Property #{A="this is regular text5";B="""You're Joking!"" said the Developer`r`n""No honestly; it's all about the secret VB library"" responded the Google search result";C="all should be good5"})
) | convertto-csv -Delimiter ';' -NoTypeInformation
$csv | Clean-CsvStream -Delimiter ';' -InvalidCharRegex "[`r`n;]" -ReplacementString ':'