Powershell get all errors not just first - powershell

I have a command like so:
(($remoteFiles | Where-Object { -not ($_ | Select-String -Quiet -NotMatch -Pattern '^[a-f0-9]{32}( )') }) -replace '^[a-f0-9]{32}( )', '$0= ' -join "`n") | ConvertFrom-StringData
sometimes it throws a
ConvertFrom-StringData : Data item 'a3512c98c9e159c021ebbb76b238707e' in line 'a3512c98c9e159c021ebbb76b238707e = My Pictures/Tony/Automatic Upload/Tony’s iPhone/2022-10-08 21-46-21 (2).mov'
is already defined.
BUT I believe there to be more and the error is only thrown on the FIRST occurrence, is there a way to get all of the errors so I can act upon them?

is there a way to get all of the errors
I'm afraid there is not, because what ConvertFrom-StringData reports on encountering a problem is a statement-terminating error, which means that it aborts its execution instantly, without considering further input.
You'd have to perform your own analysis of the input in order to detect multiple problems, such as duplicate keys; e.g.:
#'
a = 1
b = 2
a = 10
c = 3
b = 20
'# | ForEach-Object {
$_ -split '\r?\n' |
Group-Object { ($_ -split '=')[0].Trim() } |
Where-Object Count -gt 1 |
ForEach-Object {
Write-Error "Duplicate key: $($_.Name)"
}
}
Output:
Write-Error: Duplicate key: a
Write-Error: Duplicate key: b

Related

ConvertFrom-StringData Duplicates

how best to de-duplicate any data for this line? the only issue seems to be ConvertFrom-StringData; if I remove that one bit I get no errors ...
here's the code:
$remotefilehash = (($remoteFiles | Where-Object { -not ($_ | Select-String -Quiet -NotMatch -Pattern '^[a-f0-9]{32}( )') }) -replace '^[a-f0-9]{32}( )', '$0= ' -join "`n") | ConvertFrom-StringData
and the error
ConvertFrom-StringData : Data item 'a3512c98c9e159c021ebbb76b238707e' in line 'a3512c98c9e159c021ebbb76b238707e = My Pictures/Tony/Automatic Upload/Tony’s iPhone/2022-10-08 21-46-21.mov' is already
defined.
the $remotefiles var has data like:
a3512c98c9e159c021ebbb76b238707e = My Pictures/Tony/Automatic Upload/Tony’s iPhone/2022-10-08 21-46-21 (2).mov
a3512c98c9e159c021ebbb76b238707e = My Pictures/Tony/Automatic Upload/Tony’s iPhone/2022-10-08 21-46-21.mov
so I only need ONE of these files and since they both have the same checksum I don't care which path
I'm thinking maybe a try/catch on the "is already defined" ? maybe thats better b/c I can run a different command if it does happen
I would personally do it like this, with just a ForEach-Object to loop on each string, -match and $Matches for regex comparison and generating the objects, and a HashSet<T> that ensures the code doesn't output duplicated hashes:
$remoteFiles = #'
a3512c98c9e159c021ebbb76b238707e = path/to/thing/2022-10-08 21-46-21 (2).mov
a3512c98c9e159c021ebbb76b238707e = path/to/thing/2022-10-08 21-46-21.mov
a3512c98c9e159c021ebbb76b238707f = path/to/otherstuff
a3512c98c9e159c021ebbb76b238707f = path/to/otherstuff2
a3512c98c9e159c021ebbb76b238707f = path/to/otherstuff3
'# -split '\r?\n'
$hash = [Collections.Generic.HashSet[string]]::new([StringComparer]::OrdinalIgnoreCase)
$remoteFiles | ForEach-Object {
if($_ -match '(?<hash>^[a-f0-9]{32})[\s=]{5}(?<path>.+)' -and $hash.Add($Matches['hash'])) {
[pscustomobject]#{
Hash = $Matches['hash']
Path = $Matches['path']
}
}
}
Outputs:
Hash Path
---- ----
a3512c98c9e159c021ebbb76b238707e path/to/thing/2022-10-08 21-46-21 (2).mov
a3512c98c9e159c021ebbb76b238707f path/to/otherstuff

How to avoid double quote when using export-csv in Powershell [duplicate]

I am using ConvertTo-Csv to get comma separated output
get-process | convertto-csv -NoTypeInformation -Delimiter ","
It outputs like:
"__NounName","Name","Handles","VM","WS",".....
However I would like to get output without quotes, like
__NounName,Name,Handles,VM,WS....
Here is a way to remove the quotes
get-process | convertto-csv -NoTypeInformation -Delimiter "," | % {$_ -replace '"',''}
But it has a serious drawback if one of the item contains a " it will be removed !
Hmm, I have Powershell 7 preview 1 on my mac, and Export-Csv has a -UseQuotes option that you can set to AsNeeded. :)
I was working on a table today and thought about this very question as I was previewing the CSV file in notepad and decided to see what others had come up with. It seems many have over-complicated the solution.
Here's a real simple way to remove the quote marks from a CSV file generated by the Export-Csv cmdlet in PowerShell.
Create a TEST.csv file with the following data.
"ID","Name","State"
"5","Stephanie","Arizona"
"4","Melanie","Oregon"
"2","Katie","Texas"
"8","Steve","Idaho"
"9","Dolly","Tennessee"
Save As: TEST.csv
Store file contents in a $Test variable
$Test = Get-Content .\TEST.csv
Load $Test variable to see results of the get-content cmdlet
$Test
Load $Test variable again and replace all ( "," ) with a comma, then trim start and end by removing each quote mark
$Test.Replace('","',",").TrimStart('"').TrimEnd('"')
Save/Replace TEST.csv file
$Test.Replace('","',",").TrimStart('"').TrimEnd('"') | Out-File .\TEST.csv -Force -Confirm:$false
Test new file Output with Import-Csv and Get-Content:
Import-Csv .\TEST.csv
Get-Content .\TEST.csv
To Sum it all up, the work can be done with 2 lines of code
$Test = Get-Content .\TEST.csv
$Test.Replace('","',",").TrimStart('"').TrimEnd('"') | Out-File .\TEST.csv -Force -Confirm:$false
I ran into this issue, found this question, but was not satisfied with the answers because they all seem to suffer if the data you are using contains a delimiter, which should remain quoted. Getting rid of the unneeded double quotes is a good thing.
The solution below appears to solve this issue for a general case, and for all variants that would cause issues.
I found this answer elsewhere, Removing quotes from CSV created by PowerShell, and have used it to code up an example answer for the SO community.
Attribution: Credit for the regex, goes 100% to Russ Loski.
Code in a Function, Remove-DoubleQuotesFromCsv
function Remove-DoubleQuotesFromCsv
{
param (
[Parameter(Mandatory=$true)]
[string]
$InputFile,
[string]
$OutputFile
)
if (-not $OutputFile)
{
$OutputFile = $InputFile
}
$inputCsv = Import-Csv $InputFile
$quotedData = $inputCsv | ConvertTo-Csv -NoTypeInformation
$outputCsv = $quotedData | % {$_ -replace `
'\G(?<start>^|,)(("(?<output>[^,"]*?)"(?=,|$))|(?<output>".*?(?<!")("")*?"(?=,|$)))' `
,'${start}${output}'}
$outputCsv | Out-File $OutputFile -Encoding utf8 -Force
}
Test Code
$csvData = #"
id,string,notes,number
1,hello world.,classic,123
2,"a comma, is in here","test data 1",345
3,",a comma, is in here","test data 2",346
4,"a comma, is in here,","test data 3",347
5,"a comma, is in here,","test data 4`r`nwith a newline",347
6,hello world2.,classic,123
"#
$data = $csvData | ConvertFrom-Csv
"`r`n---- data ---"
$data
$quotedData = $data | ConvertTo-Csv -NoTypeInformation
"`r`n---- quotedData ---"
$quotedData
# this regular expression comes from:
# http://www.sqlmovers.com/removing-quotes-from-csv-created-by-powershell/
$fixedData = $quotedData | % {$_ -replace `
'\G(?<start>^|,)(("(?<output>[^,"\n]*?)"(?=,|$))|(?<output>".*?(?<!")("")*?"(?=,|$)))' `
,'${start}${output}'}
"`r`n---- fixedData ---"
$fixedData
$fixedData | Out-File e:\test.csv -Encoding ascii -Force
"`r`n---- e:\test.csv ---"
Get-Content e:\test.csv
Test Output
---- data ---
id string notes number
-- ------ ----- ------
1 hello world. classic 123
2 a comma, is in here test data 1 345
3 ,a comma, is in here test data 2 346
4 a comma, is in here, test data 3 347
5 a comma, is in here, test data 4... 347
6 hello world2. classic 123
---- quotedData ---
"id","string","notes","number"
"1","hello world.","classic","123"
"2","a comma, is in here","test data 1","345"
"3",",a comma, is in here","test data 2","346"
"4","a comma, is in here,","test data 3","347"
"5","a comma, is in here,","test data 4
with a newline","347"
"6","hello world2.","classic","123"
---- fixedData ---
id,string,notes,number
1,hello world.,classic,123
2,"a comma, is in here",test data 1,345
3,",a comma, is in here",test data 2,346
4,"a comma, is in here,",test data 3,347
5,"a comma, is in here,","test data 4
with a newline","347"
6,hello world2.,classic,123
---- e:\test.csv ---
id,string,notes,number
1,hello world.,classic,123
2,"a comma, is in here",test data 1,345
3,",a comma, is in here",test data 2,346
4,"a comma, is in here,",test data 3,347
5,"a comma, is in here,","test data 4
with a newline","347"
6,hello world2.,classic,123
This is pretty similar to the accepted answer but it helps to prevent unwanted removal of "real" quotes.
$delimiter = ','
Get-Process | ConvertTo-Csv -Delimiter $delimiter -NoTypeInformation | foreach { $_ -replace '^"','' -replace "`"$delimiter`"",$delimiter -replace '"$','' }
This will do the following:
Remove quotes that begin a line
Remove quotes that end a line
Replace quotes that wrap a delimiter with the delimiter alone.
Therefore, the only way this would go wrong is if one of the values actually contained not only quotes, but specifically a quote-delimiter-quote sequence, which hopefully should be pretty uncommon.
Once the file is generated, you can run
set-content FILENAME.csv ((get-content FILENAME.csv) -replace '"')
Depending on how pathological (or "full-featured") your CSV data is, one of the posted solutions will already work.
The solution posted by Kory Gill is almost perfect - the only issue remaining is that quotes are removed also for cells containing the line separator \r\n, which is causing issues in many tools.
The solution is adding a newline to the character class expression:
$fixedData = $quotedData | % {$_ -replace `
'\G(?<start>^|,)(("(?<output>[^,"\n]*?)"(?=,|$))|(?<output>".*?(?<!")("")*?"(?=,|$)))' `
,'${start}${output}'}
I wrote this for my needs:
function ConvertTo-Delimited {
[CmdletBinding()]
param(
[Parameter(ValueFromPipeline=$true,Mandatory=$true)]
[psobject[]]$InputObject,
[string]$Delimiter='|',
[switch]$ExcludeHeader
)
Begin {
if ( $ExcludeHeader -eq $false ) {
#(
$InputObject[0].PsObject.Properties | `
Select-Object -ExpandProperty Name
) -Join $Delimiter
}
}
Process {
foreach ($item in $InputObject) {
#(
$item.PsObject.Properties | `
Select-Object Value | `
ForEach-Object {
if ( $null -ne $_.Value ) {$_.Value.ToString()}
else {''}
}
) -Join $Delimiter
}
}
End {}
}
Usage:
$Data = #(
[PSCustomObject]#{
A = $null
B = Get-Date
C = $null
}
[PSCustomObject]#{
A = 1
B = Get-Date
C = 'Lorem'
}
[PSCustomObject]#{
A = 2
B = Get-Date
C = 'Ipsum'
}
[PSCustomObject]#{
A = 3
B = $null
C = 'Lorem Ipsum'
}
)
# with headers
PS> ConvertTo-Delimited $Data
A|B|C
1|7/17/19 9:07:23 PM|Lorem
2|7/17/19 9:07:23 PM|Ipsum
||
# without headers
PS> ConvertTo-Delimited $Data -ExcludeHeader
1|7/17/19 9:08:19 PM|Lorem
2|7/17/19 9:08:19 PM|Ipsum
||
Here's another approach:
Get-Process | ConvertTo-Csv -NoTypeInformation -Delimiter "," |
foreach { $_ -replace '^"|"$|"(?=,)|(?<=,)"','' }
This replaces matches with the empty string, in each line. Breaking down the regex above:
| is like an OR, used to unite the following 4 sub-regexes
^" matches quotes in the beginning of the line
"$ matches quotes in the end of the line
"(?=,) matches quotes that are immediately followed by a comma
(?<=,)" matches quotes that are immediately preceded by a comma
I found that Kory's answer didn't work for the case where the original string included more than one blank field in a row. I.e. "ABC",,"0" was fine but "ABC",,,"0" wasn't handled properly. It stopped replacing quotes after the ",,,". I fixed it by adding "|(?<output>)" near the end of the first parameter, like this:
% {$_ -replace `
'\G(?<start>^|,)(("(?<output>[^,"]*?)"(?=,|$))|(?<output>".*?(?<!")("")*?"(?=,|$))|(?<output>))', `
'${start}${output}'}
I haven't spent much time looking for removing the quotes. But, here is a workaround.
get-process | Export-Csv -NoTypeInformation -Verbose -Path $env:temp\test.csv
$csv = Import-Csv -Path $env:temp\test.csv
This is a quick workaround and there may be a better way to do this.
A slightly modified variant of JPBlanc's answer:
I had an existing csv file which looked like this:
001,002,003
004,005,006
I wanted to export only the first and third column to a new csv file. And for sure I didn't want any quotes ;-)
It can be done like this:
Import-Csv -Path .\source.csv -Delimiter ',' -Header A,B,C | select A,C | ConvertTo-Csv -NoTypeInformation -Delimiter ',' | % {$_ -replace '"',''} | Out-File -Encoding utf8 .\target.csv
Couldn't find an answer to a similar question so I'm posting what I've found here...
For exporting as Pipe Delimited with No Quotes for string qualifiers, use the following:
$objtable | convertto-csv -Delimiter "|" -notypeinformation | select -Skip $headers | % { $_ -replace '"\|"', "|"} | % { $_ -replace '""', '"'} | % { $_ -replace "^`"",''} | % { $_ -replace "`"$",''} | out-file "$OutputPath$filename" -fo -en ascii
This was the only thing I could come up with that could handle quotes and commas within the text; especially things like a quote and comma next to each other at the beginning or ending of a text field.
This function takes a powershell csv object from the pipeline and outputs like convertto-csv but without adding quotes (unless needed).
function convertto-unquotedcsv {
param([Parameter(ValueFromPipeline=$true)]$csv, $delimiter=',', [switch]$noheader=$false)
begin {
$NeedQuotesRex = "($([regex]::escape($delimiter))|[\n\r\t])"
if ($noheader) { $names = #($true) } else { $names = #($false) }
}
process {
$psop = $_.psobject.properties
if (-not $names) {
$names = $psop.name | % {if ($_ -match $NeedQuotesRex) {'"' + $_ + '"'} else {$_}}
$names -join $delimiter # unquoted csv header
}
$values = $psop.value | % {if ($_ -match $NeedQuotesRex) {'"' + $_ + '"'} else {$_}}
$values -join $delimiter # unquoted csv line
}
end {
}
}
$names gets an array of noteproperty names and $values gets an array of notepropery values. It took that special step to output the header. The process block gets the csv object one piece at a time.
Here is a test run
$delimiter = ','; $csvData = #"
id,string,notes,"points per 1,000",number
4,"a delimiter$delimiter is in here,","test data 3",1,348
5,"a comma, is in here,","test data 4`r`nwith a newline",0.5,347
6,hello world2.,classic,"3,000",123
"#
$csvdata | convertfrom-csv | sort number | convertto-unquotedcsv -delimiter $delimiter
id,string,notes,"points per 1,000",number
6,hello world2.,classic,"3,000",123
5,"a comma, is in here,","test data 4
with a newline",0.5,347
4,"a delimiter, is in here,",test data 3,1,348

Combining like objects in an array

I am attempting to analyze a group of text files (MSFTP logs) and do counts of IP addresses that have submitted bad credentials. I think I have it worked out except I don't think that the array is passing to/from the function correctly. As a result, I get duplicate entries if the same IP appears in multiple log files. What am I doing wrong?
Function LogBadAttempt($FTPLog,$BadPassesArray)
{
$BadPassEx="PASS - 530"
Foreach($Line in $FTPLog)
{
if ($Line -match $BadPassEx)
{
$IP=($Line.Split(' '))[1]
if($BadPassesArray.IP -contains $IP)
{
$CurrentIP=$BadPassesArray | Where-Object {$_.IP -like $IP}
[int]$CurrentCount=$CurrentIP.Count
$CurrentCount++
$CurrentIP.Count=$CurrentCount
}else{
$info=#{"IP"=$IP;"Count"='1'}
$BadPass=New-Object -TypeName PSObject -Property $info
$BadPassesArray += $BadPass
}
}
}
return $BadPassesArray
}
$BadPassesArray=#()
$FTPLogs = Get-Childitem \\ftpserver\MSFTPSVC1\test
$Result = ForEach ($LogFile in $FTPLogs)
{
$FTPLog=Get-Content ($LogFile.fullname)
LogBadAttempt $FTPLog
}
$Result | Export-csv C:\Temp\test.csv -NoTypeInformation
The result looks like...
Count IP
7 209.59.17.20
20 209.240.83.135
18441 209.59.17.20
13059 200.29.3.98
and would like it to combine the entries for 209.59.17.20
You're making this way too complicated. Process the files in a pipeline and use a hashtable to count the occurrences of each IP address:
$BadPasswords = #{}
Get-ChildItem '\\ftpserver\MSFTPSVC1\test' | Get-Content | ? {
$_ -like '*PASS - 530*'
} | % {
$ip = ($_ -split ' ')[1]
$BadPasswords[$ip]++
}
$BadPasswords.GetEnumerator() |
select #{n='IP';e={$_.Name}}, #{n='Count';e={$_.Value}} |
Export-Csv 'C:\Temp\test.csv' -NoType

Select-String sometimes results in "System.Object[]"

I'm working on a script that combines parts of two text files. These files are not too large (about 2000 lines each).
I'm seeing strange output from select-string that I don't think should be there.
Here's samples of my two files:
CC.csv - 2026 lines
LS126L47L6/1L2#519,07448,1,B
LS126L47L6/1R1-1#503,07449,1,B
LS126L47L6/1L3#536,07450,1,B
LS126L47L6/2R1#515,07451,1,B
LS126L47L6/10#525,07452,1,B
LS126L47L6/1L4#538,07453,1,B
GI.txt - 1995 lines
07445,B,SH,1
07446,B,SH,1
07448,B,SH,1
07449,B,SH,1
07450,B,SH,1
07451,B,SH,1
07452,B,SH,1
07453,B,SH,1
07454,B,SH,1
And here's a sample of the output file:
output in myfile.csv
LS126L47L6/3R1#516,07446,1,B
LS126L47L6/1L2#519,07448,1,B
LS126L47L6/1R1-1#503,07449,1,B
System.Object[],B
LS126L47L6/2R1#515,07451,1,B
This is the script I'm using:
sc ./myfile.csv "col1,col2,col3,col4"
$mn = gc cc.csv | select -skip 1 | % {$_.tostring().split(",")[1]}
$mn | % {
$a = (gc cc.csv | sls $_ ).tostring() -replace ",[a-z]$", ""
if (gc GI.txt | sls $_ | select -first 1)
{$b = (gc GI.txt | sls $_ | select -first 1).tostring().split(",")[1]}
else {$b = "NULL"
write-host "$_ is not present in GI file"}
$c = $a + ',' + $b
ac ./myfile.csv -value $c
}
The $a variable is where I am sometimes seeing the returned string as System.Object[]
Any ideas why? Also, this script takes quite some time to finish. Any tips for a newb on how to speed it up?
Edit: I should add that I've taken one line from the cc.csv file, saved in a new text file, and run through the script in console up through assigning $a. I can't get it to return "system.object[]".
Edit 2: After follow the advice below and trying a couple of things I've noticed that if I run
$mn | %{(gc cc.csv | sls $_).tostring()}
I get System.Object[].
But if I run
$mn | %{(gc cc.csv | sls $_)} | %{$_.tostring()}
It comes out fine. Go figure.
The problem is caused by a change in multiplicity of matches. If there are multiple matching elements an Object[] array (of MatchInfo elements) is returned; a single matching element results in a single MatchInfo object (not in an array); and when there are no matches, null is returned.
Consider these results, when executed against the "cc.csv" test-data supplied:
# matches many
(gc cc.csv | Select-String "LS" ).GetType().Name # => Object[]
# matches one
(gc cc.csv | Select-String "538").GetType().Name # => MatchInfo
# matches none
(gc cc.csv | Select-String "FAIL") # => null
The result of calling ToString on Object[] is "System.Object[]" while the result is a more useful concatenation of the matched values when invoked directly upon a MatchInfo object.
The immediate problem can be fixed with selected | Select -First 1, which will result in a MatchInfo being returned for the first two cases. Select-String will still search the entire input - extra results are simply discarded.
However, it seems like the look-back into "cc.csv" (with the Select-String) could be eliminated entirely as that is where $_ originally comes from. Here is a minor [untested] adaptation, of what it may look like:
gc cc.csv | Select -Skip 1 | %{
$num = $_.Split(",")[1]
$a = $_ -Replace ",[a-z]$", ""
# This is still O(m*n) and could be improved with a hash/set probe.
$gc_match = Select-String $num -Path gi.csv -SimpleMatch | Select -First 1
if ($gc_match) {
# Use of "Select -First 1" avoids the initial problem; but
# it /may/ be more appropriate for an error to indicate data problems.
# (Likewise, an error in the original may need further investigation.)
$b = $gc_match.ToString().Split(",")[1]
} else {
$b = "NULL"
Write-Host "$_ is not present in GI file"
}
$c = $a + ',' + $b
ac ./myfile.csv -Value $c
}

I need help formatting output with PowerShell's Out-File cmdlet

I have a series of documents that are going through the following function designed to count word occurrences in each document. This function works fine outputting to the console, but now I want to generate a text file containting the information, but with the file name appended to each word in the list.
My current console output is:
"processing document1 with x unique words occuring as follows"
"word1 12"
"word2 8"
"word3 3"
"word4 4"
"word5 1"
I want a delimited file in this format:
document1;word1;12
document1;word2;8
document1;word3;3
document1;word4;4
document1;word1;1
document2;word1;16
document2;word2;11
document2;word3;9
document2;word4;9
document2;word1;13
While the function below gets me the lists of words and occurences, I'm having a hard time figuring out where or how to insert the filename variable so that it prints at the head of each line. MSDN has been less-than helpful, and most of the places I try to insert the variable result in errors (see below)
function Count-Words ($docs) {
$document = get-content $docs
$document = [string]::join(" ", $document)
$words = $document.split(" `t",[stringsplitoptions]::RemoveEmptyEntries)
$uniq = $words | sort -uniq
$words | % {$wordhash=#{}} {$wordhash[$_] += 1}
Write-Host $docs "contains" $wordhash.psbase.keys.count "unique words distributed as follows."
$frequency = $wordhash.psbase.keys | sort {$wordhash[$_]}
-1..-25 | %{ $frequency[$_]+" "+$wordhash[$frequency[$_]]} | Out-File c:\out-file-test.txt -append
$grouped = $words | group | sort count
Do I need to create a string to pass to the out-file cmdlet? is this just something I've been putting in the wrong place on the last few tries? I'd like to understand WHY it's going in a particular place as well. Right now I'm just guessing, because I know I have no idea where to put the out-file to achieve my selected results.
I've tried formatting my command per powershell help, using -$docs and -FilePath, but each time I add anything to the out-file above that runs successfully, I get the following error:
Out-File : Cannot validate argument on parameter 'Encoding'. The argument "c:\out-file-test.txt" does not bel
ong to the set "unicode,utf7,utf8,utf32,ascii,bigendianunicode,default,oem" specified by the ValidateSet attribute. Sup
ply an argument that is in the set and then try the command again.
At C:\c.ps1:39 char:71
+ -1..-25 | %{ $frequency[$_]+" "+$wordhash[$frequency[$_]]} | Out-File <<<< -$docs -width 1024 c:\users\x46332\co
unt-test.txt -append
+ CategoryInfo : InvalidData: (:) [Out-File], ParameterBindingValidationException
+ FullyQualifiedErrorId : ParameterArgumentValidationError,Microsoft.PowerShell.Commands.OutFileCommand
I rewrote most of your code. You should utilize objects to make it easier formatting the way you want. This one splits on "space" and groups words together. Try this:
Function Count-Words ($paths) {
$output = #()
foreach ($path in $paths) {
$file = Get-ChildItem $path
((Get-Content $file) -join " ").Split(" ", [System.StringSplitOptions]::RemoveEmptyEntries) | Group-Object | Select-Object -Property #{n="FileName";e={$file.BaseName}}, Name, Count | % {
$output += "$($_.FileName);$($_.Name);$($_.Count)"
}
}
$output | Out-File test-out2.txt -Append
}
$filepaths = ".\test.txt", ".\test2.txt"
Count-Words -paths $filepaths
It outputs like you asked(document;word;count). If you want documentname to include extension, change $file.BaseName to $file.Name . Testoutput:
test;11;1
test;9;2
test;13;1
test2;word11;5
test2;word1;4
test2;12;1
test2;word2;2
Slightly different approach:
function Get-WordCounts ($doc)
{
$text_ = [IO.File]::ReadAllText($doc.fullname)
$WordHash = #{}
$text_ -split '\b' -match '\w+'|
foreach {$WordHash[$_]++}
$WordHash.GetEnumerator() |
foreach {
New-Object PSObject -Property #{
Word = $_.Key
Count = $_.Value
}
}
}
$docs = gci c:\testfiles\*.txt |
sort name
&{
foreach ($doc in dir $docs)
{
Get-WordCounts $doc |
sort Count -Descending |
foreach {
(&{$doc.Name;$_.Word;$_.Count}) -join ';'
}
}
} | out-file c:\somedir\wordcounts.txt
Try this:
$docs = #("document1", "document2", ...)
$docs | % {
$doc = $_
Get-Content $doc `
| % { $_.split(" `t",[stringsplitoptions]::RemoveEmptyEntries) } `
| Group-Object `
| select #{n="Document";e={$doc}}, Name, Count
} | Export-CSV output.csv -Delimiter ";" -NoTypeInfo
If you want to make this into a function you could do it like this:
function Count-Words($docs) {
foreach ($doc in $docs) {
Get-Content $doc `
| % { $_.split(" `t",[stringsplitoptions]::RemoveEmptyEntries) } `
| Group-Object `
| select #{n="Document";e={$doc}}, Name, Count
}
}
$files = #("document1", "document2", ...)
Count-Words $files | Export-CSV output.csv -Delimiter ";" -NoTypeInfo