ConvertFrom-StringData Duplicates - powershell

how best to de-duplicate any data for this line? the only issue seems to be ConvertFrom-StringData; if I remove that one bit I get no errors ...
here's the code:
$remotefilehash = (($remoteFiles | Where-Object { -not ($_ | Select-String -Quiet -NotMatch -Pattern '^[a-f0-9]{32}( )') }) -replace '^[a-f0-9]{32}( )', '$0= ' -join "`n") | ConvertFrom-StringData
and the error
ConvertFrom-StringData : Data item 'a3512c98c9e159c021ebbb76b238707e' in line 'a3512c98c9e159c021ebbb76b238707e = My Pictures/Tony/Automatic Upload/Tony’s iPhone/2022-10-08 21-46-21.mov' is already
defined.
the $remotefiles var has data like:
a3512c98c9e159c021ebbb76b238707e = My Pictures/Tony/Automatic Upload/Tony’s iPhone/2022-10-08 21-46-21 (2).mov
a3512c98c9e159c021ebbb76b238707e = My Pictures/Tony/Automatic Upload/Tony’s iPhone/2022-10-08 21-46-21.mov
so I only need ONE of these files and since they both have the same checksum I don't care which path
I'm thinking maybe a try/catch on the "is already defined" ? maybe thats better b/c I can run a different command if it does happen

I would personally do it like this, with just a ForEach-Object to loop on each string, -match and $Matches for regex comparison and generating the objects, and a HashSet<T> that ensures the code doesn't output duplicated hashes:
$remoteFiles = #'
a3512c98c9e159c021ebbb76b238707e = path/to/thing/2022-10-08 21-46-21 (2).mov
a3512c98c9e159c021ebbb76b238707e = path/to/thing/2022-10-08 21-46-21.mov
a3512c98c9e159c021ebbb76b238707f = path/to/otherstuff
a3512c98c9e159c021ebbb76b238707f = path/to/otherstuff2
a3512c98c9e159c021ebbb76b238707f = path/to/otherstuff3
'# -split '\r?\n'
$hash = [Collections.Generic.HashSet[string]]::new([StringComparer]::OrdinalIgnoreCase)
$remoteFiles | ForEach-Object {
if($_ -match '(?<hash>^[a-f0-9]{32})[\s=]{5}(?<path>.+)' -and $hash.Add($Matches['hash'])) {
[pscustomobject]#{
Hash = $Matches['hash']
Path = $Matches['path']
}
}
}
Outputs:
Hash Path
---- ----
a3512c98c9e159c021ebbb76b238707e path/to/thing/2022-10-08 21-46-21 (2).mov
a3512c98c9e159c021ebbb76b238707f path/to/otherstuff

Related

Powershell get all errors not just first

I have a command like so:
(($remoteFiles | Where-Object { -not ($_ | Select-String -Quiet -NotMatch -Pattern '^[a-f0-9]{32}( )') }) -replace '^[a-f0-9]{32}( )', '$0= ' -join "`n") | ConvertFrom-StringData
sometimes it throws a
ConvertFrom-StringData : Data item 'a3512c98c9e159c021ebbb76b238707e' in line 'a3512c98c9e159c021ebbb76b238707e = My Pictures/Tony/Automatic Upload/Tony’s iPhone/2022-10-08 21-46-21 (2).mov'
is already defined.
BUT I believe there to be more and the error is only thrown on the FIRST occurrence, is there a way to get all of the errors so I can act upon them?
is there a way to get all of the errors
I'm afraid there is not, because what ConvertFrom-StringData reports on encountering a problem is a statement-terminating error, which means that it aborts its execution instantly, without considering further input.
You'd have to perform your own analysis of the input in order to detect multiple problems, such as duplicate keys; e.g.:
#'
a = 1
b = 2
a = 10
c = 3
b = 20
'# | ForEach-Object {
$_ -split '\r?\n' |
Group-Object { ($_ -split '=')[0].Trim() } |
Where-Object Count -gt 1 |
ForEach-Object {
Write-Error "Duplicate key: $($_.Name)"
}
}
Output:
Write-Error: Duplicate key: a
Write-Error: Duplicate key: b

How do you group unique values from imported csv in a foreach loop

I've got a txt file with the following content:
#test.txt
'ALDHT21;MIMO;1111;BOK;Tree'
'ALDHT21;MIMO;1211;BOK;Tree'
'PRGHT21;AIMO;1351;STE;Water'
'PRGHT21;AIMO;8888;FRA;Stone'
'ABCDT22;DIDO;8888;STE;Stone'
'PRA2HT21;ADDO;8888;STE;Stone'
';ADDO;1317;STE;Stone'
To make it easier to explain, let's give the above content headers:
''Group;Code;ID;Signature;Type'
With the help of Powershell, I'm trying to create a foreach loop of each unique "Signature" to return two variables with unique data from rows where the "Signature" exists in and then mashed together with some delimiters.
Based on the file content, here are the expected results:
First loop:
$Signature = "BOK"
$Groups = "Tree:ALDHT21"
$Codes = "Tree:MIMO"
Next loop:
$Signature = "FRA"
$Groups = "Stone:PRGHT21"
$Codes = "Stone:AIMO"
Last loop:
$Signature = "STE"
$Groups = "Stone:PRA2HT21,Stone:ABCDT22,Water:PRGHT21"
$Codes = "Stone:ADDO,Stone:DIDO,Water:AIMO"
Notice the last loop should skip the last entry in the file because it contains an empty Group.
My attempt didn't quite hit the mark and I'm struggling to find a good way to accomplish this:
$file = "C:\temp\test.txt"
$uniqueSigs = (gc $file) -replace "'$|^'" | ConvertFrom-Csv -Delimiter ';' -Header Group,Code,ID,Signature,Type | group Signature
foreach ($sigs in $uniqueSigs) {
$Groups = ""
foreach ($Group in $sigs.Group) {
$Groups += "$($Group.Type):$($Group.Group),"
}
$Groups = $Groups -replace ",$"
[PSCustomObject] #{
Signatur = $sigs.Name
Groups = $Groups
}
$Codes = ""
foreach ($Group in $sigs.Group) {
$Codes += "$($Group.Type):$($Group.Code),"
}
$Codes = $Codes -replace ",$"
[PSCustomObject] #{
Signatur = $sigs.Name
Codes = $Codes
}
$Signature = $sigs.Name
If ($Group.Group){
write-host "$Signature "-" $Groups "-" $Codes "
}
}
Result from my bad attempt:
BOK - Tree:ALDHT21,Tree:ALDHT21 - Tree:MIMO,Tree:MIMO
FRA - Stone:PRGHT21 - Stone:AIMO
Any help appreciated. :)
Your variables are somewhat confusingly named; the following streamlined solution uses fewer variables and perhaps produces the desired result:
$file = "test.txt"
(Get-Content $file) -replace "'$|^'" | ConvertFrom-Csv -Delimiter ';' -Header Group,Code,ID,Signature,Type |
Group-Object Signature |
ForEach-Object {
# Create and output an object with group information.
# Skip empty .Group properties among the group's member objects.
# Get the concatenation of all .Group and .Code column
# values each, skipping empty groups and eliminating duplicates.
$groups = (
$_.Group.ForEach({ if ($_.Group) { "$($_.Type):$($_.Group)" } }) |
Select-Object -Unique
) -join ","
$codes = (
$_.Group.ForEach({ "$($_.Type):$($_.Code)" }) |
Select-Object -Unique
) -join ","
# Create and output an object comprising the signature
# and the concatenated groups and codes.
[PSCustomObject] #{
Signature = $_.Name
Groups = $groups
Codes = $codes
}
# Note: This is just *for-display* output.
# Don't use Write-Host to output *data*.
Write-Host ($_.Name, $groups, $codes -join ' - ')
}
Output:
BOK - Tree:ALDHT21 - Tree:MIMO
FRA - Stone:PRGHT21 - Stone:AIMO
STE - Water:PRGHT21,Stone:ABCDT22,Stone:PRA2HT21 - Water:AIMO,Stone:DIDO,Stone:ADDO
Signature Groups Codes
--------- ------ -----
BOK Tree:ALDHT21 Tree:MIMO
FRA Stone:PRGHT21 Stone:AIMO
STE Water:PRGHT21,Stone:ABCDT22,Stone:PRA2HT21 Water:AIMO,Stone:DIDO,Stone:ADDO
Note that the for-display Write-Host surprisingly precedes the the default output formatting for the [pscustomobject] instances, which is due to the asynchronous behavior of the implicitly applied Format-Table formatting explained in this answer.

how to assign the output of a string separated by column to a variable in PowerShell

I am trying to write a PowerShell script to get the required contents from a log/text file. The file looks like below:
node1 : data1
node2 : data2
Administrators : Data is Not Available at this moment
desiredouput : data3
format : format-type
node1 : data4
node2 : data5
Administrators : user1, user2, user3, user4, user5, user6
desiredoutput : data6
format : format-type
node1 : data7
node2 : data8
Administrators : user1, user2, user3, user4, user5, user6, user7, user8, user9, user10, user11,
user12, user13, user14, user15, user16, user17, user18, user19, user20
desiredoutput : data9
format : format-type
.....
the sequence continues
.....
As you can see after every five lines, the new data will be displayed to the same variable on the left.
I want to fetch the data after : in each and every line and assign that to a variable. Here is the code I am writing:
$deliverycontent = #()
$filecontent = Get-Content -Path <path to the text file>
$datacount = $filecontent.Length
for ($i=0; $i -lt $datacount; $i+=5)
{
$temp = "" |select Header1, Header2, Header3, Header4, Header5
$temp.Header1 = ($filecontent[$i]).Split(":")[1]
$temp.Header2 = ($filecontent[$i+1]).Split(":")[1]
$temp.Header3 = ($filecontent[$i+2]).Split(":")[1]
$temp.Header4 = ($filecontent[$i+3]).Split(":")[1]
$temp.Header5 = ($filecontent[$i+4]).Split(":")[1]
$deliverycontent += $temp
}
while running the above script, I am seeing the data is not properly assigned especially for the output of the administrators because the output of the administrators is a huge string and it is printing the output in the next line instead in a single line in the text file so the powershell output is not displaying as expected. How can I assign the entire string of the administrators to a single variable even it is printed in the next lines as per the loop condition provided.
The desired output is:
Header1: data1
Header2: data2
Header3 : Data is Not Available at this moment
Header4: data3
Header5: format-type
Header1: data4
Header2: data5
Header3 : user1, user2, user3, user4, user5, user6
Header4: data6
Header5: format-type
Header1: data7
Header2: data8
Header3 : user1, user2, user3, user4, user5, user6, user7, user8, user9, user10, user11, user12,
user13, user14, user15, user16, user17, user18, user19, user20
Header4: data6
Header5: format-type
....
<the sequence follows>
....
How can I achieve this?
The multi-line Administrator value is actually relatively easy to fix if things are formatted as you suggest without variance. What you can do is read in everything as a multi-line string (by adding the -raw parameter to your Get-Content command). Then remove any New Line/Carriage Return characters that are not followed by Something : (a word followed by a space and a colon). You can do that with a RegEx Negative Lookahead. Then just split it all on the remaining new lines and you'll end up with a file that will work with the rest of your script just fine.
$deliverycontent = #()
$filecontent = Get-Content -Path <path to the text file> -raw
$filecontent = $filecontent -replace '[\r\n]+(?!\w+ :)' -split '[\r\n]+'
$datacount = $filecontent.Length
$deliverycontent = for ($i=0; $i -lt $datacount; $i+=5)
{
$temp = "" |select Header1, Header2, Header3, Header4, Header5
$temp.Header1 = ($filecontent[$i]).Split(":")[1]
$temp.Header2 = ($filecontent[$i+1]).Split(":")[1]
$temp.Header3 = ($filecontent[$i+2]).Split(":")[1]
$temp.Header4 = ($filecontent[$i+3]).Split(":")[1]
$temp.Header5 = ($filecontent[$i+4]).Split(":")[1]
$temp
}
I also changed how $deliverycontent collects data so that it isn't constantly rebuilding the array, which is what happens when you do $deliverycontent += $temp.
Once the offending new line/carriage return is removed as TheMadTechnician demonstrates, you have a few options from there.
If you're using powershell core you can make use of the ConvertFrom-StringData's -Delimiter parameter
$text = Get-Content $datafile -Raw
$text -replace '[\r\n]+(?!\w+ :)' -split '(?=node1)' | ForEach-Object {
[PSCustomObject]$($_ | ConvertFrom-StringData -Delimiter :)
}
If you care about the order of the properties, you can use an ordered hashtable.
$text -replace '[\r\n]+(?!\w+ :)' -split '(?=node1)' | ForEach-Object {
$ht = [ordered]#{}
$_ -split '\r?\n' | Foreach-Object {$ht += $_ | ConvertFrom-StringData -Delimiter :}
[PSCustomObject]$ht
}
Powershell 5.1 you'll need to replace the colon with equals as ConvertFrom-StringData doesn't have -Delimiter
$text = Get-Content $datafile -Raw
$text -replace '[\r\n]+(?!\w+ :)' -replace ':','=' -split '(?=node1)' | ForEach-Object {
[PSCustomObject]$($_ | ConvertFrom-StringData)
}
Again, to maintain property order.
$text -replace '[\r\n]+(?!\w+ :)' -replace ':','=' -split '(?=node1)' | ForEach-Object {
$ht = [ordered]#{}
$_ -split '\r?\n' | Foreach-Object {$ht += $_ | ConvertFrom-StringData}
[PSCustomObject]$ht
}

How to avoid double quote when using export-csv in Powershell [duplicate]

I am using ConvertTo-Csv to get comma separated output
get-process | convertto-csv -NoTypeInformation -Delimiter ","
It outputs like:
"__NounName","Name","Handles","VM","WS",".....
However I would like to get output without quotes, like
__NounName,Name,Handles,VM,WS....
Here is a way to remove the quotes
get-process | convertto-csv -NoTypeInformation -Delimiter "," | % {$_ -replace '"',''}
But it has a serious drawback if one of the item contains a " it will be removed !
Hmm, I have Powershell 7 preview 1 on my mac, and Export-Csv has a -UseQuotes option that you can set to AsNeeded. :)
I was working on a table today and thought about this very question as I was previewing the CSV file in notepad and decided to see what others had come up with. It seems many have over-complicated the solution.
Here's a real simple way to remove the quote marks from a CSV file generated by the Export-Csv cmdlet in PowerShell.
Create a TEST.csv file with the following data.
"ID","Name","State"
"5","Stephanie","Arizona"
"4","Melanie","Oregon"
"2","Katie","Texas"
"8","Steve","Idaho"
"9","Dolly","Tennessee"
Save As: TEST.csv
Store file contents in a $Test variable
$Test = Get-Content .\TEST.csv
Load $Test variable to see results of the get-content cmdlet
$Test
Load $Test variable again and replace all ( "," ) with a comma, then trim start and end by removing each quote mark
$Test.Replace('","',",").TrimStart('"').TrimEnd('"')
Save/Replace TEST.csv file
$Test.Replace('","',",").TrimStart('"').TrimEnd('"') | Out-File .\TEST.csv -Force -Confirm:$false
Test new file Output with Import-Csv and Get-Content:
Import-Csv .\TEST.csv
Get-Content .\TEST.csv
To Sum it all up, the work can be done with 2 lines of code
$Test = Get-Content .\TEST.csv
$Test.Replace('","',",").TrimStart('"').TrimEnd('"') | Out-File .\TEST.csv -Force -Confirm:$false
I ran into this issue, found this question, but was not satisfied with the answers because they all seem to suffer if the data you are using contains a delimiter, which should remain quoted. Getting rid of the unneeded double quotes is a good thing.
The solution below appears to solve this issue for a general case, and for all variants that would cause issues.
I found this answer elsewhere, Removing quotes from CSV created by PowerShell, and have used it to code up an example answer for the SO community.
Attribution: Credit for the regex, goes 100% to Russ Loski.
Code in a Function, Remove-DoubleQuotesFromCsv
function Remove-DoubleQuotesFromCsv
{
param (
[Parameter(Mandatory=$true)]
[string]
$InputFile,
[string]
$OutputFile
)
if (-not $OutputFile)
{
$OutputFile = $InputFile
}
$inputCsv = Import-Csv $InputFile
$quotedData = $inputCsv | ConvertTo-Csv -NoTypeInformation
$outputCsv = $quotedData | % {$_ -replace `
'\G(?<start>^|,)(("(?<output>[^,"]*?)"(?=,|$))|(?<output>".*?(?<!")("")*?"(?=,|$)))' `
,'${start}${output}'}
$outputCsv | Out-File $OutputFile -Encoding utf8 -Force
}
Test Code
$csvData = #"
id,string,notes,number
1,hello world.,classic,123
2,"a comma, is in here","test data 1",345
3,",a comma, is in here","test data 2",346
4,"a comma, is in here,","test data 3",347
5,"a comma, is in here,","test data 4`r`nwith a newline",347
6,hello world2.,classic,123
"#
$data = $csvData | ConvertFrom-Csv
"`r`n---- data ---"
$data
$quotedData = $data | ConvertTo-Csv -NoTypeInformation
"`r`n---- quotedData ---"
$quotedData
# this regular expression comes from:
# http://www.sqlmovers.com/removing-quotes-from-csv-created-by-powershell/
$fixedData = $quotedData | % {$_ -replace `
'\G(?<start>^|,)(("(?<output>[^,"\n]*?)"(?=,|$))|(?<output>".*?(?<!")("")*?"(?=,|$)))' `
,'${start}${output}'}
"`r`n---- fixedData ---"
$fixedData
$fixedData | Out-File e:\test.csv -Encoding ascii -Force
"`r`n---- e:\test.csv ---"
Get-Content e:\test.csv
Test Output
---- data ---
id string notes number
-- ------ ----- ------
1 hello world. classic 123
2 a comma, is in here test data 1 345
3 ,a comma, is in here test data 2 346
4 a comma, is in here, test data 3 347
5 a comma, is in here, test data 4... 347
6 hello world2. classic 123
---- quotedData ---
"id","string","notes","number"
"1","hello world.","classic","123"
"2","a comma, is in here","test data 1","345"
"3",",a comma, is in here","test data 2","346"
"4","a comma, is in here,","test data 3","347"
"5","a comma, is in here,","test data 4
with a newline","347"
"6","hello world2.","classic","123"
---- fixedData ---
id,string,notes,number
1,hello world.,classic,123
2,"a comma, is in here",test data 1,345
3,",a comma, is in here",test data 2,346
4,"a comma, is in here,",test data 3,347
5,"a comma, is in here,","test data 4
with a newline","347"
6,hello world2.,classic,123
---- e:\test.csv ---
id,string,notes,number
1,hello world.,classic,123
2,"a comma, is in here",test data 1,345
3,",a comma, is in here",test data 2,346
4,"a comma, is in here,",test data 3,347
5,"a comma, is in here,","test data 4
with a newline","347"
6,hello world2.,classic,123
This is pretty similar to the accepted answer but it helps to prevent unwanted removal of "real" quotes.
$delimiter = ','
Get-Process | ConvertTo-Csv -Delimiter $delimiter -NoTypeInformation | foreach { $_ -replace '^"','' -replace "`"$delimiter`"",$delimiter -replace '"$','' }
This will do the following:
Remove quotes that begin a line
Remove quotes that end a line
Replace quotes that wrap a delimiter with the delimiter alone.
Therefore, the only way this would go wrong is if one of the values actually contained not only quotes, but specifically a quote-delimiter-quote sequence, which hopefully should be pretty uncommon.
Once the file is generated, you can run
set-content FILENAME.csv ((get-content FILENAME.csv) -replace '"')
Depending on how pathological (or "full-featured") your CSV data is, one of the posted solutions will already work.
The solution posted by Kory Gill is almost perfect - the only issue remaining is that quotes are removed also for cells containing the line separator \r\n, which is causing issues in many tools.
The solution is adding a newline to the character class expression:
$fixedData = $quotedData | % {$_ -replace `
'\G(?<start>^|,)(("(?<output>[^,"\n]*?)"(?=,|$))|(?<output>".*?(?<!")("")*?"(?=,|$)))' `
,'${start}${output}'}
I wrote this for my needs:
function ConvertTo-Delimited {
[CmdletBinding()]
param(
[Parameter(ValueFromPipeline=$true,Mandatory=$true)]
[psobject[]]$InputObject,
[string]$Delimiter='|',
[switch]$ExcludeHeader
)
Begin {
if ( $ExcludeHeader -eq $false ) {
#(
$InputObject[0].PsObject.Properties | `
Select-Object -ExpandProperty Name
) -Join $Delimiter
}
}
Process {
foreach ($item in $InputObject) {
#(
$item.PsObject.Properties | `
Select-Object Value | `
ForEach-Object {
if ( $null -ne $_.Value ) {$_.Value.ToString()}
else {''}
}
) -Join $Delimiter
}
}
End {}
}
Usage:
$Data = #(
[PSCustomObject]#{
A = $null
B = Get-Date
C = $null
}
[PSCustomObject]#{
A = 1
B = Get-Date
C = 'Lorem'
}
[PSCustomObject]#{
A = 2
B = Get-Date
C = 'Ipsum'
}
[PSCustomObject]#{
A = 3
B = $null
C = 'Lorem Ipsum'
}
)
# with headers
PS> ConvertTo-Delimited $Data
A|B|C
1|7/17/19 9:07:23 PM|Lorem
2|7/17/19 9:07:23 PM|Ipsum
||
# without headers
PS> ConvertTo-Delimited $Data -ExcludeHeader
1|7/17/19 9:08:19 PM|Lorem
2|7/17/19 9:08:19 PM|Ipsum
||
Here's another approach:
Get-Process | ConvertTo-Csv -NoTypeInformation -Delimiter "," |
foreach { $_ -replace '^"|"$|"(?=,)|(?<=,)"','' }
This replaces matches with the empty string, in each line. Breaking down the regex above:
| is like an OR, used to unite the following 4 sub-regexes
^" matches quotes in the beginning of the line
"$ matches quotes in the end of the line
"(?=,) matches quotes that are immediately followed by a comma
(?<=,)" matches quotes that are immediately preceded by a comma
I found that Kory's answer didn't work for the case where the original string included more than one blank field in a row. I.e. "ABC",,"0" was fine but "ABC",,,"0" wasn't handled properly. It stopped replacing quotes after the ",,,". I fixed it by adding "|(?<output>)" near the end of the first parameter, like this:
% {$_ -replace `
'\G(?<start>^|,)(("(?<output>[^,"]*?)"(?=,|$))|(?<output>".*?(?<!")("")*?"(?=,|$))|(?<output>))', `
'${start}${output}'}
I haven't spent much time looking for removing the quotes. But, here is a workaround.
get-process | Export-Csv -NoTypeInformation -Verbose -Path $env:temp\test.csv
$csv = Import-Csv -Path $env:temp\test.csv
This is a quick workaround and there may be a better way to do this.
A slightly modified variant of JPBlanc's answer:
I had an existing csv file which looked like this:
001,002,003
004,005,006
I wanted to export only the first and third column to a new csv file. And for sure I didn't want any quotes ;-)
It can be done like this:
Import-Csv -Path .\source.csv -Delimiter ',' -Header A,B,C | select A,C | ConvertTo-Csv -NoTypeInformation -Delimiter ',' | % {$_ -replace '"',''} | Out-File -Encoding utf8 .\target.csv
Couldn't find an answer to a similar question so I'm posting what I've found here...
For exporting as Pipe Delimited with No Quotes for string qualifiers, use the following:
$objtable | convertto-csv -Delimiter "|" -notypeinformation | select -Skip $headers | % { $_ -replace '"\|"', "|"} | % { $_ -replace '""', '"'} | % { $_ -replace "^`"",''} | % { $_ -replace "`"$",''} | out-file "$OutputPath$filename" -fo -en ascii
This was the only thing I could come up with that could handle quotes and commas within the text; especially things like a quote and comma next to each other at the beginning or ending of a text field.
This function takes a powershell csv object from the pipeline and outputs like convertto-csv but without adding quotes (unless needed).
function convertto-unquotedcsv {
param([Parameter(ValueFromPipeline=$true)]$csv, $delimiter=',', [switch]$noheader=$false)
begin {
$NeedQuotesRex = "($([regex]::escape($delimiter))|[\n\r\t])"
if ($noheader) { $names = #($true) } else { $names = #($false) }
}
process {
$psop = $_.psobject.properties
if (-not $names) {
$names = $psop.name | % {if ($_ -match $NeedQuotesRex) {'"' + $_ + '"'} else {$_}}
$names -join $delimiter # unquoted csv header
}
$values = $psop.value | % {if ($_ -match $NeedQuotesRex) {'"' + $_ + '"'} else {$_}}
$values -join $delimiter # unquoted csv line
}
end {
}
}
$names gets an array of noteproperty names and $values gets an array of notepropery values. It took that special step to output the header. The process block gets the csv object one piece at a time.
Here is a test run
$delimiter = ','; $csvData = #"
id,string,notes,"points per 1,000",number
4,"a delimiter$delimiter is in here,","test data 3",1,348
5,"a comma, is in here,","test data 4`r`nwith a newline",0.5,347
6,hello world2.,classic,"3,000",123
"#
$csvdata | convertfrom-csv | sort number | convertto-unquotedcsv -delimiter $delimiter
id,string,notes,"points per 1,000",number
6,hello world2.,classic,"3,000",123
5,"a comma, is in here,","test data 4
with a newline",0.5,347
4,"a delimiter, is in here,",test data 3,1,348

I need help formatting output with PowerShell's Out-File cmdlet

I have a series of documents that are going through the following function designed to count word occurrences in each document. This function works fine outputting to the console, but now I want to generate a text file containting the information, but with the file name appended to each word in the list.
My current console output is:
"processing document1 with x unique words occuring as follows"
"word1 12"
"word2 8"
"word3 3"
"word4 4"
"word5 1"
I want a delimited file in this format:
document1;word1;12
document1;word2;8
document1;word3;3
document1;word4;4
document1;word1;1
document2;word1;16
document2;word2;11
document2;word3;9
document2;word4;9
document2;word1;13
While the function below gets me the lists of words and occurences, I'm having a hard time figuring out where or how to insert the filename variable so that it prints at the head of each line. MSDN has been less-than helpful, and most of the places I try to insert the variable result in errors (see below)
function Count-Words ($docs) {
$document = get-content $docs
$document = [string]::join(" ", $document)
$words = $document.split(" `t",[stringsplitoptions]::RemoveEmptyEntries)
$uniq = $words | sort -uniq
$words | % {$wordhash=#{}} {$wordhash[$_] += 1}
Write-Host $docs "contains" $wordhash.psbase.keys.count "unique words distributed as follows."
$frequency = $wordhash.psbase.keys | sort {$wordhash[$_]}
-1..-25 | %{ $frequency[$_]+" "+$wordhash[$frequency[$_]]} | Out-File c:\out-file-test.txt -append
$grouped = $words | group | sort count
Do I need to create a string to pass to the out-file cmdlet? is this just something I've been putting in the wrong place on the last few tries? I'd like to understand WHY it's going in a particular place as well. Right now I'm just guessing, because I know I have no idea where to put the out-file to achieve my selected results.
I've tried formatting my command per powershell help, using -$docs and -FilePath, but each time I add anything to the out-file above that runs successfully, I get the following error:
Out-File : Cannot validate argument on parameter 'Encoding'. The argument "c:\out-file-test.txt" does not bel
ong to the set "unicode,utf7,utf8,utf32,ascii,bigendianunicode,default,oem" specified by the ValidateSet attribute. Sup
ply an argument that is in the set and then try the command again.
At C:\c.ps1:39 char:71
+ -1..-25 | %{ $frequency[$_]+" "+$wordhash[$frequency[$_]]} | Out-File <<<< -$docs -width 1024 c:\users\x46332\co
unt-test.txt -append
+ CategoryInfo : InvalidData: (:) [Out-File], ParameterBindingValidationException
+ FullyQualifiedErrorId : ParameterArgumentValidationError,Microsoft.PowerShell.Commands.OutFileCommand
I rewrote most of your code. You should utilize objects to make it easier formatting the way you want. This one splits on "space" and groups words together. Try this:
Function Count-Words ($paths) {
$output = #()
foreach ($path in $paths) {
$file = Get-ChildItem $path
((Get-Content $file) -join " ").Split(" ", [System.StringSplitOptions]::RemoveEmptyEntries) | Group-Object | Select-Object -Property #{n="FileName";e={$file.BaseName}}, Name, Count | % {
$output += "$($_.FileName);$($_.Name);$($_.Count)"
}
}
$output | Out-File test-out2.txt -Append
}
$filepaths = ".\test.txt", ".\test2.txt"
Count-Words -paths $filepaths
It outputs like you asked(document;word;count). If you want documentname to include extension, change $file.BaseName to $file.Name . Testoutput:
test;11;1
test;9;2
test;13;1
test2;word11;5
test2;word1;4
test2;12;1
test2;word2;2
Slightly different approach:
function Get-WordCounts ($doc)
{
$text_ = [IO.File]::ReadAllText($doc.fullname)
$WordHash = #{}
$text_ -split '\b' -match '\w+'|
foreach {$WordHash[$_]++}
$WordHash.GetEnumerator() |
foreach {
New-Object PSObject -Property #{
Word = $_.Key
Count = $_.Value
}
}
}
$docs = gci c:\testfiles\*.txt |
sort name
&{
foreach ($doc in dir $docs)
{
Get-WordCounts $doc |
sort Count -Descending |
foreach {
(&{$doc.Name;$_.Word;$_.Count}) -join ';'
}
}
} | out-file c:\somedir\wordcounts.txt
Try this:
$docs = #("document1", "document2", ...)
$docs | % {
$doc = $_
Get-Content $doc `
| % { $_.split(" `t",[stringsplitoptions]::RemoveEmptyEntries) } `
| Group-Object `
| select #{n="Document";e={$doc}}, Name, Count
} | Export-CSV output.csv -Delimiter ";" -NoTypeInfo
If you want to make this into a function you could do it like this:
function Count-Words($docs) {
foreach ($doc in $docs) {
Get-Content $doc `
| % { $_.split(" `t",[stringsplitoptions]::RemoveEmptyEntries) } `
| Group-Object `
| select #{n="Document";e={$doc}}, Name, Count
}
}
$files = #("document1", "document2", ...)
Count-Words $files | Export-CSV output.csv -Delimiter ";" -NoTypeInfo