PowerShell - transpose results from a hashtable - powershell

I need to check the warranty of many servers, but the output returned by the module I found in https://www.powershellgallery.com/packages/HPWarranty/2.6.2 seems to be a hashtable and the first column contains what I want to be my rows.
the script below will return this where every 5 rows the fields repeat - output1.csv:
TYPE System.Management.Automation.PSCustomObject
"Component","Codecount"
"SerialNumber","CZ36092P5H"
"ProductNumber","727021-B21"
"OverallEntitlementStartDate","2016-03-04"
"OverallEntitlementEndDate","2019-04-02"
"ActiveEntitlement","true"
"SerialNumber","CZ36092P5K"
"ProductNumber","727021-B21"
"OverallEntitlementStartDate","2016-03-04"
"OverallEntitlementEndDate","2019-04-02"
"ActiveEntitlement","true"
How can I transpose the output so that SerialNumber, ProductNumber, OverallEntitlementStartDate, OverallEntitlementEndDate and ActiveEntitlement are the columns?
# variables
$dest_path = "C:\Scripts\HPE\HPWarranty"
$export_date = Get-Date -Format o | ForEach-Object {$_ -replace ":", "-"}
$myScriptName = $MyInvocation.MyCommand.Name
$transcriptPath = $dest_path + "\" + $myScriptName + "_transcript_" + $export_date + ".txt"
$csvPath = $dest_path + "\" + "hpe_list1.csv"
#Start transcript of script activities and set transcript location
start-transcript -append -path $transcriptPath | Out-Null
# import serials & part numbers to be processed
$csv_info = Import-Csv $csvPath
foreach ($line in $csv_info) {
$hash = (Get-HPEntWarrantyEntitlement -ProductNumber $line.ProductNumber -SerialNumber $line.SerialNumber)
&{$hash.getenumerator() |
ForEach-Object {new-object psobject -Property #{Component = $_.name;Codecount=$_.value}}
} | Export-Csv "C:\Scripts\HPE\HPWarranty\output1.csv" -Append
}
# Stop Transcript
Stop-Transcript | Out-Null
hpe_list1.csv that the script processes contains the details for two servers:
ProductNumber,SerialNumber
727021-B21,CZ36092P5H
727021-B21,CZ36092P5K

Cast the output hashtable to a [pscustomobject]:
$WarrantyInfo = foreach ($line in $csv_info) {
[pscustomobject](Get-HPEntWarrantyEntitlement -ProductNumber $line.ProductNumber -SerialNumber $line.SerialNumber)
}
$WarrantyInfo | Export-Csv "C:\Scripts\HPE\HPWarranty\output1.csv"

Related

Problem error checking -match and IF statement

The below script will import an exported CSV from our MIS system so that we can upload to google classroom. In tern, this will allow bulk creation of classes with our custom classnames based on word matching within the regex.csv.
As you can see from the exporteddata.csv, 10552 is blank.
Is it possible for this to be omitted from the final export and added into its own errors.csv file?
Any help would be great!
Script.ps1
$data = Import-Csv "$PSScriptRoot\data.csv" -Delimiter ','
$patterns = Import-Csv "$PSScriptRoot\Regex\regex.csv" -Delimiter ','
$interimexportedData = "$PSScriptRoot\classesinterim.csv"
$exportclasses = "$PSScriptRoot\exporteddata.csv"
## Imports the initial SIMS export of classes and created a 'prefered' name for the class, then exports to a CSV.
$data | Select-Object *,#{Name='preference'; Expression={
foreach ($p in $patterns) {
if ($_.title -match $p.'regex_key') {
$p.preference + " " + "-" + " " + $_.title
return
}
}
}
} | Select-Object -property sourcedID, preference | Export-Csv $interimexportedData -NoTypeInformation
## The below re-imports the csv file and renames the header
Import-Csv $interimexportedData |
Select-Object -property sourcedID, #{ expression={$_.preference}; label='title' } |
Export-Csv -NoTypeInformation $exportclasses
## Delete the classesinterim.csv from the folder
Remove-Item $interimexportedData
data.csv
"sourcedId","title"
9443,"10A/BS1"
9444,"10A/FR1"
10598,"10A/Ft"
9445,"10A/GG1"
9446,"10A/HI1"
9447,"10A/ME1"
9451,"10A/ME2"
9448,"10A/RM1"
9449,"10A/SCTrX"
9452,"10A/SCTrY"
10552,"10A/SOS"
9450,"10A/SP1"
exporteddata.csv
"sourcedId","title"
"9443","Business Studies - 10A/BS1"
"9444","French - 10A/FR1"
"10598","Form Time - 10A/Ft"
"9445","Geography - 10A/GG1"
"9446","History - 10A/HI1"
"9447","Media Studies - 10A/ME1"
"9451","Media Studies - 10A/ME2"
"9448","Resistant Materials - 10A/RM1"
"9449","Science - 10A/SCTrX"
"9452","Science - 10A/SCTrY"
"10552",""
regex.csv
"regex_key","preference"
BS,"Business Studies"
FR, "French"
Ar,"Art"
Bt,"Eng & Maths Booster"
Bs,"Business"
Cn,"Construction"
Co,"Computing"
Use Where-Object to filter out objects with a blank value:
$data | Select-Object *,#{Name='preference'; Expression={
foreach ($p in $patterns) {
if ($_.title -match $p.'regex_key') {
$p.preference + " " + "-" + " " + $_.title
return
}
}
} |Where-Object preference -ne ''

How to export two variables into same CSV as joined via PowerShell?

I have a PowerShell script employing poshwsus module like below:
$FileOutput = "C:\WSUSReport\WSUSReport.csv"
$ProcessLog = "C:\WSUSReport\QueryLog2.txt"
$WSUSServers = "C:\WSUSReport\Computers.txt"
$WSUSPort = "8530"
import-module poshwsus
ForEach ($Server in Get-Content $WSUSServers)
{
& connect-poshwsusserver $Server -port $WSUSPort | out-file $ProcessLog -append
$r1 = & Get-PoshWSUSClient | select #{name="Computer";expression={$_.FullDomainName}},#{name="LastUpdated";expression={if ([datetime]$_.LastReportedStatusTime -gt [datetime]"1/1/0001 12:00:00 AM") {$_.LastReportedStatusTime} else {$_.LastSyncTime}}}
$r2 = & Get-PoshWSUSUpdateSummaryPerClient -UpdateScope (new-poshwsusupdatescope) -ComputerScope (new-poshwsuscomputerscope) | Select Computer,NeededCount,DownloadedCount,NotApplicableCount,NotInstalledCount,InstalledCount,FailedCount
}
What I need to do is to export CSV outpout including the results with the columns (like "inner join"):
Computer, NeededCount, DownloadedCount, NotApplicableCount, NotINstalledCount, InstalledCount, FailedCount, LastUpdated
I have tried to use the line below in foreach, but it didn't work as I expected.
$r1 + $r2 | export-csv -NoTypeInformation -append $FileOutput
I appreciate if you may help or advise.
EDIT --> The output I've got:
ComputerName LastUpdate
X A
Y B
X
Y
So no error, first two rows from $r2, last two rows from $r1, it is not joining the tables as I expected.
Thanks!
I've found my guidance in this post: Inner Join in PowerShell (without SQL)
Modified my query accordingly like below, works like a charm.
$FileOutput = "C:\WSUSReport\WSUSReport.csv"
$ProcessLog = "C:\WSUSReport\QueryLog.txt"
$WSUSServers = "C:\WSUSReport\Computers.txt"
$WSUSPort = "8530"
import-module poshwsus
function Join-Records($tab1, $tab2){
$prop1 = $tab1 | select -First 1 | % {$_.PSObject.Properties.Name} #properties from t1
$prop2 = $tab2 | select -First 1 | % {$_.PSObject.Properties.Name} #properties from t2
$join = $prop1 | ? {$prop2 -Contains $_}
$unique1 = $prop1 | ?{ $join -notcontains $_}
$unique2 = $prop2 | ?{ $join -notcontains $_}
if ($join) {
$tab1 | % {
$t1 = $_
$tab2 | % {
$t2 = $_
foreach ($prop in $join) {
if (!$t1.$prop.Equals($t2.$prop)) { return; }
}
$result = #{}
$join | % { $result.Add($_,$t1.$_) }
$unique1 | % { $result.Add($_,$t1.$_) }
$unique2 | % { $result.Add($_,$t2.$_) }
[PSCustomObject]$result
}
}
}
}
ForEach ($Server in Get-Content $WSUSServers)
{
& connect-poshwsusserver $Server -port $WSUSPort | out-file $ProcessLog -append
$r1 = & Get-PoshWSUSClient | select #{name="Computer";expression={$_.FullDomainName}},#{name="LastUpdated";expression={if ([datetime]$_.LastReportedStatusTime -gt [datetime]"1/1/0001 12:00:00 AM") {$_.LastReportedStatusTime} else {$_.LastSyncTime}}}
$r2 = & Get-PoshWSUSUpdateSummaryPerClient -UpdateScope (new-poshwsusupdatescope) -ComputerScope (new-poshwsuscomputerscope) | Select Computer,NeededCount,DownloadedCount,NotApplicableCount,NotInstalledCount,InstalledCount,FailedCount
Join-Records $r1 $r2 | Select Computer,NeededCount,DownloadedCount,NotApplicableCount,NotInstalledCount,InstalledCount,FailedCount, LastUpdated | export-csv -NoTypeInformation -append $FileOutput
}
I think this could be made simpler. Since Select-Object's -Property parameter accepts an array of values, you can create an array of the properties you want to display. The array can be constructed by comparing your two objects' properties and outputting a unique list of those properties.
$selectProperties = $r1.psobject.properties.name | Compare-Object $r2.psobject.properties.name -IncludeEqual -PassThru
$r1,$r2 | Select-Object -Property $selectProperties
Compare-Object by default will output only differences between a reference object and a difference object. Adding the -IncludeEqual switch displays different and equal comparisons. Adding the -PassThru parameter outputs the actual objects that are compared rather than the default PSCustomObject output.

Retrieve data from last line in vmware.log file?

I currently have a script that retrieves the last modified date of the .vmx in a VM's datastore in vCenter. I need to make changes to instead use and display the last date in the vmware.log file (located in the same datastore as the .vmx)
I'm not sure how to grab that line and convert it to a XX/XX/XXXX format. In the log file, it shows it as Dec 23 10 for example. If this is not possible, no worries. I just need to pull the last line in the log file and export it to a .csv file. Below is my current code:
add-pssnapin VMware.VimAutomation.Core
# ---------- Only modify the fields in this area -------------
$vCenter = 'qlab-copsmgr' #name of the vCenter
$dataCenter = 'Fly-away Kit' #name of the DataCenter
$outputFile = $vCenter + '-LastDateUsed.csv' #desired output file name
# ---------- No modification is needed in the below code. Do not edit -------------
$columnName = "Name,DataStore,Date Last Used" | Out-File .\$OutputFile -Encoding ascii
Connect-VIServer $vCenter -WarningAction SilentlyContinue
$vmList = Get-VM | where { $_.PowerState -eq “PoweredOff”} | select Name
$vmList = $vmList -replace 'Name : ', '' -replace '#{Name=', '' -replace '}', ''
ForEach ($VM in $vmList)
{
# Get configuration and path to vmx file
$VMconfig = Get-VM $VM | Get-View | select config
$VMXpath = $VMconfig.config.files.VMpathName
# Remove and/or replace unwanted strings
$VMXpath = $VMXpath -replace '\[','' -replace '\] ','\' -replace '#{Filename=','/' -replace '}','' -replace '/','\'
# List the vmx file in the datastore
$VMXinfo = ls vmstores:\$VCenter#443\$DataCenter\$VMXpath | Where {$_.LastWriteTime} | select -first 1 | select FolderPath, LastWriteTime
# Remove and/or replace unwanted strings
$VMXinfo = $VMXinfo -replace 'DatastoreFullPath=', '' -replace '#{', '' -replace '}', '' -replace ';', ',' -replace 'LastWriteTime=', ''
# Output vmx information to .csv file
$output = $VM + ', ' + $VMXinfo
$output
echo $output >> $OutputFile
}
I also needed to pull the last event from the vmware.log file in order to backtrack the power off time for VMs where there is no vCenter event history. I looked at file timestamps but found that some VM processes and possibly backup solutions can make them useless.
I tried reading the file in place but ran into issues with the PSDrive type not supporting Get-Content in place. So for better or worse for my solution I started with one of LucD's scripts - the 'Retrieve the logs' script from http://www.lucd.info/2011/02/27/virtual-machine-logging/ which pulls a VMs vmware.log file and copies it to local storage. I then modified it to copy the vmware.log file to a local temp folder, read the last line from the file before deleting the file and return the last line of the log as a PS object.
Note, this is slow and I'm sure my hacks to LucD's script are not elegant, but it does work and I hope if helps someone.
Note: This converts the time value from the log to a PS date object by simple piping the string timestamp from the file into Get-Date. I've read that this does not work as expected for non-US date formatting. For those outside of the US you might want to look into this or just pass the raw timestamp string from the log instead of converting it.
#Examples:
#$lastEventTime = (Get-VM -Name "SomeVM" | Get-VMLogLastEvent).EventTime
#$lastEventTime = Get-VMLogLastEvent -VM "SomeVM" -Path "C:\alternatetemp\"
function Get-VMLogLastEvent{
param(
[parameter(Mandatory=$true,ValueFromPipeline=$true)][PSObject[]]$VM,
[string]$Path=$env:TEMP
)
process{
$report = #()
foreach($obj in $VM){
if($obj.GetType().Name -eq "string"){
$obj = Get-VM -Name $obj
}
$logpath = ($obj.ExtensionData.LayoutEx.File | ?{$_.Name -like "*/vmware.log"}).Name
$dsName = $logPath.Split(']')[0].Trim('[')
$vmPath = $logPath.Split(']')[1].Trim(' ')
$ds = Get-Datastore -Name $dsName
$drvName = "MyDS" + (Get-Random)
$localLog = $Path + "\" + $obj.Name + ".vmware.log"
New-PSDrive -Location $ds -Name $drvName -PSProvider VimDatastore -Root '\' | Out-Null
Copy-DatastoreItem -Item ($drvName + ":" + $vmPath) -Destination $localLog -Force:$true
Remove-PSDrive -Name $drvName -Confirm:$false
$lastEvent = Get-Content -Path $localLog -Tail 1
Remove-Item -Path $localLog -Confirm:$false
$row = "" | Select VM, EventType, Event, EventTime
$row.VM = $obj.Name
($row.EventTime, $row.EventType, $row.Event) = $lastEvent.Split("|")
$row.EventTime = $row.EventTime | Get-Date
$report += $row
}
$report
}
}
That should cover your request, but to expound further on why I needed the detail, which reading between the lines may also benefit you, I'll continue.
I inherited hundreds of legacy VMs that have been powered off from various past acquisitions and divestitures and many of which have been moved between vCenter instances losing all event log detail. When I started my cleanup effort in just one datacenter I had over 60TB of powered off VMs. With the legacy nature of these there was also no detail available on who owned or had any knowledge of these old VMs.
For this I hacked another script I found, also from LucD here: https://communities.vmware.com/thread/540397.
This will take in all the powered off VMs, attempt to determine the time powered off via vCenter event history. I modified it to fall back to the above Get-VMLogLastEvent function to get the final poweroff time of the VM if event log detail is not available.
Error catching could be improved - this will error on VMs where for one reason or another there is no vmware.log file. But quick and dirty I've found this to work and provides the detail on what I need for over 90%.
Again this relies on the above function and for me at least the errors just fail through passing through null values. One could probably remove the errors by adding a check for vmware.log existance before attempting to copy it though this would add a touch more latency in execution due to the slow PSDrive interface to datastores.
$Report = #()
$VMs = Get-VM | Where {$_.PowerState -eq "PoweredOff"}
$Datastores = Get-Datastore | Select Name, Id
$PowerOffEvents = Get-VIEvent -Entity $VMs -MaxSamples ([int]::MaxValue) | where {$_ -is [VMware.Vim.VmPoweredOffEvent]} | Group-Object -Property {$_.Vm.Name}
foreach ($VM in $VMs) {
$lastPO = ($PowerOffEvents | Where { $_.Group[0].Vm.Vm -eq $VM.Id }).Group | Sort-Object -Property CreatedTime -Descending | Select -First 1
$lastLogTime = "";
# If no event log detail, revert to vmware.log last entry which takes more time...
if (($lastPO.PoweredOffTime -eq "") -or ($lastPO.PoweredOffTime -eq $null)){
$lastLogTime = (Get-VMLogLastEvent -VM $VM).EventTime
}
$row = "" | select VMName,Powerstate,OS,Host,Cluster,Datastore,NumCPU,MemMb,DiskGb,PoweredOffTime,PoweredOffBy,LastLogTime
$row.VMName = $vm.Name
$row.Powerstate = $vm.Powerstate
$row.OS = $vm.Guest.OSFullName
$row.Host = $vm.VMHost.name
$row.Cluster = $vm.VMHost.Parent.Name
$row.Datastore = $Datastores | Where{$_.Id -eq ($vm.DatastoreIdList | select -First 1)} | Select -ExpandProperty Name
$row.NumCPU = $vm.NumCPU
$row.MemMb = $vm.MemoryMB
$row.DiskGb = Get-HardDisk -VM $vm | Measure-Object -Property CapacityGB -Sum | select -ExpandProperty Sum
$row.PoweredOffTime = $lastPO.CreatedTime
$row.PoweredOffBy = $lastPO.UserName
$row.LastLogTime = $lastLogTime
$report += $row
}
# Output to screen
$report | Sort Cluster, Host, VMName | Select VMName, Cluster, Host, NumCPU, MemMb, #{N='DiskGb';E={[math]::Round($_.DiskGb,2)}}, PoweredOffTime, PoweredOffBy | ft -a
# Output to CSV - change path/filename as appropriate
$report | Sort Cluster, Host, VMName | Export-Csv -Path "output\Powered_Off_VMs_Report.csv" -NoTypeInformation -UseCulture
Cheers!
I pray this pays back some of the karma I've used.
Meyeaard
I have made a script that checks line by line and if string is found changes it to desired format
#example input you can use get-content PATH to txt or any file and assign it to $lines variable
$lines = #"
ernfoewnfnsf
ernfoewnfnsf
Dec 23 10 sgdsgdfgsdadasd
"# -split "\r\n"
#checks line by line and if find anything that maches start of the line, one Big letter two small, space, two digits, space, two digits, space
$lines | ForEach-Object{
if ($_ -match "^[A-Z][a-z]{2}\s\d{2}\s\d{2}\s")
{
$match = [convert]::ToDateTime($matches[0])
$_ -replace $matches[0], "$($match.ToShortDateString()) " | out-file { PATH } -APPEND
}
else
{
$_ | out-file { PATH } -APPEND
}
}
just change {PATH} with a filenamePAth and this should work for you

Remove New Line Character from CSV file's string column

I have a CSV File with a string column were that column spans to multiple lines. I want to aggregate those multiple lines into one line.
For example
1, "asdsdsdsds", "John"
2, "dfdhifdkinf
dfjdfgkdnjgknkdjgndkng
dkfdkjfnjdnf", "Roy"
3, "dfjfdkgjfgn", "Rahul"
I want my output to be
1, "asdsdsdsds", "John"
2, "dfdhifdkinf dfjdfgkdnjgknkdjgndkng dkfdkjfnjdnf", "Roy"
3, "dfjfdkgjfgn", "Rahul"
I want to achieve this output using PowerShell
Thanks.
Building on Ansgar's answer, here's how to do it when:
You don't know the column names
Your CSV file may contain CR or LF independently
(Import-Csv $csvInput) | % {
$line = $_
foreach ($prop in $line.PSObject.Properties) {
$line.($prop.Name) = ($prop.Value -replace '[\r\n]',' ')
}
$line
} | Export-Csv $csvOutput -NoTypeInformation
Try this:
$csv = 'C:\path\to\your.csv'
(Import-Csv $csv -Header 'ID','Value','Name') | % {
$_.Value = $_.Value -replace "`r`n",' '
$_
} | Export-Csv $csv -NoTypeInformation
If your CSV contains headers, remove -Header 'ID','Value','Name' from the import and replace Value with the actual column name.
If you don't want double quotes around the fields, you can remove them by replacing Export-Csv with something like this:
... | ConvertTo-Csv -NoTypeInformation | % { $_ -replace '"' } | Out-File $csv
To remove the header from the output you add another filter before Out-File to skip the first line:
... | select -Skip 1 | Out-File $csv
You can import the csv, do a specialized select, and write the result into a new CSV.
import-csv Before.csv -Header "ID","Change" | Select ID,#{Name="NoNewLines", Expression={$_.Change -replace "`n"," "}} | export-csv After.csv
The key part is in the select statement, which allows you to pass a specialized hash table (Name is the name of the property, Expression is a scriptblock that computes it).
You may need to fiddle with headers a bit to get the exact output you want.
The problems with Export-CSV are twofold:
Early versions (powershell1 & 2) do not allow you to append data to the CSV
If the data being piped to it contains newline characters, the data is useless in Excel
The solution to both of the above is to use Convertto-CSV instead. Here is a sample:
{bunch of stuff} | ConvertTo-CSV | %{$_ -replace "`n","<NL>"} | %{$_ -replace "`r","<CR>"} >>$AppendFile
Note that this allows you to do whatever editing on the data (in this case, replacing newline data), and using redirecrors to append.
FYI: I've created a CSV Cleaner: https://stackoverflow.com/a/32016543/361842
This can be used to replace any unwanted characters / should be straight-forward to adapt to your needs.
Code copied below; though I recommend referring to the above thread to see any feedback from others.
clear-host
[Reflection.Assembly]::LoadWithPartialName("System.IO") | out-null
[Reflection.Assembly]::LoadWithPartialName("Microsoft.VisualBasic") | out-null
function Clean-CsvStream {
[CmdletBinding()]
param (
[Parameter(Mandatory = $true, ValueFromPipeline=$true)]
[string]$CsvRow
,
[Parameter(Mandatory = $false)]
[char]$Delimiter = ','
,
[Parameter(Mandatory = $false)]
[regex]$InvalidCharRegex
,
[Parameter(Mandatory = $false)]
[string]$ReplacementString
)
begin {
[bool]$IsSimple = [string]::IsNullOrEmpty($InvalidCharRegex)
if(-not $IsSimple) {
[System.IO.MemoryStream]$memStream = New-Object System.IO.MemoryStream
[System.IO.StreamWriter]$writeStream = New-Object System.IO.StreamWriter($memStream)
[Microsoft.VisualBasic.FileIO.TextFieldParser]$Parser = new-object Microsoft.VisualBasic.FileIO.TextFieldParser($memStream)
$Parser.SetDelimiters($Delimiter)
$Parser.HasFieldsEnclosedInQuotes = $true
[long]$seekStart = 0
}
}
process {
if ($IsSimple) {
$CsvRow
} else { #if we're not replacing anything, keep it simple
$seekStart = $memStream.Seek($seekStart, [System.IO.SeekOrigin]::Current)
$writeStream.WriteLine($CsvRow)
$writeStream.Flush()
$seekStart = $memStream.Seek($seekStart, [System.IO.SeekOrigin]::Begin)
write-output (($Parser.ReadFields() | %{$_ -replace $InvalidCharRegex,$ReplacementString }) -join $Delimiter)
}
}
end {
if(-not $IsSimple) {
try {$Parser.Close(); $Parser.Dispose()} catch{}
try {$writeStream.Close(); $writeStream.Dispose()} catch{}
try {$memStream.Close(); $memStream.Dispose()} catch{}
}
}
}
$csv = #(
(new-object -TypeName PSCustomObject -Property #{A="this is regular text";B="nothing to see here";C="all should be good"})
,(new-object -TypeName PSCustomObject -Property #{A="this is regular text2";B="what the`nLine break!";C="all should be good2"})
,(new-object -TypeName PSCustomObject -Property #{A="this is regular text3";B="ooh`r`nwindows line break!";C="all should be good3"})
,(new-object -TypeName PSCustomObject -Property #{A="this is regular text4";B="I've got;a semi";C="all should be good4"})
,(new-object -TypeName PSCustomObject -Property #{A="this is regular text5";B="""You're Joking!"" said the Developer`r`n""No honestly; it's all about the secret VB library"" responded the Google search result";C="all should be good5"})
) | convertto-csv -Delimiter ';' -NoTypeInformation
$csv | Clean-CsvStream -Delimiter ';' -InvalidCharRegex "[`r`n;]" -ReplacementString ':'

I need help formatting output with PowerShell's Out-File cmdlet

I have a series of documents that are going through the following function designed to count word occurrences in each document. This function works fine outputting to the console, but now I want to generate a text file containting the information, but with the file name appended to each word in the list.
My current console output is:
"processing document1 with x unique words occuring as follows"
"word1 12"
"word2 8"
"word3 3"
"word4 4"
"word5 1"
I want a delimited file in this format:
document1;word1;12
document1;word2;8
document1;word3;3
document1;word4;4
document1;word1;1
document2;word1;16
document2;word2;11
document2;word3;9
document2;word4;9
document2;word1;13
While the function below gets me the lists of words and occurences, I'm having a hard time figuring out where or how to insert the filename variable so that it prints at the head of each line. MSDN has been less-than helpful, and most of the places I try to insert the variable result in errors (see below)
function Count-Words ($docs) {
$document = get-content $docs
$document = [string]::join(" ", $document)
$words = $document.split(" `t",[stringsplitoptions]::RemoveEmptyEntries)
$uniq = $words | sort -uniq
$words | % {$wordhash=#{}} {$wordhash[$_] += 1}
Write-Host $docs "contains" $wordhash.psbase.keys.count "unique words distributed as follows."
$frequency = $wordhash.psbase.keys | sort {$wordhash[$_]}
-1..-25 | %{ $frequency[$_]+" "+$wordhash[$frequency[$_]]} | Out-File c:\out-file-test.txt -append
$grouped = $words | group | sort count
Do I need to create a string to pass to the out-file cmdlet? is this just something I've been putting in the wrong place on the last few tries? I'd like to understand WHY it's going in a particular place as well. Right now I'm just guessing, because I know I have no idea where to put the out-file to achieve my selected results.
I've tried formatting my command per powershell help, using -$docs and -FilePath, but each time I add anything to the out-file above that runs successfully, I get the following error:
Out-File : Cannot validate argument on parameter 'Encoding'. The argument "c:\out-file-test.txt" does not bel
ong to the set "unicode,utf7,utf8,utf32,ascii,bigendianunicode,default,oem" specified by the ValidateSet attribute. Sup
ply an argument that is in the set and then try the command again.
At C:\c.ps1:39 char:71
+ -1..-25 | %{ $frequency[$_]+" "+$wordhash[$frequency[$_]]} | Out-File <<<< -$docs -width 1024 c:\users\x46332\co
unt-test.txt -append
+ CategoryInfo : InvalidData: (:) [Out-File], ParameterBindingValidationException
+ FullyQualifiedErrorId : ParameterArgumentValidationError,Microsoft.PowerShell.Commands.OutFileCommand
I rewrote most of your code. You should utilize objects to make it easier formatting the way you want. This one splits on "space" and groups words together. Try this:
Function Count-Words ($paths) {
$output = #()
foreach ($path in $paths) {
$file = Get-ChildItem $path
((Get-Content $file) -join " ").Split(" ", [System.StringSplitOptions]::RemoveEmptyEntries) | Group-Object | Select-Object -Property #{n="FileName";e={$file.BaseName}}, Name, Count | % {
$output += "$($_.FileName);$($_.Name);$($_.Count)"
}
}
$output | Out-File test-out2.txt -Append
}
$filepaths = ".\test.txt", ".\test2.txt"
Count-Words -paths $filepaths
It outputs like you asked(document;word;count). If you want documentname to include extension, change $file.BaseName to $file.Name . Testoutput:
test;11;1
test;9;2
test;13;1
test2;word11;5
test2;word1;4
test2;12;1
test2;word2;2
Slightly different approach:
function Get-WordCounts ($doc)
{
$text_ = [IO.File]::ReadAllText($doc.fullname)
$WordHash = #{}
$text_ -split '\b' -match '\w+'|
foreach {$WordHash[$_]++}
$WordHash.GetEnumerator() |
foreach {
New-Object PSObject -Property #{
Word = $_.Key
Count = $_.Value
}
}
}
$docs = gci c:\testfiles\*.txt |
sort name
&{
foreach ($doc in dir $docs)
{
Get-WordCounts $doc |
sort Count -Descending |
foreach {
(&{$doc.Name;$_.Word;$_.Count}) -join ';'
}
}
} | out-file c:\somedir\wordcounts.txt
Try this:
$docs = #("document1", "document2", ...)
$docs | % {
$doc = $_
Get-Content $doc `
| % { $_.split(" `t",[stringsplitoptions]::RemoveEmptyEntries) } `
| Group-Object `
| select #{n="Document";e={$doc}}, Name, Count
} | Export-CSV output.csv -Delimiter ";" -NoTypeInfo
If you want to make this into a function you could do it like this:
function Count-Words($docs) {
foreach ($doc in $docs) {
Get-Content $doc `
| % { $_.split(" `t",[stringsplitoptions]::RemoveEmptyEntries) } `
| Group-Object `
| select #{n="Document";e={$doc}}, Name, Count
}
}
$files = #("document1", "document2", ...)
Count-Words $files | Export-CSV output.csv -Delimiter ";" -NoTypeInfo