I'm trying to count the number of words per PDF file in a source folder and export the name and wordcount to a csv. But my output csv seems to count the number of PDFs (123) although the content of my object seems right.
Snippet
$source = 'C:\Data\SCRIPTS\R\TextMining\PDFs'
$results= #{}
Get-ChildItem -Path $source -Filter *.pdf -Recurse | ForEach-Object{
$count = Get-Content $_.FullName | Measure-Object -Word
$results.Add($_.FullName, $count.Words)}
$results
Export-Csv C:\Data\SCRIPTS\R\TextMining\PageClustering\PDFs\PGs\PGs_WC.csv -InputObject $results -notypeinformation
I can display the filename and wordcount to the console but the pipe to csv comes out with errors.
Output
IsReadOnly IsFixedSize IsSynchronized Keys Values SyncRoot Count
FALSE FALSE FALSE System.Collections.Hashtable+KeyCollection System.Collections.Hashtable+ValueCollection System.Object 123
I'm learning to use PS - what am I doing wrong?
Please try following:
$source = 'C:\Data\SCRIPTS\R\TextMining\PDFs'
$results= #()
Get-ChildItem -Path $source -Filter *.pdf -Recurse | ForEach-Object{
$count = Get-Content $_.FullName | Measure-Object -Word
$results += New-Object PSObject -Property #{
'Name' = $_.FullName
'Wert' = $count.Words
}
}
$results
$results | Export-Csv C:\Data\SCRIPTS\R\TextMining\PageClustering\PDFs\PGs\PGs_WC.csv -notype
Since $Results is a hashtable, you'll want to export the elements inside it, rather than the hashtable itself. In order to do so, you'll need to pipe the Values array to Export-Csv:
$results.Values |Export-Csv C:\Data\SCRIPTS\R\TextMining\PageClustering\PDFs\PGs\PGs_WC.csv -NoTypeInformation
Related
I'm trying (badly) to work through combining CSV files into one file and prepending a column that contains the file name. I'm new to PowerShell, so hopefully someone can help here.
I tried initially to do the well documented approach of using Import-Csv / Export-Csv, but I don't see any options to add columns.
Get-ChildItem -Filter *.csv | Select-Object -ExpandProperty FullName | Import-Csv | Export-Csv CombinedFile.txt -UseQuotes Never -NoTypeInformation -Append
Next I'm trying to loop through the files and append the name, which kind of works, but for some reason this stops after the first row is generated. Since it's not a CSV process, I have to use the switch to skip the first title row of each file.
$getFirstLine = $true
Get-ChildItem -Filter *.csv | Where-Object {$_.Name -NotMatch "Combined.csv"} | foreach {
$filePath = $_
$collection = Get-Content $filePath
foreach($lines in $collection) {
$lines = ($_.Basename + ";" + $lines)
}
$linesToWrite = switch($getFirstLine) {
$true {$lines}
$false {$lines | Select -Skip 1}
}
$getFirstLine = $false
Add-Content "Combined.csv" $linesToWrite
}
This is where the -PipelineVariable parameter comes in real handy. You can set a variable to represent the current iteration in the pipeline, so you can do things like this:
Get-ChildItem -Filter *.csv -PipelineVariable File | Where-Object {$_.Name -NotMatch "Combined.csv"} | ForEach-Object { Import-Csv $File.FullName } | Select *,#{l='OriginalFile';e={$File.Name}} | Export-Csv Combined.csv -Notypeinfo
Merging your CSVs into one and adding a column for the file's name can be done as follows, using a calculated property on Select-Object:
Get-ChildItem -Filter *.csv | ForEach-Object {
$fileName = $_.Name
Import-Csv $_.FullName | Select-Object #{
Name = 'FileName'
Expression = { $fileName }
}, *
} | Export-Csv path/to/merged.csv -NoTypeInformation
I use powershell to automate extracting of selected data from a CSV file.
My $target_servers also contains two the same server name but it has different data in each rows.
Here is my code:
$target_servers = Get-Content -Path D:\Users\Tools\windows\target_prd_servers.txt
foreach($server in $target_servers) {
Import-Csv $path\Serverlist_Template.csv | Where-Object {$_.Hostname -Like $server} | Export-Csv -Path $path/windows_prd.csv -Append -NoTypeInformation
}
After executing the above code it extracts CSV data based on a TXT file, but my problem is some of the results are duplicated.
I am expecting around 28 results but it gave me around 49.
As commented, -Append is the culprit here and you should check if the newly added records are not already present in the output file:
# read the Hostname column of the target csv file as array to avoid duplicates
$existingHostsNames = #((Import-Csv -Path "$path/windows_prd.csv").Hostname)
$target_servers = Get-Content -Path D:\Users\Tools\windows\target_prd_servers.txt
foreach($server in $target_servers) {
Import-Csv "$path\Serverlist_Template.csv" |
Where-Object {($_.Hostname -eq $server) -and ($existingHostsNames -notcontains $_.HostName)} |
Export-Csv -Path "$path/windows_prd.csv" -Append -NoTypeInformation
}
You can convert your data to array of objects and then use select -Unique, like this:
$target_servers = Get-Content -Path D:\Users\Tools\windows\target_prd_servers.txt
$data = #()
foreach($server in $target_servers) {
$data += Import-Csv $path\Serverlist_Template.csv| Where-Object {$_.Hostname -Like $server}
}
$data | select -Unique | Export-Csv -Path $path/windows_prd.csv -Append -NoTypeInformation
It will work only if duplicated rows have same value in every column. If not, you can pass column names to select which are important for you. For ex.:
$data | select Hostname -Unique | Export-Csv -Path $path/windows_prd.csv -Append -NoTypeInformation
It will give you list of unique hostnames.
I'm a bit new to PowerShell. I have a working script returning -Line, -Character and -Word to a csv file. I can't figure out how to add the full name of the file into the csv.
get-childitem -recurse -Path C:\Temp\*.* | foreach-object { $name = $_.FullName; get-content $name | Measure-Object -Line -Character -Word} | Export-Csv -Path C:\Temp\FileAttributes.csv
I've tried using Write-Host and Select-Object, but I'm not sure about the syntax.
I've been using the following as a reference.
Results
This is what I'm after
Use Select-Object with a calculated property:
Get-Childitem -recurse -Path C:\Temp\*.* | ForEach-Object {
$fullName = $_.FullName
Get-Content $fullName | Measure-Object -Line -Character -Word |
Select-Object #{ Name = 'FullName'; Expression={ $fullName } }, *
} | Export-Csv -Path C:\Temp\FileAttributes.csv
Note:
Pass -ExcludeProperty Property to Select-Object to omit the empty Property column.
Pass -NoTypeInformation to Export-Csv to suppress the virtually useless first line (the type annotation) in the CSV.
So I'm trying to process CSV files, then giving the output new name. I can do it with one file by explicitly specifying the file name. But is there a way / wildcard I can use to make the script to process multiple files at the same time? Let's just say I want to process anything with .csv as an extension. Here's my script that's used to process a specific file
$objs =#();
$output = Import-csv -Path D:\TEP\FilesProcessing\Test\file1.csv | ForEach {
$Object = New-Object PSObject -Property #{
Time = $_.READ_DTTM
Value = $_.{VALUE(KWH)}
Tag = [String]::Concat($_.SUBSTATION,'_',$_.CIRCUITNAME,'_',$_.PHASE,'_',$_.METERID,'_KWH')
}
$objs += $Object;
}
$objs
$objs | Export-CSv -NoTypeInformation D:\TEP\FilesProcessing\Test\file1_out.csv
You can combine Get-ChildItem and Import-Csv.
Here's an example that specifies different input and output directories to avoid name collisions:
$inputPath = "D:\TEP\FilesProcessing\Test"
$outputPath = "D:\TEP\FilesProcessing\Output"
Get-ChildItem (Join-Path $inputPath "*.csv") | ForEach-Object {
$outputFilename = Join-Path $outputPath $_.Name
Import-Csv $_.FullName | ForEach-Object {
New-Object PSObject -Property #{
"Time" = $_.READ_DTTM
"Value" = $_.{VALUE(KWH)}
"Tag" = "{0}_{1}_{2}_{3}_KWH" -f $_.SUBSTATION,$_.CIRCUITNAME,$_.PHASE,$_.METERID
}
} | Export-Csv $outputFilename -NoTypeInformation
}
Note that there's no need for creating an array and repeatedly appending it. Just output the custom objects you want and export afterwards.
Use the Get-Childitem and cut out all the unnecessary intermediate variables so that you code it in a more Powershell type way. Something like this:
Get-CHhilditems 'D:\TEP\FilesProcessing\Test\*.csv' | % {
Import-csv $_.FullName | % {
New-Object PSObject -Property #{
Time = $_.READ_DTTM
Value = $_.{VALUE(KWH)}
Tag = '{0}_{1}_{2}_{3}_KWH' -f $_.SUBSTATION, $_.CIRCUITNAME, $_.PHASE, $_.METERID
}
} | Export-CSv ($_.FullName -replace '\.csv', '_out.csv') -NoTypeInformation
}
The Get-ChildItem is very useful for situations like this.
You can add wildcards directly into the path:
Get-ChildItem -Path D:\TEP\FilesProcessing\Test\*.csv
You can recurse a path and use the provider to filter files:
Get-ChildItem -Path D:\TEP\FilesProcessing\Test\ -recurse -include *.csv
This should get you what you need.
$Props = #{
Time = [datetime]::Parse($_.READ_DTTM)
Value = $_.{VALUE(KWH)}
Tag = $_.SUBSTATION,$_.CIRCUITNAME,$_.PHASE,$_.METERID,'KWH' -join "_"
}
$data = Get-ChildItem -Path D:\TEP\FilesProcessing\Test\*.csv | Foreach-Object {Import-CSV -Path $_.FullName}
$data | Select-Object -Property $Props | Export-CSv -NoTypeInformation D:\TEP\FilesProcessing\Test\file1_out.csv
Also when using Powershell avoid doing these things:
$objs =#();
$objs += $Object;
I need to take a slew of csv files from a directory and get them into an array in Powershell (to eventually manipulate and write back to a CSV).
The problem is there are 5 file types. I need around 8 columns from each. The columns are essentially the same, but have different headings.
Is there an easy way to do this? I started creating a custom object with my 8 fields, looping through the files importing each one, looking at the filename (which tells me the column names I need) and then a bunch of ifs to add it to my custom object array.
I was wondering if there is a simpler way...like with a template saying which columns from each file.
wound up doing this. It may have not been the most efficient, but works. I wound up writing out each file separately and combining at the end as PS really got bogged down (over a million rows combined).
$Newcsv = #()
$path = "c:\scrap\BWFILES\"
$files = gci -path $path -recurse -filter *.csv | Where-Object { ! ($_.psiscontainer) }
$counter=1
foreach($file in $files)
{
$csv = Import-Csv $file.FullName
if ($file.Name -like '*SAV*')
{
$Newcsv = $csv | Select-Object #{Name="PRODUCT";Expression={"SV"}},DMBRCH,DMACCT,DMSHRT
}
if ($file.Name -like '*TIME*')
{
$Newcsv = $csv | Select-Object #{Name="PRODUCT";Expression={"TM"}},TMBRCH,TMACCT,TMSHRT
}
if ($file.Name -like '*TRAN*')
{
$Newcsv = $csv | Select-Object #{Name="PRODUCT";Expression={"TR"}},DMBRCH,DMACCT,DMSHRT
}
if ($file.Name -like '*LN*')
{
$Newcsv = $csv | Select-Object #{Name="PRODUCT";Expression={"LN"}},LNBRCH,LNNOTE,LNSHRT
}
$Newcsv | Export-Csv "C:\scrap\$file.name$counter.csv" -force -notypeinformation
$counter++
}
get-childItem "c:\scrap\*.csv" | foreach {
$filePath = $_
$lines = $lines = Get-Content $filePath
$linesToWrite = switch($getFirstLine) {
$true {$lines}
$false {$lines | Select -Skip 1}
}
$getFirstLine = $false
Add-Content "c:\scrap\combined.csv" $linesToWrite
}
With a hashtable for reference, a little RegEx matching, and using the automatic variable $Matches in a ForEach-Object loop (alias % used) that could all be shortened to:
$path = "c:\scrap\BWFILES\"
$Reference = #{
'SAV' = 'SV'
'TIME' = 'TM'
'TRAN' = 'TR'
'LN'='LN'
}
Set-Content -Value "PRODUCT,BRCH,ACCT,SHRT" -Path 'c:\scrap\combined.csv'
gci -path $path -recurse -filter *.csv | Where-Object { !($_.psiscontainer) -and $_.Name -match ".*(SAV|TIME|TRAN|LN).*"}|%{
$Product = $Reference[($Matches[1])]
Import-CSV $_.FullName | Select-Object #{Name="PRODUCT";Expression={$Product}},*BRCH,#{l='Acct';e={$_.LNNOTE, $_.DMACCT, $_.TMACCT|?{$_}}},*SHRT | ConvertTo-Csv -NoTypeInformation | Select -Skip 1 | Add-Content 'c:\scrap\combined.csv'
}
That should produce the exact same file. Only kind of tricky part was the LNNOTE/TMACCT/DMACCT field since obviously you can't just do the same as like *SHRT.