How to convert text file containing double quotes to csv format using powershell - powershell

I have a text file(with header Actual_Output and saved it as actual.txt) containing data such as
Actual_Output
W
à
é
"
'
(
_
ç
²"
^
^
*
END
I want to convert it into csv file using powershell. I doing in this way
$DB = import-csv E:\actual.txt
$outarray = #()
foreach ($Data in $DB)
{
$First = $Data.Actual_Output
$outarray += New-Object PsObject -property #{
'Actual_Output' = $First
}
write-host "Actual_Output: " $First
write-Host ""
}
$outarray | export-csv 'E:\result.csv' -NoTypeInformation -Encoding utf8
I am getting the output like this as shown in screenshot
I want each data to be listed in seperate cell. Actually double quote " is creating problem here. Please help in resolving this. Sorry if i am unclear in describing the issue

Tested this, and it seems to work better:
Get-Content actual.txt | select -Skip 1 |
foreach {
New-Object PSObject -Property #{Actual_Output = $_}
} | export-csv result.csv -NoTypeInformation -Encoding UTF8
The file isn't well-formed as CSV initially, so Import-CSV isn't able to parse it correctly.

Related

Exporting a variable with multipule values to CSV

I'm using PowerShell to try and get a specific field from a CSV, store it as a variable, and output it as another csv. This is mainly because I want to use it as part of a larger script, but I'm having problems...
Import-Csv C:\EmailsListNoBlanks.csv | ForEach-Object{
$Email = $_.Member -split ';'
}
$Email | Out-File C:\EmailListCOMP.csv
However in my CSV I'm only ever getting the last 4 values, whereas I'm expecting a few hundred...
Is there something I'm missing here?
Thanks
Matt
I tried something similar to what you are attempting to do and this is what I came up with to have each item on it's own line in the output:
Import-Csv C:\EmailsListNoBlanks.csv | ForEach-Object {
$Email += ($_.Member -split ';') + ("`n")
}
$Email | Out-File C:\EmailListCOMP.csv
The output is not in a csv format, but just a regular text file. The `n adds a newline to the text so that the output is one entry per line.
Try this:
Add-Type -AssemblyName System.Collections
$Email = [System.Collections.Generic.List[string]]::new()
Import-Csv C:\EmailsListNoBlanks.csv | ForEach-Object{
[void]$Email.Add( $_.Member -split ';' )
}
$Email | Out-File C:\EmailListCOMP.csv

Powershell script to match string between 2 files and merge

I have 2 files that contain strings, each string in both files is delimited by a colon. Both files share a common string and I want to be able to merge both files (based on the common string) into 1 new file.
Examples:
File1.txt
tom:mioihsdihfsdkjhfsdkjf
dick:khsdkjfhlkjdhfsdfdklj
harry:lkjsdlfkjlksdjfsdlkjs
File2.txt
mioihsdihfsdkjhfsdkjf:test1
lkjsdlfkjlksdjfsdlkjs:test2
khsdkjfhlkjdhfsdfdklj:test3
File3.txt (results should look like this)
tom:mioihsdihfsdkjhfsdkjf:test1
dick:khsdkjfhlkjdhfsdfdklj:test3
harry:lkjsdlfkjlksdjfsdlkjs:test2
$File1 = #"
tom:mioihsdihfsdkjhfsdkjf
dick:khsdkjfhlkjdhfsdfdklj
harry:lkjsdlfkjlksdjfsdlkjs
"#
$File2 = #"
mioihsdihfsdkjhfsdkjf:test1
lkjsdlfkjlksdjfsdlkjs:test2
khsdkjfhlkjdhfsdfdklj:test3
"#
# You are probably going to want to use Import-Csv here
# I am using ConvertFrom-Csv as I have "inlined" the contents of the files in the variables above
$file1_contents = ConvertFrom-Csv -InputObject $File1 -Delimiter ":" -Header name, code # specifying a header as there isn't one provided
$file2_contents = ConvertFrom-Csv -InputObject $File2 -Delimiter ":" -Header code, test
# There are almost certainly better ways to do this... but this does work so... meh.
$results = #()
# Loop over one file finding the matches in the other file
foreach ($row in $file1_contents) {
$matched_row = $file2_contents | Where-Object code -eq $row.code
if ($matched_row) {
# Create a hashtable with the values you want from source and matched rows
$result = #{
name = $row.name
code = $row.code
test = $matched_row.test
}
# Append the matched up row to the final result set
$results += New-Object PSObject -Property $result
}
}
# Convert back to CSV format, with a _specific_ column ordering
# Although you'll probably want to use Export-Csv instead
$results |
Select-Object name, code, test |
ConvertTo-Csv -Delimiter ":"

Export custom object to CSV

I'm quite new to powershell and struggling with outputting data to a CSV file.
I have a larger code piece but created the below small working example that contains the issue:
$results = #()
$tmp_avs = #('tmp', 'tmp2')
$hostname = 'hostname'
$results += New-Object -TypeName PSObject -Property (#{Hostname=$hostname; avs=$tmp_avs})
$res = $results | ? {$_.avs.Count -gt 0} | Format-Table
$res | Export-Csv -NoTypeInformation "test.csv"
When printing the $res object above in PowerShell I get the output:
avs Hostname
--- --------
{tmp, tmp2} hostname
That is also the output I would like to receive in the CSV file, but currently I get something like this:
"ClassId2e4f51ef21dd47e99d3c952918aff9cd","pageHeaderEntry","pageFooterEntry","autosizeInfo","shapeInfo","groupingEntry"
"033ecb2bc07a4d43b5ef94ed5a35d280",,,,"Microsoft.PowerShell.Commands.Internal.Format.TableHeaderInfo",
"9e210fe47d09416682b841769c78b8a3",,,,,
"27c87ef9bbda4f709f6b4002fa4af63c",,,,,
"4ec4f0187cb04f4cb6973460dfe252df",,,,,
"cf522b78d86c486691226b40aa69e95c",,,,,
Is there a possibility to export the $res object in a proper CSV format?
EDIT:
I removed the Format-Table now, which results in the following in the CSV format:
"avs","Hostname"
"System.Object[]","hostname"
There is System.Object[] written instead of the values?
The values are an array. If you run $tmp_avs.ToString(), you will also get System.Object[]
To resolve, replace avs=$tmp_avs with avs=$($tmp_avs -join " ") where is the joining character between elements of your array. It converts the array to a string.
Code:
$results = #()
$tmp_avs = #('tmp', 'tmp2')
$hostname = 'hostname'
$results = New-Object -TypeName PSObject -Property (#{Hostname=$hostname; avs=$($tmp_avs -join " ")})
$res = $results | ? {$_.avs.Count -gt 0}
$res | Export-Csv -NoTypeInformation "test.csv"
Output:
avs,Hostname
tmp tmp2, hostname
If you do not want to see System.object[], but a comma separated list, you could convert your array to a string, so this can be outputted correctly in CSV.
Try adding $tmp_avs = $tmp_avs -join ";" to your script below the line $tmp_avs = #('tmp', 'tmp2').

Remove New Line Character from CSV file's string column

I have a CSV File with a string column were that column spans to multiple lines. I want to aggregate those multiple lines into one line.
For example
1, "asdsdsdsds", "John"
2, "dfdhifdkinf
dfjdfgkdnjgknkdjgndkng
dkfdkjfnjdnf", "Roy"
3, "dfjfdkgjfgn", "Rahul"
I want my output to be
1, "asdsdsdsds", "John"
2, "dfdhifdkinf dfjdfgkdnjgknkdjgndkng dkfdkjfnjdnf", "Roy"
3, "dfjfdkgjfgn", "Rahul"
I want to achieve this output using PowerShell
Thanks.
Building on Ansgar's answer, here's how to do it when:
You don't know the column names
Your CSV file may contain CR or LF independently
(Import-Csv $csvInput) | % {
$line = $_
foreach ($prop in $line.PSObject.Properties) {
$line.($prop.Name) = ($prop.Value -replace '[\r\n]',' ')
}
$line
} | Export-Csv $csvOutput -NoTypeInformation
Try this:
$csv = 'C:\path\to\your.csv'
(Import-Csv $csv -Header 'ID','Value','Name') | % {
$_.Value = $_.Value -replace "`r`n",' '
$_
} | Export-Csv $csv -NoTypeInformation
If your CSV contains headers, remove -Header 'ID','Value','Name' from the import and replace Value with the actual column name.
If you don't want double quotes around the fields, you can remove them by replacing Export-Csv with something like this:
... | ConvertTo-Csv -NoTypeInformation | % { $_ -replace '"' } | Out-File $csv
To remove the header from the output you add another filter before Out-File to skip the first line:
... | select -Skip 1 | Out-File $csv
You can import the csv, do a specialized select, and write the result into a new CSV.
import-csv Before.csv -Header "ID","Change" | Select ID,#{Name="NoNewLines", Expression={$_.Change -replace "`n"," "}} | export-csv After.csv
The key part is in the select statement, which allows you to pass a specialized hash table (Name is the name of the property, Expression is a scriptblock that computes it).
You may need to fiddle with headers a bit to get the exact output you want.
The problems with Export-CSV are twofold:
Early versions (powershell1 & 2) do not allow you to append data to the CSV
If the data being piped to it contains newline characters, the data is useless in Excel
The solution to both of the above is to use Convertto-CSV instead. Here is a sample:
{bunch of stuff} | ConvertTo-CSV | %{$_ -replace "`n","<NL>"} | %{$_ -replace "`r","<CR>"} >>$AppendFile
Note that this allows you to do whatever editing on the data (in this case, replacing newline data), and using redirecrors to append.
FYI: I've created a CSV Cleaner: https://stackoverflow.com/a/32016543/361842
This can be used to replace any unwanted characters / should be straight-forward to adapt to your needs.
Code copied below; though I recommend referring to the above thread to see any feedback from others.
clear-host
[Reflection.Assembly]::LoadWithPartialName("System.IO") | out-null
[Reflection.Assembly]::LoadWithPartialName("Microsoft.VisualBasic") | out-null
function Clean-CsvStream {
[CmdletBinding()]
param (
[Parameter(Mandatory = $true, ValueFromPipeline=$true)]
[string]$CsvRow
,
[Parameter(Mandatory = $false)]
[char]$Delimiter = ','
,
[Parameter(Mandatory = $false)]
[regex]$InvalidCharRegex
,
[Parameter(Mandatory = $false)]
[string]$ReplacementString
)
begin {
[bool]$IsSimple = [string]::IsNullOrEmpty($InvalidCharRegex)
if(-not $IsSimple) {
[System.IO.MemoryStream]$memStream = New-Object System.IO.MemoryStream
[System.IO.StreamWriter]$writeStream = New-Object System.IO.StreamWriter($memStream)
[Microsoft.VisualBasic.FileIO.TextFieldParser]$Parser = new-object Microsoft.VisualBasic.FileIO.TextFieldParser($memStream)
$Parser.SetDelimiters($Delimiter)
$Parser.HasFieldsEnclosedInQuotes = $true
[long]$seekStart = 0
}
}
process {
if ($IsSimple) {
$CsvRow
} else { #if we're not replacing anything, keep it simple
$seekStart = $memStream.Seek($seekStart, [System.IO.SeekOrigin]::Current)
$writeStream.WriteLine($CsvRow)
$writeStream.Flush()
$seekStart = $memStream.Seek($seekStart, [System.IO.SeekOrigin]::Begin)
write-output (($Parser.ReadFields() | %{$_ -replace $InvalidCharRegex,$ReplacementString }) -join $Delimiter)
}
}
end {
if(-not $IsSimple) {
try {$Parser.Close(); $Parser.Dispose()} catch{}
try {$writeStream.Close(); $writeStream.Dispose()} catch{}
try {$memStream.Close(); $memStream.Dispose()} catch{}
}
}
}
$csv = #(
(new-object -TypeName PSCustomObject -Property #{A="this is regular text";B="nothing to see here";C="all should be good"})
,(new-object -TypeName PSCustomObject -Property #{A="this is regular text2";B="what the`nLine break!";C="all should be good2"})
,(new-object -TypeName PSCustomObject -Property #{A="this is regular text3";B="ooh`r`nwindows line break!";C="all should be good3"})
,(new-object -TypeName PSCustomObject -Property #{A="this is regular text4";B="I've got;a semi";C="all should be good4"})
,(new-object -TypeName PSCustomObject -Property #{A="this is regular text5";B="""You're Joking!"" said the Developer`r`n""No honestly; it's all about the secret VB library"" responded the Google search result";C="all should be good5"})
) | convertto-csv -Delimiter ';' -NoTypeInformation
$csv | Clean-CsvStream -Delimiter ';' -InvalidCharRegex "[`r`n;]" -ReplacementString ':'

I need help formatting output with PowerShell's Out-File cmdlet

I have a series of documents that are going through the following function designed to count word occurrences in each document. This function works fine outputting to the console, but now I want to generate a text file containting the information, but with the file name appended to each word in the list.
My current console output is:
"processing document1 with x unique words occuring as follows"
"word1 12"
"word2 8"
"word3 3"
"word4 4"
"word5 1"
I want a delimited file in this format:
document1;word1;12
document1;word2;8
document1;word3;3
document1;word4;4
document1;word1;1
document2;word1;16
document2;word2;11
document2;word3;9
document2;word4;9
document2;word1;13
While the function below gets me the lists of words and occurences, I'm having a hard time figuring out where or how to insert the filename variable so that it prints at the head of each line. MSDN has been less-than helpful, and most of the places I try to insert the variable result in errors (see below)
function Count-Words ($docs) {
$document = get-content $docs
$document = [string]::join(" ", $document)
$words = $document.split(" `t",[stringsplitoptions]::RemoveEmptyEntries)
$uniq = $words | sort -uniq
$words | % {$wordhash=#{}} {$wordhash[$_] += 1}
Write-Host $docs "contains" $wordhash.psbase.keys.count "unique words distributed as follows."
$frequency = $wordhash.psbase.keys | sort {$wordhash[$_]}
-1..-25 | %{ $frequency[$_]+" "+$wordhash[$frequency[$_]]} | Out-File c:\out-file-test.txt -append
$grouped = $words | group | sort count
Do I need to create a string to pass to the out-file cmdlet? is this just something I've been putting in the wrong place on the last few tries? I'd like to understand WHY it's going in a particular place as well. Right now I'm just guessing, because I know I have no idea where to put the out-file to achieve my selected results.
I've tried formatting my command per powershell help, using -$docs and -FilePath, but each time I add anything to the out-file above that runs successfully, I get the following error:
Out-File : Cannot validate argument on parameter 'Encoding'. The argument "c:\out-file-test.txt" does not bel
ong to the set "unicode,utf7,utf8,utf32,ascii,bigendianunicode,default,oem" specified by the ValidateSet attribute. Sup
ply an argument that is in the set and then try the command again.
At C:\c.ps1:39 char:71
+ -1..-25 | %{ $frequency[$_]+" "+$wordhash[$frequency[$_]]} | Out-File <<<< -$docs -width 1024 c:\users\x46332\co
unt-test.txt -append
+ CategoryInfo : InvalidData: (:) [Out-File], ParameterBindingValidationException
+ FullyQualifiedErrorId : ParameterArgumentValidationError,Microsoft.PowerShell.Commands.OutFileCommand
I rewrote most of your code. You should utilize objects to make it easier formatting the way you want. This one splits on "space" and groups words together. Try this:
Function Count-Words ($paths) {
$output = #()
foreach ($path in $paths) {
$file = Get-ChildItem $path
((Get-Content $file) -join " ").Split(" ", [System.StringSplitOptions]::RemoveEmptyEntries) | Group-Object | Select-Object -Property #{n="FileName";e={$file.BaseName}}, Name, Count | % {
$output += "$($_.FileName);$($_.Name);$($_.Count)"
}
}
$output | Out-File test-out2.txt -Append
}
$filepaths = ".\test.txt", ".\test2.txt"
Count-Words -paths $filepaths
It outputs like you asked(document;word;count). If you want documentname to include extension, change $file.BaseName to $file.Name . Testoutput:
test;11;1
test;9;2
test;13;1
test2;word11;5
test2;word1;4
test2;12;1
test2;word2;2
Slightly different approach:
function Get-WordCounts ($doc)
{
$text_ = [IO.File]::ReadAllText($doc.fullname)
$WordHash = #{}
$text_ -split '\b' -match '\w+'|
foreach {$WordHash[$_]++}
$WordHash.GetEnumerator() |
foreach {
New-Object PSObject -Property #{
Word = $_.Key
Count = $_.Value
}
}
}
$docs = gci c:\testfiles\*.txt |
sort name
&{
foreach ($doc in dir $docs)
{
Get-WordCounts $doc |
sort Count -Descending |
foreach {
(&{$doc.Name;$_.Word;$_.Count}) -join ';'
}
}
} | out-file c:\somedir\wordcounts.txt
Try this:
$docs = #("document1", "document2", ...)
$docs | % {
$doc = $_
Get-Content $doc `
| % { $_.split(" `t",[stringsplitoptions]::RemoveEmptyEntries) } `
| Group-Object `
| select #{n="Document";e={$doc}}, Name, Count
} | Export-CSV output.csv -Delimiter ";" -NoTypeInfo
If you want to make this into a function you could do it like this:
function Count-Words($docs) {
foreach ($doc in $docs) {
Get-Content $doc `
| % { $_.split(" `t",[stringsplitoptions]::RemoveEmptyEntries) } `
| Group-Object `
| select #{n="Document";e={$doc}}, Name, Count
}
}
$files = #("document1", "document2", ...)
Count-Words $files | Export-CSV output.csv -Delimiter ";" -NoTypeInfo