Related
In my csv file I'm getting data in incorrect format for a few rows, sometimes a line is broken into two lines as shown in below table. For EmpId 2, line is broken into two lines. How can I find such records and merge them into one line in correct format to fix the issue for such records using PowerShell. Expected output is shown in below table.
Input File data:
EmpId,EmpName,EmpLocation
1,"Jack","Austin"
2,"Pet
er","NYC"
3,"Raj","Delhi"
Expected Output:
EmpId,EmpName,EmpLocation
1,"Jack","Austin"
2,"Peter","NYC"
3,"Raj","Delhi"
My instinct was to do something similar to Karthick's answer, however I first took a look at the output of Import-Csv. Surprisingly it puts the line break in the individual property where it was found like:
Import-Csv C:\temp\Broken.csv | fl
EmpId : 1
EmpName : Jack
EmpLocation : Austin
EmpId : 2
EmpName : Pet
er
EmpLocation : NYC
EmpId : 3
EmpName : Raj
EmpLocation : Delhi
Notice "peter" is broken across 2 lines.
So I saw some potential to bring the objects in and modify the underlying property values instead of trying to fix up the string data. I cooked up the below:
$CSVData = Import-Csv C:\temp\Broken.csv
$CSVData |
ForEach-Object{
ForEach( $Property in $_.PSObject.Properties.Name )
{
$_.($Property) = $_.($Property) -replace "(`r|`n)"
}
}
$CSVData
# If you want to re-export:
$CSVData | Export-Csv -Path c:\temp\Fixed.csv -NoTypeInformation
This code should work regardless of which field has the line break. Give it a shot and let me know. Thanks!
You can try the below. This worked for me. I assumed the first line is the header.
$filepath = "D:\file.csv"
[string[]]$data = Get-Content $filepath
$data_Final = New-Object System.Collections.ArrayList
for($i = $j = 0; $i -lt $data.Count; $(if($i -eq $j){$i++}else{$i=$j+1}), ($j=$i)) {
While ( ($data[$i] -split ",").Count -ne 3 ) {
$j = $j+1
# Concatenate the target line ($i) with successive line(s) ($j) until the elements Count to 3
$data[$i] = $data[$i] + $data[$j]
}
$data_Final.Add($data[$i]) | Out-Null
}
$inputData = $data_Final | ConvertFrom-Csv
# Or, if you want to fix the csv uncomment the below
# $data_Final | ConvertFrom-Csv | Export-Csv $filepath -NoTypeInformation
I have a CSV file with 2 columns and multiple rows. I have attached the contents of CSV file used for testing, with 2 columns and 30 rows having data from 2 person. But in real application the number of rows increases with the number of test made. So in practical application the dimension will be 2 columns and (15 times N) row where N is the number of person.
I need the output as N + 1 (header) rows and 15 columns (parameters). Can someone help me with a program to convert it using Powershell?
Gentle remainder that I took 2 readings for testing. In application the number of reading is not sure.
My input text file which is to be converted. The parameter and the corresponding values are separated by comma.
Name,Test
Age,18
Gender,Male
Time 1,379
Time 2,290
Time 3,305
Time 4,290
Time 5,319
Time 6,340
Time 7,436
Time 8,263
Time 9,290
Time 10,381
Responses,0
Average Reaction Time,329
Name,Test
Age,18
Gender,Male
Time 1,365
Time 2,340
Time 3,254
Time 4,270
Time 5,249
Time 6,350
Time 7,309
Time 8,527
Time 9,356
Time 10,407
Responses,1
Reaction Time,375
My code snippet for delimiting comma and transposing columns and rows
import-csv $file -delimiter "," | export-csv $outfile
(gc $outfile | select -Skip 1) | sc $outfile
$filedata = import-csv $outfile -Header Parameter , Value
$filedata | export-csv $outfile -NoTypeInformation
$Csv = import-csv $outfile
$Rows = #()
$Rows += $csv.Parameter -join ","
$Rows += $Csv.Value -join ","
Get-Process | Tee-Object -Variable ExportMe | Format-Table
$Rows | Set-Content $outfile
This is my current CSV file
Name,Age,Gender,Time 1,Time 2,Time 3,Time 4,Time 5,Time 6,Time 7,Time 8,Time 9,Time 10,Responses,Average Time,Name,Age,Gender,Time 1,Time 2,Time 3,Time 4,Time 5,Time 6,Time 7,Time 8,Time 9,Time 10,Responses,Average Time
Test,18,Male,379,290,305,290,319,340,436,263,290,381,0,329,Test,18,Male,365,340,254,270,249,350,309,527,356,407,1,375
I am expecting an output CSV like this
Name,Age,Gender,Time 1,Time 2,Time 3,Time 4,Time 5,Time 6,Time 7,Time 8,Time 9,Time 10,Responses,Average Time
Test,18,Male,379,290,305,290,319,340,436,263,290,381,0,329
Test,18,Male,365,340,254,270,249,350,309,527,356,407,1,375
I have also attached a snap of my actual and received output.
Thanks in Advance
I'd strongly suggest not trying to format the CSV by hand.
If you know that there are always exactly 15 rows of properties per person, you can do a nested loop to "chop up" your csv:
# import original csv
$rows = Import-Csv $file -Header Name,Value
# outer loop increments by 15 (span of one person) every time
$objects = for($i = 0;$i -lt $rows.Count;$i += 15){
# prepare an ordered dictionary to hold the properties
$props = [ordered]#{}
# generate an inner loop from the offset to offset+14
$i..($i+14)|%{
# copy each row to our dictionary
$props[$rows[$_].Name] = $rows[$_].Value
}
# cast our dictionary to an object
[pscustomobject]$props
}
# convert back to csv
$objects |Export-Csv $outfile -NoTypeInformation
Edit 1:
So I've figure out how to get the unique headers in CSV 2 to append to CSV 1.
$header = ($table | Get-Member -MemberType NoteProperty).Name
$header_add = ($table_add | Get-Member -MemberType NoteProperty).Name
$header_diff = $header + $header_add
$header_diff = ($header_diff | Sort-Object -Unique)
$header_diff = (Compare-Object -ReferenceObject $header -DifferenceObject $header_diff -PassThru)
$header is an array of headers from CSV 1 ($table). $header_add is an array of headers from CSV 2 ($table_add). $header_diff houses the unique headers in CSV 2 by the end of the code block.
So as far as I'm aware, my next step would be:
$append = ($table_add | Select-Object $header_diff)
My problem now is how do I append these objects to my CSV 1 ($table 1) object? I don't quite see a way for Add-Member to do this in a particularly nice fashion.
Original:
Here's the headers for the two CSV files I'm trying to combine.
CSV 1:
Date, Name, Assigned Router, City, Country, # of Calls , Calls in , Calls out
CSV 2:
Date, Name, Assigned Router, City, Country, # of Minutes, Minutes in, Minutes out
So a quick rundown of what these files are; both files contain call information for a set of names for one day (the date column has the same date for each row; this is because this eventually gets sent to a master .xlsx file with all dates combined). All of the columns up to Country contain the same values in the same order in both files. The files simply separate the # of calls and # of minutes data. I was wondering if there was a convenient way to move the unlike columns from one CSV to another.
I've tried using something along the lines of:
Import-Csv (Get-ChildItem <directory> -Include <common pattern in file pair>) | Export-Csv <output path> -NoTypeInformation
This didn't combine all of the matching headers and append the unique ones afterwards. Only the first file that's processed kept its unique headers. The second file that was processed had all of those headers and data discarded in the output. Shared header data in the second CSV was added as additional rows.
An example output of my described fail output:
PS > $small | Format-Table
Column_1 Column_2 Column_3
-------- -------- --------
1 a a
1 b b
1 c c
PS > $small_add | Format-Table
Column_1 Column_4 Column_5
-------- -------- --------
1 x x
1 y y
1 z z
PS > Import-Csv (Get-ChildItem ./*.* -Include "small*.csv") | Select-Object * -unique | Format-Table
Column_1 Column_2 Column_3
-------- -------- --------
1 a a
1 b b
1 c c
1
1
1
I was wondering if I could do something like the following algorithm:
Import-Csv CSV_1 and CSV_2 to separate variables
Compare CSV_2 headers to CSV_1 headers, storing the unlike headers in CSV_2 into a separate variable
Select-Object all CSV_1 headers and unlike CSV_2 headers
Pipe the Select-Object output to Export-Csv
The only other method I could only think of is doing it line by line where I would:
Import-Csv both
remove all of the shared columns from CSV_2
change it from the custom object Powershell uses for CSVs to a string
append each line of CSV_2 to each line of CSV_1
It feels a bit unrefined and inflexible (flexibility can probably be dealt with by how columns/headers are isolated so there's no problem appending strings).
* This answer focuses on a high-level-of-abstraction OO solution.
* The OP's own solution relies more on string processing, which has the potential to be faster.
# The input file paths.
$files = 'csv1.csv', 'csv2.csv'
$outFile = 'csvMerged.csv'
# Read the 2 CSV files into collections of custom objects.
# Note: This reads the entire files into memory.
$doc1 = Import-Csv $files[0]
$doc2 = Import-Csv $files[1]
# Determine the column (property) names that are unique to document 2.
$doc2OnlyColNames = (
Compare-Object $doc1[0].psobject.properties.name $doc2[0].psobject.properties.name |
Where-Object SideIndicator -eq '=>'
).InputObject
# Initialize an ordered hashtable that will be used to temporarily store
# each document 2 row's unique values as key-value pairs, so that they
# can be appended as properties to each document-1 row.
$htUniqueRowD2Props = [ordered] #{}
# Process the corresponding rows one by one, construct a merged output object
# for each, and export the merged objects to a new CSV file.
$i = 0
$(foreach($rowD1 in $doc1) {
# Get the corresponding row from document 2.
$rowD2 = $doc2[$i++]
# Extract the values from the unique document-2 columns and store them in the ordered
# hashtable.
foreach($pname in $doc2OnlyColNames) { $htUniqueRowD2Props.$pname = $rowD2.$pname }
# Add the properties represented by the hashtable entries to the
# document-1 row at hand and output the augmented object (-PassThru).
$rowD1 | Add-Member -NotePropertyMembers $htUniqueRowD2Props -PassThru
}) | Export-Csv -NoTypeInformation -Encoding Utf8 $outFile
To put the above to the test, you can use the following sample input:
# Create sample input CSV files
#'
Date,Name,Assigned Router,City,Country,# of Calls,Calls in,Calls out
dt,nm,ar,ct,cy,cc,ci,co
dt2,nm2,ar2,ct2,cy2,cc2,ci2,co2
'# > csv1.csv
# Same column layout and data as above through column 'Country', then different.
#'
Date,Name,Assigned Router,City,Country,# of Minutes,Minutes in,Minutes out
dt,nm,ar,ct,cy,mc,mi,mo
dt2,nm2,ar2,ct2,cy2,mc2,mi2,mo2
'# > csv2.csv
The code should produce the following content in csvMerged.csv:
"Date","Name","Assigned Router","City","Country","# of Calls","Calls in","Calls out","# of Minutes","Minutes in","Minutes out"
"dt","nm","ar","ct","cy","cc","ci","co","mc","mi","mo"
"dt2","nm2","ar2","ct2","cy2","cc2","ci2","co2","mc2","mi2","mo2"
Edit 1:
# Read 2 CSVs into PowerShell CSV object
$table = Import-Csv test.csv
$table_add = Import-Csv test_add.csv
# Isolate unique headers in second CSV
$unique_headers = (Compare-Object -ReferenceObject $table[0].PSObject.Properties.Name -DifferenceObject $table_add[0].PSObject.Properties.Name | Where-Object SideIndicator -eq "=>").InputObject
# Convert CSVs to strings, with second CSV only containing unique columns
$table_str = ($table | ConvertTo-Csv -NoTypeInformation)
$table_add_str = ($table_add | Select-Object $unique_headers | ConvertTo-Csv -NoTypeInformation)
# Append CSV 2's unique columns to CSV 1
# Set line counter
$line = 0
# Concatenate CSV 2 lines to the end of CSV 1 lines until one or both are out of lines
While (($table_str[$line] -ne $null) -and ($table_add_str[$line] -ne $null)) {
If ($line -eq 0) {
$table_sum_str = $table_str[$line] + "," + $table_add_str[$line]
}
If ($line -ne 0) {
$table_sum_str = $table_sum_str + "`n" + ($table_str[$line] + "," + $table_add_str[$line])
}
$line = $line + 1
}
$table_sum_str | Set-Content -Path $outpath -Encoding UTF8
Using Measure-Command, the above code on my machine for the most part takes anywhere between 14-17 milliseconds to run. Running Measure-Command on mklement's yields effectively the same times from just eyeballing it.
Note that for both solutions, the data in the 2 CSV files must be in the same order. If you want to add 2 CSVs together that have complimentary data but in different orders, you need to use mklement's object oriented approach and add mechanisms to match the data to a location or name.
Original:
For those who don't want to use a hash table to do this:
# Make sure you're in same directory as files:
# CSV 1
$table = Import-Csv test.csv
# CSV 2
$table_add = Import-Csv test_add.csv
# Get array with CSV 1 headers
$header = ($table | Get-Member -MemberType NoteProperty).Name
# Get array with CSV 2 headers
$header_add = ($table_add | Get-Member -MemberType NoteProperty).Name
# Add arrays of both headers together
$header_diff = $header + $header_add
# Sort the headers, remove duplicate headers (first couple ones), keep unique ones
$header_diff = ($header_diff | Sort-Object -Unique)
# Remove all of CSV 1's unique headers and shared headers
$header_diff = (Compare-Object -ReferenceObject $header -DifferenceObject $header_diff -PassThru)
# Generate a CSV table containing only CSV 2's unique headers
$table_diff = ($table_add | Select-Object $header_diff)
# Convert CSV 1 from a custom PSObject to a string
$table_str = ($table | Select-Object * | ConvertTo-Csv)
# Convert CSV 2 (unique headers only) from custom PSObject to a string
$table_diff_str = ($table_diff | Select-Object * | ConvertTo-Csv)
# Set line counter
$line = 0
# Set flag for if headers have been processed
$headproc = 0
# Concatenate CSV 2 lines to the end of CSV 1 lines until one or both are out of lines.
While (($table_str[$line] -ne $null) -and ($table_diff_str[$line] -ne $null)) {
If ($headproc -eq 1) {
$table_sum_str = $table_sum_str + "`n" + ($table_str[$line] + "," + $table_diff_str[$line])
}
If ($headproc -eq 0) {
$table_sum_str = $table_str[$line] + "," + $table_diff_str[$line]
$headproc = 1
}
$line = $line + 1
}
$table_sum_str | ConvertFrom-Csv | Select-Object * | Export-Csv -Path "./test_sum.csv" -Encoding UTF8 -NoTypeInformation
Ran a quick comparison using Measure-Command between this and mklement0's script.
PS > Measure-Command {./self.ps1}
Days : 0
Hours : 0
Minutes : 0
Seconds : 0
Milliseconds : 26
Ticks : 267771
TotalDays : 3.09920138888889E-07
TotalHours : 7.43808333333333E-06
TotalMinutes : 0.000446285
TotalSeconds : 0.0267771
TotalMilliseconds : 26.7771
PS > Measure-Command {./mklement.ps1}
Days : 0
Hours : 0
Minutes : 0
Seconds : 0
Milliseconds : 18
Ticks : 185058
TotalDays : 2.141875E-07
TotalHours : 5.1405E-06
TotalMinutes : 0.00030843
TotalSeconds : 0.0185058
TotalMilliseconds : 18.5058
I assume speed differences are because I spend time creating a separate CSV PSObject to isolate columns instead of comparing them directly. mklement's also has the advantage of keeping the columns in the same order.
I'm newbie in Powershell. I tried to process / transpose row-column against a medium size csv based record (around 10000 rows). The original CSV consist of around 10000 rows with 3 columns ("Time","Id","IOT") as below:
"Time","Id","IOT"
"00:03:56","23","26"
"00:03:56","24","0"
"00:03:56","25","0"
"00:03:56","26","1"
"00:03:56","27","0"
"00:03:56","28","0"
"00:03:56","29","0"
"00:03:56","30","1953"
"00:03:56","31","22"
"00:03:56","32","39"
"00:03:56","33","8"
"00:03:56","34","5"
"00:03:56","35","269"
"00:03:56","36","5"
"00:03:56","37","0"
"00:03:56","38","0"
"00:03:56","39","0"
"00:03:56","40","1251"
"00:03:56","41","103"
"00:03:56","42","0"
"00:03:56","43","0"
"00:03:56","44","0"
"00:03:56","45","0"
"00:03:56","46","38"
"00:03:56","47","14"
"00:03:56","48","0"
"00:03:56","49","0"
"00:03:56","2013","0"
"00:03:56","2378","0"
"00:03:56","2380","32"
"00:03:56","2758","0"
"00:03:56","3127","0"
"00:03:56","3128","0"
"00:09:16","23","22"
"00:09:16","24","0"
"00:09:16","25","0"
"00:09:16","26","2"
"00:09:16","27","0"
"00:09:16","28","0"
"00:09:16","29","21"
"00:09:16","30","48"
"00:09:16","31","0"
"00:09:16","32","4"
"00:09:16","33","4"
"00:09:16","34","7"
"00:09:16","35","382"
"00:09:16","36","12"
"00:09:16","37","0"
"00:09:16","38","0"
"00:09:16","39","0"
"00:09:16","40","1882"
"00:09:16","41","42"
"00:09:16","42","0"
"00:09:16","43","3"
"00:09:16","44","0"
"00:09:16","45","0"
"00:09:16","46","24"
"00:09:16","47","22"
"00:09:16","48","0"
"00:09:16","49","0"
"00:09:16","2013","0"
"00:09:16","2378","0"
"00:09:16","2380","19"
"00:09:16","2758","0"
"00:09:16","3127","0"
"00:09:16","3128","0"
...
...
...
I tried to do the transpose using code based from powershell script downloaded from https://gallery.technet.microsoft.com/scriptcenter/Powershell-Script-to-7c8368be
Basically my powershell code is as below:
$b = #()
foreach ($Time in $a.Time | Select -Unique) {
$Props = [ordered]#{ Time = $time }
foreach ($Id in $a.Id | Select -Unique){
$IOT = ($a.where({ $_.Id -eq $Id -and $_.time -eq $time })).IOT
$Props += #{ $Id = $IOT }
}
$b += New-Object -TypeName PSObject -Property $Props
}
$b | FT -AutoSize
$b | Out-GridView
Above code could give me the result as I expected which are all "Id" values will become column headers while all "Time" values will become unique row and "IOT" values as the intersection from "Id" x "Time" as below:
"Time","23","24","25","26","27","28","29","30","31","32","33","34","35","36","37","38","39","40","41","42","43","44","45","46","47","48","49","2013","2378","2380","2758","3127","3128"
"00:03:56","26","0","0","1","0","0","0","1953","22","39","8","5","269","5","0","0","0","1251","103","0","0","0","0","38","14","0","0","0","0","32","0","0","0"
"00:09:16","22","0","0","2","0","0","21","48","0","4","4","7","382","12","0","0","0","1882","42","0","3","0","0","24","22","0","0","0","0","19","0","0","0"
While it only involves a few hundreds rows, the result comes out quickly as expected, but the problem now when processing the whole csv file with 10000 rows, the script above 'keep executing' and doesn't seem able to finish for long time (hours) and couldn't spit out any results.
So probably if some powershell experts from stackoverflow could help to asses the code above and probably could help to modify to speed up the results?
Many thanks for the advise
10000 records is a lot but I don't think it is enough to advise streamreader* and manually parsing the CSV. The biggest thing going against you though is the following line:
$b += New-Object -TypeName PSObject -Property $Props
What PowerShell is doing here is making a new array and appending that element to it. This is a very memory intensive operation that you are repeating 1000's of times. Better thing to do in this case is use the pipeline to your advantage.
$data = Import-Csv -Path "D:\temp\data.csv"
$headers = $data.ID | Sort-Object {[int]$_} -Unique
$data | Group-Object Time | ForEach-Object{
$props = [ordered]#{Time = $_.Name}
foreach($header in $headers){
$props."$header" = ($_.Group | Where-Object{$_.ID -eq $header}).IOT
}
[pscustomobject]$props
} | export-csv d:\temp\testing.csv -NoTypeInformation
$data will be your entire file in memory as an object. Need to get all the $headers that will be the column headers.
Group the data by each Time. Then inside each time object we get the value for every ID. If the ID does not exist during that time then the entry will show as null.
This is not the best way but should be faster than yours. I ran 10000 records in under a minute (51 second average over 3 passes). Will benchmark to show you if I can.
I just ran your code once with my own data and it took 13 minutes. I think it is safe to say that mine performs faster.
Dummy data was made with this logic FYI
1..100 | %{
$time = get-date -Format "hh:mm:ss"
sleep -Seconds 1
1..100 | % {
[pscustomobject][ordered]#{
time = $time
id = $_
iot = Get-Random -Minimum 0 -Maximum 7
}
}
} | Export-Csv d:\temp\data.csv -notypeinformation
* Not a stellar example for your case of streamreader. Just pointing it out to show that it is the better way to read large files. Just need to parse string line by line.
Hope you can help me with this little puzzle.
I have ONE txt file looking like this:
firstnumbers
348.92
237
230
329.31
secondnumbers
18.21
48.92
37
30
29.31
So a txt file with one Column that has 2 strings and some numbers on each line.
I want to take the total of each column and put it into each variable like say $a and $b
Yes it is 1 column, just to make sure no misunderstanding
It's pretty easy, if I use 2 files with each column of numbers without the headers(strings)
$a = (Get-Content 'firstnumbers.txt' | Measure-Object -Sum).Sum
$b = (Get-Content 'secondnumbers.txt' | Measure-Object -Sum).Sum
But it would be a little more cool to have them in one txt file, like the aforementioned with a header over each row of numbers.
I've tried removing the the headers with i.e. $a.Replace("first", $null).Replace("sec", $null) and then doing a $b.Split(" ")[1,2,3,4,5] ending with | measure -sum
That gives me the correct number of firstnumbers - but it won't work if I don't keep the specific set of numbers each time. They'll change and there's gonna be more or less of them.
It should be pretty easy I'm guessing. I just can't to seem wrap my head around it at the moment.
Any advice would be awesome!
cheers
Something like this should work:
$file = "C:\path\to\your.txt"
[IO.File]::ReadAllText($file) | % {
$_ -replace "`n+([0-9])", ' $1' -split "`n"
} | ? { $_ -ne "" } | % {
$a = $_ -split " ", 2
$v = $a[1] -split " " | Measure-Object -Sum
"{0}`t{1}" -f ($a[0], $v.Sum)
}
Output:
firstnumbers 1145,23
secondnumbers 163,44
Here's another approach, rather than parsing the text as one big blob, you could test each line to see if it contains a # or text, if it's text, then it triggers the creation of a new entry in a hashtable where the sums are stored:
# C:\Temp> get-content .\numbers.txt | foreach{
$val=0;
if([Decimal]::TryParse($_,[ref]$val)){
$sums[$key]+=$val
}else{
$sums += #{"$_"=0}; #add new entry to hashtable
$key=$_;}
} -end {$sums}
Name Value
---- -----
secondnumbers 163.44
firstnumbers 1145.23
Edit: As noted in the comments, the $sums variable persists for each run which causes problems if you run this command twice. You could call Remove-variable sums after each run, or add it to the end processing block like this:
# C:\Temp> get-content .\numbers.txt | foreach{
$val=0;
if([Decimal]::TryParse($_,[ref]$val)){
$sums[$key]+=$val
}else{
$sums += #{"$_"=0}; #add new entry to hashtable
$key=$_;}
} -end {$sums; remove-variable sums;}