Converting CSV Column to Rows - powershell

I'm trying to create a PowerShell script that transpose a CSV file from column to rows.
I found examples of doing the opposite (converting row based CSV to column) but I found nothing on column to rows. My problem being that I don't know exactly how many column I'll have. I tried adapting the row to column to column to rows but unsuccessfully.
$a = Import-Csv "input.csv"
$a | FT -AutoSize
$b = #()
foreach ($Property in $a.Property | Select -Unique) {
$Props = [ordered]#{ Property = $Property }
foreach ($Server in $a.Server | Select -Unique){
$Value = ($a.where({ $_.Server -eq $Server -and
$_.Property -eq $Property })).Value
$Props += #{ $Server = $Value }
}
$b += New-Object -TypeName PSObject -Property $Props
}
$b | FT -AutoSize
$b | Out-GridView
$b | Export-Csv "output.csv" -NoTypeInformation
For example my CSV can look like this:
"ID","DATA1"
"12345","11111"
"54321","11111"
"23456","44444"
or this (number of column can vary):
"ID","DATA1","DATA2","DATA3"
"12345","11111","22222","33333"
"54321","11111",,
"23456","44444","55555",
and I would like the script to convert it like this:
"ID","DATA"
"12345","11111"
"12345","22222"
"12345","33333"
"54321","11111"
"23456","44444"
"23456","55555"

The trick is to query the members of the table to get the column names. Once you do that then the rest is straightforward:
function Flip-Table ($Table) {
Process {
$Row = $_
# Get all the columns names, excluding the ID field.
$Columns = ($Row | Get-Member -Type NoteProperty | Where-Object Name -ne ID).Name
foreach ($Column in $Columns) {
if ($Row.$Column) {
$Properties = [Ordered] #{
"ID" = $Row.ID
"DATA" = $Row.$Column
}
New-Object PSObject -Property $Properties
}
}
# Garbage collection won't kick in until the end of the script, so
# invoke it every 100 input rows.
$Count++;
if (($Count % 100) -eq 0) {
[System.GC]::GetTotalMemory('forceFullCollection') | out-null
}
}
}
Import-Csv input.csv | Flip-Table | Export-Csv -NoTypeInformation output.csv

Well, here is mine. I'm not as fancy as the rest:
$in = Get-Content input.csv | Select -Skip 1
$out = New-Object System.Collections.ArrayList
foreach($row in $in){
$parts = $row.Split(',')
$id = $parts[0]
foreach($data in $parts[1..$parts.Count]){
if($data -ne '' -AND $data -ne $null){
$temp = New-Object PSCustomObject -Property #{'ID' = $id;
'Data' = $data}
$out.Add($temp) | Out-Null
}
}
}
$out | Export-CSV output.csv -NoTypeInformation

You can do something like this
# Convert csv to object
$csv = ConvertFrom-Csv #"
"ID","DATA1","DATA2","DATA3"
"12345","11111","22222","33333"
"54321","11111",,
"23456","44444","55555"
"#
# Ignore common members and the ID property
$excludedMembers = #(
'GetHashCode',
'GetType',
'ToString',
'Equals',
'ID'
)
$results = #()
# Iterate around each csv row
foreach ($row in $csv) {
$members = $row | Get-Member
# Iterate around each member from the 'row object' apart from our
# exclusions and empty values
foreach ($member in $members |
Where { $excludedMembers -notcontains $_.Name -and $row.($_.Name)}) {
# add to array of objects
$results += #{ ID=$row.ID; DATA=$row.($member.Name)}
}
}
# Write the csv string
$outstring = "ID,DATA"
$results | foreach { $outstring += "`n$($_.ID),$($_.DATA)" }
# New csv object
$csv = $outstring | ConvertFrom-Csv
Probably not the most elegant solution, but should do what you need
I left some comments explaining what it does

If you only want to accept a limited number DATA columns (e.g. 5), you could do:
ForEach ($i in 1..5) {$CSV | ? {$_."Data$i"} | Select ID, #{N='Data'; E={$_."Data$i"}}}
And if you have a potential unlimited number of DATA columns:
ForEach ($Data in ($CSV | Select "Data*" -First 1).PSObject.Properties.Name) {
$CSV | ? {$_.$Data} | Select ID, #{N='Data'; E={$_.$Data}}
}

Related

Check all lines in a huge CSV file in PowerShell

I want to work with a CSV file of more than 300,000 lines. I need to verify information line by line and then display it in a .txt file in the form of a table to see which file was missing for all servers. For example
Name,Server
File1,Server1
File2,Server1
File3,Server1
File1,Server2
File2,Server2
...
File345,Server76
File346,Server32
I want to display in table form this result which corresponds to the example above:
Name Server1 Server2 ... Server 32 ....Server 76
File1 X X
File2 X X
File3 X
...
File345 X
File346 X
To do this actually, I have a function that creates objects where the members are the Server Name (The number of members object can change) and I use stream reader to split data (I have more than 2 columns in my csv so 0 is for the Server name and 5 for the file name)
$stream = [System.IO.StreamReader]::new($File)
$stream.ReadLine() | Out-Null
while ((-not $stream.EndOfStream)) {
$line = $stream.ReadLine()
$strTempo = $null
$strTempo = $line -split ","
$index = $listOfFile.Name.IndexOf($strTempo[5])
if ($index -ne -1) {
$property = $strTempo[0].Replace("-", "_")
$listOfFile[$index].$property = "X"
}
else {
$obj = CreateEmptyObject ($listOfConfiguration)
$obj.Name = $strTempo[5]
$listOfFile.Add($obj) | Out-Null
}
}
When I export this I have a pretty good result. But the script take so much time (between 20min to 1hour)
I didn't know how optimize actually the script. I'm beginner to PowerShell.
Thanks for the futures tips
You might use HashSets for this:
$Servers = [System.Collections.Generic.HashSet[String]]::New()
$Files = #{}
Import-Csv -Path $Path |ForEach-Object {
$Null = $Servers.Add($_.Server)
if ($Files.Contains($_.Name)) { $Null = $Files[$_.Name].Add($_.Server) }
else { $Files[$_.Name] = [System.Collections.Generic.HashSet[String]]$_.Server }
}
$Table = foreach($Name in $Files.get_Keys()) {
$Properties = [Ordered]#{ Name = $Name }
ForEach ($Server in $Servers) {
$Properties[$Server] = if ($Files[$Name].Contains($Server)) { 'X' }
}
[PSCustomObject]$Properties
}
$Table |Format-Table -Property #{ expression='*' }
Note that in contrast to PowerShell's usual behavior, the .Net HashSet class is case-sensitive by default. To create an case-insensitive HashSet use the following constructor:
[System.Collections.Generic.HashSet[String]]::New([StringComparer]::OrdinalIgnoreCase)
See if this works faster. Change filename as required
$Path = "C:\temp\test1.txt"
$table = Import-Csv -Path $Path
$columnNames = $table | Select-Object -Property Server -Unique| foreach{$_.Server} | Sort-Object
Write-Host "names = " $columnNames
$groups = $table | Group-Object {$_.Name}
$outputTable = [System.Collections.ArrayList]#()
foreach($group in $groups)
{
Write-Host "Group = " $group.Name
$newRow = New-Object -TypeName psobject
$newRow | Add-Member -NotePropertyName Name -NotePropertyValue $group.Name
$servers = $group.Group | Select-Object -Property Server | foreach{$_.Server}
Write-Host "servers = " $servers
foreach($item in $columnNames)
{
if($servers.Contains($item))
{
$newRow | Add-Member -NotePropertyName $item -NotePropertyValue 'X'
}
else
{
#if you comment out next line code doesn't work
$newRow | Add-Member -NotePropertyName $item -NotePropertyValue ''
}
}
$outputTable.Add($newRow) | Out-Null
}
$outputTable | Format-Table

Expanding System.Object[] for Export to CSV

Hello again and sorry!
To keep it really short. What am i doing wrong?
Im attempting to export a list of users filtered by using a customobject to a CSV and it outputs it into the same block. Is there no way to change this? I only ask because, all the other pages ive looked at it keeps telling me to use -join, to join them as strings which does the exact same thing. Is it not possible to output it as multiple rows for each user?
$GPMem = Get-ADGroupMember -Identity security.group | Select-Object -ExpandProperty Name
[array]$TPpl = $GPMem | Where-Object {$_ -like "T*"}
[array]$RPpl = $GPMem | Where-Object {$_ -like "r*"}
[array]$CPpl = $GPMem | Where-Object {$_ -like "c*"}
[pscustomobject]#{
TPeople = (#($TPpl) |Out-String).Trim()
TPCount = $TPpl.Count
RPeople = (#($RPpl) |Out-String).ToString()
RPCount = $TPpl.Count
CPeople = $CPpl
CPCount = $TPpl.Count
} | Export-Csv -Path C:\Users\abraham\Desktop\csv.csv -NoTypeInformation -Force
here is how to insert a ; OR newline into the values of a column ... [grin]
$ThingList = #('One Thing', 'Two Thing', 'Three Thing')
$ThingList -join '; '
'=' * 20
$ThingList -join [System.Environment]::NewLine
output ...
One Thing; Two Thing; Three Thing
====================
One Thing
Two Thing
Three Thing
create 3 more arrays for the count (each array will be exported to a column), then find the array with the most count and generate a psobject for each line.
[array]$TPpl = #("T1" ,"T2", "T3")
[array]$TPpl_count = #($TPpl.Count)
[array]$RPpl = #("R1" ,"R2", "R3", "R4")
[array]$RPpl_count = #($RPpl.Count)
[array]$CPpl = #("C1" ,"C2", "C3", "C4","C5")
[array]$CPpl_count = #($CPpl.Count)
$leng = [array]$TPpl.Count,$RPpl.Count,$CPpl.Count
$max = ($leng | measure -Maximum).Maximum
$csv = for($i=0;$i -lt $max;$i++){
New-Object -TypeName psobject -Property #{
"TPeople" = $(if ($TPpl[$i]) { $TPpl[$i]})
"TPCount" = $(if ($TPpl_count[$i]) { $TPpl_count[$i]})
"RPeople" = $(if ($RPpl[$i]) { $RPpl[$i]})
"RPCount" = $(if ($RPpl_count[$i]) { $RPpl_count[$i]})
"CPeople" = $(if ($CPpl[$i]) { $CPpl[$i]})
"CPCount" = $(if ($CPpl_count[$i]) { $CPpl_count[$i]})
}
}
$csv | Export-Csv C:\Temp\test.csv -NoTypeInformation
result:
your final code should be:
$GPMem = Get-ADGroupMember -Identity security.group | Select-Object -ExpandProperty Name
[array]$TPpl = $GPMem | Where-Object {$_ -like "T*"}
[array]$RPpl = $GPMem | Where-Object {$_ -like "r*"}
[array]$CPpl = $GPMem | Where-Object {$_ -like "c*"}
[array]$TPpl_count = #($TPpl.Count)
[array]$RPpl_count = #($RPpl.Count)
[array]$CPpl_count = #($CPpl.Count)
$leng = [array]$TPpl.Count,$RPpl.Count,$CPpl.Count
$max = ($leng | measure -Maximum).Maximum
$csv = for($i=0;$i -lt $max;$i++){
New-Object -TypeName psobject -Property #{
"TPeople" = $(if ($TPpl[$i]) { $TPpl[$i]})
"TPCount" = $(if ($TPpl_count[$i]) { $TPpl_count[$i]})
"RPeople" = $(if ($RPpl[$i]) { $RPpl[$i]})
"RPCount" = $(if ($RPpl_count[$i]) { $RPpl_count[$i]})
"CPeople" = $(if ($CPpl[$i]) { $CPpl[$i]})
"CPCount" = $(if ($CPpl_count[$i]) { $CPpl_count[$i]})
}
}
$csv | Export-Csv C:\Temp\test.csv -NoTypeInformation

Compare .csv and add only rows with updated values

I have 2 .csv files, they have matching columns, I am trying to compare the two and have a final output .csv that only has the differences
corpold.csv is a previous imported file.
corpnew.csv is the new import file.
I need to export a CSV that includes all items that are not in corpold.csv, only changed items that exist in both CSVs and exclude any rows that exist in corpold.csv but not in corpnew.csv.
$reference = Import-Csv -Path D:\corpold.csv
$lookup = $reference | Group-Object -AsHashTable -AsString -Property EMPID
$results = Import-Csv -Path D:\corpnew.csv | foreach {
$email = $_.EMAIL_ADDRESS
$status = $_.ACTIVE
$fs = $_.FIRST_NAME
$ls = $_.LAST_NAME
$id = $_.EMPID
$title = $_.JOB_TITLE
$code = $_.JOB_CODE
$type = $_.USER_TYPE
$designee = $_.DESIGNEE
$stores = $_.STORES
$hiredate = $_.HIRE_DATE
$dept = $_.DEPARTMENT
$grp = $_.GROUP
if ($lookup.ContainsKey($id)) {
# if exists in yesterdays file
# trying to figure out how to compare and only provide results into
# the Export-Csv that have changed while excluding any items in
# corpold that do not exist in corpnew
} else {
# if it does not exist update all fields
[PSCustomObject]#{
ACTIVE = $status
EMAIL_ADDRESS = $email
FIRST_NAME = $fs
LAST_NAME = $ls
EMPID = $id
JOB_TITLE = $title
JOB_CODE = $code
USER_TYPE = $type
DESIGNEE = $designee
STORES = $stores
HIRE_DATE = $hiredate
DEPARTMENT = $dept
GROUP = $grp
}
}
}
# Sample outputs
$results
$results | Export-Csv -Path D:\delta.csv -NoTypeInformation
There are two operations to be done here: find differences and compare objects which exist in both files.
Compare objects and find new/deleted entries
To compare object you can utilize Compare-Object cmdlet like this:
Compare-Object -ReferenceObject $reference -DifferenceObject $results -Property EMPID -IncludeEqual
This will give you the list of EMPID and SideIndicator showing whether object exist only in first (<=), only in second (=>) or in both (==). You can filter by SideIndicatorand then process it accordingly.
Alternative way is to use Where-Object like this:
$reference | Where-Object empid -NotIn $results.empid
$reference | Where-Object empid -In $results.empid
$results | Where-Object empid -NotIn $reference.empid
First one will give you entries only in first file, second one results existing in both, last one will give you entries only existing in second file.
Find edited entries
What you basically have to do is to iterate all the entries and then check if any of the columns has been changed. If yes, add it to $changedEntries.
Example of script:
$IDsInBoth = $results | Where-Object empid -In $reference.empid | Select-Object -ExpandProperty EMPID
$AllProperties = $results | Get-Member | Where-Object MemberType -eq "NoteProperty" | Select-Object -ExpandProperty Name
$changedEntries = #()
$IDsInBoth | ForEach-Object {
$changed = $false
$newEntry = $results | Where-Object EMPID -eq $_
$oldEntry = $reference | Where-Object EMPID -eq $_
foreach ($p in $AllProperties) {
if ($oldEntry."$p" -ne $newEntry."$p") {
$changed = $true
}
}
if ($changed) {
$changedEntries += $newEntry
}
}

Reading txt-file, change rows to columns, save txt file

I have a txt files (semicolon separated) containing over 3 million records where columns 1 to 4 have some general information. Columns 5 and 6 have detailed information. There can be up to 4 different detailed information for the same general information in columns 1 to 4.
My sample input:
Server;Owner;Company;Username;Property;Value
Srv1;Dave;Sandbox;kwus91;Memory;4GB
Srv1;Dave;Sandbox;kwus91;Processes;135
Srv1;Dave;Sandbox;kwus91;Storage;120GB
Srv1;Dave;Sandbox;kwus91;Variant;16
Srv2;Pete;GWZ;aiwq71;Memory;8GB
Srv2;Pete;GWZ;aiwq71;Processes;234
Srv3;Micael;P12;mxuq01;Memory;16GB
Srv3;Micael;P12;mxuq01;Processes;239
Srv3;Micael;P12;mxuq01;Storage;160GB
Srv4;Stefan;MTC;spq61ep;Storage;120GB
Desired output:
Server;Owner;Company;Username;Memory;Processes;Storage;Variant
Srv1;Dave;Sandbox;kwus91;4GB;135;120GB;16
Srv2;Pete;GWZ;aiwq71;8GB;234;;
Srv3;Micael;P12;mxuq01;16GB;239;160GB;
Srv4;Stefan;MTC;spq61ep;;;120GB;
If a values doesn't exist for general information (Columns 1-4) it has to stay blank.
My current code:
$a = Import-csv .\Input.txt -Delimiter ";"
$a | FT -AutoSize
$b = #()
foreach ($Server in $a.Server | Select -Unique) {
$Props = [ordered]#{ Server = $Server }
$Owner = ($a.where({ $_.Server -eq $Server})).Owner | Select -Unique
$Company = ($a.where({ $_.Server -eq $Server})).Company | Select -Unique
$Username = ($a.where({ $_.Server -eq $Server})).Username | Select -Unique
$Props += #{Owner = $Owner}
$Props += #{Company = $Company}
$Props += #{Username = $Username}
foreach ($Property in $a.Property | Select -Unique){
$Value = ($a.where({ $_.Server -eq $Server -and
$_.Property -eq $Property})).Value
$Props += #{ $Property = $Value }
}
$b += New-Object -TypeName PSObject -Property $Props
}
$b | FT -AutoSize
$b | Export-Csv .\Output.txt -NoTypeInformation -Delimiter ";"
After a lot of trying and getting errors: My script works.
But it takes a lot of time.
Is there a possibility to make performance better for around 3 Million lines in txt file? I'm calculating with more or less 2.5 Million unique values for $Server.
I'm running Windows 7 64bit with PowerShell 4.0.
try Something like this:
#Import Data and create empty columns
$List=import-csv "C:\temp\file.csv" -Delimiter ";"
#get all properties name with value not empty
$ListProperty=($List | where Value -ne '' | select property -Unique).Property
#group by server
$Groups=$List | group Server
#loop every rows and store data by group and Property Name
$List | %{
$Current=$_
#Take value not empty and group by Property Name
$Group=($Groups | where Name -eq $Current.Server).Group | where Value -ne '' | group Property
#Add all property and first value not empty
$ListProperty | %{
$PropertyName=$_
$PropertyValue=($Group | where Name -eq $PropertyName | select -first 1).Group.Value
$Current | Add-Member -Name $PropertyName -MemberType NoteProperty -Value $PropertyValue
}
$Current
} | select * -ExcludeProperty Property, Value -unique | export-csv "c:\temp\result.csv" -notype -Delimiter ";"

Filter csv file by array of hashtables

What I am trying todo is filter a csv file (happens to be a weblog) with an array of hashtables (user information from a database).
#$data is from a database. (about 500 items) Type: System.Data.DataTable
$users = #()
foreach($row in $data)
{
$userItem = #{
LoginId = $row[0]
LastName = $row[3]
FirstName = $row[4]
LastAccess = $null
}
$users += $userItem
}
#Log files are about 14,000 lines long
$logfiles = Get-ChildItem $logFolder -Recurse | where {$_.Extension -eq ".log"} | Sort-Object BaseName -Descending
foreach($log in $logfiles)
{
$csvLog = Import-Csv $log.FullName -Header ("Blank","LoginId","Date")
$u = $users | Select {&_.LoginId}
$filteredcsvLog = $cvsLog | Where-Object { $u -contains $_.LoginId}
#This returns null
....
}
This does not seem to work, what am I missing. My guess is that I need to flatten the array into [string[]], however I can't seem todo that either.
Rather than do an array of hashtables, I would do a hashtable of custom objects e.g.:
$users = #{}
foreach($row in $data)
{
$userItem = new-object psobject -property #{
LoginId = $row[0]
LastName = $row[3]
FirstName = $row[4]
LastAccess = $null
}
$users[$userItem.LoginId] = $userItem
}
Then the filtering is easier and faster:
foreach($log in $logfiles)
{
$csvLog = Import-Csv $log.FullName -Header ("Blank","LoginId","Date")
$filteredcsvLog = $cvsLog | Where-Object { $users[$_.LoginId} }
....
}