Compare .csv and add only rows with updated values - powershell
I have 2 .csv files, they have matching columns, I am trying to compare the two and have a final output .csv that only has the differences
corpold.csv is a previous imported file.
corpnew.csv is the new import file.
I need to export a CSV that includes all items that are not in corpold.csv, only changed items that exist in both CSVs and exclude any rows that exist in corpold.csv but not in corpnew.csv.
$reference = Import-Csv -Path D:\corpold.csv
$lookup = $reference | Group-Object -AsHashTable -AsString -Property EMPID
$results = Import-Csv -Path D:\corpnew.csv | foreach {
$email = $_.EMAIL_ADDRESS
$status = $_.ACTIVE
$fs = $_.FIRST_NAME
$ls = $_.LAST_NAME
$id = $_.EMPID
$title = $_.JOB_TITLE
$code = $_.JOB_CODE
$type = $_.USER_TYPE
$designee = $_.DESIGNEE
$stores = $_.STORES
$hiredate = $_.HIRE_DATE
$dept = $_.DEPARTMENT
$grp = $_.GROUP
if ($lookup.ContainsKey($id)) {
# if exists in yesterdays file
# trying to figure out how to compare and only provide results into
# the Export-Csv that have changed while excluding any items in
# corpold that do not exist in corpnew
} else {
# if it does not exist update all fields
[PSCustomObject]#{
ACTIVE = $status
EMAIL_ADDRESS = $email
FIRST_NAME = $fs
LAST_NAME = $ls
EMPID = $id
JOB_TITLE = $title
JOB_CODE = $code
USER_TYPE = $type
DESIGNEE = $designee
STORES = $stores
HIRE_DATE = $hiredate
DEPARTMENT = $dept
GROUP = $grp
}
}
}
# Sample outputs
$results
$results | Export-Csv -Path D:\delta.csv -NoTypeInformation
There are two operations to be done here: find differences and compare objects which exist in both files.
Compare objects and find new/deleted entries
To compare object you can utilize Compare-Object cmdlet like this:
Compare-Object -ReferenceObject $reference -DifferenceObject $results -Property EMPID -IncludeEqual
This will give you the list of EMPID and SideIndicator showing whether object exist only in first (<=), only in second (=>) or in both (==). You can filter by SideIndicatorand then process it accordingly.
Alternative way is to use Where-Object like this:
$reference | Where-Object empid -NotIn $results.empid
$reference | Where-Object empid -In $results.empid
$results | Where-Object empid -NotIn $reference.empid
First one will give you entries only in first file, second one results existing in both, last one will give you entries only existing in second file.
Find edited entries
What you basically have to do is to iterate all the entries and then check if any of the columns has been changed. If yes, add it to $changedEntries.
Example of script:
$IDsInBoth = $results | Where-Object empid -In $reference.empid | Select-Object -ExpandProperty EMPID
$AllProperties = $results | Get-Member | Where-Object MemberType -eq "NoteProperty" | Select-Object -ExpandProperty Name
$changedEntries = #()
$IDsInBoth | ForEach-Object {
$changed = $false
$newEntry = $results | Where-Object EMPID -eq $_
$oldEntry = $reference | Where-Object EMPID -eq $_
foreach ($p in $AllProperties) {
if ($oldEntry."$p" -ne $newEntry."$p") {
$changed = $true
}
}
if ($changed) {
$changedEntries += $newEntry
}
}
Related
Compare Two CSVs, match the columns on 2 or more Columns, export specific columns from both csvs with powershell
i Have 2 CSV's left.csv Ref_ID,First_Name,Last_Name,DOB 321364060,User1,Micah,11/01/1969 946497594,User2,Acker,05/28/1960 887327716,User3,Aco,06/26/1950 588496260,User4,John,05/23/1960 565465465,User5,Jack,07/08/2020 right.csv First_Name,Last_Name,DOB,City,Document_Type,Filename User1,Micah,11/01/1969,Parker,Transcript,T4IJZSYO.pdf User2,Acker,05/28/1960,,Transcript,R4IKTRYN.pdf User3,Aco,06/26/1950,,Transcript,R4IKTHMK.pdf User4,John,05/23/1960,,Letter,R4IKTHSL.pdf End Results: Combined.csv Ref_ID,First_Name,Last_Name,DOB,Document_Type,Filename 321364060,User1,Micah,11/01/1969,Parker,Transcript,T4IJZSYO.pdf 946497594,User2,Acker,05/28/1960,Transcript,R4IKTRYN.pdf 887327716,User3,Aco,06/26/1950,Transcript,R4IKTHMK.pdf 588496260,User4,John,05/23/1960,Letter,R4IKTHSL.pdf I need to match them on First_Name,Last_Name,DOB then return Ref_ID, first_name, last_name, DOB from the left.csv and Document_Type,Filename from the right.csv Use Compare-Object: that only returns columns from one of the csvs, not columns from both. Use join-object: This was my great hope, but that only lets me match on one Column, I need to match multiple Columns (cant figure out how to do Multiple) Im not sure where to go from here, open to suggestions.
$left = Import-Csv C:\left.csv $right = Import-Csv C:\right.csv Compare-Object -ReferenceObject $left -DifferenceObject $right -Property First_Name,Last_Name,DOB -IncludeEqual -ExcludeDifferent | ForEach-Object { $iItem = $_ $ileft = $left.Where({$_.First_Name -eq $iItem.First_Name -and $_.Last_Name -eq $iItem.Last_Name -and$_.DOB -eq $iItem.DOB}) $iright = $right.Where({$_.First_Name -eq $iItem.First_Name -and $_.Last_Name -eq $iItem.Last_Name -and$_.DOB -eq $iItem.DOB}) [pscustomobject]#{ Ref_ID=$ileft.Ref_ID first_name=$ileft.first_name last_name=$ileft.last_name DOB=$ileft.DOB Document_Type=$iright.Document_Type Filename=$iright.Filename } } | Export-Csv C:\Combined.csv -NoTypeInformation
You could create you own key from each csv, then add from each csv to a new hashtable using this key. Step through this in a debugger (ISE or VSCode) and tailor it to what you need... Add appropriate error checking as you need depending on the sanity of your data. Some statements below are just for debugging so you can inspect what's happening as it runs. # Ref_ID,First_Name,Last_Name,DOB $csv1 = #' 321364060,User1,Micah,11/01/1969 946497594,User2,Acker,05/28/1960 887327716,User3,Aco,06/26/1950 588496260,User4,John,05/23/1960 565465465,User5,Jack,07/08/2020 '# # First_Name,Last_Name,DOB,City,Document_Type,Filename $csv2 = #' User1,Micah,11/01/1969,Parker,Transcript,T4IJZSYO.pdf User2,Acker,05/28/1960,,Transcript,R4IKTRYN.pdf User3,Aco,06/26/1950,,Transcript,R4IKTHMK.pdf User4,John,05/23/1960,,Letter,R4IKTHSL.pdf '# # hashtable $data = #{} $c1 = $csv1 -split "`r`n" $c1.count foreach ($item in $c1) { $fields = $item -split ',' $key = $fields[1]+$fields[2]+$fields[3] $key # add new hashtable for given key $data.Add($key, [ordered]#{}) # add data from c1 to the hashtable $data[$key].ID = $fields[0] $data[$key].First = $fields[1] $data[$key].Last = $fields[2] $data[$key].DOB = $fields[3] } $c2 = $csv2 -split "`r`n" $c2.count foreach ($item in $c2) { $fields = $item -split ',' $key = $fields[0]+$fields[1]+$fields[2] $key # add data from c2 to the hashtable $data[$key].Type = $fields[4] $data[$key].FileName = $fields[5] } $data.Count foreach ($key in $data.Keys) { '=====================' $data[$key] }
Some good answers already, and here's another. Import your myriad objects into a single (dis)array: $left = #" Ref_ID,First_Name,Last_Name,DOB 321364060,User1,Micah,11/01/1969 946497594,User2,Acker,05/28/1960 887327716,User3,Aco,06/26/1950 588496260,User4,John,05/23/1960 565465465,User5,Jack,07/08/2020 "# $right = #" First_Name,Last_Name,DOB,City,Document_Type,Filename User1,Micah,11/01/1969,Parker,Transcript,T4IJZSYO.pdf User2,Acker,05/28/1960,,Transcript,R4IKTRYN.pdf User3,Aco,06/26/1950,,Transcript,R4IKTHMK.pdf User4,John,05/23/1960,,Letter,R4IKTHSL.pdf "# $disarray = #( $left | ConvertFrom-Csv $right | ConvertFrom-Csv ) Use Group-Object to organize them into groups having identical key values: $keyProps = #('First_Name', 'Last_name', 'DOB') $disarray | Group-Object -Property $keyProps | Where-Object Count -gt 1 | Then merge the objects, adding any missing properties to the output $mergedObject ForEach-Object { $mergedObject = $_.group[0] foreach ($obj in $_.group[1..($_.group.count-1)]) { $newProps = ($obj | Get-Member -MemberType NoteProperty).name | Where-Object { $_ -notin ($mergedobject | Get-Member -MemberType NoteProperty).name } foreach ($propName in $newProps) { $mergedObject | Add-Member -MemberType NoteProperty -Name $propName -Value $obj.$propName -Force } } Write-Output $mergedObject } This doesn't differ wildly from the answers you already have, but eliminating the "left" "right" distinction might be helpful; The above code should handle three or more sources thrown into $disarray, merging all objects containing identical $keyProps. Note that there are corner cases to consider. For instance, what happens if one object has 'City=Chigago' for a user and another has 'City=New York'?
Try this Join-Object. It has a few more features along with joining based on multiple columns: $Left = ConvertFrom-Csv #" Ref_ID,First_Name,Last_Name,DOB 321364060,User1,Micah,11/01/1969 946497594,User2,Acker,05/28/1960 887327716,User3,Aco,06/26/1950 588496260,User4,John,05/23/1960 565465465,User5,Jack,07/08/2020 "# $Right = ConvertFrom-Csv #" First_Name,Last_Name,DOB,City,Document_Type,Filename User1,Micah,11/01/1969,Parker,Transcript,T4IJZSYO.pdf User2,Acker,05/28/1960,,Transcript,R4IKTRYN.pdf User3,Aco,06/26/1950,,Transcript,R4IKTHMK.pdf User4,John,05/23/1960,,Letter,R4IKTHSL.pdf "# $Left | Join $Right ` -On First_Name, Last_Name, DOB ` -Property Ref_ID, Filename, First_Name, DOB, Last_Name ` | Format-Table Last_Name Ref_ID DOB Filename First_Name --------- ------ --- -------- ---------- Micah 321364060 1969-11-01 12:00:00 AM T4IJZSYO.pdf User1 Acker 946497594 1960-05-28 12:00:00 AM R4IKTRYN.pdf User2 Aco 887327716 1950-06-26 12:00:00 AM R4IKTHMK.pdf User3 John 588496260 1960-05-23 12:00:00 AM R4IKTHSL.pdf User4
adding answer i found: $left = Import-Csv .\left.csv $right = Import-Csv .\right.csv $right | foreach { $r = $_; $left | where{ $_.First_Name -eq $r.First_Name -and $_.Last_Name -eq $r.Last_Name -and $_.DOB -eq $r.DOB } | select Ref_Id, First_Name, Last_Name, DOB, #{Name="City";Expression={$r.City}}, #{Name="Document_Type";Expression={$r.Document_Type}}, #{Name="FileName";Expression={$r.FileName}} } | format-table
Converting CSV Column to Rows
I'm trying to create a PowerShell script that transpose a CSV file from column to rows. I found examples of doing the opposite (converting row based CSV to column) but I found nothing on column to rows. My problem being that I don't know exactly how many column I'll have. I tried adapting the row to column to column to rows but unsuccessfully. $a = Import-Csv "input.csv" $a | FT -AutoSize $b = #() foreach ($Property in $a.Property | Select -Unique) { $Props = [ordered]#{ Property = $Property } foreach ($Server in $a.Server | Select -Unique){ $Value = ($a.where({ $_.Server -eq $Server -and $_.Property -eq $Property })).Value $Props += #{ $Server = $Value } } $b += New-Object -TypeName PSObject -Property $Props } $b | FT -AutoSize $b | Out-GridView $b | Export-Csv "output.csv" -NoTypeInformation For example my CSV can look like this: "ID","DATA1" "12345","11111" "54321","11111" "23456","44444" or this (number of column can vary): "ID","DATA1","DATA2","DATA3" "12345","11111","22222","33333" "54321","11111",, "23456","44444","55555", and I would like the script to convert it like this: "ID","DATA" "12345","11111" "12345","22222" "12345","33333" "54321","11111" "23456","44444" "23456","55555"
The trick is to query the members of the table to get the column names. Once you do that then the rest is straightforward: function Flip-Table ($Table) { Process { $Row = $_ # Get all the columns names, excluding the ID field. $Columns = ($Row | Get-Member -Type NoteProperty | Where-Object Name -ne ID).Name foreach ($Column in $Columns) { if ($Row.$Column) { $Properties = [Ordered] #{ "ID" = $Row.ID "DATA" = $Row.$Column } New-Object PSObject -Property $Properties } } # Garbage collection won't kick in until the end of the script, so # invoke it every 100 input rows. $Count++; if (($Count % 100) -eq 0) { [System.GC]::GetTotalMemory('forceFullCollection') | out-null } } } Import-Csv input.csv | Flip-Table | Export-Csv -NoTypeInformation output.csv
Well, here is mine. I'm not as fancy as the rest: $in = Get-Content input.csv | Select -Skip 1 $out = New-Object System.Collections.ArrayList foreach($row in $in){ $parts = $row.Split(',') $id = $parts[0] foreach($data in $parts[1..$parts.Count]){ if($data -ne '' -AND $data -ne $null){ $temp = New-Object PSCustomObject -Property #{'ID' = $id; 'Data' = $data} $out.Add($temp) | Out-Null } } } $out | Export-CSV output.csv -NoTypeInformation
You can do something like this # Convert csv to object $csv = ConvertFrom-Csv #" "ID","DATA1","DATA2","DATA3" "12345","11111","22222","33333" "54321","11111",, "23456","44444","55555" "# # Ignore common members and the ID property $excludedMembers = #( 'GetHashCode', 'GetType', 'ToString', 'Equals', 'ID' ) $results = #() # Iterate around each csv row foreach ($row in $csv) { $members = $row | Get-Member # Iterate around each member from the 'row object' apart from our # exclusions and empty values foreach ($member in $members | Where { $excludedMembers -notcontains $_.Name -and $row.($_.Name)}) { # add to array of objects $results += #{ ID=$row.ID; DATA=$row.($member.Name)} } } # Write the csv string $outstring = "ID,DATA" $results | foreach { $outstring += "`n$($_.ID),$($_.DATA)" } # New csv object $csv = $outstring | ConvertFrom-Csv Probably not the most elegant solution, but should do what you need I left some comments explaining what it does
If you only want to accept a limited number DATA columns (e.g. 5), you could do: ForEach ($i in 1..5) {$CSV | ? {$_."Data$i"} | Select ID, #{N='Data'; E={$_."Data$i"}}} And if you have a potential unlimited number of DATA columns: ForEach ($Data in ($CSV | Select "Data*" -First 1).PSObject.Properties.Name) { $CSV | ? {$_.$Data} | Select ID, #{N='Data'; E={$_.$Data}} }
Transpose CSV Row to Column based on Column Value
I am trying to convert a 80K row data csv with 30 columns to sorted and filtered CSV based on specific column data from orignal CSV. For Example My Data is in below format: PatchName MachineName IPAddress DefaultIPGateway Domain Name USERID UNKNOWN NOTAPPLICABLE INSTALLED APPLICABLE REBOOTREQUIRED FAILED KB456982 XXX1002 xx.yy.65.148 xx.yy.64.1 XYZ.NET XYZ\ayzuser YES KB589631 XXX1003 xx.yy.65.176 xx.yy.64.1 XYZ.NET XYZ\cdfuser YES KB456982 ABC1004 xx.zz.83.56 xx.zz.83.1 XYZ.NET XYZ\mnguser YES KB456982 8797XCV xx.yy.143.187 xx.yy.143.184 XYZ.NET WPX\abcuser YES Here MachineName would be filtered to Uniq and PatchName would transpose to Last Columns headers with holding "UNKNOWN, NOAPPLICABLE, INSTALLED, FAILED, REBOOTREQUIRED columns Values if YES occurred - Expected Result: MachineName IPAddress DefaultIPGateway Domain Name USERID KB456982 KB589631 XXX1002 xx.yy.65.148 xx.yy.64.1 XYZ.NET XYZ\ayzuser UNKNOWN XXX1003 xx.yy.65.176 xx.yy.64.1 XYZ.NET XYZ\cdfuser NOTAPPLICATBLE ABC1004 xx.zz.83.56 xx.zz.83.1 XYZ.NET XYZ\mnguser UNKNOWN 8797XCV xx.yy.143.187 xx.yy.143.184 XYZ.NET WPX\abcuser FAILED Looking for help to achieve this, so far I am able to transpose PathcName rows to columns but not able to include all the columns along with and apply the condition. [It takes 40 Minutes to process this] $b = #() foreach ($Property in $a.MachineName | Select -Unique) { $Props = [ordered]#{ MachineName = $Property } foreach ($Server in $a.PatchName | Select -Unique){ $Value = ($a.where({ $_.PatchName -eq $Server -and $_.MachineName -eq $Property })).NOTAPPLICABALE $Props += #{ $Server = $Value } } $b += New-Object -TypeName PSObject -Property $Props }
This is what I came up with: $data = Import-Csv -LiteralPath 'C:\path\to\data.csv' $lookup = #{} $allPatches = $data.PatchName | Select-Object -Unique # Make 1 lookup entry for each computer, to keep the username and IP and so on. # Add the patch details from the current row (might add more than one patch per computer) foreach ($row in $data) { if (-not $lookup.ContainsKey($row.MachineName)) { $lookup[$row.MachineName] = ($row | Select-Object -Property MachineName, IPAddress, DefaultIPGateway, DomainName, UserID) } $patchStatus = $row.psobject.properties | Where-Object { $_.name -in #('applicable', 'notapplicable', 'installed', 'rebootrequired', 'failed', 'unknown') -and -not [string]::IsNullOrWhiteSpace($_.value) } | Select-Object -ExpandProperty Name $lookup[$row.MachineName] | Add-Member -NotePropertyName $row.PatchName -NotePropertyValue $patchStatus } # Pull the computer details out of the lookup, and add all the remaining patches # so they will convert to CSV properly, then export to CSV $lookup.Values | ForEach-Object { $computer = $_ foreach ($patch in $allPatches | where-object {$_ -notin $computer.psobject.properties.name}) { $computer | Add-Member -NotePropertyName $patch -NotePropertyValue '' } $computer } | Export-Csv -LiteralPath 'c:\path\to\output.csv' -NoTypeInformation
Reading txt-file, change rows to columns, save txt file
I have a txt files (semicolon separated) containing over 3 million records where columns 1 to 4 have some general information. Columns 5 and 6 have detailed information. There can be up to 4 different detailed information for the same general information in columns 1 to 4. My sample input: Server;Owner;Company;Username;Property;Value Srv1;Dave;Sandbox;kwus91;Memory;4GB Srv1;Dave;Sandbox;kwus91;Processes;135 Srv1;Dave;Sandbox;kwus91;Storage;120GB Srv1;Dave;Sandbox;kwus91;Variant;16 Srv2;Pete;GWZ;aiwq71;Memory;8GB Srv2;Pete;GWZ;aiwq71;Processes;234 Srv3;Micael;P12;mxuq01;Memory;16GB Srv3;Micael;P12;mxuq01;Processes;239 Srv3;Micael;P12;mxuq01;Storage;160GB Srv4;Stefan;MTC;spq61ep;Storage;120GB Desired output: Server;Owner;Company;Username;Memory;Processes;Storage;Variant Srv1;Dave;Sandbox;kwus91;4GB;135;120GB;16 Srv2;Pete;GWZ;aiwq71;8GB;234;; Srv3;Micael;P12;mxuq01;16GB;239;160GB; Srv4;Stefan;MTC;spq61ep;;;120GB; If a values doesn't exist for general information (Columns 1-4) it has to stay blank. My current code: $a = Import-csv .\Input.txt -Delimiter ";" $a | FT -AutoSize $b = #() foreach ($Server in $a.Server | Select -Unique) { $Props = [ordered]#{ Server = $Server } $Owner = ($a.where({ $_.Server -eq $Server})).Owner | Select -Unique $Company = ($a.where({ $_.Server -eq $Server})).Company | Select -Unique $Username = ($a.where({ $_.Server -eq $Server})).Username | Select -Unique $Props += #{Owner = $Owner} $Props += #{Company = $Company} $Props += #{Username = $Username} foreach ($Property in $a.Property | Select -Unique){ $Value = ($a.where({ $_.Server -eq $Server -and $_.Property -eq $Property})).Value $Props += #{ $Property = $Value } } $b += New-Object -TypeName PSObject -Property $Props } $b | FT -AutoSize $b | Export-Csv .\Output.txt -NoTypeInformation -Delimiter ";" After a lot of trying and getting errors: My script works. But it takes a lot of time. Is there a possibility to make performance better for around 3 Million lines in txt file? I'm calculating with more or less 2.5 Million unique values for $Server. I'm running Windows 7 64bit with PowerShell 4.0.
try Something like this: #Import Data and create empty columns $List=import-csv "C:\temp\file.csv" -Delimiter ";" #get all properties name with value not empty $ListProperty=($List | where Value -ne '' | select property -Unique).Property #group by server $Groups=$List | group Server #loop every rows and store data by group and Property Name $List | %{ $Current=$_ #Take value not empty and group by Property Name $Group=($Groups | where Name -eq $Current.Server).Group | where Value -ne '' | group Property #Add all property and first value not empty $ListProperty | %{ $PropertyName=$_ $PropertyValue=($Group | where Name -eq $PropertyName | select -first 1).Group.Value $Current | Add-Member -Name $PropertyName -MemberType NoteProperty -Value $PropertyValue } $Current } | select * -ExcludeProperty Property, Value -unique | export-csv "c:\temp\result.csv" -notype -Delimiter ";"
Filter csv file by array of hashtables
What I am trying todo is filter a csv file (happens to be a weblog) with an array of hashtables (user information from a database). #$data is from a database. (about 500 items) Type: System.Data.DataTable $users = #() foreach($row in $data) { $userItem = #{ LoginId = $row[0] LastName = $row[3] FirstName = $row[4] LastAccess = $null } $users += $userItem } #Log files are about 14,000 lines long $logfiles = Get-ChildItem $logFolder -Recurse | where {$_.Extension -eq ".log"} | Sort-Object BaseName -Descending foreach($log in $logfiles) { $csvLog = Import-Csv $log.FullName -Header ("Blank","LoginId","Date") $u = $users | Select {&_.LoginId} $filteredcsvLog = $cvsLog | Where-Object { $u -contains $_.LoginId} #This returns null .... } This does not seem to work, what am I missing. My guess is that I need to flatten the array into [string[]], however I can't seem todo that either.
Rather than do an array of hashtables, I would do a hashtable of custom objects e.g.: $users = #{} foreach($row in $data) { $userItem = new-object psobject -property #{ LoginId = $row[0] LastName = $row[3] FirstName = $row[4] LastAccess = $null } $users[$userItem.LoginId] = $userItem } Then the filtering is easier and faster: foreach($log in $logfiles) { $csvLog = Import-Csv $log.FullName -Header ("Blank","LoginId","Date") $filteredcsvLog = $cvsLog | Where-Object { $users[$_.LoginId} } .... }