Compare .csv and add only rows with updated values - powershell

I have 2 .csv files, they have matching columns, I am trying to compare the two and have a final output .csv that only has the differences
corpold.csv is a previous imported file.
corpnew.csv is the new import file.
I need to export a CSV that includes all items that are not in corpold.csv, only changed items that exist in both CSVs and exclude any rows that exist in corpold.csv but not in corpnew.csv.
$reference = Import-Csv -Path D:\corpold.csv
$lookup = $reference | Group-Object -AsHashTable -AsString -Property EMPID
$results = Import-Csv -Path D:\corpnew.csv | foreach {
$email = $_.EMAIL_ADDRESS
$status = $_.ACTIVE
$fs = $_.FIRST_NAME
$ls = $_.LAST_NAME
$id = $_.EMPID
$title = $_.JOB_TITLE
$code = $_.JOB_CODE
$type = $_.USER_TYPE
$designee = $_.DESIGNEE
$stores = $_.STORES
$hiredate = $_.HIRE_DATE
$dept = $_.DEPARTMENT
$grp = $_.GROUP
if ($lookup.ContainsKey($id)) {
# if exists in yesterdays file
# trying to figure out how to compare and only provide results into
# the Export-Csv that have changed while excluding any items in
# corpold that do not exist in corpnew
} else {
# if it does not exist update all fields
[PSCustomObject]#{
ACTIVE = $status
EMAIL_ADDRESS = $email
FIRST_NAME = $fs
LAST_NAME = $ls
EMPID = $id
JOB_TITLE = $title
JOB_CODE = $code
USER_TYPE = $type
DESIGNEE = $designee
STORES = $stores
HIRE_DATE = $hiredate
DEPARTMENT = $dept
GROUP = $grp
}
}
}
# Sample outputs
$results
$results | Export-Csv -Path D:\delta.csv -NoTypeInformation

There are two operations to be done here: find differences and compare objects which exist in both files.
Compare objects and find new/deleted entries
To compare object you can utilize Compare-Object cmdlet like this:
Compare-Object -ReferenceObject $reference -DifferenceObject $results -Property EMPID -IncludeEqual
This will give you the list of EMPID and SideIndicator showing whether object exist only in first (<=), only in second (=>) or in both (==). You can filter by SideIndicatorand then process it accordingly.
Alternative way is to use Where-Object like this:
$reference | Where-Object empid -NotIn $results.empid
$reference | Where-Object empid -In $results.empid
$results | Where-Object empid -NotIn $reference.empid
First one will give you entries only in first file, second one results existing in both, last one will give you entries only existing in second file.
Find edited entries
What you basically have to do is to iterate all the entries and then check if any of the columns has been changed. If yes, add it to $changedEntries.
Example of script:
$IDsInBoth = $results | Where-Object empid -In $reference.empid | Select-Object -ExpandProperty EMPID
$AllProperties = $results | Get-Member | Where-Object MemberType -eq "NoteProperty" | Select-Object -ExpandProperty Name
$changedEntries = #()
$IDsInBoth | ForEach-Object {
$changed = $false
$newEntry = $results | Where-Object EMPID -eq $_
$oldEntry = $reference | Where-Object EMPID -eq $_
foreach ($p in $AllProperties) {
if ($oldEntry."$p" -ne $newEntry."$p") {
$changed = $true
}
}
if ($changed) {
$changedEntries += $newEntry
}
}

Related

Compare Two CSVs, match the columns on 2 or more Columns, export specific columns from both csvs with powershell

i Have 2 CSV's
left.csv
Ref_ID,First_Name,Last_Name,DOB
321364060,User1,Micah,11/01/1969
946497594,User2,Acker,05/28/1960
887327716,User3,Aco,06/26/1950
588496260,User4,John,05/23/1960
565465465,User5,Jack,07/08/2020
right.csv
First_Name,Last_Name,DOB,City,Document_Type,Filename
User1,Micah,11/01/1969,Parker,Transcript,T4IJZSYO.pdf
User2,Acker,05/28/1960,,Transcript,R4IKTRYN.pdf
User3,Aco,06/26/1950,,Transcript,R4IKTHMK.pdf
User4,John,05/23/1960,,Letter,R4IKTHSL.pdf
End Results:
Combined.csv
Ref_ID,First_Name,Last_Name,DOB,Document_Type,Filename
321364060,User1,Micah,11/01/1969,Parker,Transcript,T4IJZSYO.pdf
946497594,User2,Acker,05/28/1960,Transcript,R4IKTRYN.pdf
887327716,User3,Aco,06/26/1950,Transcript,R4IKTHMK.pdf
588496260,User4,John,05/23/1960,Letter,R4IKTHSL.pdf
I need to match them on First_Name,Last_Name,DOB then return Ref_ID, first_name, last_name, DOB from the left.csv and Document_Type,Filename from the right.csv
Use Compare-Object: that only returns columns from one of the csvs, not columns from both.
Use join-object: This was my great hope, but that only lets me match on one Column, I need to match multiple Columns (cant figure out how to do Multiple)
Im not sure where to go from here, open to suggestions.
$left = Import-Csv C:\left.csv
$right = Import-Csv C:\right.csv
Compare-Object -ReferenceObject $left -DifferenceObject $right -Property First_Name,Last_Name,DOB -IncludeEqual -ExcludeDifferent |
ForEach-Object {
$iItem = $_
$ileft = $left.Where({$_.First_Name -eq $iItem.First_Name -and $_.Last_Name -eq $iItem.Last_Name -and$_.DOB -eq $iItem.DOB})
$iright = $right.Where({$_.First_Name -eq $iItem.First_Name -and $_.Last_Name -eq $iItem.Last_Name -and$_.DOB -eq $iItem.DOB})
[pscustomobject]#{
Ref_ID=$ileft.Ref_ID
first_name=$ileft.first_name
last_name=$ileft.last_name
DOB=$ileft.DOB
Document_Type=$iright.Document_Type
Filename=$iright.Filename
}
} | Export-Csv C:\Combined.csv -NoTypeInformation
You could create you own key from each csv, then add from each csv to a new hashtable using this key.
Step through this in a debugger (ISE or VSCode) and tailor it to what you need...
Add appropriate error checking as you need depending on the sanity of your data.
Some statements below are just for debugging so you can inspect what's happening as it runs.
# Ref_ID,First_Name,Last_Name,DOB
$csv1 = #'
321364060,User1,Micah,11/01/1969
946497594,User2,Acker,05/28/1960
887327716,User3,Aco,06/26/1950
588496260,User4,John,05/23/1960
565465465,User5,Jack,07/08/2020
'#
# First_Name,Last_Name,DOB,City,Document_Type,Filename
$csv2 = #'
User1,Micah,11/01/1969,Parker,Transcript,T4IJZSYO.pdf
User2,Acker,05/28/1960,,Transcript,R4IKTRYN.pdf
User3,Aco,06/26/1950,,Transcript,R4IKTHMK.pdf
User4,John,05/23/1960,,Letter,R4IKTHSL.pdf
'#
# hashtable
$data = #{}
$c1 = $csv1 -split "`r`n"
$c1.count
foreach ($item in $c1)
{
$fields = $item -split ','
$key = $fields[1]+$fields[2]+$fields[3]
$key
# add new hashtable for given key
$data.Add($key, [ordered]#{})
# add data from c1 to the hashtable
$data[$key].ID = $fields[0]
$data[$key].First = $fields[1]
$data[$key].Last = $fields[2]
$data[$key].DOB = $fields[3]
}
$c2 = $csv2 -split "`r`n"
$c2.count
foreach ($item in $c2)
{
$fields = $item -split ','
$key = $fields[0]+$fields[1]+$fields[2]
$key
# add data from c2 to the hashtable
$data[$key].Type = $fields[4]
$data[$key].FileName = $fields[5]
}
$data.Count
foreach ($key in $data.Keys)
{
'====================='
$data[$key]
}
Some good answers already, and here's another.
Import your myriad objects into a single (dis)array:
$left = #"
Ref_ID,First_Name,Last_Name,DOB
321364060,User1,Micah,11/01/1969
946497594,User2,Acker,05/28/1960
887327716,User3,Aco,06/26/1950
588496260,User4,John,05/23/1960
565465465,User5,Jack,07/08/2020
"#
$right = #"
First_Name,Last_Name,DOB,City,Document_Type,Filename
User1,Micah,11/01/1969,Parker,Transcript,T4IJZSYO.pdf
User2,Acker,05/28/1960,,Transcript,R4IKTRYN.pdf
User3,Aco,06/26/1950,,Transcript,R4IKTHMK.pdf
User4,John,05/23/1960,,Letter,R4IKTHSL.pdf
"#
$disarray = #(
$left | ConvertFrom-Csv
$right | ConvertFrom-Csv
)
Use Group-Object to organize them into groups having identical key values:
$keyProps = #('First_Name', 'Last_name', 'DOB')
$disarray |
Group-Object -Property $keyProps |
Where-Object Count -gt 1 |
Then merge the objects, adding any missing properties to the output $mergedObject
ForEach-Object {
$mergedObject = $_.group[0]
foreach ($obj in $_.group[1..($_.group.count-1)]) {
$newProps = ($obj | Get-Member -MemberType NoteProperty).name |
Where-Object {
$_ -notin ($mergedobject | Get-Member -MemberType NoteProperty).name
}
foreach ($propName in $newProps) {
$mergedObject | Add-Member -MemberType NoteProperty -Name $propName -Value $obj.$propName -Force
}
}
Write-Output $mergedObject
}
This doesn't differ wildly from the answers you already have, but eliminating the "left" "right" distinction might be helpful; The above code should handle three or more sources thrown into $disarray, merging all objects containing identical $keyProps.
Note that there are corner cases to consider. For instance, what happens if one object has 'City=Chigago' for a user and another has 'City=New York'?
Try this Join-Object.
It has a few more features along with joining based on multiple columns:
$Left = ConvertFrom-Csv #"
Ref_ID,First_Name,Last_Name,DOB
321364060,User1,Micah,11/01/1969
946497594,User2,Acker,05/28/1960
887327716,User3,Aco,06/26/1950
588496260,User4,John,05/23/1960
565465465,User5,Jack,07/08/2020
"#
$Right = ConvertFrom-Csv #"
First_Name,Last_Name,DOB,City,Document_Type,Filename
User1,Micah,11/01/1969,Parker,Transcript,T4IJZSYO.pdf
User2,Acker,05/28/1960,,Transcript,R4IKTRYN.pdf
User3,Aco,06/26/1950,,Transcript,R4IKTHMK.pdf
User4,John,05/23/1960,,Letter,R4IKTHSL.pdf
"#
$Left | Join $Right `
-On First_Name, Last_Name, DOB `
-Property Ref_ID, Filename, First_Name, DOB, Last_Name `
| Format-Table
Last_Name Ref_ID DOB Filename First_Name
--------- ------ --- -------- ----------
Micah 321364060 1969-11-01 12:00:00 AM T4IJZSYO.pdf User1
Acker 946497594 1960-05-28 12:00:00 AM R4IKTRYN.pdf User2
Aco 887327716 1950-06-26 12:00:00 AM R4IKTHMK.pdf User3
John 588496260 1960-05-23 12:00:00 AM R4IKTHSL.pdf User4
adding answer i found:
$left = Import-Csv .\left.csv
$right = Import-Csv .\right.csv
$right | foreach {
$r = $_;
$left | where{ $_.First_Name -eq $r.First_Name -and $_.Last_Name -eq $r.Last_Name -and $_.DOB -eq $r.DOB } |
select Ref_Id,
First_Name,
Last_Name,
DOB,
#{Name="City";Expression={$r.City}},
#{Name="Document_Type";Expression={$r.Document_Type}},
#{Name="FileName";Expression={$r.FileName}}
} | format-table

Converting CSV Column to Rows

I'm trying to create a PowerShell script that transpose a CSV file from column to rows.
I found examples of doing the opposite (converting row based CSV to column) but I found nothing on column to rows. My problem being that I don't know exactly how many column I'll have. I tried adapting the row to column to column to rows but unsuccessfully.
$a = Import-Csv "input.csv"
$a | FT -AutoSize
$b = #()
foreach ($Property in $a.Property | Select -Unique) {
$Props = [ordered]#{ Property = $Property }
foreach ($Server in $a.Server | Select -Unique){
$Value = ($a.where({ $_.Server -eq $Server -and
$_.Property -eq $Property })).Value
$Props += #{ $Server = $Value }
}
$b += New-Object -TypeName PSObject -Property $Props
}
$b | FT -AutoSize
$b | Out-GridView
$b | Export-Csv "output.csv" -NoTypeInformation
For example my CSV can look like this:
"ID","DATA1"
"12345","11111"
"54321","11111"
"23456","44444"
or this (number of column can vary):
"ID","DATA1","DATA2","DATA3"
"12345","11111","22222","33333"
"54321","11111",,
"23456","44444","55555",
and I would like the script to convert it like this:
"ID","DATA"
"12345","11111"
"12345","22222"
"12345","33333"
"54321","11111"
"23456","44444"
"23456","55555"
The trick is to query the members of the table to get the column names. Once you do that then the rest is straightforward:
function Flip-Table ($Table) {
Process {
$Row = $_
# Get all the columns names, excluding the ID field.
$Columns = ($Row | Get-Member -Type NoteProperty | Where-Object Name -ne ID).Name
foreach ($Column in $Columns) {
if ($Row.$Column) {
$Properties = [Ordered] #{
"ID" = $Row.ID
"DATA" = $Row.$Column
}
New-Object PSObject -Property $Properties
}
}
# Garbage collection won't kick in until the end of the script, so
# invoke it every 100 input rows.
$Count++;
if (($Count % 100) -eq 0) {
[System.GC]::GetTotalMemory('forceFullCollection') | out-null
}
}
}
Import-Csv input.csv | Flip-Table | Export-Csv -NoTypeInformation output.csv
Well, here is mine. I'm not as fancy as the rest:
$in = Get-Content input.csv | Select -Skip 1
$out = New-Object System.Collections.ArrayList
foreach($row in $in){
$parts = $row.Split(',')
$id = $parts[0]
foreach($data in $parts[1..$parts.Count]){
if($data -ne '' -AND $data -ne $null){
$temp = New-Object PSCustomObject -Property #{'ID' = $id;
'Data' = $data}
$out.Add($temp) | Out-Null
}
}
}
$out | Export-CSV output.csv -NoTypeInformation
You can do something like this
# Convert csv to object
$csv = ConvertFrom-Csv #"
"ID","DATA1","DATA2","DATA3"
"12345","11111","22222","33333"
"54321","11111",,
"23456","44444","55555"
"#
# Ignore common members and the ID property
$excludedMembers = #(
'GetHashCode',
'GetType',
'ToString',
'Equals',
'ID'
)
$results = #()
# Iterate around each csv row
foreach ($row in $csv) {
$members = $row | Get-Member
# Iterate around each member from the 'row object' apart from our
# exclusions and empty values
foreach ($member in $members |
Where { $excludedMembers -notcontains $_.Name -and $row.($_.Name)}) {
# add to array of objects
$results += #{ ID=$row.ID; DATA=$row.($member.Name)}
}
}
# Write the csv string
$outstring = "ID,DATA"
$results | foreach { $outstring += "`n$($_.ID),$($_.DATA)" }
# New csv object
$csv = $outstring | ConvertFrom-Csv
Probably not the most elegant solution, but should do what you need
I left some comments explaining what it does
If you only want to accept a limited number DATA columns (e.g. 5), you could do:
ForEach ($i in 1..5) {$CSV | ? {$_."Data$i"} | Select ID, #{N='Data'; E={$_."Data$i"}}}
And if you have a potential unlimited number of DATA columns:
ForEach ($Data in ($CSV | Select "Data*" -First 1).PSObject.Properties.Name) {
$CSV | ? {$_.$Data} | Select ID, #{N='Data'; E={$_.$Data}}
}

Transpose CSV Row to Column based on Column Value

I am trying to convert a 80K row data csv with 30 columns to sorted and filtered CSV based on specific column data from orignal CSV.
For Example My Data is in below format:
PatchName MachineName IPAddress DefaultIPGateway Domain Name USERID UNKNOWN NOTAPPLICABLE INSTALLED APPLICABLE REBOOTREQUIRED FAILED
KB456982 XXX1002 xx.yy.65.148 xx.yy.64.1 XYZ.NET XYZ\ayzuser YES
KB589631 XXX1003 xx.yy.65.176 xx.yy.64.1 XYZ.NET XYZ\cdfuser YES
KB456982 ABC1004 xx.zz.83.56 xx.zz.83.1 XYZ.NET XYZ\mnguser YES
KB456982 8797XCV xx.yy.143.187 xx.yy.143.184 XYZ.NET WPX\abcuser YES
Here MachineName would be filtered to Uniq and PatchName would transpose to Last Columns headers with holding "UNKNOWN, NOAPPLICABLE, INSTALLED, FAILED, REBOOTREQUIRED columns Values if YES occurred -
Expected Result:
MachineName IPAddress DefaultIPGateway Domain Name USERID KB456982 KB589631
XXX1002 xx.yy.65.148 xx.yy.64.1 XYZ.NET XYZ\ayzuser UNKNOWN
XXX1003 xx.yy.65.176 xx.yy.64.1 XYZ.NET XYZ\cdfuser NOTAPPLICATBLE
ABC1004 xx.zz.83.56 xx.zz.83.1 XYZ.NET XYZ\mnguser UNKNOWN
8797XCV xx.yy.143.187 xx.yy.143.184 XYZ.NET WPX\abcuser FAILED
Looking for help to achieve this, so far I am able to transpose PathcName rows to columns but not able to include all the columns along with and apply the condition. [It takes 40 Minutes to process this]
$b = #()
foreach ($Property in $a.MachineName | Select -Unique) {
$Props = [ordered]#{ MachineName = $Property }
foreach ($Server in $a.PatchName | Select -Unique){
$Value = ($a.where({ $_.PatchName -eq $Server -and $_.MachineName -eq $Property })).NOTAPPLICABALE
$Props += #{ $Server = $Value }
}
$b += New-Object -TypeName PSObject -Property $Props
}
This is what I came up with:
$data = Import-Csv -LiteralPath 'C:\path\to\data.csv'
$lookup = #{}
$allPatches = $data.PatchName | Select-Object -Unique
# Make 1 lookup entry for each computer, to keep the username and IP and so on.
# Add the patch details from the current row (might add more than one patch per computer)
foreach ($row in $data)
{
if (-not $lookup.ContainsKey($row.MachineName))
{
$lookup[$row.MachineName] = ($row | Select-Object -Property MachineName, IPAddress, DefaultIPGateway, DomainName, UserID)
}
$patchStatus = $row.psobject.properties |
Where-Object {
$_.name -in #('applicable', 'notapplicable', 'installed', 'rebootrequired', 'failed', 'unknown') -and
-not [string]::IsNullOrWhiteSpace($_.value)
} |
Select-Object -ExpandProperty Name
$lookup[$row.MachineName] | Add-Member -NotePropertyName $row.PatchName -NotePropertyValue $patchStatus
}
# Pull the computer details out of the lookup, and add all the remaining patches
# so they will convert to CSV properly, then export to CSV
$lookup.Values | ForEach-Object {
$computer = $_
foreach ($patch in $allPatches | where-object {$_ -notin $computer.psobject.properties.name})
{
$computer | Add-Member -NotePropertyName $patch -NotePropertyValue ''
}
$computer
} | Export-Csv -LiteralPath 'c:\path\to\output.csv' -NoTypeInformation

Reading txt-file, change rows to columns, save txt file

I have a txt files (semicolon separated) containing over 3 million records where columns 1 to 4 have some general information. Columns 5 and 6 have detailed information. There can be up to 4 different detailed information for the same general information in columns 1 to 4.
My sample input:
Server;Owner;Company;Username;Property;Value
Srv1;Dave;Sandbox;kwus91;Memory;4GB
Srv1;Dave;Sandbox;kwus91;Processes;135
Srv1;Dave;Sandbox;kwus91;Storage;120GB
Srv1;Dave;Sandbox;kwus91;Variant;16
Srv2;Pete;GWZ;aiwq71;Memory;8GB
Srv2;Pete;GWZ;aiwq71;Processes;234
Srv3;Micael;P12;mxuq01;Memory;16GB
Srv3;Micael;P12;mxuq01;Processes;239
Srv3;Micael;P12;mxuq01;Storage;160GB
Srv4;Stefan;MTC;spq61ep;Storage;120GB
Desired output:
Server;Owner;Company;Username;Memory;Processes;Storage;Variant
Srv1;Dave;Sandbox;kwus91;4GB;135;120GB;16
Srv2;Pete;GWZ;aiwq71;8GB;234;;
Srv3;Micael;P12;mxuq01;16GB;239;160GB;
Srv4;Stefan;MTC;spq61ep;;;120GB;
If a values doesn't exist for general information (Columns 1-4) it has to stay blank.
My current code:
$a = Import-csv .\Input.txt -Delimiter ";"
$a | FT -AutoSize
$b = #()
foreach ($Server in $a.Server | Select -Unique) {
$Props = [ordered]#{ Server = $Server }
$Owner = ($a.where({ $_.Server -eq $Server})).Owner | Select -Unique
$Company = ($a.where({ $_.Server -eq $Server})).Company | Select -Unique
$Username = ($a.where({ $_.Server -eq $Server})).Username | Select -Unique
$Props += #{Owner = $Owner}
$Props += #{Company = $Company}
$Props += #{Username = $Username}
foreach ($Property in $a.Property | Select -Unique){
$Value = ($a.where({ $_.Server -eq $Server -and
$_.Property -eq $Property})).Value
$Props += #{ $Property = $Value }
}
$b += New-Object -TypeName PSObject -Property $Props
}
$b | FT -AutoSize
$b | Export-Csv .\Output.txt -NoTypeInformation -Delimiter ";"
After a lot of trying and getting errors: My script works.
But it takes a lot of time.
Is there a possibility to make performance better for around 3 Million lines in txt file? I'm calculating with more or less 2.5 Million unique values for $Server.
I'm running Windows 7 64bit with PowerShell 4.0.
try Something like this:
#Import Data and create empty columns
$List=import-csv "C:\temp\file.csv" -Delimiter ";"
#get all properties name with value not empty
$ListProperty=($List | where Value -ne '' | select property -Unique).Property
#group by server
$Groups=$List | group Server
#loop every rows and store data by group and Property Name
$List | %{
$Current=$_
#Take value not empty and group by Property Name
$Group=($Groups | where Name -eq $Current.Server).Group | where Value -ne '' | group Property
#Add all property and first value not empty
$ListProperty | %{
$PropertyName=$_
$PropertyValue=($Group | where Name -eq $PropertyName | select -first 1).Group.Value
$Current | Add-Member -Name $PropertyName -MemberType NoteProperty -Value $PropertyValue
}
$Current
} | select * -ExcludeProperty Property, Value -unique | export-csv "c:\temp\result.csv" -notype -Delimiter ";"

Filter csv file by array of hashtables

What I am trying todo is filter a csv file (happens to be a weblog) with an array of hashtables (user information from a database).
#$data is from a database. (about 500 items) Type: System.Data.DataTable
$users = #()
foreach($row in $data)
{
$userItem = #{
LoginId = $row[0]
LastName = $row[3]
FirstName = $row[4]
LastAccess = $null
}
$users += $userItem
}
#Log files are about 14,000 lines long
$logfiles = Get-ChildItem $logFolder -Recurse | where {$_.Extension -eq ".log"} | Sort-Object BaseName -Descending
foreach($log in $logfiles)
{
$csvLog = Import-Csv $log.FullName -Header ("Blank","LoginId","Date")
$u = $users | Select {&_.LoginId}
$filteredcsvLog = $cvsLog | Where-Object { $u -contains $_.LoginId}
#This returns null
....
}
This does not seem to work, what am I missing. My guess is that I need to flatten the array into [string[]], however I can't seem todo that either.
Rather than do an array of hashtables, I would do a hashtable of custom objects e.g.:
$users = #{}
foreach($row in $data)
{
$userItem = new-object psobject -property #{
LoginId = $row[0]
LastName = $row[3]
FirstName = $row[4]
LastAccess = $null
}
$users[$userItem.LoginId] = $userItem
}
Then the filtering is easier and faster:
foreach($log in $logfiles)
{
$csvLog = Import-Csv $log.FullName -Header ("Blank","LoginId","Date")
$filteredcsvLog = $cvsLog | Where-Object { $users[$_.LoginId} }
....
}