Powershell: Compare two arrays, find missing items and changed values

Powershell: Compare two arrays, find missing items and changed values - powershell

I have an issue I'm hoping someone might be able to provide some guidance on.
I have two csv files, File1 and File2.
File1:
value1,value2
A,10
B,30
C,45
D,39
File2:
value1,value2
A,10
B,32
C,44
E,7
F,3
What I am looking for is two things. I need to check if any items in value1 have been removed between the files, and/or I need to check if the corresponding number value2 has decreased.
First, to compare the files and find which lines have been added or removed in File1 vs File2. This is easy enough with Compare-Object if I compare only the value1 items.
So the result of my first compare, I'll see that from File1 to File2, I'll see that line E and F have been added, and line D has been removed. Perfect.
However, It's the next part I'm struggling with. I need to then compare value2 in each file, and determine if the number has decreased (or potentially increased or stayed the same).
The tricky part is I can't just compare line 4, value2 in File1 to line 4, value2 in File2, because one is for D and the other is for E. I don't know how to match the items first, then take essentially only the items that match and compare value2 for just those items? But what then happens there was no previous line to match? (because the line was newly added to file2 or the line was removed and only exists in file1)
In the end what I'm trying to come up with is a list of all the value1's that have been removed, and a list of all the value1's whose corresponding value2 has decreased since File1. I do not care about additions or increases.
Hope someone can provide some guidance.
Thank you!

To provide a single-pipeline alternative to Santiago's helpful answer:
It relies on Add-Member, Group-Object, and calculated properties via Select-Object:
(#'
value1,value2
A,10
B,30
C,45
D,39
'# | ConvertFrom-Csv | Add-Member -PassThru Source 1) +
(#'
value1,value2
A,10
B,32
C,44
E,7
F,3
'# | ConvertFrom-Csv | Add-Member -PassThru Source 2) |
Group-Object value1 | ForEach-Object {
if ($_.Count -eq 1) {
$_.Group[0] | Add-Member -PassThru Status #{ 1 = '<='; 2 = '=>' }[$_.Group[0].Source]
}
else {
$grp = $_
$change = ($_.Group[0].value2).CompareTo($_.Group[1].value2)
$_.Group[0] |
Select-Object value1,
#{ name = 'value2'; expression = { $grp.Group[0].value2, $grp.Group[1].value2 } },
#{ name = 'Status'; expression = { $change } }
}
}
The above yields the following:
value1 value2 Status
------ ------ ------
A {10, 10} 0
B {30, 32} -1
C {45, 44} 1
D 39 <=
E 7 =>
F 3 =>
<= and => have the same meaning as in the output of Compare-Object: <= indicates value1 values exclusive to the LHS, => those exclusive to the RHS.
-1 / 0 / 1 map onto an increase / equality / decrease in the value2 property value.

I decided to edit a bit the code, since I think you're doing some sort of data analysis, in my opinion I think the best way you could display the information is by merging both objects.
It may take a bit more processing time depending on how big the CSVs are but I think this is the cleaner approach of showing the data, specially if this is going to be exported to Excel, filtering the result object would be very easy.
Also, mklement0's answer is much more clever than mine. We could say this is maybe a more classic coding approach and he is using all the PowerShell toolset at his disposal. Big props to his answer too.
# Use Import-Csv here, this is just for testing
$csv1 = #'
value1,value2
A,10
B,30
C,45
D,39
'# | ConvertFrom-Csv
$csv2 = #'
value1,value2
A,10
B,32
C,44
E,7
F,3
'# | ConvertFrom-Csv
# Convert CSV1 to hashtable
$map = #{}
foreach($line in $csv1)
{
$map.Add($line.value1,$line.value2)
}
$result = [system.collections.generic.list[pscustomobject]]::new()
$map.Keys.ForEach({
if($_ -notin $csv2.value1)
{
$result.Add(
[pscustomobject]#{
value1 = $_
oldval2 = $map[$_]
newval2 = $null
status = 'REMOVED'
})
}
})
foreach($line in $csv2)
{
$out = [ordered]#{
value1 = $line.value1
newval2 = $line.value2
}
if(-not $map.ContainsKey($line.value1))
{
$out.oldval2 = $null
$out.status = 'ADDED'
$result.Add([pscustomobject]$out)
continue
}
$out.oldval2 = $map[$line.value1]
switch($line.value2)
{
{$_ -lt $map[$line.value1]}
{
$out.status = 'DECREASED'
continue
}
{$_ -gt $map[$line.value1]}
{
$out.status = 'INCREASED'
continue
}
Default
{
$out.status = 'EQUAL'
}
}
$result.Add([pscustomobject]$out)
}
Looking at $result:
PS /> $result
value1 oldval2 newval2 status
------ ------- ------- ------
D 39 REMOVED
A 10 10 EQUAL
B 30 32 INCREASED
C 45 44 DECREASED
E 7 ADDED
F 3 ADDED
If you wanted to see the values 'REMOVED' or 'DECREASED':
PS /> $result.where({$_.status -match 'REMOVED|DECREASED'})
value1 oldval2 newval2 status
------ ------- ------- ------
D 39 REMOVED
C 45 44 DECREASED

Related

re-arrange and combine powershell custom objects

I have a system that currently reads data from a CSV file produced by a separate system that is going to be replaced.
The imported CSV file looks like this
PS> Import-Csv .\SalesValues.csv
Sale Values AA BB
----------- -- --
10 6 5
5 3 4
3 1 9
To replace this process I hope to produce an object that looks identical to the CSV above, but I do not want to continue to use a CSV file.
I already have a script that reads data in from our database and extracts the data that I need to use. I'll not detail the fairly long script that preceeds this point but in effect it looks like this:
$SQLData = Custom-SQLFunction "SELECT * FROM SALES_DATA WHERE LIST_ID = $LISTID"
$SQLData will contain ~5000+ DataRow objects that I need to query.
One of those DataRow object looks something like this:
lead_id : 123456789
entry_date : 26/10/2018 16:51:16
modify_date : 01/11/2018 01:00:02
status : WRONG
user : mrexample
vendor_lead_code : TH1S15L0NGC0D3
source_id : A543212
list_id : 333004
list_name : AA Some Text
gmt_offset_now : 0.00
SaleValue : 10
list_name is going to be prefixed with AA or BB.
SaleValue can be any integer 3 and up, however realistically extremely unlikely to be higher than 100 (as this is a monthly donation) and will be one of 3,5,10 in the vast majority of occurrences.
I already have script that takes the content of list_name, creates and populates the data I need to use into two separate psobjects ($AASalesValues and $BBSalesValues) that collates the total numbers of 'SaleValue' across the data set.
Because I cannot reliably anticipate the value of any SaleValue I have to dynamically create the psobjects properties like this
foreach ($record in $SQLData) {
if ($record.list_name -match "BB") {
if ($record.SaleValue -gt 0) {
if ($BBSalesValues | Get-Member -Name $($record.SaleValue) -MemberType Properties) {
$BBSalesValues.$($record.SaleValue) = $BBSalesValues.$($record.SaleValue)+1
} else {
$BBSalesValues | Add-Member -Name $($record.SaleValue) -MemberType NoteProperty -Value 1
}
}
}
}
The two resultant objects look like this:
PS> $AASalesValues
10 5 3 50
-- - - --
17 14 3 1
PS> $BBSalesvalues
3 10 5 4
- -- - -
36 12 11 1
I now have the data that I need, however I need to format it in a way that replicates the format of the CSV so I can pass it directly to another existing powershell script that is configured to expect the data in the format that the CSV is in, but I do not want to write the data to a file.
I'd prefer to pass this directly to the next part of the script.
Ultimately what I want to do is to produce a new object/some output that looks like the output from Import-Csv command at the top of this post.
I'd like a new object, say $OverallSalesValues, to look like this:
PS>$overallSalesValues
Sale Values AA BB
50 1 0
10 17 12
5 14 11
4 0 1
3 3 36
In the above example the values from $AASalesValues is listed under the AA column, the values from $BBSalesValues is listed under the BB column, with the rows matching the headers of the two original objects.
I did try this with hashtables but I was unable to work out how to both create them from dynamic values and format them to how I needed them to look.

Finally got there.
$TotalList = #()
foreach($n in 3..200){
if($AASalesValues.$n -or $BBSalesValues.$n){
$AACount = $AASalesValues.$n
$BBcount = $BBSalesValues.$n
$values = [PSCustomObject]#{
'Sale Value'= $n
AA = $AACount
BB = $BBcount
}
$TotalList += $values
}
}
$TotalList
produces an output of
Sale Value AA BB
---------- -- --
3 3 36
4 2
5 14 11
10 18 12
50 1
Just need to add a bit to include '0' values instead of $null.

I'm going to assume that $record contains a list of the database results for either $AASalesValues or $BBSalesValues, not both, otherwise you'd need some kind of selector to avoid counting records of one group with the other group.
Group the records by their SaleValue property as LotPings suggested:
$BBSalesValues = $record | Group-Object SaleValue -NoElement
That will give you a list of the SaleValue values with their respective count.
PS> $BBSalesValues
Count Name
----- ----
36 3
12 10
11 5
1 4
You can then update your CSV data with these values like this:
$file = 'C:\path\to\data.csv'
# read CSV into a hashtable mapping the sale value to the complete record
# (so that we can lookup the record by sale value)
$csv = #{}
Import-Csv $file | ForEach-Object {
$csv[$_.'Sale Values'] = $_
}
# Add records for missing sale values
$($AASalesValues; $BBSalesValues) | Select-Object -Expand Name -Unique | ForEach-Object {
if (-not $csv.ContainsKey($_)) {
$csv[$_] = New-Object -Type PSObject -Property #{
'Sale Values' = $_
'AA' = 0
'BB' = 0
}
}
}
# update records with values from $AASalesValues
$AASalesValues | ForEach-Object {
[int]$csv[$_.Name].AA += $_.Count
}
# update records with values from $BBSalesValues
$BBSalesValues | ForEach-Object {
[int]$csv[$_.Name].BB += $_.Count
}
# write updated records back to file
$csv.Values | Export-Csv $file -NoType
Even with your updated question the approach would be pretty much the same, you'd just add another level of grouping for collecting the sales numbers:
$sales = #{}
$record | Group-Object {$_.list_name.Split()[0]} | ForEach-Object {
$sales[$_.Name] = $_.Group | Group-Object SaleValue -NoElement
}
and then adjust the merging to something like this:
$file = 'C:\path\to\data.csv'
# read CSV into a hashtable mapping the sale value to the complete record
# (so that we can lookup the record by sale value)
$csv = #{}
Import-Csv $file | ForEach-Object {
$csv[$_.'Sale Values'] = $_
}
# Add records for missing sale values
$sales.Values | Select-Object -Expand Name -Unique | ForEach-Object {
if (-not $csv.ContainsKey($_)) {
$prop = #{'Sale Values' = $_}
$sales.Keys | ForEach-Object {
$prop[$_] = 0
}
$csv[$_] = New-Object -Type PSObject -Property $prop
}
}
# update records with values from $sales
$sales.GetEnumerator() | ForEach-Object {
$name = $_.Key
$_.Value | ForEach-Object {
[int]$csv[$_.Name].$name += $_.Count
}
}
# write updated records back to file
$csv.Values | Export-Csv $file -NoType

Add up the data if the reference from another file is correct

I have two CSV Files which look like this:
test.csv:
"Col1","Col2"
"1111","1"
"1122","2"
"1111","3"
"1121","2"
"1121","2"
"1133","2"
"1133","2"
The second looks like this:
test2.csv:
"Number","signs"
"1111","ABC"
"1122","DEF"
"1111","ABC"
"1121","ABC"
"1133","GHI"
Now the goal is to get a summary of all points from test.csv assigned to the "signs" of test2.csv. Reference are the numbers, as you may see.
Should be something like this:
ABC = 8
DEF = 2
GHI = 4
I have tried to test this out but cannot get the goal. What I have so far is:
$var = "C:\PathToCSV"
$csv1 = Import-Csv "$var\test.csv"
$csv2 = Import-Csv "$var\test2.csv"
# Process: group by 'Item' then sum 'Average' for each group
# and create output objects on the fly
$test1 = $csv1 | Group-Object Col1 | ForEach-Object {
New-Object psobject -Property #{
Col1 = $_.Name
Sum = ($_.Group | Measure-Object Col2 -Sum).Sum
}
}
But this gives me back the following output:
Ps> $test1
Sum Col1
--- ----
4 1111
2 1122
4 1121
4 1133
I am not able to get the summary and the mapping of the signs.

Not sure if I understand your question correctly, but I'm going to assume that for each value from the column "signs" you want to lookup the values from the column "Number" in the second CSV and then calculate the sum of the column "Col2" for all matches.
For that I'd build a hashtable with the pre-calculated sums for the unique values from "Col1":
$h1 = #{}
$csv1 | ForEach-Object {
$h1[$_.Col1] += [int]$_.Col2
}
and then build a second hashtable to sum up the lookup results for the values from the second CSV:
$h2 = #{}
$csv2 | ForEach-Object {
$h2[$_.signs] += $h1[$_.Number]
}
However, that produced a different value for "ABC" than what you stated as the desired result in your question when I processed your sample data:
Name Value
---- -----
ABC 12
GHI 4
DEF 2
Or did you mean you want to sum up the corresponding values for the unique numbers for each sign? For that you'd change the second code snippet to something like this:
$h2 = #{}
$csv2 | Group-Object signs | ForEach-Object {
$name = $_.Name
$_.Group | Select-Object -Unique -Expand Number | ForEach-Object {
$h2[$name] += $h1[$_]
}
}
That would produce the desired result from your question:
Name Value
---- -----
ABC 8
GHI 4
DEF 2

powershell compare two files and list their columns with side indicator as match/mismatch

I have seen powershell script which also I have in mind. What I would like to add though is another column which would show the side indicator comparators ("==", "<=", "=>") and be named them as MATCH(if "==") and MISMATCH(if "<=" and "=>").
Any advise on how I would do this?
Here is the link of the script (Credits to Florent Courtay)
How can i reorganise powershell's compare-object output?
$a = Compare-Object (Import-Csv 'C:\temp\f1.csv') (Import-Csv 'C:\temp\f2.csv') -property Header,Value
$a | Group-Object -Property Header | % { New-Object -TypeName psobject -Property #{Header=$_.name;newValue=$_.group[0].Value;oldValue=$_.group[1].Value}}
========================================================================
The output I have in mind:
Header1 Old Value New Value STATUS
------ --------- --------- -----------
String1 Value 1 Value 2 MATCH
String2 Value 3 Value 4 MATCH
String3 NA Value 5 MISMATCH
String4 Value 6 NA MISMATCH

Here's a self-contained solution; simply replace the ConvertFrom-Csv calls with your Import-Csv calls:
# Sample CSV input.
$csv1 = #'
Header,Value
a,1
b,2
c,3
'#
$csv2 = #'
Header,Value
a,1a
b,2
d,4
'#
Compare-Object (ConvertFrom-Csv $csv1) (ConvertFrom-Csv $csv2) -Property Header, Value |
Group-Object Header | Sort-Object Name | ForEach-Object {
$newValIndex, $oldValIndex = ((1, 0), (0, 1))[$_.Group[0].SideIndicator -eq '=>']
[pscustomobject] #{
Header = $_.Name
OldValue = ('NA', $_.Group[$oldValIndex].Value)[$null -ne $_.Group[$oldValIndex].Value]
NewValue = ('NA', $_.Group[$newValIndex].Value)[$null -ne $_.Group[$newValIndex].Value]
Status = ('MISMATCH', 'MATCH')[$_.Group.Count -gt 1]
}
}
The above yields:
Header OldValue NewValue Status
------ -------- -------- ------
a 1 1a MATCH
c 3 NA MISMATCH
d NA 4 MISMATCH
Note:
The assumption is that a given Header column value appears at most once in each input file.
The Sort-Object Name call is needed to sort the output by Header valuesThanks, LotPings.
, because, due to how Compare-Object orders its output (right-side-only items first), the order of groups created by Group-Object would not automatically reflect the 1st CSV's order of header values (d would appear before c).

Import-CSV does not preserve line indents

I am using Import-CSV to get the data from a csv file that looks like:
P1,1,3,4
P2,4,5,6
P3,1,2,3
P4,8.7,6,3
I would like to keep the white-space in front of the text as it indicates the hierarchy. Import-CSV returns:
P1,1,3,4
P2,4,5,6
P3,1,2,3
P4,8.7,6,3
Is there a way to keep the white space?

Your CSV isn't correctly formatted, the items in each row should all be quoted to meet the file specification:
"P1,"1","3","4"
" P2,"4","5","6"
" P3,"1","2","3"
" P4,"8.7","6","3"
You can take a shortcut and only wrap the entries with leading spaces in quotes:
P1,1,3,4
" P2",4,5,6
" P3",1,2,3
" P4",8.7,6,3
Then Import-CSV will function as you're expecting, headers added for demonstration:
Import-CSV leading_spaces.csv -Header "Field1","Field2","Field3","Field4"
Gives you your desired output:
Field1 Field2 Field3 Field4
------ ------ ------ ------
P1 1 3 4
P2 4 5 6
P3 1 2 3
P4 8.7 6 3

As per James C's comment, you can do this with Get-Content:
$myData = Get-Content .\test2.txt
foreach($line in ($myData | Select-Object -Skip 1)){
[array]$results += [pscustomobject]#{
$myData[0].Split(",")[0] = $line.Split(",")[0]
$myData[0].Split(",")[1] = $line.Split(",")[1]
$myData[0].Split(",")[2] = $line.Split(",")[2]
$myData[0].Split(",")[3] = $line.Split(",")[3]
}
}

Maybe this will help, it will create a new object for each row:
get-content test.csv | % {
$row = New-Object PSObject
$i = 0
$_ -split "," | %{
$row | add-member Noteproperty "column$i" $_
$i++
}
$row
}
Output will look like this:
column0 column1 column2 column3
------- ------- ------- -------
P1 1 3 4
P2 4 5 6
P3 1 2 3
P4 8.7 6 3

compare columns in two csv files

With all of the examples out there you would think I could have found my solution. :-)
Anyway, I have two csv files; one with two columns, one with 4. I need to compare one column from each one using powershell. I thought I had it figured out but when I did a compare of my results, it comes back as false when I know it should be true. Here's what I have so far:
$newemp = Import-Csv -Path "C:\Temp\newemp.csv" -Header login_id, lastname, firstname, other | Select-Object "login_id"
$ps = Import-Csv -Path "C:\Temp\Emplid_LoginID.csv" | Select-Object "login id"
If ($newemp -eq $ps)
{
write-host "IDs match" -forgroundcolor green
}
Else
{
write-host "Not all IDs match" -backgroundcolor yellow -foregroundcolor black
}
I had to specifiy headers for the first file because it doesn't have any. What's weird is that I can call each variable to see what it holds and they end up with the same info but for some reason still comes up as false. This occurs even if there is only one row (not counting the header row).
I started to parse them as arrays but wasn't quite sure that was the right thing. What's important is that I compare row1 of the first file with with row1 of the second file. I can't just do a simple -match or -contains.
EDIT: One annoying thing is that the variables seem to hold the header row as well. When I call each one, the header is shown. But if I call both variables, I only see one header but two rows.
I just added the following check but getting the same results (False for everything):
$results = Compare-Object -ReferenceObject $newemp -DifferenceObject $ps -PassThru | ForEach-Object { $_.InputObject }

Using latkin's answer from here I think this would give you the result set you're looking for. As per latkin's comment, the property comparison is redundant for your purposes but I left it in as it's good to know. Additionally the header is specified even for the csv with headers to prevent the header row being included in the comparison.
$newemp = Import-Csv -Path "C:\Temp\_sotemp\Book1.csv" -Header loginid |
Select-Object "loginid"
$ps = Import-Csv -Path "C:\Temp\_sotemp\Book2.csv" -Header loginid |
Select-Object "loginid"
#get list of (imported) CSV properties
$props1 = $newemp | gm -MemberType NoteProperty | select -expand Name | sort
$props2 = $ps | gm -MemberType NoteProperty | select -expand Name | sort
#first check that properties match
#omit this step if you know for sure they will be
if(Compare-Object $props1 $props2){
throw "Properties are not the same! [$props1] [$props2]"
}
#pass properties list to Compare-Object
else{
Compare-Object $newemp $ps -Property $props1
}

In the second line, I see there a space "login id" and the first line doesn't have it. Could that be an issue. Try having the same name for the headers in the .csv files itself. And it works for without providing header or select statements. Below is my experiment based upon your input.
emp.csv
loginid firstname lastname
------------------------------
abc123 John patel
zxy321 Kohn smith
sdf120 Maun scott
tiy123 Dham rye
k2340 Naam mason
lk10j5 Shaan kelso
303sk Doug smith
empids.csv
loginid
-------
abc123
zxy321
sdf120
tiy123
PS C:\>$newemp = Import-csv C:\scripts\emp.csv
PS C:\>$ps = Import-CSV C:\scripts\empids.csv
PS C:\>$results = Compare-Object -ReferenceObject $newemp -DifferenceObject $ps | foreach { $_.InputObject}
Shows the difference objects that are not in $ps
loginid firstname lastname SideIndicator
------- --------- -------- -------------
k2340 Naam mason <=
lk10j5 Shaan kelso <=
303sk Doug smith <=

I am not sure if this is what you are looking for but i have used the PowerShell to do some CSV formatting for myself.
$test = Import-Csv .\Desktop\Vmtools-compare.csv
foreach ($i in $test) {
foreach ($n in $i.name) {
foreach ($m in $test) {
$check = "yes"
if ($n -eq $m.prod) {
$check = "no"
break
}
}
if ($check -ne "no") {$n}
}
}
this is how my excel csv file looks like:
prod name
1 3
2 5
3 8
4 2
5 0
and script outputs this:
8
0
so basically script takes each number under Name column and then checks it against prod column. If the number is there then it won't display else it will display that number.
I have also done it the opposite way:
$test = Import-Csv c:\test.csv
foreach ($i in $test) {
foreach ($n in $i.name) {
foreach ($m in $test) {
$check = "yes"
if ($n -eq $m.prod) {echo $n}
}
}
}
this is how my excel csv looks like:
prod name
1 3
2 5
3 8
4 2
5 0
and script outputs this:
3
5
2
so script shows the matching entries only.
You can play around with the code to look at different columns.