comparision of 2 csv files using powershell

comparision of 2 csv files using powershell - powershell

The format of two files is same and as follows:
ServiceName Status computer State
AdobeARMservice OK NEE Running
Amazon Assistan OK NEE Running
the requirement is, i have to check the service name and computer name..if both are same, then i have to check whether the state of particular service is same in both the files or not. And if it is not same then display it..
$preser = import-csv C:\info.csv
$postser = import-csv C:\serviceinfo.csv
foreach($ser1 in $preser)
{
foreach($ser2 in $postser)
{
if(($ser1.computer -eq $ser2.computer) -and ($ser1.ServiceName -eq $ser2.ServiceName))
{
if($ser1.State -eq $ser2.State)
{
}
else
{
write-host $ser1,$ser2
}
}
}
}
This code is working fine but as the files length is very large, the time of execution is more.
Is there any alternative method to reduce the time of execution..?
Thank you

Although Import-Csv on very large files will take its time, maybe this will be faster:
$preser = Import-Csv -Path 'C:\info.csv'
$postser = Import-Csv -Path 'C:\serviceinfo.csv'
# build a lookup Hashtable for $preser
$hash = #{}
foreach ($item in $preser) {
# combine the ServiceName and Computer to form the hash key
$key = '{0}#{1}' -f $item.ServiceName, $item.computer
$hash[$key] = $item
}
# now loop through the items in $postser
foreach ($item in $postser) {
$key = '{0}#{1}' -f $item.ServiceName, $item.computer
if ($hash.ContainsKey($key)) {
if ($hash[$key].State -ne $item.State) {
# create a new object for output
$out = $hash[$key] | Select-Object * -ExcludeProperty State
$out | Add-Member -MemberType NoteProperty -Name 'State in Preser' -Value $hash[$key].State
$out | Add-Member -MemberType NoteProperty -Name 'State in Postser' -Value $item.State
$out
}
}
}
The output on screen will look something like this:
ServiceName : AdobeARMservice
Status : OK
computer : NEE
State in Preser : Running
State in Postser : Stopped
Of course, you can capture this output and save it as new csv if you do
$result = foreach ($item in $postser) {
# rest of the above foreach loop
}
# output on screen
$result
# output to new csv
$result | Export-Csv -Path 'C:\ServiceInfoDifference.csv' -NoTypeInformation

There are a few ways to do this:
1. Sorting the columns
If the columns are unsorted in the files, sort them first, and then try finding a match by using linear search.
2. Binary search
What you are currently doing is an implementation of a linear search. You can implement binary search (works best on sorted lists) to find a result faster.
Taken from dfinkey's github repo
function binarySearch {
param($sortedArray, $seekElement, $comparatorCallback)
$comparator = New-Object Comparator $comparatorCallback
$startIndex = 0
$endIndex = $sortedArray.length - 1
while ($startIndex -le $endIndex) {
$middleIndex = $startIndex + [Math]::floor(($endIndex - $startIndex) / 2)
# If we've found the element just return its position.
if ($comparator.equal($sortedArray[$middleIndex], $seekElement)) {
return $middleIndex
}
# Decide which half to choose for seeking next: left or right one.
if ($comparator.lessThan($sortedArray[$middleIndex], $seekElement)) {
# Go to the right half of the array.
$startIndex = $middleIndex + 1
}
else {
# Go to the left half of the array.
$endIndex = $middleIndex - 1
}
}
return -1
}
3. Hashes
I am not completely sure of this method, but, you can load the columns into hashes and then compare them. Hash comparisons are generally faster than array comparisons.

Related

Fast compare two large csv(boths rows and columns) in powershell

I have two large CSVs to compare. Bosth csvs are basically data from the same system 1 day apart. No of rows are around 12k and columns 30.
The aim is to identify what column data has changed for primary key(#ID).
My idea was to loop through the CSVs to identify which rows have changed and dump these into a separate csvs. One done, I again loop through the changes rows, and indetify the exact change in column.
NewCSV = Import-Csv -Path ".\Data_A.csv"
OldCSV = Import-Csv -Path ".\Data_B.csv"
foreach ($LineNew in $NewCSV)
{
ForEach ($LineOld in $OldCSV)
{
If($LineNew -eq $LineOld)
{
Write-Host $LineNew, " Match"
}else{
Write-Host $LineNew, " Not Match"
}
}
}
But as soon as run the loop, it takes forever to run for 12k rows. I was hoping there must be a more efficient way to compare large files powershell. Something that is quicker.

Well you can give this a try, I'm not claiming it will be fast for what vonPryz has already pointed out but it should give you a good side-by-side perspective to compare what has changed from OldCsv to NewCsv.
Note: Those cells that have the same value on both CSVs will be ignored.
$NewCSV = Import-Csv -Path ".\Data_A.csv"
$OldCSV = Import-Csv -Path ".\Data_B.csv" | Group-Object ID -AsHashTable -AsString
$properties = $newCsv[0].PSObject.Properties.Name
$result = foreach($line in $NewCSV)
{
if($ref = $OldCSV[$line.ID])
{
foreach($prop in $properties)
{
if($line.$prop -ne $ref.$prop)
{
[pscustomobject]#{
ID = $line.ID
Property = $prop
OldValue = $ref.$prop
NewValue = $line.$prop
}
}
}
continue
}
Write-Warning "ID $($line.ID) could not be found on Old Csv!!"
}

As vonPryz hints in the comments, you've written an algorithm with quadratic time complexity (O(n²) in Big-O notation) - every time the input size doubles, the number of computations performed increase 4-fold.
To avoid this, I'd suggest using a hashtable or other dictionary type to hold each data set, and use the primary key from the input as the dictionary key. This way you get constant-time lookup of corresponding records, and the time complexity of your algorithm becomes near-linear (O(2n + k)):
$NewCSV = #{}
Import-Csv -Path ".\Data_A.csv" |ForEach-Object {
$NewCSV[$_.ID] = $_
}
$OldCSV = #{}
Import-Csv -Path ".\Data_B.csv" |ForEach-Object {
$OldCSV[$_.ID] = $_
}
Now that we can efficiently resolve each row by it's ID, we can inspect the whole of the data sets with an independent loop over each:
foreach($entry in $NewCSV.GetEnumerator()){
if(-not $OldCSV.ContainsKey($entry.Key)){
# $entry.Value is a new row, not seen in the old data set
}
$newRow = $entry.Value
$oldRow = $OldCSV[$entry.Key]
# do the individual comparison of the rows here
}
Do another loop like above, but with $NewCSV in place of $OldCSV to find/detect deletions.

PS Object unescape character

I have small error when running my code. I assign a string to custom object but it's parsing the string by itself and throwing an error.
Code:
foreach ($item in $hrdblistofobjects) {
[string]$content = Get-Content -Path $item
[string]$content = $content.Replace("[", "").Replace("]", "")
#here is line 43 which is shown as error as well
foreach ($object in $listofitemsdb) {
$result = $content -match $object
$OurObject = [PSCustomObject]#{
ObjectName = $null
TestObjectName = $null
Result = $null
}
$OurObject.ObjectName = $item
$OurObject.TestObjectName = $object #here is line 52 which is other part of error
$OurObject.Result = $result
$Resultsdb += $OurObject
}
}
This code loads an item and checks if an object exists within an item. Basically if string part exists within a string part and then saves result to a variable. I am using this code for other objects and items but they don't have that \p part which I am assuming is the issue. I can't put $object into single quotes for obvious reasons (this was suggested on internet but in my case it's not possible). So is there any other option how to unescape \p? I tried $object.Replace("\PMS","\\PMS") but that did not work either (this was suggested somewhere too).
EDIT:
$Resultsdb = #(foreach ($item in $hrdblistofobjects) {
[string]$content = Get-Content -Path $item
[string]$content = $content.Replace("[", "").Replace("]", "")
foreach ($object in $listofitemsdb) {
[PSCustomObject]#{
ObjectName = $item
TestObjectName = $object
Result = $content -match $object
}
}
}
)

$Resultsdb is not defined as an array, hence you get that error when you try to add one object to another object when that doesn't implement the addition operator.
You shouldn't be appending to an array in a loop anyway. That will perform poorly, because with each iteration it creates a new array with the size increased by one, copies all elements from the existing array, puts the new item in the new free slot, and then replaces the original array with the new one.
A better approach is to just output your objects in the loop and collect the loop output in a variable:
$Resultsdb = foreach ($item in $hrdblistofobjects) {
...
foreach ($object in $listofitemsdb) {
[PSCustomObject]#{
ObjectName = $item
TestObjectName = $object
Result = $content -match $object
}
}
}
Run the loop in an array subexpression if you need to ensure that the result is an array, otherwise it will be empty or a single object when the loop returns less than two results.
$Resultsdb = #(foreach ($item in $hrdblistofobjects) {
...
})
Note that you need to suppress other output on the default output stream in the loop, so that it doesn't pollute your result.

I changed the match part to this and it's working fine $result = $content -match $object.Replace("\PMS","\\PMS").
Sorry for errors in posting. I will amend that.

I'm trying to get something that looks like UNIX ls output in PowerShell. This is getting there:
Get-ChildItem | Format-Wide -AutoSize -Property Name
but it's still outputting the items in row-major instead of column-major order:
PS C:\Users\Mark Reed> Get-ChildItem | Format-Wide -AutoSize -Property Name
Contacts Desktop Documents Downloads Favorites
Links Music Pictures Saved Games
Searches Videos
Desired output:
PS C:\Users\Mark Reed> My-List-Files
Contacts Downloads Music Searches
Desktop Favorites Pictures Videos
Documents Links Saved Games
The difference is in the sorting: 1 2 3 4 5/6 7 8 9 reading across the lines, vs 1/2/3 4/5/6 7/8/9 reading down the columns.
I already have a script that will take an array and print it out in column-major order using Write-Host, though I found a lot of PowerShellish idiomatic improvements to it by reading Keith's and Roman's takes. But my impression from reading around is that's the wrong way to go about this. Instead of calling Write-Host, a script should output objects, and let the formatters and outputters take care of getting the right stuff written to the user's console.
When a script uses Write-Host, its output is not capturable; if I assign the result to a variable, I get a null variable and the output is written to the screen anyway. It's like a command in the middle of a UNIX pipeline writing directly to /dev/tty instead of standard output or even standard error.
Admittedly, I may not be able to do much with the array of Microsoft.PowerShell.Commands.Internal.Format.* objects I get back from e.g. Format-Wide, but at least it contains the output, which doesn't show up on my screen in rogue fashion, and which I can recreate at any time by passing the array to another formatter or outputter.

This is a simple-ish function that formats column major. You can do this all in PowerShell Script:
function Format-WideColMajor {
[CmdletBinding()]
param(
[Parameter(ValueFromPipeline)]
[AllowNull()]
[AllowEmptyString()]
[PSObject]
$InputObject,
[Parameter()]
$Property
)
begin {
$list = new-object System.Collections.Generic.List[PSObject]
}
process {
$list.Add($InputObject)
}
end {
if ($Property) {
$output = $list | Foreach {"$($_.$Property)"}
}
else {
$output = $list | Foreach {"$_"}
}
$conWidth = $Host.UI.RawUI.BufferSize.Width - 1
$maxLen = ($output | Measure-Object -Property Length -Maximum).Maximum
$colWidth = $maxLen + 1
$numCols = [Math]::Floor($conWidth / $colWidth)
$numRows = [Math]::Ceiling($output.Count / $numCols)
for ($i=0; $i -lt $numRows; $i++) {
$line = ""
for ($j = 0; $j -lt $numCols; $j++) {
$item = $output[$i + ($j * $numRows)]
$line += "$item$(' ' * ($colWidth - $item.Length))"
}
$line
}
}
}

Compare objects based on subset of properties

Say I have 2 powershell hashtables one big and one small and, for a specific purpose I want to say they are equal if for the keys in the small one, the keys on the big hastable are the same.
Also I don't know the names of the keys in advance. I can use the following function that uses Invoke-Expression but I am looking for nicer solutions, that don't rely on this.
Function Compare-Subset {
Param(
[hashtable] $big,
[hashtable] $small
)
$keys = $small.keys
Foreach($k in $keys) {
$expression = '$val = $big.' + "$k" + ' -eq ' + '$small.' + "$k"
Invoke-Expression $expression
If(-not $val) {return $False}
}
return $True
}
$big = #{name='Jon'; car='Honda'; age='30'}
$small = #{name = 'Jon'; car='Honda'}
Compare-Subset $big $small

A simple $true/$false can easily be gotten. This will return $true if there are no differences:
[string]::IsNullOrWhiteSpace($($small|Select -Expand Keys|Where{$Small[$_] -ne $big[$_]}))
It checks for all keys in $small to see if the value of that key in $small is the same of the value for that key in $big. It will only output any values that are different. It's wrapped in a IsNullOrWhitespace() method from the [String] type, so if any differences are found it returns false. If you want to list differences just remove that method.

This could be the start of something. Not sure what output you are looking for but this will output the differences between the two groups. Using the same sample data that you provided:
$results = Compare-Object ($big.GetEnumerator() | % { $_.Name }) ($small.GetEnumerator() | % { $_.Name })
$results | ForEach-Object{
$key = $_.InputObject
Switch($_.SideIndicator){
"<="{"Only reference object has the key: '$key'"}
"=>"{"Only difference object has the key: '$key'"}
}
}
In primetime you would want something different but just to show you the above would yield the following output:
Only reference object has the key: 'age'

Powershell array of arrays loop process

I need help with loop processing an array of arrays. I have finally figured out how to do it, and I am doing it as such...
$serverList = $1Servers,$2Servers,$3Servers,$4Servers,$5Servers
$serverList | % {
% {
Write-Host $_
}
}
I can't get it to process correctly. What I'd like to do is create a CSV from each array, and title the lists accordingly. So 1Servers.csv, 2Servers.csv, etc... The thing I can not figure out is how to get the original array name into the filename. Is there a variable that holds the list object name that can be accessed within the loop? Do I need to just do a separate single loop for each list?

You can try :
$1Servers = "Mach1","Mach2"
$2Servers = "Mach3","Mach4"
$serverList = $1Servers,$2Servers
$serverList | % {$i=0}{$i+=1;$_ | % {New-Object -Property #{"Name"=$_} -TypeName PsCustomObject} |Export-Csv "c:\temp\$($i)Servers.csv" -NoTypeInformation }
I take each list, and create new objects that I export in a CSV file. The way I create the file name is not so nice, I don't take the var name I just recreate it, so if your list is not sorted it will not work.
It would perhaps be more efficient if you store your servers in a hash table :
$1Servers = #{Name="1Servers"; Computers="Mach1","Mach2"}
$2Servers = #{Name="2Servers"; Computers="Mach3","Mach4"}
$serverList = $1Servers,$2Servers
$serverList | % {$name=$_.name;$_.computers | % {New-Object -Property #{"Name"=$_} -TypeName PsCustomObject} |Export-Csv "c:\temp\$($name).csv" -NoTypeInformation }

Much like JPBlanc's answer, I kinda have to kludge the filename... (FWIW, I can't see how you can get that out of the array itself).
I did this example w/ foreach instead of foreach-object (%). Since you have actual variable names you can address w/ foreach, it seems a little cleaner, if nothing else, and hopefully a little easier to read/maintain:
$1Servers = "apple.contoso.com","orange.contoso.com"
$2Servers = "peach.contoso.com","cherry.contoso.com"
$serverList = $1Servers,$2Servers
$counter = 1
foreach ( $list in $serverList ) {
$fileName = "{0}Servers.csv" -f $counter++
"FileName: $fileName"
foreach ( $server in $list ) {
"-- ServerName: $server"
}
}

I was able to resolve this issue myself. Because I wasn't able to get the object name through, I just changed the nature of the object. So now my server lists consist of two columns, one of which is the name of the list itself.
So...
$1Servers = += [pscustomobject] #{
Servername = $entry.Servername
Domain = $entry.Domain
}
Then...
$serverList = $usaServers,$devsubServers,$wtencServers,$wtenclvServers,$pcidevServers
Then I am able to use that second column to name the lists within my foreach loop.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse