Powershell, comparing 2 files to find the amount of unique entries in both files - powershell

I have 2 files. The contents of
File 1 is: 4,22,1,2,3,14,12,13.
File 2 is: 1,50,2,12,3,6,9.
Im trying to write a script that outputs the total unique entries in both files and the total unique numbers in file 1 and file 2. I am currently using:
$howmany = compare-object $(get-content C:\test\file1.txt) $(get-content C:\test\file2.txt)
Write-Host "Total unique entries in both files is:" $howmany.Count
This does the total unique entries in both files but I can't figure out how to find the total unique entries in file 1 and file 2.
I want the output to be something like:
Total unique entries in file 1 is: 4
Total unique entries in file 2 is: 3
Unique numbers in file 1 are: 4 22 14 13
Unique numbers in file 2 are: 50 6 9

This uses the AsHashTable and AsString parameters to return the groups in a hash table, that is, as a collection of key-value pairs.
In the resulting hash table, each property value is a key, and the group elements are the values. Because each key is a property of the hash table object, you can use dot notation to display the values.
$unique = $howmany | Group-Object -Property sideindicator -AsHashTable -AsString
File1
since the output is an array the -join operator is used to join each number to form a string
($unique.'<=' | Select-Object -ExpandProperty inputobject) -join ','
File2
($unique.'=>' | Select-Object -ExpandProperty inputobject) -join ','
File1 - Count unique items
($unique.'<=' | Select-Object -ExpandProperty inputobject).count
File2 - Count unique items
($unique.'=>' | Select-Object -ExpandProperty inputobject).count

(I know you have an accepted answer, I just wanted to write an alternative. I can't make it work the way I was trying to approach it, but this is close).
function uniques {param($a,$b) $a|? {$b -notcontains $_}}
$f1 = (gc C:\test\file1.txt) | select -unique
$f2 = (gc C:\test\file2.txt) | select -unique
Write-Host "Total unique entries in both files is: $(($f1+$f2 |select -Unique).Count)"
Write-Host "Total unique entries in file 1 is: $((uniques $f1 $f2).Count)"
Write-Host "Total unique entries in file 2 is: $((uniques $f2 $f1).Count)"
Write-Host "Unique numbers in file 1 are: $(uniques $f1 $f2)"
Write-Host "Unique numbers in file 2 are: $(uniques $f2 $f1)"
NB. Your initial code, and therefore #Kiran's answer, has a bug if one of the files contains a duplicate number. e.g. if file1 contains 4,22,1,2,3,14,12,13,4 with a duplicate 4 in it, you'll get 5 unique numbers - 4,22,14,13,4. That's why this has |select -unique for both files when reading them.
NB. my version might fail if a file has only one number, or there is only one unique number. #() around things to make sure they stay as arrays if that matters.

Related

sorting columns in CSV file with powershell

I have a csv file with 1600 lines from which top ten lines given below
N,EQ,ADANIPORTS,ADANI PORT & SEZ LTD,384.5,385,387.8,375,376.75,792818726.1,2085488,Y, ,40850,452.35,350.45
N,EQ,ASIANPAINT,ASIAN PAINTS LIMITED,1394.75,1395,1411,1385.05,1393.5,1284559258,919355,Y, ,36117,1490.6,1090.1
N,EQ,AXISBANK,AXIS BANK LIMITED,631.75,638.05,643.4,634,639.9,9599936309,15035968,Y, ,144038,644.65,447.5
N,EQ,BAJAJ-AUTO,BAJAJ AUTO LIMITED,2685.55,2683.9,2697,2664,2682.25,1476618943,551229,Y, ,23611,3468.35,2605
N,EQ,BAJAJFINSV,BAJAJ FINSERV LTD.,7092.1,7092,7129,7025.25,7050.65,909166393.3,128111,Y, ,19707,7200,4500
N,EQ,BAJFINANCE,BAJAJ FINANCE LIMITED,2893.85,2892,2943.4,2891.05,2916.6,3884349778,1327710,Y, ,52356,2943.4,1511.2
N,EQ,BHARTIARTL,BHARTI AIRTEL LIMITED,369.9,370,370.8,365,368.95,768282183.8,2089422,Y, ,26515,564.8,331
N,EQ,BPCL,BHARAT PETROLEUM CORP LT,357.75,358.25,362,353.5,356.95,1738725370,4865929,Y, ,77863,551.55,353.5
N,EQ,CIPLA,CIPLA LTD,657.95,658,658,645,651.2,1235846442,1904031,Y, ,38575,665,507.2
N,EQ,COALINDIA,COAL INDIA LTD,289.05,287.85,293.6,287.8,291,791484837,2713583,Y, ,55421,316.95,235.85
I wanted to sort 10 the column in descending order so that top 20 I can find out.
The file name is Pd240818.csv
my powershell code is as below.
# To remove unwanted few lines
sls ",BE,",",EQ," .\Pd240818.csv | select -exp line | Where-Object {$_ -notmatch ',EQ, ,'} > .\temp.csv
#Sorting line is as follows
gc .\temp.csv | Where-Object {$_ -notmatch 'MKT,'}|%{$_.split(",")[9]}|Sort-Object -Descending| Select-Object -first 20 > temp.txt
Sorted
I get temp.txt as follows:
99988.7
99896.5
9989273.6
99769.75
996134.55
9933960.45
99228.65
99199.95
989418.15
988423057.7
9884111.1
98572145.2
982146.5
981497584.9
97982.75
9786178.9
9775915.05
9760482.5
97384498.85
971033.85
Where as if I sort the same column in excel, I get as below.
28818819313
9599936309
8459873415
6175554483
5889553012
5690666055
5439638100
5121938441
5079530750
5042021707
4972762046
4889394601
4742835986
3884349778
3690976213
3486309023
3388956937
3336437125
3206801588
3114870807
Where am I doing wrong. How to correct it?
The clue is seeing numbers of different lengths, all sorted together:
This is a common problem, where numbers are sorted as text, instead of number values - when we sort words it does not matter how long they are, we put all the a together, then all the b together ... do that with numbers and put all the 9 together, then all the 8 together, you see this varying length sort:
99896.5
9989273.6
99769.75
The solution is to convert the text to numbers, while sorting, then they will sort on the value:
.. | Sort-Object -Descending -Property { $_ -as [decimal] } | ..
Then the output is more like you want:
988423057.7
981497584.9
98572145.2
97384498.85
9989273.6

Count unique numbers in CSV (PowerShell or Notepad++)

How to find the count of unique numbers in a CSV file? When I use the following command in PowerShell ISE
1,2,3,4,2 | Sort-Object | Get-Unique
I can get the unique numbers but I'm not able to get this to work with CSV files. If for example I use
$A = Import-Csv C:\test.csv | Sort-Object | Get-Unique
$A.Count
it returns 0. I would like to count unique numbers for all the files in a given folder.
My data looks similar to this:
Col1,Col2,Col3,Col4
5,,7,4
0,,9,
3,,5,4
And the result should be 6 unique values (preferably written inside the same CSV file).
Or would it be easier to do it with Notepad++? So far I have found examples only on how to count the unique rows.
You can try the following (PSv3+):
PS> (Import-CSV C:\test.csv |
ForEach-Object { $_.psobject.properties.value -ne '' } |
Sort-Object -Unique).Count
6
The key is to extract all property (column) values from each input object (CSV row), which is what $_.psobject.properties.value does;
-ne '' filters out empty values.
Note that, given that Sort-Object has a -Unique switch, you don't need Get-Unique (you need Get-Unique only if your input already is sorted).
That said, if your CSV file is structured as simply as yours, you can speed up processing by reading it as a text file (PSv2+):
PS> (Get-Content C:\test.csv | Select-Object -Skip 1 |
ForEach-Object { $_ -split ',' -ne '' } |
Sort-Object -Unique).Count
6
Get-Content reads the CSV file as a line of strings.
Select-Object -Skip 1 skips the header line.
$_ -split ',' -ne '' splits each line into values by commas and weeds out empty values.
As for what you tried:
Import-CSV C:\test.csv | Sort-Object | Get-Unique:
Fundamentally, Sort-Object emits the input objects as a whole (just in sorted order), it doesn't extract property values, yet that is what you need.
Because no -Property argument is passed to Sort-Object to base the sorting on, it compares the custom objects that Import-Csv emits as a whole, by their .ToString() values, which happen to be empty[1]
, so they all compare the same, and in effect no sorting happens.
Similarly, Get-Unique also determines uniqueness by .ToString() here, so that, again, all objects are considered the same and only the very first one is output.
[1] This may be surprising, given that using a custom object in an expandable string does yield a value: compare $obj = [pscustomobject] #{ foo ='bar' }; $obj.ToString(); '---'; "$obj". This inconsistency is discussed in this GitHub issue.

How to read a CSV file but exclude certain columns containing blanks using Get-Content

I want to read a CSV file and exclude rows where dynamically selected columns contain blanks but not all rows of those dynamically selected columns contain blanks.
Trying to use the where clause in the statement below (but not working):
Get-Content $Source -ReadCount 1000 |
Where {
ForEach($NotEqualBlankCol in $BlankColumns)
{
$NotEqualBlankCol -ne $null -and $NotEqualBlankCol -ne ''}
} |
ConvertFrom-Csv |
Sort-Object -Property $SortByColNames.Replace('"', '') -Unique |
.
.
.
| Out-File $Destination
$BlankColumns is my dynamic object string array which I would like to loop through containing the column names of the CSV that are blank. it can be 1 column or more. When more then all of the selected columns need to be blank to qualify as a row that does not need to be included in the final CSV file output.
How do I do it using Get-Content? Any help would be appreciated.
Using Get-Content
Ok. So what this will do it read in the contents of a file X lines at a time. It will parse each line into its indiviual columns. Then it will check the specified columns for blanks. If any of the flagged columns contains a black then it will be filtered out. Consider the test data I used for this
id,first_name,last_name,email,gender,ip_address
1,Christina,Tucker,ctucker0#bbc.co.uk,Female,91.33.192.187
2,Jacqueline,Torres,jtorres1#shop-pro.jp,Female,205.70.183.107
3,Kathy,Perez,kperez2#hugedomains.com,Female,35.175.154.127
4,"",Holmes,eholmes3#canalblog.com,,
5,Ernest,Walker,ewalker4#marketwatch.com,Male,140.110.129.21
6,,Garza,cgarza5#jugem.jp,,
7,,Cunningham,jcunningham6#ox.ac.uk,Female,
8,,Clark,lclark7#posterous.com,,
9,,Ortiz,lortiz8#shareasale.com,,
Notice that the first_name and gender are blank for some of these folks. id 1,2,3,5,10 have complete data. The rest should be filtered.
$BlankColumns = "first_name","gender"
$headers = (Get-Content $path -TotalCount 1).Split(",")
$potentialBlankHeaderIndecies = 0..($headers.Count - 1) | Where-Object{$BlankColumns -contains $headers[$_]}
$potentialBlankHeaderIndecies
Get-Content $path -ReadCount 3 | Foreach-Object{
# Check to see if any of the indexes from a split are empty
$_ | Where-Object{
[bool[]](($_.Split(","))[$potentialBlankHeaderIndecies] | ForEach-Object{
![string]::IsNullOrEmpty($_.Trim('"'))
}) -notcontains $false
}
}
The output of this code is the file, as string, with the removed entries. You can just pipe this into a variable, file or what even you need.
To go into a little more detail we take the header names we want to check and this read in the first line of the csv file. That should contain the column names. Using that we determine the column indexes that we want to scrutinize. The we read in the whole file and parse it line by line. For each line we split on the comma and check the elements matching the identified headers. Check each of those elements if they are blank or null. We trim quotes in case it is a string "" which I will assume you would count as blank. Of all the elements we evaluate as a Boolean whether or not it is empty. If at least one is then it fails the where-object clause and gets ommited.
Using Import-CSV
$BlankColumns = "first_name","gender"
Import-CSV $path | Where-Object{
$line = $_
($BlankColumns | ForEach-Object{
![string]::IsNullOrEmpty(($line.$_.Trim('"')))
}) -notcontains $false
}
Very similar approach just a lot less overhead since we are dealing with objects now instead of strings.
Now you could use Export-CSV or ConvertFrom-CSV depending on your needs in the rest of the project.
Changing the filter criteria.
Both examples above filter columns where any of the columns contain blanks. If you want to omit only where all are blank change the line }) -notcontains $false to }) -contains $true

Powershell counting same values from csv

Using PowerShell, I can import the CSV file and count how many objects are equal to "a". For example,
#(Import-csv location | where-Object{$_.id -eq "a"}).Count
Is there a way to go through every column and row looking for the same String "a" and adding onto count? Or do I have to do the same command over and over for every column, just with a different keyword?
So I made a dummy file that contains 5 columns of people names. Now to show you how the process will work I will show you how often the text "Ann" appears in any field.
$file = "C:\temp\MOCK_DATA (3).csv"
gc $file | %{$_ -split ","} | Group-Object | Where-Object{$_.Name -like "Ann*"}
Don't focus on the code but the output below.
Count Name Group
----- ---- -----
5 Ann {Ann, Ann, Ann, Ann...}
9 Anne {Anne, Anne, Anne, Anne...}
12 Annie {Annie, Annie, Annie, Annie...}
19 Anna {Anna, Anna, Anna, Anna...}
"Ann" appears 5 times on it's own. However it is a part of other names as well. Lets use a simple regex to find all the values that are only "Ann".
(select-string -Path 'C:\temp\MOCK_DATA (3).csv' -Pattern "\bAnn\b" -AllMatches | Select-Object -ExpandProperty Matches).Count
That will return 5 since \b is for a word boundary. In essence it is only looking at what is between commas or beginning or end of each line. This omits results like "Anna" and "Annie" that you might have. Select-Object -ExpandProperty Matches is important to have if you have more than one match on a single line.
Small Caveat
It should not matter but in trying to keep the code simple it is possible that your header could match with the value you are looking for. Not likely which is why I don't account for it. If that is a possibility then we could use Get-Content instead with a Select -Skip 1.
Try cycling through properties like this:
(Import-Csv location | %{$record = $_; $record | Get-Member -MemberType Properties |
?{$record.$($_.Name) -eq 'a';}}).Count

Returning multiple values from simular hash keys powershell

I have the following code that pulls in some server information from a text file and spits it into a hashtable.
Get-Content $serverfile | Foreach-Object {
if($_ -match '^([^\W+]+)\s+([^\.+]+)')
{
$sh[$matches[1]] = $matches[2]
}
}
$sh.GetEnumerator()| sort -Property Name
This produces the following:
Name Value
---- -----
Disk0 40
Disk1 40
Disk2 38
Disk3 43
Memory 4096
Name Value
Number_of_disks 1
Number_of_network_cards 2
Number_of_processors 1
ServerName WIN02
Depending on the server there may be one Disk0 or many more.
My challenge here is to pull each Disk* value from each of the varying number of Disk keys and return the values in a comma separated list, for example;
$disks = 40,40,38,43
I have tried varying approaches to this problem however none have met the criteria of being dynamic and including the ',' after each disk.
Any help would be appreciated.
I assume that when you say "Depending on the server there may be one Disk0 or many more", you mean "one Disk or many more", each with a different number? You can't have more than one Disk0, because key names can't be duplicated in a hash.
This will give you a list of all the hash values for keys starting with "Disk":
$sh.Keys | ?{$_ -match '^Disk'} | %{$sh.$_}
If you actually want to get a comma-separated list (a single string value), you can use the -join operator:
$disks = ($sh.Keys | ?{$_ -match '^Disk'} | %{$sh.$_}) -join ','
However, if the reason you want a comma-separated list is in order to get an array of the values, you don't really need the comma-separated list; just assign the results (which are already an array) to the variable:
$disks = $sh.Keys | ?{$_ -match '^Disk'} | %{$sh.$_}
Note, BTW, that hashes are not ordered. There's no guarantee that the order of the keys listed will be the same as the order in which you added them or in ascending alphanumeric order. So, in the above example, your result could be 38,40,43,40. If order does matter (i.e. you're counting on the values in $disks to be in the order of their respective Disk numbers, you have two options.
Filter the listing of the keys through Sort-Object:
$sh.Keys | ?{$_ -match '^Disk'} | sort | %{$sh.$_}
(You can put the | sort between $sh.Keys and | ?{..., but it's more efficient this way...which makes little difference here but would matter with larger data sets.)
Use an ordered dictionary, which functions pretty much the same as a hash, but maintains the keys in the order added:
$sh = New-Object System.Collections.Specialized.OrderedDictionary