sorting columns in CSV file with powershell - powershell

I have a csv file with 1600 lines from which top ten lines given below
N,EQ,ADANIPORTS,ADANI PORT & SEZ LTD,384.5,385,387.8,375,376.75,792818726.1,2085488,Y, ,40850,452.35,350.45
N,EQ,ASIANPAINT,ASIAN PAINTS LIMITED,1394.75,1395,1411,1385.05,1393.5,1284559258,919355,Y, ,36117,1490.6,1090.1
N,EQ,AXISBANK,AXIS BANK LIMITED,631.75,638.05,643.4,634,639.9,9599936309,15035968,Y, ,144038,644.65,447.5
N,EQ,BAJAJ-AUTO,BAJAJ AUTO LIMITED,2685.55,2683.9,2697,2664,2682.25,1476618943,551229,Y, ,23611,3468.35,2605
N,EQ,BAJAJFINSV,BAJAJ FINSERV LTD.,7092.1,7092,7129,7025.25,7050.65,909166393.3,128111,Y, ,19707,7200,4500
N,EQ,BAJFINANCE,BAJAJ FINANCE LIMITED,2893.85,2892,2943.4,2891.05,2916.6,3884349778,1327710,Y, ,52356,2943.4,1511.2
N,EQ,BHARTIARTL,BHARTI AIRTEL LIMITED,369.9,370,370.8,365,368.95,768282183.8,2089422,Y, ,26515,564.8,331
N,EQ,BPCL,BHARAT PETROLEUM CORP LT,357.75,358.25,362,353.5,356.95,1738725370,4865929,Y, ,77863,551.55,353.5
N,EQ,CIPLA,CIPLA LTD,657.95,658,658,645,651.2,1235846442,1904031,Y, ,38575,665,507.2
N,EQ,COALINDIA,COAL INDIA LTD,289.05,287.85,293.6,287.8,291,791484837,2713583,Y, ,55421,316.95,235.85
I wanted to sort 10 the column in descending order so that top 20 I can find out.
The file name is Pd240818.csv
my powershell code is as below.
# To remove unwanted few lines
sls ",BE,",",EQ," .\Pd240818.csv | select -exp line | Where-Object {$_ -notmatch ',EQ, ,'} > .\temp.csv
#Sorting line is as follows
gc .\temp.csv | Where-Object {$_ -notmatch 'MKT,'}|%{$_.split(",")[9]}|Sort-Object -Descending| Select-Object -first 20 > temp.txt
Sorted
I get temp.txt as follows:
99988.7
99896.5
9989273.6
99769.75
996134.55
9933960.45
99228.65
99199.95
989418.15
988423057.7
9884111.1
98572145.2
982146.5
981497584.9
97982.75
9786178.9
9775915.05
9760482.5
97384498.85
971033.85
Where as if I sort the same column in excel, I get as below.
28818819313
9599936309
8459873415
6175554483
5889553012
5690666055
5439638100
5121938441
5079530750
5042021707
4972762046
4889394601
4742835986
3884349778
3690976213
3486309023
3388956937
3336437125
3206801588
3114870807
Where am I doing wrong. How to correct it?

The clue is seeing numbers of different lengths, all sorted together:
This is a common problem, where numbers are sorted as text, instead of number values - when we sort words it does not matter how long they are, we put all the a together, then all the b together ... do that with numbers and put all the 9 together, then all the 8 together, you see this varying length sort:
99896.5
9989273.6
99769.75
The solution is to convert the text to numbers, while sorting, then they will sort on the value:
.. | Sort-Object -Descending -Property { $_ -as [decimal] } | ..
Then the output is more like you want:
988423057.7
981497584.9
98572145.2
97384498.85
9989273.6

Related

Powershell - Import-CSV Group-Object SUM a number from grouped objects and then combine all grouped objects to single rows

I have a question similar to this one but with a twist:
Powershell Group Object in CSV and exporting it
My file has 42 existing headers. The delimiter is a standard comma, and there are no quotation marks in this file.
master_account_number,sub,txn,cur,last,first,address,address2,city,state,zip,ssn,credit,email,phone,cell,workphn,dob,chrgnum,cred,max,allow,neg,plan,downpayment,pmt2,min,clid,cliname,owner,merch,legal,is_active,apply,ag,offer,settle_perc,min_pay,plan2,lstpmt,orig,placedate
The file's data (the first 6 columns) looks like this:
master_account_number,sub,txn,cur,last,first
001,12,35,50.25,BIRD, BIG
001,34,47,100.10,BIRD, BIG
002,56,9,10.50,BUNNY, BUGS
002,78,3,20,BUNNY, BUGS
003,54,7,250,DUCK, DAFFY
004,44,88,25,MOUSE, JERRY
I am only working with the first column master_account_number and the 4th column cur.
I want to check for duplicates of the"master_account_number" column, if found then add the totals up from the 4th column "cur" for only those dupes found and then do a combine for any rows that we just did a sum on. The summed value from the dupes should replace the cur value in our combined row.
With that said, our out-put should look like so.
master_account_number,sub,txn,cur,last,first
001,12,35,150.35,BIRD, BIG
002,56,9,30.50,BUNNY, BUGS
003,54,7,250,DUCK, DAFFY
004,44,88,25,MOUSE, JERRY
Now that we have that out the way, here is how this question differs. I want to keep all 42 columns intact in the out-put file. In the other question I referenced above, the input was 5 columns and the out-put was 4 columns and this is not what I'm trying to achieve. I have so many more headers, I'd hate to have specify individually all 42 columns. That seems inefficient anyhow.
As for what I have so far for code... not much.
$revNB = "\\server\path\example.csv"
$global:revCSV = import-csv -Path $revNB | ? {$_.is_active -eq "Y"}
$dupesGrouped = $revCSV | Group-Object master_account_number | Select-Object #{Expression={ ($_.Group|Measure-Object cur -Sum).Sum }}
Ultimately I want the output to look identical to the input, only the output should merge duplicate account numbers rows, and add all the "cur" values, where the merged row contains the sum of the grouped cur values, in the cur field.
Last Update: Tried Rich's solution and got an error. Modified what he had to this $dupesGrouped = $revCSV | Group-Object master_account_number | Select-Object Name, #{Name='curSum'; Expression={ ($_.Group | Measure-Object cur -Sum).Sum}}
And this gets me exactly what my own code got me so I am still looking for a solution. I need to output this CSV with all 42 headers. Even for items with no duplicates.
Other things I've tried:
This doesn't give me the data I need in the columns, the columns are there but they are blank.
$dupesGrouped = $revCSV | Group-Object master_account_number | Select-Object #{ expression={$_.Name}; label='master_account_number' },
sub_account_number,
charge_txn,
#{Name='current_balance'; Expression={ ($_.Group | Measure-Object current_balance -Sum).Sum },
last,
}
You're pretty close, but you used current_balance where you probably meant cur.
Here's a start:
$dupesGrouped = $revCSV | Group-Object master_account_number |
Select-Object Name, #{N='curSum'; E={ ($_.Group | Measure-Object cur -Sum).Sum},
#{N='last'; E={ ($_.Group | Select-Object last -first 1).last} }
You can add the other fields by adding Name;Expression hashtables for each of the fields you want to summarize. I assumed you would want to select the first occurrence of repeated last name for the same master_account_number. The output will be incorrect if the last name differs for the same master_account_number.
In the case of changing only part of the data, there is also the following way.
$dupesGrouped = $revCSV | Group-Object master_account_number | ForEach-Object {
# copy the first data in order not to change original data
$new = $_.Group[0].psobject.Copy()
# update the value of cur property
$new.cur = ($_.Group | Measure-Object cur -Sum).Sum
# output
$new
}

Word frequency elegantly in Powershell

Donald Knuth once got the task to write a literate program computing the word frequency of a file.
Read a file of text, determine the n most frequently used words, and print out a sorted list of those words along with their frequencies.
Doug McIlroy famously rewrote the 10 pages of Pascal in a few lines of sh:
tr -cs A-Za-z '\n' |
tr A-Z a-z |
sort |
uniq -c |
sort -rn |
sed ${1}q
As a little exercise, I converted this to Powershell:
(-split ((Get-Content -Raw test.txt).ToLower() -replace '[^a-zA-Z]',' ')) |
Group-Object |
Sort-Object -Property count -Descending |
Select-Object -First $Args[0] |
Format-Table count, name
I like that Powershell combines sort | uniq -c into a single Group-Object.
The first line looks ugly, so I wonder if it can be written more elegantly? Maybe there is a way to load the file with a regex delimiter somehow?
One obvious way to shorten the code would be to uses the aliases, but that does not help readability.
I would do it this way.
PS C:\users\me> Get-Content words.txt
One one
two
two
three,three.
two;two
PS C:\users\me> (Get-Content words.txt) -Split '\W' | Group-Object
Count Name Group
----- ---- -----
2 One {One, one}
4 two {two, two, two, two}
2 three {three, three}
1 {}
EDIT: Some code from Bruce Payette's Windows Powershell in Action
# top 10 most frequent words, hash table
$s = gc songlist.txt
$s = [string]::join(" ", $s)
$words = $s.Split(" `t", [stringsplitoptions]::RemoveEmptyEntries)
$uniq = $words | sort -Unique
$words | % {$h=#{}} {$h[$_] += 1}
$frequency = $h.keys | sort {$h[$_]}
-1..-10 | %{ $frequency[$_]+" "+$h[$frequency[$_]]}
# or
$grouped = $words | group | sort count
$grouped[-1..-10]
Thanks js2010 and LotPings for important hints. To document what is probably the best solution:
$Input -split '\W+' |
Group-Object -NoElement |
Sort-Object count -Descending |
Select-Object -First $Args[0]
Things I learned:
$Input contains stdin. This is closer to McIlroys code than Get-Content some file.
split can actually take regex delimiters
the -NoElement parameter let me get rid of the Format-Table line.
Windows 10 64-bit. PowerShell 5
How to find what whole word (the not -the- or weather) regardless of case is most frequently used in a text file and how many times it is used using Powershell:
Replace 1.txt with your file.
$z = gc 1.txt -raw
-split $z | group -n | sort c* | select -l 1
Results:
Count Name
----- ----
30 THE

Count unique numbers in CSV (PowerShell or Notepad++)

How to find the count of unique numbers in a CSV file? When I use the following command in PowerShell ISE
1,2,3,4,2 | Sort-Object | Get-Unique
I can get the unique numbers but I'm not able to get this to work with CSV files. If for example I use
$A = Import-Csv C:\test.csv | Sort-Object | Get-Unique
$A.Count
it returns 0. I would like to count unique numbers for all the files in a given folder.
My data looks similar to this:
Col1,Col2,Col3,Col4
5,,7,4
0,,9,
3,,5,4
And the result should be 6 unique values (preferably written inside the same CSV file).
Or would it be easier to do it with Notepad++? So far I have found examples only on how to count the unique rows.
You can try the following (PSv3+):
PS> (Import-CSV C:\test.csv |
ForEach-Object { $_.psobject.properties.value -ne '' } |
Sort-Object -Unique).Count
6
The key is to extract all property (column) values from each input object (CSV row), which is what $_.psobject.properties.value does;
-ne '' filters out empty values.
Note that, given that Sort-Object has a -Unique switch, you don't need Get-Unique (you need Get-Unique only if your input already is sorted).
That said, if your CSV file is structured as simply as yours, you can speed up processing by reading it as a text file (PSv2+):
PS> (Get-Content C:\test.csv | Select-Object -Skip 1 |
ForEach-Object { $_ -split ',' -ne '' } |
Sort-Object -Unique).Count
6
Get-Content reads the CSV file as a line of strings.
Select-Object -Skip 1 skips the header line.
$_ -split ',' -ne '' splits each line into values by commas and weeds out empty values.
As for what you tried:
Import-CSV C:\test.csv | Sort-Object | Get-Unique:
Fundamentally, Sort-Object emits the input objects as a whole (just in sorted order), it doesn't extract property values, yet that is what you need.
Because no -Property argument is passed to Sort-Object to base the sorting on, it compares the custom objects that Import-Csv emits as a whole, by their .ToString() values, which happen to be empty[1]
, so they all compare the same, and in effect no sorting happens.
Similarly, Get-Unique also determines uniqueness by .ToString() here, so that, again, all objects are considered the same and only the very first one is output.
[1] This may be surprising, given that using a custom object in an expandable string does yield a value: compare $obj = [pscustomobject] #{ foo ='bar' }; $obj.ToString(); '---'; "$obj". This inconsistency is discussed in this GitHub issue.

Powershell, comparing 2 files to find the amount of unique entries in both files

I have 2 files. The contents of
File 1 is: 4,22,1,2,3,14,12,13.
File 2 is: 1,50,2,12,3,6,9.
Im trying to write a script that outputs the total unique entries in both files and the total unique numbers in file 1 and file 2. I am currently using:
$howmany = compare-object $(get-content C:\test\file1.txt) $(get-content C:\test\file2.txt)
Write-Host "Total unique entries in both files is:" $howmany.Count
This does the total unique entries in both files but I can't figure out how to find the total unique entries in file 1 and file 2.
I want the output to be something like:
Total unique entries in file 1 is: 4
Total unique entries in file 2 is: 3
Unique numbers in file 1 are: 4 22 14 13
Unique numbers in file 2 are: 50 6 9
This uses the AsHashTable and AsString parameters to return the groups in a hash table, that is, as a collection of key-value pairs.
In the resulting hash table, each property value is a key, and the group elements are the values. Because each key is a property of the hash table object, you can use dot notation to display the values.
$unique = $howmany | Group-Object -Property sideindicator -AsHashTable -AsString
File1
since the output is an array the -join operator is used to join each number to form a string
($unique.'<=' | Select-Object -ExpandProperty inputobject) -join ','
File2
($unique.'=>' | Select-Object -ExpandProperty inputobject) -join ','
File1 - Count unique items
($unique.'<=' | Select-Object -ExpandProperty inputobject).count
File2 - Count unique items
($unique.'=>' | Select-Object -ExpandProperty inputobject).count
(I know you have an accepted answer, I just wanted to write an alternative. I can't make it work the way I was trying to approach it, but this is close).
function uniques {param($a,$b) $a|? {$b -notcontains $_}}
$f1 = (gc C:\test\file1.txt) | select -unique
$f2 = (gc C:\test\file2.txt) | select -unique
Write-Host "Total unique entries in both files is: $(($f1+$f2 |select -Unique).Count)"
Write-Host "Total unique entries in file 1 is: $((uniques $f1 $f2).Count)"
Write-Host "Total unique entries in file 2 is: $((uniques $f2 $f1).Count)"
Write-Host "Unique numbers in file 1 are: $(uniques $f1 $f2)"
Write-Host "Unique numbers in file 2 are: $(uniques $f2 $f1)"
NB. Your initial code, and therefore #Kiran's answer, has a bug if one of the files contains a duplicate number. e.g. if file1 contains 4,22,1,2,3,14,12,13,4 with a duplicate 4 in it, you'll get 5 unique numbers - 4,22,14,13,4. That's why this has |select -unique for both files when reading them.
NB. my version might fail if a file has only one number, or there is only one unique number. #() around things to make sure they stay as arrays if that matters.

Powershell counting same values from csv

Using PowerShell, I can import the CSV file and count how many objects are equal to "a". For example,
#(Import-csv location | where-Object{$_.id -eq "a"}).Count
Is there a way to go through every column and row looking for the same String "a" and adding onto count? Or do I have to do the same command over and over for every column, just with a different keyword?
So I made a dummy file that contains 5 columns of people names. Now to show you how the process will work I will show you how often the text "Ann" appears in any field.
$file = "C:\temp\MOCK_DATA (3).csv"
gc $file | %{$_ -split ","} | Group-Object | Where-Object{$_.Name -like "Ann*"}
Don't focus on the code but the output below.
Count Name Group
----- ---- -----
5 Ann {Ann, Ann, Ann, Ann...}
9 Anne {Anne, Anne, Anne, Anne...}
12 Annie {Annie, Annie, Annie, Annie...}
19 Anna {Anna, Anna, Anna, Anna...}
"Ann" appears 5 times on it's own. However it is a part of other names as well. Lets use a simple regex to find all the values that are only "Ann".
(select-string -Path 'C:\temp\MOCK_DATA (3).csv' -Pattern "\bAnn\b" -AllMatches | Select-Object -ExpandProperty Matches).Count
That will return 5 since \b is for a word boundary. In essence it is only looking at what is between commas or beginning or end of each line. This omits results like "Anna" and "Annie" that you might have. Select-Object -ExpandProperty Matches is important to have if you have more than one match on a single line.
Small Caveat
It should not matter but in trying to keep the code simple it is possible that your header could match with the value you are looking for. Not likely which is why I don't account for it. If that is a possibility then we could use Get-Content instead with a Select -Skip 1.
Try cycling through properties like this:
(Import-Csv location | %{$record = $_; $record | Get-Member -MemberType Properties |
?{$record.$($_.Name) -eq 'a';}}).Count