Sort CSV powershell script delete duplicate, keep the one with a special value in 3rd column - powershell

How do I delete double entriys in a csv by one column and leave the one with one special value in one of the columns?
Example: I got a csv with
Name;Employeenumber;Accessrights
Max;123456;ReadOnly
Berta;133556;Write
Jhonny;161771;ReadOnly
Max;123456;Write
I want to end up with:
Name;Employeenumber;Accessrights
Max;123456;Write
Berta;133556;Write
Jhonny;161771;ReadOnly
I tried by Get-Content Select-Object -unique, but that does not solve the problem that it should only keep the ones with the value "write" at the property Accessrights.
So I have no clue at all

You can use a combination of sorting and grouping ....
#'
Name;Employeenumber;Accessrights
Max;123456;ReadOnly
Berta;133556;Write
Jhonny;161771;ReadOnly
Max;123456;Write
'# |
ConvertFrom-Csv -Delimiter ';' |
Sort-Object -Property Name, Accessrights -Descending |
Group-Object -Property Name |
ForEach-Object {
$_.Group[0]
}

Related

how to merge header and sub header in csv in powershell?

Assume I have a sample.csv file with 2 rows with multiple columns as
Account, Id, Date
Jan, Dec, Feb
now, I want to convert this file which consists single row, and change it in column order as output.csv in PowerShell scripting
Account,Id,Date,Jan,Feb,Dec
I tried many ways..
Import-Csv 'sample.csv' | Group-Object ID | ForEach-Object { [PsCustomObject]#{ Header = $_.Group.Dept -join ',' } } | Export-Csv 'output.csv' -NoTypeInformation
Without going into detail on the issues you may encounter, here is what you can try:
#"
Account, Id, Date
Jan, Dec, Feb
one, two, three
"# | ConvertFrom-Csv -PipelineVariable "Header" | Select-Object -First 1 |
ForEach-Object -Process {
($_.PSObject.Properties.Value, $Header[0].PSObject.Properties.Name | Write-Output) -join ","
}
Tapping into the PSObject wrapper exposes additional properties not shown when piped to Get-Member. The main one we are after is the Column Names (.Name), and the first rows values (.Value). Select-Object -First 1, is to ensure only the first row is selected and not any other rows you may have. So, we join both property values and all that's left to do is export to a new csv.

Powershell Help: How can I remove duplicates (using multiple columns simultaneously, not sequentially)?

I have tried several different variations based on some other stack overflow articles, but I will share a sample of what I have and a sample output and then some cobbled-together code hoping for some direction from the community:
C:\Scripts\contacts.csv:
id,first_name,last_name,email
1,john,smith,jsmith#notreal.com
1,jane,smith,jsmith#notreal.com
2,jane,smith,jsmith#notreal.com
2,john,smith,jsmith#notreal.com
3,sam,jones,sjones#notreal.com
3,sandy,jones,sandy#notreal.com
Need to turn this into a file where column "email" is unique to column "id". In other words there can be duplicate addresses, but only if there is a different id.
desired output C:\Scripts\contacts-trimmed.csv:
id,first_name,last_name,email
1,john,smith,jsmith#notreal.com
2,john,smith,jsmith#notreal.com
3,sam,jones,sjones#notreal.com
3,sandy,jones,sandy#notreal.com
I have tried this with a few different variations:
Import-Csv C:\Scripts\contacts.csv | sort first_name | Sort-Object -Property id,email -Unique | Export-Csv C:\Scripts\contacts-trim.csv -NoTypeInformation
Any help or direction would be most appreciated
You'll want to use the Group-Object cmdlet, to, well, group together records with similar values:
$records = #'
id,first_name,last_name,email
1,john,smith,jsmith#notreal.com
1,jane,smith,jsmith#notreal.com
2,jane,smith,jsmith#notreal.com
2,john,smith,jsmith#notreal.com
3,sam,jones,sjones#notreal.com
3,sandy,jones,sandy#notreal.com
'# |ConvertFrom-Csv
# group records based on id and email column
$records |Group-Object id,email |ForEach-Object {
# grab only the first record from each group
$_.Group |Select-Object -First 1
} |Export-Csv .\no_duplicates.csv -NoTypeInformation

Count unique numbers in CSV (PowerShell or Notepad++)

How to find the count of unique numbers in a CSV file? When I use the following command in PowerShell ISE
1,2,3,4,2 | Sort-Object | Get-Unique
I can get the unique numbers but I'm not able to get this to work with CSV files. If for example I use
$A = Import-Csv C:\test.csv | Sort-Object | Get-Unique
$A.Count
it returns 0. I would like to count unique numbers for all the files in a given folder.
My data looks similar to this:
Col1,Col2,Col3,Col4
5,,7,4
0,,9,
3,,5,4
And the result should be 6 unique values (preferably written inside the same CSV file).
Or would it be easier to do it with Notepad++? So far I have found examples only on how to count the unique rows.
You can try the following (PSv3+):
PS> (Import-CSV C:\test.csv |
ForEach-Object { $_.psobject.properties.value -ne '' } |
Sort-Object -Unique).Count
6
The key is to extract all property (column) values from each input object (CSV row), which is what $_.psobject.properties.value does;
-ne '' filters out empty values.
Note that, given that Sort-Object has a -Unique switch, you don't need Get-Unique (you need Get-Unique only if your input already is sorted).
That said, if your CSV file is structured as simply as yours, you can speed up processing by reading it as a text file (PSv2+):
PS> (Get-Content C:\test.csv | Select-Object -Skip 1 |
ForEach-Object { $_ -split ',' -ne '' } |
Sort-Object -Unique).Count
6
Get-Content reads the CSV file as a line of strings.
Select-Object -Skip 1 skips the header line.
$_ -split ',' -ne '' splits each line into values by commas and weeds out empty values.
As for what you tried:
Import-CSV C:\test.csv | Sort-Object | Get-Unique:
Fundamentally, Sort-Object emits the input objects as a whole (just in sorted order), it doesn't extract property values, yet that is what you need.
Because no -Property argument is passed to Sort-Object to base the sorting on, it compares the custom objects that Import-Csv emits as a whole, by their .ToString() values, which happen to be empty[1]
, so they all compare the same, and in effect no sorting happens.
Similarly, Get-Unique also determines uniqueness by .ToString() here, so that, again, all objects are considered the same and only the very first one is output.
[1] This may be surprising, given that using a custom object in an expandable string does yield a value: compare $obj = [pscustomobject] #{ foo ='bar' }; $obj.ToString(); '---'; "$obj". This inconsistency is discussed in this GitHub issue.

Use Import-Csv to read changable column Titles by location

I'm trying to see if there is a way to read the column values in a csv file based on the column location. The reason for this is the file I'm being handed always has it's titles being changed...
For example, lets say csv file column A (via excel) looks like the following:
ColumnOne
ValueOne
ValueTwo
ValueThree
Now the user changes the title:
Column 1
ValueOne
ValueTwo
ValueThree
Now I want to create an array of the first column. Normally what I do is the following:
$arrayFirstColumn = Import-Csv 'C:\test\test1.csv' | where-object {$_.ColumnOne} | select-object -expand 'ColumnOne'
However, as we can see if ColumnOne is changed to Column 1, it breaks this code. How can I create this array to allow an interchangeable column title, but the column location will always be the same?
You can specify headers of your own on import:
Import-Csv 'C:\path\to\your.csv' -Header 'MyHeaderA','MyHeaderB',...
As long as you don't export the data back to a CSV (or don't require the original headers to be in the output CSV as well) you can use whatever names you like. You can also specify as many header names as you like. If their number is less than the number of the columns in the CSV the additional columns will be omitted, if it's greater then the columns for the additional headers will be empty.
If you need to preserve the original headers you could get the header name(s) you need to work with in variable(s) like this:
$csv = Import-Csv 'C:\test\test1.csv'
$firstCol = $csv | Select-Object -First 1 | ForEach-Object {
$_.PSObject.Properties | Select-Object -First 1 -Expand Name
}
$arrayFirstColumn = $csv | Where-Object {$_.$firstCol} |
Select-Object -Expand $firstCol
Or you could simply read the first line from the CSV and split it to get an array with the headers:
$headers = (Get-Content 'C:\test\test1.csv' -TotalCount 1) -split ','
$firstCol = $headers[0]
One option:
$ImportFile = 'C:\test\test1.csv'
$FirstColumn = ((Get-Content $ImportFile -TotalCount 2 | ConvertFrom-Csv).psobject.properties.name)[0]
$FirstColumn
$arrayFirstColumn = Import-Csv $ImportFile | where-object {$_.$FirstColumn} | select-object -expand $FirstColumn
If you are using PowerShell v2.0 then the expression for $FirstColumn in $mjolinor's answer would be:
$FirstColumn = ((Get-Content $ImportFile -TotalCount 2 | ConvertFrom-Csv).psobject.properties | ForEach-Object {$_.name})[0]
(Apologies for starting a new answer; I do not yet have enough reputation to add a comment to mjolinor's post)

Sort-Object by greatest numerical value value from Import-CSV

I want the greatest value (mailboxSize) at the top of the file. I have a cvs as inport.
When I do the following sort cmd:
Import-Csv import.csv| Sort-Object MailboxSize,DisplayName -Descending | Export-Csv SORT.csv
I get the following result:
"DisplayName","MailboxSize"
"persone6","9941"
"persone3","8484"
"persone1","7008"
"persone4","4322"
"persone5","3106"
"persone7","27536"
"persone10","24253"
"persone8","1961"
"persone9","17076"
"persone11","17012"
"persone2","15351"
"persone12","11795"
"persone14","1156"
"persone13","1008"
But I want this as a result!
"persone7","27536"
"persone10","24253"
"persone9","17076"
"persone11","17012"
"persone2","15351"
"persone12","11795"
"persone6","9941"
"persone3","8484"
"persone1","7008"
"persone4","4322"
"persone5","3106"
"persone14","1156"
"persone13","1008"
When importing a CSV-file, all properties are made string-type. You have to cast the MailboxSize to an int before you can sort it properly. Try:
Import-Csv import.csv |
Sort-Object {[int]$_.MailboxSize}, DisplayName -Descending |
Export-Csv SORT.csv
You should also use the -NoTypeInformation switch in Export-CSV to avoid the #TYPE ..... line (first line in an exported CSV-file).
Sample:
$data = #"
"DisplayName","MailboxSize"
"persone6","9941"
"persone3","8484"
"persone1","7008"
"persone4","4322"
"persone5","3106"
"persone7","27536"
"persone10","24253"
"persone8","1961"
"persone9","17076"
"persone11","17012"
"persone2","15351"
"persone12","11795"
"persone14","1156"
"persone13","1008"
"# | ConvertFrom-Csv
$data |
Sort-Object {[int]$_.MailboxSize}, DisplayName -Descending |
Export-Csv SORT.csv -NoTypeInformation
SORT.csv
"DisplayName","MailboxSize"
"persone7","27536"
"persone10","24253"
"persone9","17076"
"persone11","17012"
"persone2","15351"
"persone12","11795"
"persone6","9941"
"persone3","8484"
"persone1","7008"
"persone4","4322"
"persone5","3106"
"persone8","1961"
"persone14","1156"
"persone13","1008"
I'm guessing the usernames are fake, but be aware that the same issue goes for DisplayName if your usernames actually was personeXX where XX is an int. Like:
persone7 27536
persone20 27536
persone13 27536
To sort them probably, you'd have to create a scriptblock for Sort-Object or create your own function to split the value and sort them correctly.