Count unique numbers in CSV (PowerShell or Notepad++) - powershell

How to find the count of unique numbers in a CSV file? When I use the following command in PowerShell ISE
1,2,3,4,2 | Sort-Object | Get-Unique
I can get the unique numbers but I'm not able to get this to work with CSV files. If for example I use
$A = Import-Csv C:\test.csv | Sort-Object | Get-Unique
$A.Count
it returns 0. I would like to count unique numbers for all the files in a given folder.
My data looks similar to this:
Col1,Col2,Col3,Col4
5,,7,4
0,,9,
3,,5,4
And the result should be 6 unique values (preferably written inside the same CSV file).
Or would it be easier to do it with Notepad++? So far I have found examples only on how to count the unique rows.

You can try the following (PSv3+):
PS> (Import-CSV C:\test.csv |
ForEach-Object { $_.psobject.properties.value -ne '' } |
Sort-Object -Unique).Count
6
The key is to extract all property (column) values from each input object (CSV row), which is what $_.psobject.properties.value does;
-ne '' filters out empty values.
Note that, given that Sort-Object has a -Unique switch, you don't need Get-Unique (you need Get-Unique only if your input already is sorted).
That said, if your CSV file is structured as simply as yours, you can speed up processing by reading it as a text file (PSv2+):
PS> (Get-Content C:\test.csv | Select-Object -Skip 1 |
ForEach-Object { $_ -split ',' -ne '' } |
Sort-Object -Unique).Count
6
Get-Content reads the CSV file as a line of strings.
Select-Object -Skip 1 skips the header line.
$_ -split ',' -ne '' splits each line into values by commas and weeds out empty values.
As for what you tried:
Import-CSV C:\test.csv | Sort-Object | Get-Unique:
Fundamentally, Sort-Object emits the input objects as a whole (just in sorted order), it doesn't extract property values, yet that is what you need.
Because no -Property argument is passed to Sort-Object to base the sorting on, it compares the custom objects that Import-Csv emits as a whole, by their .ToString() values, which happen to be empty[1]
, so they all compare the same, and in effect no sorting happens.
Similarly, Get-Unique also determines uniqueness by .ToString() here, so that, again, all objects are considered the same and only the very first one is output.
[1] This may be surprising, given that using a custom object in an expandable string does yield a value: compare $obj = [pscustomobject] #{ foo ='bar' }; $obj.ToString(); '---'; "$obj". This inconsistency is discussed in this GitHub issue.

Related

Sort CSV powershell script delete duplicate, keep the one with a special value in 3rd column

How do I delete double entriys in a csv by one column and leave the one with one special value in one of the columns?
Example: I got a csv with
Name;Employeenumber;Accessrights
Max;123456;ReadOnly
Berta;133556;Write
Jhonny;161771;ReadOnly
Max;123456;Write
I want to end up with:
Name;Employeenumber;Accessrights
Max;123456;Write
Berta;133556;Write
Jhonny;161771;ReadOnly
I tried by Get-Content Select-Object -unique, but that does not solve the problem that it should only keep the ones with the value "write" at the property Accessrights.
So I have no clue at all
You can use a combination of sorting and grouping ....
#'
Name;Employeenumber;Accessrights
Max;123456;ReadOnly
Berta;133556;Write
Jhonny;161771;ReadOnly
Max;123456;Write
'# |
ConvertFrom-Csv -Delimiter ';' |
Sort-Object -Property Name, Accessrights -Descending |
Group-Object -Property Name |
ForEach-Object {
$_.Group[0]
}

Using Powershell, how can I export and delete csv rows, where a particular value is *not found* in a *different* csv?

I have two files. One is called allper.csv
institutiongroup,studentid,iscomplete
institutionId=22343,123,FALSE
institutionId=22343,456,FALSE
institutionId=22343,789,FALSE
The other one is called actswithpersons.csv
abc,123;456
def,456
ghi,123
jkl,123;456
Note: The actswithpersons.csv does not have headers - they are going to be added in later via an excel power query so don't want them in there now. The actswithpersons csv columns are delimited with commas - there are only two columns, and the second one contains multiple personids - again Excel will deal with this later.
I want to remove all rows from allper.csv where the personid doesn't appear in actswithpersons.csv, and export them to another csv. So in the desired outcome, allper.csv would look like this
institutiongroup,studentid,iscomplete
institutionId=22343,123,FALSE
institutionId=22343,456,FALSE
and the export.csv would look like this
institutiongroup,studentid,iscomplete
institutionId=22343,789,FALSE
I've got as far as the below, which will put into the shell whether the personid is found in the actswithpersons.csv file.
$donestuff = (Get-Content .\ActsWithpersons.csv | ConvertFrom-Csv); $ids=(Import-Csv .\allper.csv);foreach($id in $ids.personid) {echo $id;if($donestuff -like "*$id*" )
{
echo 'Contains String'
}
else
{
echo 'Does not contain String'
}}
However, I'm not sure how to go the last step, and export & remove the unwanted rows from allper.csv
I've tried (among many things)
$donestuff = (Get-Content .\ActsWithpersons.csv | ConvertFrom-Csv);
Import-Csv .\allper.csv |
Where-Object {$donestuff -notlike $_.personid} |
Export-Csv -Path export.csv -NoTypeInformation
This took a really long time and left me with an empty csv. So, if you can give any guidance, please help.
Since your actswithpersons.csv doesn't have headers, in order for you to import as csv, you can specify the -Header parameter in either Import-Csv or ConvertFrom-Csv; with the former cmdlet being the better solution.
With that said, you can use any header name for those 2 columns then filter by the given column name (ID in this case) after your import of allper.csv using Where-Object:
$awp = (Import-Csv -Path '.\actswithpersons.csv' -Header 'blah','ID').ID.Split(';')
Import-Csv -Path '.\allper.csv' | Where-Object -Property 'Studentid' -notin $awp
This should give you:
institutiongroup studentid iscomplete
---------------- --------- ----------
institutionId=22343 789 FALSE
If you're looking to do it with Get-Content you can split by the delimiters of , and ;. This should give you just a single row of values which you can then compare the entirety of variable ($awp) using the same filter as above which will give you the same results:
$awp = (Get-Content -Path '.\actswithpersons.csv') -split ",|;"
Import-Csv -Path '.\allper.csv' | Where-Object -Property 'Studentid' -notin $awp

Powershell: Import-csv, rename all headers

In our company there are many users and many applications with restricted access and database with evidence of those accessess. I don´t have access to that database, but what I do have is automatically generated (once a day) csv file with all accessess of all my users. I want them to have a chance to check their access situation so i am writing a simple powershell script for this purpose.
CSV:
user;database1_dat;database2_dat;database3_dat
john;0;0;1
peter;1;0;1
I can do:
import-csv foo.csv | where {$_.user -eq $user}
But this will show me original ugly headres (with "_dat" suffix). Can I delete last four characters from every header which ends with "_dat", when i can´t predict how many headers will be there tomorrow?
I am aware of calculated property like:
Select-Object #{ expression={$_.database1_dat}; label='database1' }
but i have to know all column names for that, as far as I know.
Am I convicted to "overingeneer" it by separate function and build whole "calculated property expression" from scratch dynamically or is there a simple way i am missing?
Thanks :-)
Assuming that file foo.csv fits into memory as a whole, the following solution performs well:
If you need a memory-throttled - but invariably much slower - solution, see Santiago Squarzon's helpful answer or the alternative approach in the bottom section.
$headerRow, $dataRows = (Get-Content -Raw foo.csv) -split '\r?\n', 2
# You can pipe the result to `where {$_.user -eq $user}`
ConvertFrom-Csv ($headerRow -replace '_dat(?=;|$)'), $dataRows -Delimiter ';'
Get-Content -Raw reads the entire file into memory, which is much faster than reading it line by line (the default).
-split '\r?\n', 2 splits the resulting multi-line string into two: the header line and all remaining lines.
Regex \r?\n matches a newline (both a CRLF (\r\n) and a LF-only newline (\n))
, 2 limits the number of tokens to return to 2, meaning that splitting stops once the 1st token (the header row) has been found, and the remainder of the input string (comprising all data rows) is returned as-is as the last token.
Note the $null as the first target variable in the multi-assignment, which is used to discard the empty token that results from the separator regex matching at the very start of the string.
$headerRow -replace '_dat(?=;|$)'
-replace '_dat(?=;|$)' uses a regex to remove any _dat column-name suffixes (followed by a ; or the end of the string); if substring _dat only ever occurs as a name suffix (not also inside names), you can simplify to -replace '_dat'
ConvertFrom-Csv directly accepts arrays of strings, so the cleaned-up header row and the string with all data rows can be passed as-is.
Alternative solution: algorithmic renaming of an object's properties:
Note: This solution is slow, but may be an option if you only extract a few objects from the CSV file.
As you note in the question, use of Select-Object with calculated properties is not an option in your case, because you neither know the column names nor their number in advance.
However, you can use a ForEach-Object command in which you use .psobject.Properties, an intrinsic member, for reflection on the input objects:
Import-Csv -Delimiter ';' foo.csv | where { $_.user -eq $user } | ForEach-Object {
# Initialize an aux. ordered hashtable to store the renamed
# property name-value pairs.
$renamedProperties = [ordered] #{}
# Process all properties of the input object and
# add them with cleaned-up names to the hashtable.
foreach ($prop in $_.psobject.Properties) {
$renamedProperties[($prop.Name -replace '_dat(?=.|$)')] = $prop.Value
}
# Convert the aux. hashtable to a custom object and output it.
[pscustomobject] $renamedProperties
}
You can do something like this:
$textInfo = (Get-Culture).TextInfo
$headers = (Get-Content .\test.csv | Select-Object -First 1).Split(';') |
ForEach-Object {
$textInfo.ToTitleCase($_) -replace '_dat'
}
$user = 'peter'
Get-Content .\test.csv | Select-Object -Skip 1 |
ConvertFrom-Csv -Delimiter ';' -Header $headers |
Where-Object User -EQ $user
User Database1 Database2 Database3
---- --------- --------- ---------
peter 1 0 1
Not super efficient but does the trick.

Powershell CSV removing rows and then remove from whole file if A column matches

I've created the following small script to remove 2++ strings from a CSV.
Each row is a log of a given person and a answer they give.
The CSV has X columns.
The column named FIRST identifies the person.
What I need to do is when I delete a row matching the answer, I also need to delete the person from the whole CSV if it had one of the two strings.
What I've made so far, removes the row of people having the answers but the person is still left in the overall CSV with other answers. I want to remove the person fully if the questions have been answered.
Can somebody help me out with making the addition or changes to make this happen?
INPUT File
FIRST,LAST,ADDR,ADDR2,GENDER,HOME,WORK
1,N/A,N/A,N/A,N/A,BAF,N/A
10005,JAS,AA,N/A,,ZAV,N/A
10007,JADE,BB,N/A,OMA,N/A,N/A
10007,JADE,N/A,RAV,N/A,N/A,N/A
10011,KIAH,N/A,N/A,BALI,BB,N/A
SCRIPT
$CSVfile = "C:\Temp\Test\Test.csv"
$CSVfile_filtered = "C:\Temp\Test\Test.csv"
$regex001 = "AA"
$regex002 = "BB"
$filterArray = #($regex001,$regex002)
Get-Content $CSVfile | Select-String -pattern $filterArray -notmatch | Set-Content $CSVfile_filtered
The file should then remove 10005, 10011 and both lines of 10007. But my version only removes one of the 10007 since it only matches one of the two patterns.
Using more of PowerShell's built-in cmdlets can make this a little easier to manage.
# Assuming searching only properties ADDR and ADDR2
$filter = 'AA','BB'
# Grouping by First and Last values to easily remove duplicates
# -match uses regex so | is needed for an OR of multiple items
Import-Csv Test.csv | Group-Object First,Last |
Where {!($_.Group.ADDR,$_.Group.ADDR2 -match ($filter -join '|'))} |
Foreach-Object Group |
Export-Csv output.csv -NoType
You would think strictly using text manipulation would be simpler, but it adds other scenarios to consider:
You will need to track users that have duplicate entries and potentially back track to remove them (if not grouping). This could require reading the file contents twice.
Your header row could match the string you want to filter so you will need to add it to the output if filtering removes it.
Keeping the scenarios above in mind, you can still use a grouping concept:
$filter = 'AA','BB'
$file = Get-Content Test.csv
# $file[0] is the header row
# -split string uses regex and splits at the second comma
# -split results' [0] element is First,Last values
$file[0],($file |
Select-Object -Skip 1 |
Group-Object {($_ -split '(?<=^[^,]*,[^,]*),')[0]} |
where {!($_.Group -match ($filter -join '|'))} |
Foreach-Object Group) | Set-Content output.csv
If I got it right you could do something like this:
$SearchPattern = 'AA', 'BB'
$INPUTCSV = #'
FIRST,LAST,ADDR,ADDR2,GENDER,HOME,WORK
1,N/A,N/A,N/A,N/A,BAF,N/A
10005,JAS,AA,N/A,,ZAV,N/A
10007,JADE,BB,N/A,OMA,N/A,N/A
10007,JADE,N/A,RAV,N/A,N/A,N/A
10011,KIAH,N/A,N/A,BALI,BB,N/A
'# | ConvertFrom-Csv
$ActualSearchPattern =
$INPUTCSV |
Where-Object {
$_.LAST -in $SearchPattern -or
$_.ADDR -in $SearchPattern -or
$_.ADDR2 -in $SearchPattern -or
$_.GENDER -in $SearchPattern -or
$_.HOME -in $SearchPattern -or
$_.Work -in $SearchPattern
} |
Select-Object -ExpandProperty FIRST
$INPUTCSV |
Where-Object -Property FIRST -NotIn -Value $ActualSearchPattern |
Format-Table -AutoSize
There might be more sophisticated or more elegant ways but I cannot think about one at the moment. ;-)
There is a nice PowerShell module you can use to manipulate the content of a csv or xlsx file: ImportExcel
This give you a lot of options to manipulate the sheets, columns etc.

How to read a CSV file but exclude certain columns containing blanks using Get-Content

I want to read a CSV file and exclude rows where dynamically selected columns contain blanks but not all rows of those dynamically selected columns contain blanks.
Trying to use the where clause in the statement below (but not working):
Get-Content $Source -ReadCount 1000 |
Where {
ForEach($NotEqualBlankCol in $BlankColumns)
{
$NotEqualBlankCol -ne $null -and $NotEqualBlankCol -ne ''}
} |
ConvertFrom-Csv |
Sort-Object -Property $SortByColNames.Replace('"', '') -Unique |
.
.
.
| Out-File $Destination
$BlankColumns is my dynamic object string array which I would like to loop through containing the column names of the CSV that are blank. it can be 1 column or more. When more then all of the selected columns need to be blank to qualify as a row that does not need to be included in the final CSV file output.
How do I do it using Get-Content? Any help would be appreciated.
Using Get-Content
Ok. So what this will do it read in the contents of a file X lines at a time. It will parse each line into its indiviual columns. Then it will check the specified columns for blanks. If any of the flagged columns contains a black then it will be filtered out. Consider the test data I used for this
id,first_name,last_name,email,gender,ip_address
1,Christina,Tucker,ctucker0#bbc.co.uk,Female,91.33.192.187
2,Jacqueline,Torres,jtorres1#shop-pro.jp,Female,205.70.183.107
3,Kathy,Perez,kperez2#hugedomains.com,Female,35.175.154.127
4,"",Holmes,eholmes3#canalblog.com,,
5,Ernest,Walker,ewalker4#marketwatch.com,Male,140.110.129.21
6,,Garza,cgarza5#jugem.jp,,
7,,Cunningham,jcunningham6#ox.ac.uk,Female,
8,,Clark,lclark7#posterous.com,,
9,,Ortiz,lortiz8#shareasale.com,,
Notice that the first_name and gender are blank for some of these folks. id 1,2,3,5,10 have complete data. The rest should be filtered.
$BlankColumns = "first_name","gender"
$headers = (Get-Content $path -TotalCount 1).Split(",")
$potentialBlankHeaderIndecies = 0..($headers.Count - 1) | Where-Object{$BlankColumns -contains $headers[$_]}
$potentialBlankHeaderIndecies
Get-Content $path -ReadCount 3 | Foreach-Object{
# Check to see if any of the indexes from a split are empty
$_ | Where-Object{
[bool[]](($_.Split(","))[$potentialBlankHeaderIndecies] | ForEach-Object{
![string]::IsNullOrEmpty($_.Trim('"'))
}) -notcontains $false
}
}
The output of this code is the file, as string, with the removed entries. You can just pipe this into a variable, file or what even you need.
To go into a little more detail we take the header names we want to check and this read in the first line of the csv file. That should contain the column names. Using that we determine the column indexes that we want to scrutinize. The we read in the whole file and parse it line by line. For each line we split on the comma and check the elements matching the identified headers. Check each of those elements if they are blank or null. We trim quotes in case it is a string "" which I will assume you would count as blank. Of all the elements we evaluate as a Boolean whether or not it is empty. If at least one is then it fails the where-object clause and gets ommited.
Using Import-CSV
$BlankColumns = "first_name","gender"
Import-CSV $path | Where-Object{
$line = $_
($BlankColumns | ForEach-Object{
![string]::IsNullOrEmpty(($line.$_.Trim('"')))
}) -notcontains $false
}
Very similar approach just a lot less overhead since we are dealing with objects now instead of strings.
Now you could use Export-CSV or ConvertFrom-CSV depending on your needs in the rest of the project.
Changing the filter criteria.
Both examples above filter columns where any of the columns contain blanks. If you want to omit only where all are blank change the line }) -notcontains $false to }) -contains $true