Using Powershell, how can I export and delete csv rows, where a particular value is *not found* in a *different* csv? - powershell

I have two files. One is called allper.csv
institutiongroup,studentid,iscomplete
institutionId=22343,123,FALSE
institutionId=22343,456,FALSE
institutionId=22343,789,FALSE
The other one is called actswithpersons.csv
abc,123;456
def,456
ghi,123
jkl,123;456
Note: The actswithpersons.csv does not have headers - they are going to be added in later via an excel power query so don't want them in there now. The actswithpersons csv columns are delimited with commas - there are only two columns, and the second one contains multiple personids - again Excel will deal with this later.
I want to remove all rows from allper.csv where the personid doesn't appear in actswithpersons.csv, and export them to another csv. So in the desired outcome, allper.csv would look like this
institutiongroup,studentid,iscomplete
institutionId=22343,123,FALSE
institutionId=22343,456,FALSE
and the export.csv would look like this
institutiongroup,studentid,iscomplete
institutionId=22343,789,FALSE
I've got as far as the below, which will put into the shell whether the personid is found in the actswithpersons.csv file.
$donestuff = (Get-Content .\ActsWithpersons.csv | ConvertFrom-Csv); $ids=(Import-Csv .\allper.csv);foreach($id in $ids.personid) {echo $id;if($donestuff -like "*$id*" )
{
echo 'Contains String'
}
else
{
echo 'Does not contain String'
}}
However, I'm not sure how to go the last step, and export & remove the unwanted rows from allper.csv
I've tried (among many things)
$donestuff = (Get-Content .\ActsWithpersons.csv | ConvertFrom-Csv);
Import-Csv .\allper.csv |
Where-Object {$donestuff -notlike $_.personid} |
Export-Csv -Path export.csv -NoTypeInformation
This took a really long time and left me with an empty csv. So, if you can give any guidance, please help.

Since your actswithpersons.csv doesn't have headers, in order for you to import as csv, you can specify the -Header parameter in either Import-Csv or ConvertFrom-Csv; with the former cmdlet being the better solution.
With that said, you can use any header name for those 2 columns then filter by the given column name (ID in this case) after your import of allper.csv using Where-Object:
$awp = (Import-Csv -Path '.\actswithpersons.csv' -Header 'blah','ID').ID.Split(';')
Import-Csv -Path '.\allper.csv' | Where-Object -Property 'Studentid' -notin $awp
This should give you:
institutiongroup studentid iscomplete
---------------- --------- ----------
institutionId=22343 789 FALSE
If you're looking to do it with Get-Content you can split by the delimiters of , and ;. This should give you just a single row of values which you can then compare the entirety of variable ($awp) using the same filter as above which will give you the same results:
$awp = (Get-Content -Path '.\actswithpersons.csv') -split ",|;"
Import-Csv -Path '.\allper.csv' | Where-Object -Property 'Studentid' -notin $awp

Related

Powershell CSV removing rows and then remove from whole file if A column matches

I've created the following small script to remove 2++ strings from a CSV.
Each row is a log of a given person and a answer they give.
The CSV has X columns.
The column named FIRST identifies the person.
What I need to do is when I delete a row matching the answer, I also need to delete the person from the whole CSV if it had one of the two strings.
What I've made so far, removes the row of people having the answers but the person is still left in the overall CSV with other answers. I want to remove the person fully if the questions have been answered.
Can somebody help me out with making the addition or changes to make this happen?
INPUT File
FIRST,LAST,ADDR,ADDR2,GENDER,HOME,WORK
1,N/A,N/A,N/A,N/A,BAF,N/A
10005,JAS,AA,N/A,,ZAV,N/A
10007,JADE,BB,N/A,OMA,N/A,N/A
10007,JADE,N/A,RAV,N/A,N/A,N/A
10011,KIAH,N/A,N/A,BALI,BB,N/A
SCRIPT
$CSVfile = "C:\Temp\Test\Test.csv"
$CSVfile_filtered = "C:\Temp\Test\Test.csv"
$regex001 = "AA"
$regex002 = "BB"
$filterArray = #($regex001,$regex002)
Get-Content $CSVfile | Select-String -pattern $filterArray -notmatch | Set-Content $CSVfile_filtered
The file should then remove 10005, 10011 and both lines of 10007. But my version only removes one of the 10007 since it only matches one of the two patterns.
Using more of PowerShell's built-in cmdlets can make this a little easier to manage.
# Assuming searching only properties ADDR and ADDR2
$filter = 'AA','BB'
# Grouping by First and Last values to easily remove duplicates
# -match uses regex so | is needed for an OR of multiple items
Import-Csv Test.csv | Group-Object First,Last |
Where {!($_.Group.ADDR,$_.Group.ADDR2 -match ($filter -join '|'))} |
Foreach-Object Group |
Export-Csv output.csv -NoType
You would think strictly using text manipulation would be simpler, but it adds other scenarios to consider:
You will need to track users that have duplicate entries and potentially back track to remove them (if not grouping). This could require reading the file contents twice.
Your header row could match the string you want to filter so you will need to add it to the output if filtering removes it.
Keeping the scenarios above in mind, you can still use a grouping concept:
$filter = 'AA','BB'
$file = Get-Content Test.csv
# $file[0] is the header row
# -split string uses regex and splits at the second comma
# -split results' [0] element is First,Last values
$file[0],($file |
Select-Object -Skip 1 |
Group-Object {($_ -split '(?<=^[^,]*,[^,]*),')[0]} |
where {!($_.Group -match ($filter -join '|'))} |
Foreach-Object Group) | Set-Content output.csv
If I got it right you could do something like this:
$SearchPattern = 'AA', 'BB'
$INPUTCSV = #'
FIRST,LAST,ADDR,ADDR2,GENDER,HOME,WORK
1,N/A,N/A,N/A,N/A,BAF,N/A
10005,JAS,AA,N/A,,ZAV,N/A
10007,JADE,BB,N/A,OMA,N/A,N/A
10007,JADE,N/A,RAV,N/A,N/A,N/A
10011,KIAH,N/A,N/A,BALI,BB,N/A
'# | ConvertFrom-Csv
$ActualSearchPattern =
$INPUTCSV |
Where-Object {
$_.LAST -in $SearchPattern -or
$_.ADDR -in $SearchPattern -or
$_.ADDR2 -in $SearchPattern -or
$_.GENDER -in $SearchPattern -or
$_.HOME -in $SearchPattern -or
$_.Work -in $SearchPattern
} |
Select-Object -ExpandProperty FIRST
$INPUTCSV |
Where-Object -Property FIRST -NotIn -Value $ActualSearchPattern |
Format-Table -AutoSize
There might be more sophisticated or more elegant ways but I cannot think about one at the moment. ;-)
There is a nice PowerShell module you can use to manipulate the content of a csv or xlsx file: ImportExcel
This give you a lot of options to manipulate the sheets, columns etc.

Powershell - Finding the output of get-contents and searching for all occurrences in another file using wild cards

I'm trying to get the output of two separate files although I'm stuck on the wild card or contains select-string search from file A (Names) in file B (name-rank).
The contents of file A is:
adam
george
william
assa
kate
mark
The contents of file B is:
12-march-2020,Mark-1
12-march-2020,Mark-2
12-march-2020,Mark-3
12-march-2020,william-4
12-march-2020,william-2
12-march-2020,william-7
12-march-2020,kate-54
12-march-2020,kate-12
12-march-2020,kate-44
And I need to match on every occurrence of the names after the '-' so my ordered output should look like this which is a combination of both files as the output:
mark
Mark-1
Mark-2
Mark-3
william
william-2
william-4
william-7
Kate
kate-12
kate-44
kate-54
So far I only have the following and I'd be grateful for any pointers or assistance please.
import-csv (c:\temp\names.csv) |
select-string -simplematch (import-csv c:\temp\names-rank.csv -header "Date", "RankedName" | select RankedName) |
set-content c:\temp\names-and-ranks.csv
I imagine the select-string isn't going to be enough and I need to write a loop instead.
The data you give in the example does not give you much to work with, and the desired output is not that intuitive, most of the time with Powershell you would like to combine the data in to a much richer output at the end.
But anyway, with what is given here and what you want, the code bellow will get what you need, I have left comments in the code for you
$pathDir='C:\Users\myUser\Downloads\trash'
$names="$pathDir\names.csv"
$namesRank="$pathDir\names-rank.csv"
$nameImport = Import-Csv -Path $names -Header names
$nameRankImport= Import-Csv -Path $namesRank -Header date,rankName
#create an empty array to collect the result
$list=#()
foreach($name in $nameImport){
#get all the match names
$match=$nameRankImport.RankName -like "$($name.names)*"
#add the name from the First list
$list+=($name.names)
#if there are any matches, add them too
if($match){
$list+=$match
}
}
#Because its a one column string, Export-CSV will now show us what we want
$list | Set-Content -Path "$pathDir\names-and-ranks.csv" -Force
For this I would use a combination of Group-Object and Where-Object to first group all "RankedName" items by the name before the dash, then filter on those names to be part of the names we got from the 'names.csv' file and output the properties you need.
# read the names from the file as string array
$names = Get-Content -Path 'c:\temp\names.csv' # just a list of names, so really not a CSV
# import the CSV file and loop through
Import-Csv -Path 'c:\temp\names-rank.csv' -Header "Date", "RankedName" |
Group-Object { ($_.RankedName -split '-')[0] } | # group on the name before the dash in the 'RankedName' property
Where-Object { $_.Name -in $names } | # use only the groups that have a name that can be found in the $names array
ForEach-Object {
$_.Name # output the group name (which is one of the $names)
$_.Group.RankedName -join [environment]::NewLine # output the group's 'RankedName' property joined with a newline
} |
Set-Content -Path 'c:\temp\names-and-ranks.csv'
Output:
Mark
Mark-1
Mark-2
Mark-3
william
william-4
william-2
william-7
kate
kate-54
kate-12
kate-44

Select specific column based on data supplied using Powershell

I have a csv file that may have unknown headers, one of the columns will contain email addresses for example.
Is there a way to select only the column that contains the email addresses and save it as a list to a variable?
One csv could have the header say email, another could say emailaddresses, another could say email addresses another file might not even have the word email in the header. As you can see, the headers are different. So I want to be able to detect the correct column first and use that data further in the script. Once the column is identified based on the data it contains, select that column only.
I've tried the where-object and select-string cmdlets. With both, the output is the entire array and not just the data in the column I am wanting.
$CSV = import-csv file.csv
$CSV | Where {$_ -like "*#domain.com"}
This outputs the entire array as all rows will contain this data.
Sample Data for visualization
id,first_name,bagel,last_name
1,Base,bcruikshank0#homestead.com,Cruikshank
2,Regan,rbriamo1#ebay.co.uk,Briamo
3,Ryley,rsacase2#mysql.com,Sacase
4,Siobhan,sdonnett3#is.gd,Donnett
5,Patty,pesmonde4#diigo.com,Esmonde
Bagel is obviously what we are trying to find. And we will play pretend in that we have no knowledge of the columns name or position ahead of time.
Find column dynamically
# Import the CSV
$data = Import-CSV $path
# Take the first row and get its columns
$columns = $data[0].psobject.properties.name
# Cycle the columns to find the one that has an email address for a row value
# Use a VERY crude regex to validate an email address.
$emailColumn = $columns | Where-Object{$data[0].$_ -match ".*#*.\..*"}
# Example of using the found column(s) to display data.
$data | Select-Object $emailColumn
Basically read in the CSV like normal and use the first columns data to try and figure out where the email address column is. There is a caveat that if there is more than one column that matches it will get returned.
To enforce only 1 result a simple pipe to Select-Object -First 1 will handle that. Then you just have to hope the first one is the "right" one.
If you're using Import-Csv, the result is a PSCustomObject.
$CsvObject = Import-Csv -Path 'C:\Temp\Example.csv'
$Header = ($CsvObject | Get-Member | Where-Object { $_.Name -like '*email*' }).Name
$CsvObject.$Header
This filters for the header containing email, then selects that column from the object.
Edit for requirement:
$Str = #((Get-Content -Path 'C:\Temp\Example.csv') -like '*#domain.com*')
$Headers = #((Get-Content -Path 'C:\Temp\Example.csv' -TotalCount 1) -split ',')
$Str | ConvertFrom-Csv -Delimiter ',' -Header $Headers
Other method:
$PathFile="c:\temp\test.csv"
$columnName=$null
$content=Get-Content $PathFile
foreach ($item in $content)
{
$SplitRow= $item -split ','
$Cpt=0..($SplitRow.Count - 1) | where {$SplitRow[$_] -match ".*#*.\..*"} | select -first 1
if ($Cpt)
{
$columnName=($content[0] -split ',')[$Cpt]
break
}
}
if ($columnName)
{
import-csv "c:\temp\test.csv" | select $columnName
}
else
{
"No Email column founded"
}

How to read a CSV file but exclude certain columns containing blanks using Get-Content

I want to read a CSV file and exclude rows where dynamically selected columns contain blanks but not all rows of those dynamically selected columns contain blanks.
Trying to use the where clause in the statement below (but not working):
Get-Content $Source -ReadCount 1000 |
Where {
ForEach($NotEqualBlankCol in $BlankColumns)
{
$NotEqualBlankCol -ne $null -and $NotEqualBlankCol -ne ''}
} |
ConvertFrom-Csv |
Sort-Object -Property $SortByColNames.Replace('"', '') -Unique |
.
.
.
| Out-File $Destination
$BlankColumns is my dynamic object string array which I would like to loop through containing the column names of the CSV that are blank. it can be 1 column or more. When more then all of the selected columns need to be blank to qualify as a row that does not need to be included in the final CSV file output.
How do I do it using Get-Content? Any help would be appreciated.
Using Get-Content
Ok. So what this will do it read in the contents of a file X lines at a time. It will parse each line into its indiviual columns. Then it will check the specified columns for blanks. If any of the flagged columns contains a black then it will be filtered out. Consider the test data I used for this
id,first_name,last_name,email,gender,ip_address
1,Christina,Tucker,ctucker0#bbc.co.uk,Female,91.33.192.187
2,Jacqueline,Torres,jtorres1#shop-pro.jp,Female,205.70.183.107
3,Kathy,Perez,kperez2#hugedomains.com,Female,35.175.154.127
4,"",Holmes,eholmes3#canalblog.com,,
5,Ernest,Walker,ewalker4#marketwatch.com,Male,140.110.129.21
6,,Garza,cgarza5#jugem.jp,,
7,,Cunningham,jcunningham6#ox.ac.uk,Female,
8,,Clark,lclark7#posterous.com,,
9,,Ortiz,lortiz8#shareasale.com,,
Notice that the first_name and gender are blank for some of these folks. id 1,2,3,5,10 have complete data. The rest should be filtered.
$BlankColumns = "first_name","gender"
$headers = (Get-Content $path -TotalCount 1).Split(",")
$potentialBlankHeaderIndecies = 0..($headers.Count - 1) | Where-Object{$BlankColumns -contains $headers[$_]}
$potentialBlankHeaderIndecies
Get-Content $path -ReadCount 3 | Foreach-Object{
# Check to see if any of the indexes from a split are empty
$_ | Where-Object{
[bool[]](($_.Split(","))[$potentialBlankHeaderIndecies] | ForEach-Object{
![string]::IsNullOrEmpty($_.Trim('"'))
}) -notcontains $false
}
}
The output of this code is the file, as string, with the removed entries. You can just pipe this into a variable, file or what even you need.
To go into a little more detail we take the header names we want to check and this read in the first line of the csv file. That should contain the column names. Using that we determine the column indexes that we want to scrutinize. The we read in the whole file and parse it line by line. For each line we split on the comma and check the elements matching the identified headers. Check each of those elements if they are blank or null. We trim quotes in case it is a string "" which I will assume you would count as blank. Of all the elements we evaluate as a Boolean whether or not it is empty. If at least one is then it fails the where-object clause and gets ommited.
Using Import-CSV
$BlankColumns = "first_name","gender"
Import-CSV $path | Where-Object{
$line = $_
($BlankColumns | ForEach-Object{
![string]::IsNullOrEmpty(($line.$_.Trim('"')))
}) -notcontains $false
}
Very similar approach just a lot less overhead since we are dealing with objects now instead of strings.
Now you could use Export-CSV or ConvertFrom-CSV depending on your needs in the rest of the project.
Changing the filter criteria.
Both examples above filter columns where any of the columns contain blanks. If you want to omit only where all are blank change the line }) -notcontains $false to }) -contains $true

Add Column to CSV Windows PowerShell

I have a fairly standard csv file with headers I want to add a new column & set all the rows to the same data.
Original:
column1, column2
1,b
2,c
3,5
After
column1, column2, column3
1,b, setvalue
2,c, setvalue
3,5, setvalue
I can't find anything on this if anybody could point me in the right direction that would be great. Sorry very new to Power Shell.
Here's one way to do that using Calculated Properties:
Import-Csv file.csv |
Select-Object *,#{Name='column3';Expression={'setvalue'}} |
Export-Csv file.csv -NoTypeInformation
You can find more on calculated properties here: http://technet.microsoft.com/en-us/library/ff730948.aspx.
In a nutshell, you import the file, pipe the content to the Select-Object cmdlet, select all exiting properties (e.g '*') then add a new one.
The ShayLevy's answer also works for me!
If you don't want to provide a value for each object yet the code is even easier...
Import-Csv file.csv |
Select-Object *,"column3" |
Export-Csv file.csv -NoTypeInformation
None of the scripts I've seen are dynamic in nature, so they're fairly limited in their scope & what you can do with them.. that's probably because most PS Users & even Power Users aren't programmers. You very rarely see the use of arrays in Powershell. I took Shay Levy's answer & improved upon it.
Note here: The Import needs to be consistent (two columns for instance), but it would be fairly easy to modify this to dynamically count the columns & generate headers that way too. For this particular question, that wasn't asked. Or simply don't generate a header unless it's needed.
Needless to say the below will pull in as many CSV files that exist in the folder, add a header, and then later strip it. The reason I add the header is for consistency in the data, it makes manipulating the columns later down the line fairly straight forward too (if you choose to do so). You can modify this to your hearts content, feel free to use it for other purposes too. This is generally the format I stick with for just about any of my Powershell needs. The use of a counter basically allows you to manipulate individual files, so there's a lot of possibilities here.
$chargeFiles = 'C:\YOURFOLDER\BLAHBLAH\'
$existingReturns = Get-ChildItem $chargeFiles
for ($i = 0; $i -lt $existingReturns.count; $i++)
{
$CSV = Import-Csv -Path $existingReturns[$i].FullName -Header Header1,Header2
$csv | select *, #{Name='Header3';Expression={'Header3 Static'}}
| select *, #{Name='Header4';Expression={'Header4 Static Tet'}}
| select *, #{Name='Header5';Expression={'Header5 Static Text'}}|
CONVERTTO-CSV -DELIMITER "," -NoTypeInformation |
SELECT-OBJECT -SKIP 1 | % {$_ -replace '"', ""} |
OUT-FILE -FilePath $existingReturns[$i].FullName -FORCE -ENCODING ASCII
}
You could also use Add-Member:
$csv = Import-Csv 'input.csv'
foreach ($row in $csv)
{
$row | Add-Member -NotePropertyName 'MyNewColumn' -NotePropertyValue 'MyNewValue'
}
$csv | Export-Csv 'output.csv' -NoTypeInformation
For some applications, I found that producing a hashtable and using the .values as the column to be good (it would allow for cross reference validation against another object that was being enumerated).
In this case, #powershell on freenode brought my attention to an ordered hashtable (since the column header must be used).
Here is an example without any validation the .values
$newcolumnobj = [ordered]#{}
#input data into a hash table so that we can more easily reference the `.values` as an object to be inserted in the CSV
$newcolumnobj.add("volume name", $currenttime)
#enumerate $deltas [this will be the object that contains the volume information `$volumedeltas`)
# add just the new deltas to the newcolumn object
foreach ($item in $deltas){
$newcolumnobj.add($item.volume,$item.delta)
}
$originalcsv = #(import-csv $targetdeltacsv)
#thanks to pscookiemonster in #powershell on freenode
for($i=0; $i -lt $originalcsv.count; $i++){
$originalcsv[$i] | Select-Object *, #{l="$currenttime"; e={$newcolumnobj.item($i)}}
}
Example is related to How can I perform arithmetic to find differences of values in two CSVs?
create a csv file with nothin in it
$csv >> "$PSScriptRoot/dpg.csv"
define the csv file's path. here $psscriptroot is the root of the script
$csv = "$PSScriptRoot/dpg.csv"
now add columns to it
$csv | select vds, protgroup, vlan, ports | Export-Csv $csv