Delete line in CSV if two columns not equal - powershell

I have big CSV Files, here some example of content:
Name;Number;Type;AlterName
Prag;1418;2;2012;Prag
Prag;1836;3;2012;Prag
Prag;1836;514;2012;Moscow
...
And I need delete the line where is not equal Name and AlterName.
In this case:
Prag;1836;514;2012;Moscow

Simply check if the fields are equal.
Import-Csv 'C:\path\to\input.csv' -Delimiter ';' |
Where-Object { $_.Name -eq $_.AlterName } |
Export-Csv 'C:\path\to\output.csv' -Delimiter ';' -NoType

Related

Get Unique Column and Count from CSV file in Powershell

I have a CSV File called Products.csv
Product_ID,Category
1,A
2,A
3,A
4,B
I want a powershell script that will show me the Unique Categories along with the Count and export to CSV.
i.e.
A,3
B,1
I have used the following code to extract the Unique Categories, but cannot get the Count:
Import-Csv Products.csv -DeLimiter ","|
Select 'Category' -Unique |
Export-Csv Summary.csv -DeLimiter "," -NoTypeInformation
Can anyone help me out?
Thanks.
You can use Group-Object to get the count.
Import-Csv Products.csv -DeLimiter "," |
Group-Object Category | Foreach-Object {
"{0},{1}" -f $_.Name,$_.Count
}
If you want a CSV output of the count, you need headers for your data. Group-Object outputs property Name which is the grouped property value and Count which is the number of items in that group.
Import-Csv Products.csv -DeLimiter "," |
Group-Object Category | Select-Object Name,Count |
Export-Csv Summary.csv -Delimiter ',' -NoType
You can take the above code a step further and use Select-Object's calculated properties. Then you can create custom named columns and/or values with expressions.
Import-Csv Products.csv -DeLimiter "," |
Group-Object Category |
Select-Object #{n='Product_ID';e={$_.Name}},Count |
Export-Csv Summary.csv -Delimiter ',' -NoType

export csv rows where duplicate values found in column

I have a csv file where I am trying to export rows into another csv file only where the values in the id column have duplicates.
I have the following csv file...
"id","blablah"
"valOne","valTwo"
"valOne","asdfdsa"
"valThree","valFour"
"valFive","valSix"
"valFive","qwreweq"
"valSeven","valEight"
I need the output csv file to look like the following...
"valOne","valTwo"
"valOne","asdfdsa"
"valFive","valSix"
"valFive","qwreweq"
Here is the code I have so far:
$inputCsv = Import-CSV './test.csv' -delimiter ","
#$output = #()
$inputCsv | Group-Object -prop id, blablah | Where-Object {$_.id -gt 1} |
Select-Object
##{n='id';e={$_.Group[0].id}},
##{n='blablah';e={$_.Group[0].blablah}}
#Export-Csv 'C:\scripts\powershell\output.csv' -NoTypeInformation
#Write-Host $output
#$output | Export-Csv 'C:\scripts\powershell\output.csv' -NoTypeInformation
I've searched multiple how-to's but can't seem to find the write syntax. Can anyone help with this?
Just group on the ID property and if there is more than 1 count in the group then expand those and export.
$inputCsv = Import-CSV './test.csv' -delimiter ","
$inputCsv |
Group-Object -Property ID |
Where-Object count -gt 1 |
Select-Object -ExpandProperty group |
Export-Csv output.csv -NoTypeInformation
output.csv will contain
"id","blablah"
"valOne","valTwo"
"valOne","asdfdsa"
"valFive","valSix"
"valFive","qwreweq"

Export-Csv adding unwanted header double quotes

I have got a source CSV file (without a header, all columns delimited by a comma) which I am trying split out into separate CSV files based upon the value in the first column and using that column value as the output file name.
Input file:
S00000009,2016,M04 01/07/2016,0.00,0.00,0.00,0.00,0.00,0.00,750.00,0.00,0.00
S00000009,2016,M05 01/08/2016,0.00,0.00,0.00,0.00,0.00,0.00,600.00,0.00,0.00
S00000009,2016,M06 01/09/2016,0.00,0.00,0.00,0.00,0.00,0.00,600.00,0.00,0.00
S00000010,2015,W28 05/10/2015,2275.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
S00000010,2015,W41 04/01/2016,0.00,0.00,0.00,0.00,0.00,0.00,568.75,0.00,0.00
S00000010,2015,W42 11/01/2016,0.00,0.00,0.00,0.00,0.00,0.00,568.75,0.00,0.00
S00000012,2015,W10 01/06/2015,0.00,0.00,0.00,0.00,0.00,0.00,650.00,0.00,0.00
S00000012,2015,W11 08/06/2015,0.00,0.00,0.00,0.00,0.00,0.00,650.00,0.00,0.00
S00000012,2015,W12 15/06/2015,0.00,0.00,0.00,0.00,0.00,0.00,650.00,0.00,0.00
My PowerShell script looks like this:
Import-Csv INPUT_FILE.csv -Header service_id,year,period,cash_exp,cash_inc,cash_def,act_exp,act_inc,act_def,comm_exp,comm_inc,comm_def |
Group-Object -Property "service_id" |
Foreach-Object {
$path = $_.Name + ".csv";
$_.group | Export-Csv -Path $path -NoTypeInformation
}
Output files:
S00000009.csv:
"service_id","year","period","cash_exp","cash_inc","cash_def","act_exp","act_inc","act_def","comm_exp","comm_inc","comm_def"
"S00000009","2016","M04 01/07/2016","0.00","0.00","0.00","0.00","0.00","0.00","750.00","0.00","0.00"
"S00000009","2016","M05 01/08/2016","0.00","0.00","0.00","0.00","0.00","0.00","600.00","0.00","0.00"
"S00000009","2016","M06 01/09/2016","0.00","0.00","0.00","0.00","0.00","0.00","600.00","0.00","0.00"
S00000010.csv:
"service_id","year","period","cash_exp","cash_inc","cash_def","act_exp","act_inc","act_def","comm_exp","comm_inc","comm_def"
"S00000010","2015","W28 05/10/2015","2275.00","0.00","0.00","0.00","0.00","0.00","0.00","0.00","0.00"
"S00000010","2015","W41 04/01/2016","0.00","0.00","0.00","0.00","0.00","0.00","568.75","0.00","0.00"
"S00000010","2015","W42 11/01/2016","0.00","0.00","0.00","0.00","0.00","0.00","568.75","0.00","0.00"
It is generating the new files using the header value in column 1 (service_id).
There are 2 problems.
The output CSV file contains a header row which I don't need.
The columns are enclosed with double quotes which I don't need.
First of all the .csv file needs headers and the quote marks as a csv file structure. But if you don't want them then you can go on with a text file or...
$temp = Import-Csv INPUT_FILE.csv -Header service_id,year,period,cash_exp,cash_inc,cash_def,act_exp,act_inc,act_def,comm_exp,comm_inc,comm_def | Group-Object -Property "service_id" |
Foreach-Object {
$path=$_.name+".csv"
$temp0 = $_.group | ConvertTo-Csv -NoTypeInformation | Select-Object -Skip 1
$temp1 = $temp0.replace("""","")
$temp1 > $path
}
But this output is not a "real" csv file.
Hope that helps.
For your particular scenario you could probably use a simpler approach. Read the input file as a plain text file, group the lines by splitting off the first field, then write the groups to output files named after the groups:
Get-Content 'INPUT_FILE.csv' |
Group-Object { $_.Split(',')[0] } |
ForEach-Object { $_.Group | Set-Content ($_.Name + '.csv') }
Another solution,
using no named headers but simply numbers (as they aren't wanted in output anyway)
avoiding unneccessary temporary files.
removing only field delimiting double quotes.
Import-Csv INPUT_FILE.csv -Header (1..12) |
Group-Object -Property "1" | Foreach-Object {
($_.Group | ConvertTo-Csv -NoType | Select-Object -Skip 1).Trim('"') -replace '","',',' |
Set-Content -Path ("{0}.csv" -f $_.Name)
}

Extract differences of CSV files into a seperate file

I have a CSV file (with headers) filled with assortment data. The file will be updated once every day. I need to find the differences in those files (the old and the new one) and extract them into a separate file.
For instance: in the old file there could be a price of "18,50" and now it's an updated one of "17,90". The script should now extract this row into a new file.
So far, I was able to import both CSV files (via Import-Csv) but my current solution is to compare each row by findstr.
The problems are:
In 9 of 10 cases the strings are too long to compare.
What if a new row will be inserted - I guess the comparison wouldn't work any longer if the row isn't inserted at the end of the file.
My current code is:
foreach ($oldData in (Import-Csv $PSScriptRoot\old.csv -Delimiter ";" -Encoding "default")) {
foreach ($newData in (Import-Csv $PSScriptRoot\new.csv -Delimiter ";" -Encoding "default")) {
findstr.exe /v /c:$oldData $newData > $PSScriptRoot\diff.txt
}
}
Read both files into separate variables and use Compare-Object for the comparison:
$fields = 'idArtikel', 'Preis', ...
$csv1 = Import-Csv $PSScriptRoot\old.csv -Delimiter ';'
$csv2 = Import-Csv $PSScriptRoot\new.csv -Delimiter ';'
Compare-Object -ReferenceObject $csv1 -DifferenceObject $csv2 -Property $fields -PassThru | Where-Object {
$_.SideIndicator -eq '=>'
} | Select-Object $fields | Export-Csv 'C:\path\to\diff.csv' -Delimiter ';'
$csv1 | Join $csv2 idArtikel -Merge {$Right.$_} | Export-CSV 'C:\path\to\diff.csv' -Delimiter ';'
For details on Join (Join-Object), see: https://stackoverflow.com/a/45483110/1701026

Breaking up CSV Files

So I am looking at breaking up a CSV using Powershell. The CSV is delmited by | which isn't a problem, and I am looking to break it up into multiple smaller csvs while retaining the original. The breaks would occur based off of the value in a single column containing one of a list of values.
What I have done so far is to import the csv (delimited by |) and then
foreach($line in $csv) {
if($columnValue -like $target1) {
export-csv filename1.csv -Delimiter `| $line -append)}
elseif($columnValue -like $target2) {
export-csv filename2.csv -Delimiter `| $line -append)}
etc.
However I do not think it is exporting correctly, and I do not want there to be the quotes (and yes I know this is standard but I do not want them) Also I want the header from the original csv to be applied to the child csvs and its not being applied.
sorry if theres a better way to format the code still new here
Here is where I suggest the awesomeness of the Switch cmdlet. It compares something against multiple potential matches, and executes those matches where appropriate.
Switch($csv){
{$_.column -match $target1} {$_ | Export-CSV filename1.csv -append -delimiter '|'}
{$_.column -match $target2} {$_ | Export-CSV filename2.csv -append -delimiter '|'}
{$_.column -match $target3} {$_ | Export-CSV filename3.csv -append -delimiter '|'}
}
$data = import-csv $csvfile
$data | ?{$_.val -eq $criteria1} | export-csv -path "File1.csv"
$data | ?{$_.val -eq $criteria2} | export-csv -path "File2.csv"