Extract differences of CSV files into a seperate file - powershell

I have a CSV file (with headers) filled with assortment data. The file will be updated once every day. I need to find the differences in those files (the old and the new one) and extract them into a separate file.
For instance: in the old file there could be a price of "18,50" and now it's an updated one of "17,90". The script should now extract this row into a new file.
So far, I was able to import both CSV files (via Import-Csv) but my current solution is to compare each row by findstr.
The problems are:
In 9 of 10 cases the strings are too long to compare.
What if a new row will be inserted - I guess the comparison wouldn't work any longer if the row isn't inserted at the end of the file.
My current code is:
foreach ($oldData in (Import-Csv $PSScriptRoot\old.csv -Delimiter ";" -Encoding "default")) {
foreach ($newData in (Import-Csv $PSScriptRoot\new.csv -Delimiter ";" -Encoding "default")) {
findstr.exe /v /c:$oldData $newData > $PSScriptRoot\diff.txt
}
}

Read both files into separate variables and use Compare-Object for the comparison:
$fields = 'idArtikel', 'Preis', ...
$csv1 = Import-Csv $PSScriptRoot\old.csv -Delimiter ';'
$csv2 = Import-Csv $PSScriptRoot\new.csv -Delimiter ';'
Compare-Object -ReferenceObject $csv1 -DifferenceObject $csv2 -Property $fields -PassThru | Where-Object {
$_.SideIndicator -eq '=>'
} | Select-Object $fields | Export-Csv 'C:\path\to\diff.csv' -Delimiter ';'

$csv1 | Join $csv2 idArtikel -Merge {$Right.$_} | Export-CSV 'C:\path\to\diff.csv' -Delimiter ';'
For details on Join (Join-Object), see: https://stackoverflow.com/a/45483110/1701026

Related

Remove unnecessary commas in a column in csv file by using PowerShell

I am trying to Remove unnecessary commas in a column in the CSV file. For now, I know a few issues and hard-coded it, But I wanted the code to be dynamic. Any suggestions are greatly appreciated.
$FilePath = "C:\Test\"
Get-ChildItem $FilePath -Filter .csv | ForEach-Object {
(Get-Content $_.FullName -Raw) | Foreach-Object {
$_ -replace ',"Frederick, Fred",' , ',"Frederick Fred",' `
-replace ',"Brian, Josiah",' , ',"Brian Josiah",' `
-replace ',"Lisinopril ,Tablet / 20MG",' , ',"Lisinopril Tablet / 20MG",'
} | Set-Content $_.FullName
}
Try this, also note that I worked with the csv sample that you gave here.It might not work with other csv files.
also make sure that you change the path of %YOURCSVFILE% to the real path of your file
#import the csv
$csv = Import-Csv -Path %YOURCSVFILE% -Delimiter ','
#going each row and replacing commas
foreach ($desc in $csv){
$desc.Desc = $desc.Desc -replace ',',''
}
#exporting the csv
$csv | Export-csv -NoTypeInformation "noCommas.csv"
Here's a few more alteratives for you:
Method 1. Loop through the rows with foreach(..) and capture the output:
$result = foreach ($row in (Import-Csv -Path 'D:\Test\FileWithCommasInDescription.csv')) {
$row.Desc = $row.Desc -replace ','
$row # output the updated item
}
$result | Export-Csv -Path 'D:\Test\FileWithoutCommasInDescription.csv' -NoTypeInformation
Method 2. Use ForEach-Object and the automatic variable $_. Pipe the results through:
Import-Csv -Path 'D:\Test\FileWithCommasInDescription.csv' | ForEach-Object {
$_.Desc = $_.Desc -replace ','
$_ # output the updated item
} | Export-Csv -Path 'D:\Test\FileWithoutCommasInDescription.csv' -NoTypeInformation
Method 3. Use a calculated property:
Import-Csv -Path 'D:\Test\FileWithCommasInDescription.csv' |
Select-Object ID, #{Name = 'Desc'; Expression = {$_.Desc -replace ','}}, Nbr -ExcludeProperty Desc |
Export-Csv -Path 'D:\Test\FileWithoutCommasInDescription.csv' -NoTypeInformation
All will result in a new CSV file
"ID","Desc","Nbr"
"12","Frederick Fred","11"
"21","Brian Josiah","31"
"13","Lisinopril Tablet / 20MG","17"

Converting at least two text files with different rows into one csv - powershell

I am trying to convert two TXT files into one CSV file using powershell script. When files have same structure, and same number of rows then case looks be easy. But in my case txt files have diffrent structure.
Pipe sign in both txt files is not a delimiter should be treat as normal character and it is a string.
File URL.txt
L5020|http://linktosite.de|URL
L100|http://sitelink.de|URL
L50|http://abcde.de|URL
L511|http://bbcccddeee.de|URL
L300|http://link123456.de|URL
L5450|http://randomlink.de|URL_DE
L5460|http://randomwebsitelink.de|URL_DE
File URL1.txt
L5020|http://linktosite.de|URL|P555
L100|http://sitelink.de|URL|P523
L50|http://abcde.de|URL|P53
L511|http://bbcccddeee.de|URL|P540
CSV which I expect should look like as below and delimiter is ";"
HEADER1;HEADER2
L5020|http://linktosite.de|URL;L5020|http://linktosite.de|URL|P555
L100|http://sitelink.de|URL;L100|http://sitelink.de|URL|P523
L50|http://abcde.de|URL;L50|http://abcde.de|URL|P53
L511|http://bbcccddeee.de|URL;L511|http://bbcccddeee.de|URL|P540
L300|http://link123456.de|URL;
L5450|http://randomlink.de|URL_DE;
L5460|http://randomwebsitelink.de|URL_DE;
I tried something like that
$URL = "C:\Users\XXX\Desktop\URL.txt"
$URLcontent = Get-Content $URL
$URL1 = "C:\Users\XXX\Desktop\URL1.txt"
$URLcontent1 = Get-Content $URL1
$results = #() # Empty array to store new created rows in
$csv = Import-CSV "C:\Users\XXX\Desktop\map.csv" -Delimiter ';'
foreach ($row in $csv) {
$properties = [ordered]#{
HEADER1 = $URLcontent
HEADER2 = $URLcontent1
}
# insert the new row as an object into the results-array
$results += New-Object psobject -Property $properties
}
# foreach-loop filled the results-array - export it as a CSV-file
$results | Export-Csv "C:\Users\XXXX\Desktop\map_final.csv" -NoTypeInformation
And something like that:
import-csv URL.txt -Header 'HEADER1' | Export-CSV "C:\Users\xxx\Desktop\URL.csv" -Delimiter ';' -NoTypeInformation
import-csv URL1.txt -Header 'HEADER2' | Export-CSV "C:\Users\xxx\Desktop\URL1.csv" -Delimiter ';' -NoTypeInformation
Get-ChildItem "C:\Users\xx\Desktop" -Filter "URL*.csv" | Select-Object -ExpandProperty FullName | Import-Csv | Export-Csv .\combinedcsvs.csv -NoTypeInformation -Append
Without any succes...
BR
Based on the updates in your question, if you want to build something yourself, you probably want to do something like this:
$Url1 = #(Get-Content .\URL1.txt)
$i = 0
Get-Content .\URL.txt | Foreach-Object {
[pscustomobject]#{
HEADER1 = $_
HEADER2 = If ($i -lt $URL1.Count) { $URL1[$i++] }
}
} | Export-Csv .\combinedcsvs.csv -Delimiter ';' -NoTypeInformation -Append
In case you do not want to go through the hassle of reinventing the wheel (with all pitfalls including performance tuning). Using the Join-Object I mentioned in the comment:
Import-Csv .\URL.txt -Header HEADER1 |
LeftJoin (Import-Csv .\URL1.txt -Header HEADER2) |
Export-Csv .\combinedcsvs.csv -Delimiter ';' -NoTypeInformation -Append
Note1: I am not sure why you trying to import anything like map.csv, I think that is required.
Note2: If you still want to go your own way, try to avoid using the increase assignment operator (+=) to create a collection it is a very expensive operator.
Note3: it is generally not a good idea to join lines on their line index as the list might not be sorted or have duplicates, therefore it is better to join lists on a specific property, like the the Url:
 
Import-Csv .\URL.txt -Delimiter '|' -Header Lid,Url,Type |
LeftJoin (Import-Csv .\URL1.txt -Delimiter '|' -Header Lid2,Url,Type2,Pid) -On Url |
Format-Table # or: Export-Csv .\combinedcsvs.csv -Delimiter ';' -NoTypeInformation
Lid Url Type Lid2 Type2 Pid
--- --- ---- ---- ----- ---
L5020 http://linktosite.de URL L5020 URL P555
L100 http://sitelink.de URL L100 URL P523
L50 http://abcde.de URL L50 URL P53
L511 http://bbcccddeee.de URL L511 URL P540
L300 http://link123456.de URL
L5450 http://randomlink.de URL_DE
L5460 http://randomwebsitelink.de URL_DE
Or on all three (Lid, Url and Type) properties:
Import-Csv .\URL.txt -Delimiter '|' -Header Lid,Url,Type |
LeftJoin (Import-Csv .\URL1.txt -Delimiter '|' -Header Lid,Url,Type,Pid) -On Lid,Url,Type |
Format-Table # or: Export-Csv .\combinedcsvs.csv -Delimiter ';' -NoTypeInformation
Lid Url Type Pid
--- --- ---- ---
L5020 http://linktosite.de URL P555
L100 http://sitelink.de URL P523
L50 http://abcde.de URL P53
L511 http://bbcccddeee.de URL P540
L300 http://link123456.de URL
L5450 http://randomlink.de URL_DE
L5460 http://randomwebsitelink.de URL_DE
If you only want to combine lines where both files contain data, you can do the following:
$f1 = Get-Content file1.txt
$f2 = Get-Content file2.txt
$output = for ($i = 0; $i -lt [math]::Min($f1.count,$f2.count); $i++) {
$f2[$i],$f1[$i] -join '|'
}
$output | Set-Content newfile.txt
If you want to combine all coinciding lines plus add extra lines from one of the files, you can do the following:
$output = for ($i = 0; $i -lt [math]::Max($f1.count,$f2.count); $i++) {
if ($f1[$i] -and $f2[$i]) {
$f2[$i],$f1[$i] -join '|'
}
else {
$f2[$i],$f1[$i] | Where {$_}
}
}
$output | Set-Content newfile.txt

Export-Csv adding unwanted header double quotes

I have got a source CSV file (without a header, all columns delimited by a comma) which I am trying split out into separate CSV files based upon the value in the first column and using that column value as the output file name.
Input file:
S00000009,2016,M04 01/07/2016,0.00,0.00,0.00,0.00,0.00,0.00,750.00,0.00,0.00
S00000009,2016,M05 01/08/2016,0.00,0.00,0.00,0.00,0.00,0.00,600.00,0.00,0.00
S00000009,2016,M06 01/09/2016,0.00,0.00,0.00,0.00,0.00,0.00,600.00,0.00,0.00
S00000010,2015,W28 05/10/2015,2275.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
S00000010,2015,W41 04/01/2016,0.00,0.00,0.00,0.00,0.00,0.00,568.75,0.00,0.00
S00000010,2015,W42 11/01/2016,0.00,0.00,0.00,0.00,0.00,0.00,568.75,0.00,0.00
S00000012,2015,W10 01/06/2015,0.00,0.00,0.00,0.00,0.00,0.00,650.00,0.00,0.00
S00000012,2015,W11 08/06/2015,0.00,0.00,0.00,0.00,0.00,0.00,650.00,0.00,0.00
S00000012,2015,W12 15/06/2015,0.00,0.00,0.00,0.00,0.00,0.00,650.00,0.00,0.00
My PowerShell script looks like this:
Import-Csv INPUT_FILE.csv -Header service_id,year,period,cash_exp,cash_inc,cash_def,act_exp,act_inc,act_def,comm_exp,comm_inc,comm_def |
Group-Object -Property "service_id" |
Foreach-Object {
$path = $_.Name + ".csv";
$_.group | Export-Csv -Path $path -NoTypeInformation
}
Output files:
S00000009.csv:
"service_id","year","period","cash_exp","cash_inc","cash_def","act_exp","act_inc","act_def","comm_exp","comm_inc","comm_def"
"S00000009","2016","M04 01/07/2016","0.00","0.00","0.00","0.00","0.00","0.00","750.00","0.00","0.00"
"S00000009","2016","M05 01/08/2016","0.00","0.00","0.00","0.00","0.00","0.00","600.00","0.00","0.00"
"S00000009","2016","M06 01/09/2016","0.00","0.00","0.00","0.00","0.00","0.00","600.00","0.00","0.00"
S00000010.csv:
"service_id","year","period","cash_exp","cash_inc","cash_def","act_exp","act_inc","act_def","comm_exp","comm_inc","comm_def"
"S00000010","2015","W28 05/10/2015","2275.00","0.00","0.00","0.00","0.00","0.00","0.00","0.00","0.00"
"S00000010","2015","W41 04/01/2016","0.00","0.00","0.00","0.00","0.00","0.00","568.75","0.00","0.00"
"S00000010","2015","W42 11/01/2016","0.00","0.00","0.00","0.00","0.00","0.00","568.75","0.00","0.00"
It is generating the new files using the header value in column 1 (service_id).
There are 2 problems.
The output CSV file contains a header row which I don't need.
The columns are enclosed with double quotes which I don't need.
First of all the .csv file needs headers and the quote marks as a csv file structure. But if you don't want them then you can go on with a text file or...
$temp = Import-Csv INPUT_FILE.csv -Header service_id,year,period,cash_exp,cash_inc,cash_def,act_exp,act_inc,act_def,comm_exp,comm_inc,comm_def | Group-Object -Property "service_id" |
Foreach-Object {
$path=$_.name+".csv"
$temp0 = $_.group | ConvertTo-Csv -NoTypeInformation | Select-Object -Skip 1
$temp1 = $temp0.replace("""","")
$temp1 > $path
}
But this output is not a "real" csv file.
Hope that helps.
For your particular scenario you could probably use a simpler approach. Read the input file as a plain text file, group the lines by splitting off the first field, then write the groups to output files named after the groups:
Get-Content 'INPUT_FILE.csv' |
Group-Object { $_.Split(',')[0] } |
ForEach-Object { $_.Group | Set-Content ($_.Name + '.csv') }
Another solution,
using no named headers but simply numbers (as they aren't wanted in output anyway)
avoiding unneccessary temporary files.
removing only field delimiting double quotes.
Import-Csv INPUT_FILE.csv -Header (1..12) |
Group-Object -Property "1" | Foreach-Object {
($_.Group | ConvertTo-Csv -NoType | Select-Object -Skip 1).Trim('"') -replace '","',',' |
Set-Content -Path ("{0}.csv" -f $_.Name)
}

Delete line in CSV if two columns not equal

I have big CSV Files, here some example of content:
Name;Number;Type;AlterName
Prag;1418;2;2012;Prag
Prag;1836;3;2012;Prag
Prag;1836;514;2012;Moscow
...
And I need delete the line where is not equal Name and AlterName.
In this case:
Prag;1836;514;2012;Moscow
Simply check if the fields are equal.
Import-Csv 'C:\path\to\input.csv' -Delimiter ';' |
Where-Object { $_.Name -eq $_.AlterName } |
Export-Csv 'C:\path\to\output.csv' -Delimiter ';' -NoType

Breaking up CSV Files

So I am looking at breaking up a CSV using Powershell. The CSV is delmited by | which isn't a problem, and I am looking to break it up into multiple smaller csvs while retaining the original. The breaks would occur based off of the value in a single column containing one of a list of values.
What I have done so far is to import the csv (delimited by |) and then
foreach($line in $csv) {
if($columnValue -like $target1) {
export-csv filename1.csv -Delimiter `| $line -append)}
elseif($columnValue -like $target2) {
export-csv filename2.csv -Delimiter `| $line -append)}
etc.
However I do not think it is exporting correctly, and I do not want there to be the quotes (and yes I know this is standard but I do not want them) Also I want the header from the original csv to be applied to the child csvs and its not being applied.
sorry if theres a better way to format the code still new here
Here is where I suggest the awesomeness of the Switch cmdlet. It compares something against multiple potential matches, and executes those matches where appropriate.
Switch($csv){
{$_.column -match $target1} {$_ | Export-CSV filename1.csv -append -delimiter '|'}
{$_.column -match $target2} {$_ | Export-CSV filename2.csv -append -delimiter '|'}
{$_.column -match $target3} {$_ | Export-CSV filename3.csv -append -delimiter '|'}
}
$data = import-csv $csvfile
$data | ?{$_.val -eq $criteria1} | export-csv -path "File1.csv"
$data | ?{$_.val -eq $criteria2} | export-csv -path "File2.csv"