I would like to remove duplicates in a CSV file using PowerShell. I know that there are posts about this already but I can't seem to find one that helps.
I'm trying to merge 2 CSV Files that have the same header and then remove the duplicates of the resulting file based on the IDs listed in the first column and then put it to the same CSV file.
The properties of the file are as follows:
And when I try to use the sort and unique method, I get the following (not a table:
Here is my code so far:
####
#MERGE
$getFirstLine = $true
get-childItem "C:\IGHandover\Raw\IG_INC*.csv"| foreach {
$filePath = $_
$lines = $lines = Get-Content $filePath
$linesToWrite = switch($getFirstLine) {
$true {$lines}
$false {$lines | Select -Skip 1}
}
$getFirstLine = $false
Add-Content "C:\IGHandover\new.csv" $linesToWrite
}
####
#REMOVE DUPLICATES
Import-Csv "C:\IGHandover\new.csv" | Sort inc_number -Unique |
Set-Content "C:\IGHandover\new.csv"
Don't use Get-Content or Set-Content to import or export csv file
Import-Csv (Get-ChildItem 'C:\IGHandover\Raw\IG_INC*.csv') |
Sort-Object -Unique inc_number |
Export-Csv 'C:\IGHandover\new.csv' -NoClobber -NoTypeInformation
I guess you want to update a table (HandoverINC.csv) with records from a new table (New.csv), replacing any records in the HandoverINC.csv with the same primary key (inc_number) from the New.csv in the HandoverINC.csv. And add any new records in the New.csv to the HandoverINC.csv (Basically what is called a Full Join in SQL).
Using the Join-Object described at: https://stackoverflow.com/a/45483110/1701026
Import-CSV .\HandoverINC.csv | FullJoin (Import-CSV .\New.csv) inc_number {$Right.$_} | Export-CSV .\HandoverINC.csv
As suggested by Lieven Keersmaekers and Vivek Kumar, I've made a few changes in my code:
Put the merged contents to a temporary file
Import the csv file with the merge contents
Sort the column of reference and use the unique parameter
Export the results to a new csv file
I found that my code was similar to Vincent K's:
#MERGE
$getFirstLine = $true
get-childItem "C:\IGHandover\Raw\IG_INC*.csv"|
foreach {
$filePath = $_
$lines = $lines = Get-Content $filePath
$linesToWrite = switch($getFirstLine) {
$true {$lines}
$false {$lines | Select -Skip 1}}
$getFirstLine = $false
Add-Content "C:\IGHandover\HandoverINCtemp.csv" $linesToWrite }
#REMOVE DUPLICATES
Import-Csv "C:\IGHandover\HandoverINCtemp.csv" | Sort inc_number -Unique |
Export-Csv "C:\IGHandover\HandoverINC.csv" -NoClobber -NoTypeInformation -Force
Remove-Item "C:\IGHandover\HandoverINCtemp.csv"
To simplify (merging and removing duplicates with the same header), as suggested by Vincent:
Import-Csv (Get-ChildItem "C:\IGHandover\Raw\IG_INC*.csv") | Sort inc_number -Unique |
Export-Csv "C:\IGHandover\HandoverINC.csv" -NoClobber -NoTypeInformation -Force
I hope this helps anyone who'd like to do the same with their files
Related
I'm trying (badly) to work through combining CSV files into one file and prepending a column that contains the file name. I'm new to PowerShell, so hopefully someone can help here.
I tried initially to do the well documented approach of using Import-Csv / Export-Csv, but I don't see any options to add columns.
Get-ChildItem -Filter *.csv | Select-Object -ExpandProperty FullName | Import-Csv | Export-Csv CombinedFile.txt -UseQuotes Never -NoTypeInformation -Append
Next I'm trying to loop through the files and append the name, which kind of works, but for some reason this stops after the first row is generated. Since it's not a CSV process, I have to use the switch to skip the first title row of each file.
$getFirstLine = $true
Get-ChildItem -Filter *.csv | Where-Object {$_.Name -NotMatch "Combined.csv"} | foreach {
$filePath = $_
$collection = Get-Content $filePath
foreach($lines in $collection) {
$lines = ($_.Basename + ";" + $lines)
}
$linesToWrite = switch($getFirstLine) {
$true {$lines}
$false {$lines | Select -Skip 1}
}
$getFirstLine = $false
Add-Content "Combined.csv" $linesToWrite
}
This is where the -PipelineVariable parameter comes in real handy. You can set a variable to represent the current iteration in the pipeline, so you can do things like this:
Get-ChildItem -Filter *.csv -PipelineVariable File | Where-Object {$_.Name -NotMatch "Combined.csv"} | ForEach-Object { Import-Csv $File.FullName } | Select *,#{l='OriginalFile';e={$File.Name}} | Export-Csv Combined.csv -Notypeinfo
Merging your CSVs into one and adding a column for the file's name can be done as follows, using a calculated property on Select-Object:
Get-ChildItem -Filter *.csv | ForEach-Object {
$fileName = $_.Name
Import-Csv $_.FullName | Select-Object #{
Name = 'FileName'
Expression = { $fileName }
}, *
} | Export-Csv path/to/merged.csv -NoTypeInformation
I am trying to Remove unnecessary commas in a column in the CSV file. For now, I know a few issues and hard-coded it, But I wanted the code to be dynamic. Any suggestions are greatly appreciated.
$FilePath = "C:\Test\"
Get-ChildItem $FilePath -Filter .csv | ForEach-Object {
(Get-Content $_.FullName -Raw) | Foreach-Object {
$_ -replace ',"Frederick, Fred",' , ',"Frederick Fred",' `
-replace ',"Brian, Josiah",' , ',"Brian Josiah",' `
-replace ',"Lisinopril ,Tablet / 20MG",' , ',"Lisinopril Tablet / 20MG",'
} | Set-Content $_.FullName
}
Try this, also note that I worked with the csv sample that you gave here.It might not work with other csv files.
also make sure that you change the path of %YOURCSVFILE% to the real path of your file
#import the csv
$csv = Import-Csv -Path %YOURCSVFILE% -Delimiter ','
#going each row and replacing commas
foreach ($desc in $csv){
$desc.Desc = $desc.Desc -replace ',',''
}
#exporting the csv
$csv | Export-csv -NoTypeInformation "noCommas.csv"
Here's a few more alteratives for you:
Method 1. Loop through the rows with foreach(..) and capture the output:
$result = foreach ($row in (Import-Csv -Path 'D:\Test\FileWithCommasInDescription.csv')) {
$row.Desc = $row.Desc -replace ','
$row # output the updated item
}
$result | Export-Csv -Path 'D:\Test\FileWithoutCommasInDescription.csv' -NoTypeInformation
Method 2. Use ForEach-Object and the automatic variable $_. Pipe the results through:
Import-Csv -Path 'D:\Test\FileWithCommasInDescription.csv' | ForEach-Object {
$_.Desc = $_.Desc -replace ','
$_ # output the updated item
} | Export-Csv -Path 'D:\Test\FileWithoutCommasInDescription.csv' -NoTypeInformation
Method 3. Use a calculated property:
Import-Csv -Path 'D:\Test\FileWithCommasInDescription.csv' |
Select-Object ID, #{Name = 'Desc'; Expression = {$_.Desc -replace ','}}, Nbr -ExcludeProperty Desc |
Export-Csv -Path 'D:\Test\FileWithoutCommasInDescription.csv' -NoTypeInformation
All will result in a new CSV file
"ID","Desc","Nbr"
"12","Frederick Fred","11"
"21","Brian Josiah","31"
"13","Lisinopril Tablet / 20MG","17"
Below is the data I have in 2 csv
CSV_1
"Username","UserCreationStatus","GroupAdditionStatus"
"WA92J4063641OAD","Success","Success"
CSV_2
"GroupName","GroupCreationStatus"
"WA92GRP-ADAdminAccount-CAP-OAD","Already exist"
I need to merge them in to single csv file like below
"Username","UserCreationStatus","GroupAdditionStatus","GroupName","GroupCreationStatus"
"WA92J4063641OAD","Success","Success","WA92GRP-ADAdminAccount-CAP-OAD","Already exist"
I tried the below code
Get-ChildItem -Path $RootPath -Filter *.csv | Select-Object * | Import-Csv | Export-Csv $RootPath\merged.csv -NoTypeInformation -Append
But getting below error
Import-Csv : You must specify either the -Path or -LiteralPath parameters, but not both.
Please let me know what is wrong here
You could do simething like this.
It does not work but shows the logic.
Let me know, if you have any questions.
$CSV1 = ".\first.csv"
$CSV2 = ".\second.csv"
$NewCSV = ".\new.csv"
$Data1 = Get-Content -Path $CSV1
$Data2 = Get-Content -Path $CSV2
foreach ($Line in $CSV1)
{
Add-Content -Value "$($Line),$($CSV2[$index])" -Path $NewCSV
}
I use powershell to automate extracting of selected data from a CSV file.
My $target_servers also contains two the same server name but it has different data in each rows.
Here is my code:
$target_servers = Get-Content -Path D:\Users\Tools\windows\target_prd_servers.txt
foreach($server in $target_servers) {
Import-Csv $path\Serverlist_Template.csv | Where-Object {$_.Hostname -Like $server} | Export-Csv -Path $path/windows_prd.csv -Append -NoTypeInformation
}
After executing the above code it extracts CSV data based on a TXT file, but my problem is some of the results are duplicated.
I am expecting around 28 results but it gave me around 49.
As commented, -Append is the culprit here and you should check if the newly added records are not already present in the output file:
# read the Hostname column of the target csv file as array to avoid duplicates
$existingHostsNames = #((Import-Csv -Path "$path/windows_prd.csv").Hostname)
$target_servers = Get-Content -Path D:\Users\Tools\windows\target_prd_servers.txt
foreach($server in $target_servers) {
Import-Csv "$path\Serverlist_Template.csv" |
Where-Object {($_.Hostname -eq $server) -and ($existingHostsNames -notcontains $_.HostName)} |
Export-Csv -Path "$path/windows_prd.csv" -Append -NoTypeInformation
}
You can convert your data to array of objects and then use select -Unique, like this:
$target_servers = Get-Content -Path D:\Users\Tools\windows\target_prd_servers.txt
$data = #()
foreach($server in $target_servers) {
$data += Import-Csv $path\Serverlist_Template.csv| Where-Object {$_.Hostname -Like $server}
}
$data | select -Unique | Export-Csv -Path $path/windows_prd.csv -Append -NoTypeInformation
It will work only if duplicated rows have same value in every column. If not, you can pass column names to select which are important for you. For ex.:
$data | select Hostname -Unique | Export-Csv -Path $path/windows_prd.csv -Append -NoTypeInformation
It will give you list of unique hostnames.
I have a CSV file which is structured like this:
"SA1";"21020180123155514000000000000000002"
"SA2";"21020180123155514000000000000000002";"210"
"SA4";"21020180123155514000000000000000002";"210";"200000001"
"SA5";"21020180123155514000000000000000002";"210";"200000001";"140000001";"ZZ"
"SA1";"21020180123155522000000000000000002"
"SA2";"21020180123155522000000000000000002";"210"
"SA4";"21020180123155522000000000000000002";"210";"200000001"
"SA5";"21020180123155522000000000000000002";"210";"200000001";"140000671";"ZZ"
"SA1";"21020180123155567000000000000000002"
"SA2";"21020180123155567000000000000000002";"210"
"SA4";"21020180123155567000000000000000002";"210";"200000001"
"SA5";"21020180123155567000000000000000002";"210";"200000001";"140000001";"ZZ"
So the Value in the second field (separator ';') marks the data which belongs together and value 140000001 or 140000671 is the trigger.
So the result should be:
1st file: 140000001.txt
"SA1";"21020180123155514000000000000000002"
"SA2";"21020180123155514000000000000000002";"210"
"SA4";"21020180123155514000000000000000002";"210";"200000001"
"SA5";"21020180123155514000000000000000002";"210";"200000001";"140000001";"ZZ"
"SA1";"21020180123155567000000000000000002"
"SA2";"21020180123155567000000000000000002";"210"
"SA4";"21020180123155567000000000000000002";"210";"200000001"
"SA5";"21020180123155567000000000000000002";"210";"200000001";"140000001";"ZZ"
2nd file: 140000671.txt
"SA1";"21020180123155522000000000000000002"
"SA2";"21020180123155522000000000000000002";"210"
"SA4";"21020180123155522000000000000000002";"210";"200000001"
"SA5";"21020180123155522000000000000000002";"210";"200000001";"140000671";"ZZ"
For now I found a snippet which splits the big file by the second field:
$src = "C:\temp\ORD001.txt"
$dstDir = "C:\temp\files\"
Remove-Item -Path "$dstDir\\*"
$header = Get-Content -Path $src | select -First 1
Get-Content -Path $src | select -Skip 1 | foreach {
$file = "$(($_ -split ";")[1]).txt"
Write-Verbose "Wrting to $file"
$file = $file.Replace('"',"")
if (-not (Test-Path -Path $dstDir\$file))
{
Out-File -FilePath $dstDir\$file -InputObject $header -Encoding ascii
}
$file -replace '"', ""
Out-File -FilePath $dstDir\$file -InputObject $_ -Encoding ascii -Append
}
For the rest I'm standing in the dark.
Please help.
The Import-CSV cmdlet will work here, if you don't already know about it. I would use that, as it returns all the rows as different objects in an array, with the properties being the column values. And you don't have to manually remove the quotes and such. Assuming the second column is a date time value, and should be unique for each group of 4 consecutive rows, then this will work:
$src = "C:\temp\ORD001.txt"
$dstDir = "C:\temp\files\"
Remove-Item -Path "$dstDir\*"
$csv = Import-CSV $src -Delimiter ';'
$DateTimeGroups = $csv | Group-Object -Property 'ColumnTwoHeader'
foreach ($group in $DateTimeGroups) {
$filename = $group.Group.'ColumnFiveHeader' | select -Unique
$group.Group | Export-CSV "$dstDir\$filename.txt" -Append -NoTypeInformation
}
However, this will break if two of those "groups of 4 consecutive rows" have the same value for the second column and the fifth column. There isn't a way to fix this unless you are certain that there will always be 4 consecutive rows in each time group. In which case:
$src = "C:\temp\ORD001.txt"
$dstDir = "C:\temp\files\"
Remove-Item -Path "$dstDir\*"
$csv = Import-CSV $src -Delimiter ';'
if ($csv.count % 4 -ne 0) {
Write-Error "CSV does not have a proper number of rows. Attempting to continue will be bad :)"
return
}
for ($i = 0 ; $i -lt $csv.Count ; $i=$i+4) {
$group = $csv[$i..($i+4)]
$group | Export-Csv "$dstDir\$($group[3].'ColumnFiveHeader').txt" -Append -NoTypeInformation
}
Just be sure to replace Column2Header and Column5Header with the appropriate values.
If performance is not a concern, combining Import-Csv / Export-Csv with Group-Object allows the most concise, direct expression of your intent, using PowerShell's ability to convert CSV to objects and back:
$src = "C:\temp\ORD001.txt" # Input CSV file
$dstDir = "C:\temp\files" # Output directory
# Delete previous output files, if necessary.
Remove-Item -Path "$dstDir\*" -WhatIf
# Import the source CSV into custom objects with properties named for the columns.
# Note: The assumption is that your CSV header line defines columns "Col1", "Col2", ...
Import-Csv $src -Delimiter ';' |
# Group the resulting objects by column 2
Group-Object -Property Col2 |
ForEach-Object { # Process each resulting group.
# Determine the output filename via the group's last row's column 5 value.
$outFile = '{0}\{1}.txt' -f $dstDir, $_.Group[-1].Col5
# Append the group at hand to the target file.
$_.Group | Export-Csv -Append -Encoding Ascii $outFile -Delimiter ';' -NoTypeInformation
}
Note:
The assumption - in line with your sample data - is that it is always the last row in a group of lines sharing the same column-2 value whose column 5 contains the root of the output filename (e.g., 140000001)
Sorry but I don't have a Header Column. It's a semikolon seperated txt file for an interface
You can simply read the file with Get-Content, and then search for the trigger in the line.
I hope this small example can help:
$file = Get-Content CSV_File.txt
$140000001 = #()
$140000671 = #()
$bTrig = #()
foreach($line in $file){
$bTrig += $line
if($line -match ';"140000001";'){
$140000001 += $bTrig
$bTrig = #()
}
elseif($line -match ';"140000671";'){
$140000671 += $bTrig
$bTrig = #()
}
}
if($bTrig.Count -ne 0){Write-Warning "No trigger for $bTrig"}
$140000001 | Out-File 140000001.txt -Encoding ascii
$140000671 | Out-File 140000671.txt -Encoding ascii