Remove certain duplicate values from csv file

Remove certain duplicate values from csv file - powershell

I try to import a csv file and create a xlsx file from the data afterwards. My Goal is to only show the value of Column1 once and not in every row. The csv file is already sorted so a check if the previous/next row has the same value would be possible.
CSV
"Column1";"Column2";"Column3"
"Value1A";"Value1B";"Value1C"
"Value1A";"Value2B";"Value2C"
"Value1A";"Value3B";"Value3C"
"Value2A";"Value4B";"Value4C"
Expected Outcome
"Column1";"Column2";"Column3"
"Value1A";"Value1B";"Value1C"
"";"Value2B";"Value2C"
"";"Value2B";"Value1C"
"Value2A";"Value4B";"Value4C"
Outcome
"Column1";"Column2";"Column3"
"Value1A";"Value1B";"Value1C"
"Value1A";"Value2B";"Value2C"
"Value1A";"Value2B";"Value1C"
"Value2A";"Value4B";"Value4C"
Only column1 duplicate cells should be empty.
My Code to import and add to Excel
$csv = "C:\path\to\file.csv"
$i = 1
Import-Csv $csv | Select-Object -Property Column1,Column2,Column3 | ForEach-Object {
$j = 1
foreach ($prop in $_.PSObject.Properties) {
if ($i -eq 1) {
$serverInfoSheet.Cells.Item($i, $j++).Value = $prop.Name
} else {
$serverInfoSheet.Cells.Item($i, $j++).Value = $prop.Value
}
}
$i++
}
To provide further context imagine Column1 as a Date and Columns2 and 3 are Employees.
Example of expected outcome
"12/01/2020";"Mark";"Tony"
"";"Mark";"Andrew"
"";"Tony;Vanessa"
"12/02/2020";"Tony";"Michael"
I dont want the date to repeat 2 times because the excel sheet loses clear view.

$Csv = #'
"Column1";"Column2";"Column3"
"Value1A";"Value1B";"Value1C"
"Value1A";"Value2B";"Value2C"
"Value1A";"Value3B";"Value3C"
"Value2A";"Value4B";"Value4C"
'#
$Csv | ConvertFrom-Csv -Delimiter ';' |
Foreach-Object -Begin { $Last1 = $Null } {
if ( $_.Column1 -eq $Last1 ) { $_.Column1 = '' }
else { $Last1 = $_.Column1 }
$_
} | ConvertTo-Csv -Delimiter ';'
"Column1";"Column2";"Column3"
"Value1A";"Value1B";"Value1C"
"";"Value2B";"Value2C"
"";"Value3B";"Value3C"
"Value2A";"Value4B";"Value4C"

Related

How can I add string and create new column in my csv file using PowerShell

In my existing CSV file I have a column called "SharePoint ID" and it look like this
1.ylkbq
2.KlMNO
3.
4.MSTeam
6.
7.MSTEAM
8.LMNO83
and I'm just wondering how can I create a new Column in my CSV call "SharePoint Email" and then add "#gmail.com" to only the actual Id like "ylkbq", "KLMNO" and "LMNO83" instead of applying to all even in the blank space. And Maybe not add/transfer "MSTEAM" to the new Column since it's not an Id.
$file = "C:\AuditLogSearch\New folder\OriginalFile.csv"
$file2 = "C:\AuditLogSearch\New folder\newFile23.csv"
$add = "#GMAIL.COM"
$properties = #{
Name = 'Sharepoint Email'
Expression = {
switch -Regex ($_.'SharePoint ID') {
#Not sure what to do here
}
}
}, '*'
Import-Csv -Path $file |
Select-Object $properties |
Export-Csv $file2 -NoTypeInformation

Using calculated properties with Select-Object this is how it could look:
$add = "#GMAIL.COM"
$expression = {
switch($_.'SharePoint ID')
{
{[string]::IsNullOrWhiteSpace($_) -or $_ -match 'MSTeam'}
{
# Null value or mathces MSTeam, leave this Null
break
}
Default # We can assume these are IDs, append $add
{
$_.Trim() + $add
}
}
}
Import-Csv $file | Select-Object *, #{
Name = 'SharePoint Email'
Expression = $expression
} | Export-Csv $file2 -NoTypeInformation
Sample Output
Index SharePoint ID SharePoint Email
----- ------------- ----------------
1 ylkbq ylkbq#GMAIL.COM
2 KlMNO KlMNO#GMAIL.COM
3
4 MSTeam
5
6 MSTEAM
7 LMNO83 LMNO83#GMAIL.COM
A more concise expression, since I misread the point, it can be reduced to just one if statement:
$expression = {
if(-not [string]::IsNullOrWhiteSpace($_.'SharePoint ID') -and $_ -notmatch 'MSTeam')
{
$_.'SharePoint ID'.Trim() + $add
}
}

Find out Text data in CSV File Numeric Columns in Powershell

I am very new in powershell.
I am trying to validate my CSV file by finding out if there is any text value in my numeric fields. I can define with columns are numeric.
This is my source data like this
ColA ColB ColC ColD
23 23 ff 100
2.30E+01 34 2.40E+01 23
df 33 ss df
34 35 36 37
I need output something like this (only text values if found in any column)
ColA ColC ColD
2.30E+01 ff df
df 2.40E+01
ss
I have tried some code but not getting any results, get only some output like as under
System.Object[]
---------------
xxx fff' ddd 3.54E+03
...
This is what I was trying
#
cls
function Is-Numeric ($Value) {
return $Value -match "^[\d\.]+$"
}
$arrResult = #()
$arraycol = #()
$FileCol = #("ColA","ColB","ColC","ColD")
$dif_file_path = "C:\Users\$env:username\desktop\f2.csv"
#Importing CSVs
$dif_file = Import-Csv -Path $dif_file_path -Delimiter ","
############## Test Datatype (Is-Numeric)##########
foreach($col in $FileCol)
{
foreach ($line in $dif_file) {
$val = $line.$col
$isnum = Is-Numeric($val)
if ($isnum -eq $false) {
$arrResult += $line.$col
$arraycol += $col
}
}
}
[pscustomobject]#{$arraycol = "$arrResult"}| out-file "C:\Users\$env:username\Desktop\Errors1.csv"
####################
can someone guide me right direction?
Thanks

You can try something like this,
function Is-Numeric ($Value) {
return $Value -match "^[\d\.]+$"
}
$dif_file_path = "C:\Users\$env:username\desktop\f2.csv"
#Importing CSVs
$dif_file = Import-Csv -Path $dif_file_path -Delimiter ","
#$columns = $dif_file | Get-member -MemberType 'NoteProperty' | Select-Object -ExpandProperty 'Name'
# Use this to specify certain columns
$columns = "ColB", "ColC", "ColD"
foreach($row in $dif_file) {
foreach ($col in $columns) {
if ($col -in $columns) {
if (!(Is-Numeric $row.$col)) {
$row.$col = ""
}
}
}
}
$dif_file | Export-Csv C:\temp\formatted.txt
Look up name of columns as you go
Look up values of each col in each row and if it is not numeric, change to ""
Exported updated file.

I think not displaying columns that have no data creates the challenge here. You can do the following:
$csv = Import-Csv "C:\Users\$env:username\desktop\f2.csv"
$finalprops = [collections.generic.list[string]]#()
$out = foreach ($line in $csv) {
$props = $line.psobject.properties | Where {$_.Value -notmatch '^[\d\.]+$'} |
Select-Object -Expand Name
$props | Where {$_ -notin $finalprops} | Foreach-Object { $finalprops.add($_) }
if ($props) {
$line | Select $props
}
$out | Select-Object ($finalprops | Sort)
Given the nature of Format-Table or tabular output, you only see the properties of the first object in the collection. So if object1 has ColA only, but object2 has ColA and ColB, you only see ColA.

The output order you want is quite different than the input CSV; you're tracking bad text data not by first occurrence, but by column order, which requires some extra steps.
test.csv file contents:
ColA,ColB,ColC,ColD
23,23,ff,100
2.30E+01,34,2.40E+01,23
df,33,ss,df
34,35,36,37
Sample code tested to meet your description:
$csvIn = Import-Csv "$PSScriptRoot\test.csv";
# create working data set with headers in same order as input file
$data = [ordered]#{};
$csvIn[0].PSObject.Properties | foreach {
$data.Add($_.Name, (New-Object System.Collections.ArrayList));
};
# add fields with text data
$csvIn | foreach {
$_.PSObject.Properties | foreach {
if ($_.Value -notmatch '^-?[\d\.]+$') {
$null = $data[$_.Name].Add($_.Value);
}
}
}
$removes = #(); # remove `good` columns with numeric data
$rowCount = 0; # column with most bad values
$data.GetEnumerator() | foreach {
$badCount = $_.Value.Count;
if ($badCount -eq 0) { $removes += $_.Key; }
if ($badCount -gt $rowCount) { $rowCount = $badCount; }
}
$removes | foreach { $data.Remove($_); }
0..($rowCount - 1) | foreach {
$h = [ordered]#{};
foreach ($key in $data.Keys) {
$h.Add($key, $data[$key][$_]);
}
[PSCustomObject]$h;
} |
Export-Csv -NoTypeInformation -Path "$PSScriptRoot\text-data.csv";
output file contents:
"ColA","ColC","ColD"
"2.30E+01","ff","df"
"df","2.40E+01",
,"ss",

#Jawad, Finally I have tried
function Is-Numeric ($Value) {
return $Value -match "^[\d\.]+$"
}
$arrResult = #()
$columns = "ColA","ColB","ColC","ColD"
$dif_file_path = "C:\Users\$env:username\desktop\f1.csv"
$dif_file = Import-Csv -Path $dif_file_path -Delimiter "," |select $columns
$columns = $dif_file | Get-member -MemberType 'NoteProperty' | Select-Object -ExpandProperty 'Name'
foreach($row in $dif_file) {
foreach ($col in $columns) {
$val = $row.$col
$isnum = Is-Numeric($val)
if ($isnum -eq $false) {
$arrResult += $col+ " " +$row.$col
}}}
$arrResult | out-file "C:\Users\$env:username\desktop\Errordata.csv"
I get correct result in my out file, order is very ambiguous like
ColA ss
ColB 5.74E+03
ColA ss
ColC rrr
ColB 3.54E+03
ColD ss
ColB 8.31E+03
ColD cc
any idea to get proper format? thanks
Note: with your suggested code, I get complete source file with all data , not the specific error data.

Compare and merge 2 csv files based on 2 first columns with possible duplicate values

I have 2 csv files I'm asked to merge where the values from the first column match. Both files can have the possibility of having duplicate values, and if they do, a new row should be created to support those values. If no match is found, then print the value no match.
Except for looking for duplicate values, I am using the following code...
Function GetFirstColumnNameFromFile
{
Param ($CsvFileWithPath)
$FirstFileFirstColumnTitle = ((Get-Content $CsvFileWithPath -TotalCount 2 | ConvertFrom-Csv).psobject.properties | ForEach-Object {$_.name})[0]
Write-Output $FirstFileFirstColumnTitle
}
Function CreateMergedFileWithCsv2ColumnOneColumn
{
Param ($firstColumnFirstFile, $FirstFileFirstColumnTitle, $firstFile, $secondFile, $resultsFile)
Write-Host "Creating hash table with columns values `"Csv2ColumnOne`" `"Csv2ColumnTwo`" From $secondFile"
$hashColumnOneColumnTwo2ndFile = #{}
Import-Csv $secondFile | Where-Object {$firstColumnFirstFile -contains $_.'Csv2ColumnOne'} | ForEach-Object {$hashColumnOneColumnTwo2ndFile[$_.'Csv2ColumnOne'] = $_.Csv2ColumnTwo}
Write-Host "Complete."
Write-Host "Creating Merge file with file $firstFile
and column `"Csv2ColumnTwo`" from file $secondFile"
Import-Csv $firstFile | Select-Object *, #{n='Csv2ColumnOne'; e={
if ($hashColumnOneColumnTwo2ndFile.ContainsKey($_.$FirstFileFirstColumnTitle)) {
$hashColumnOneColumnTwo2ndFile[$_.$FirstFileFirstColumnTitle]
} Else {
'Not Found'
}}} | Export-Csv $resultsFile -NoType -Force
Write-Host "Complete."
}
Function MatchFirstTwoColumnsTwoFilesAndCombineOtherColumnsOneFile
{
Param ($firstFile, $secondFile, $resultsFile)
[string]$FirstFileFirstColumnTitle = GetFirstColumnNameFromFile $firstFile
$FirstFileFirstColumn = Import-Csv $firstFile | Where-Object {$_.$FirstFileFirstColumnTitle} | Select-Object -ExpandProperty $FirstFileFirstColumnTitle
CreateMergedFileWithCsv2ColumnOneColumn $FirstFileFirstColumn $FirstFileFirstColumnTitle $firstFile $secondFile $resultsFile
}
Function Main
{
$firstFile = 'C:\Scripts\Tests\test1.csv'
$secondFile = 'C:\Scripts\Tests\test2.csv'
$resultsFile = 'C:\Scripts\Tests\testResults.csv'
MatchFirstTwoColumnsTwoFilesAndCombineOtherColumnsOneFile $firstFile $secondFile $resultsFile
}
Main
The contents of the first csv file is:
firstName,secondName
1234,Value1
2345,Value1
3456,Value1
4567,Value1
7645,Value3
The contents of the second csv file is:
Csv2ColumnOne,Csv2ColumnTwo,Csv2ColumnThree
1234,abc,Value1
1234,asd,Value1
3456,qwe,Value1
4567,mnb,Value1
The results is:
"firstName","secondName","Csv2ColumnOne"
"1234","Value1","asd"
"2345","Value1","Not Found"
"3456","Value1","qwe"
"4567","Value1","mnb"
"7645","Value3","Not Found"
Since the second file has a duplicate value of 1234 the result file should be:
"firstName","secondName","Csv2ColumnOne"
"1234","Value1","abc"
"1234","Value1","asd"
"2345","Value1","Not Found"
"3456","Value1","qwe"
"4567","Value1","mnb"
"7645","Value3","Not Found"
Is there a way I can do this?

Loop through csv compare content with an array and then add content to csv

I don't know how to append a string to CSV. What am I doing:
I have two csv files. One with a list of host-names and id's and another one with a list of host-names and some numbers.
Example file 1:
Hostname | ID
IWBW140004 | 3673234
IWBW130023 | 2335934
IWBW120065 | 1350213
Example file 2:
ServiceCode | Hostname | ID
4 | IWBW120065 |
4 | IWBW140004 |
4 | IWBW130023 |
Now I read the content of file 1 in a two dimensional array:
$pcMatrix = #(,#())
Import-Csv $outputFile |ForEach-Object {
foreach($property in $_.PSObject.Properties){
$pcMatrix += ,($property.Value.Split(";")[1],$property.Value.Split(";")[2])
}
}
Then I read the content of file 2 and compare it with my array:
Import-Csv $Group".csv" | ForEach-Object {
foreach($property in $_.PSObject.Properties){
for($i = 0; $i -lt $pcMatrix.Length; $i++){
if($pcMatrix[$i][0] -eq $property.Value.Split('"')[1]){
#Add-Content here
}
}
}
}
What do I need to do, to append $pcMatrix[$i][1] to the active column in file 2 in the row ID?
Thanks for your suggestions.
Yanick

It seems like you are over-complicating this task.
If I understand you correctly, you want to populate the ID column in file two, with the ID that corresponds to the correct hostname from file 1. The easiest way to do that, is to fill all the values from the first file into a HashTable and use that to lookup the ID for each row in the second file:
# Read the first file and populate the HashTable:
$File1 = Import-Csv .\file1.txt -Delimiter "|"
$LookupTable = #{}
$File1 |ForEach-Object {
$LookupTable[$_.Hostname] = $_.ID
}
# Now read the second file and update the ID values:
$File2 = Import-Csv .\file2.txt -Delimiter "|"
$File2 |ForEach-Object {
$_.ID = $LookupTable[$_.Hostname]
}
# Then write the updated rows back to a new CSV file:
$File2 | Export-CSV -Path .\file3.txt -NoTypeInformation -Delimiter "|"

Get a variable by dynamic variable name

How does one access data imported from a CSV file by using dynamic note property names? That is, one doesn't know the colunm names beforehand. They do match a pattern and are extracted from the CSV file when the script runs.
As for an example, consider a CSV file:
"Header 1","Header A","Header 3","Header B"
0,0,0,0
1,2,3,4
5,6,7,8
I'd like to extract only columns that end with a letter. To do this, I read the header row and extract names with a regex like so,
$reader = new-object IO.StreamReader("C:\tmp\data.csv")
$line = $reader.ReadLine()
$headers = #()
$line.Split(",") | % {
$m = [regex]::match($_, '("Header [A-Z]")')
if($m.Success) { $headers += $m.value } }
This will get all the column names I care about:
"Header A"
"Header B"
Now, to access a CSV file I import it like so,
$csvData = import-csv "C:\tmp\data.csv"
Import-CSV will create a custom object that has properties as per the header row. One can access the fields by NoteProperty names like so,
$csvData | % { $_."Header A" } # Works fine
This obviously requires one to know the column name in advance. I'd like to use colunn names I extracted and stored into the $headers. How would I do that?
Some things I've tried so far
$csvData | % { $_.$headers[0] } # Error: Cannot index into a null array.
$csvData | % { $np = $headers[0]; $_.$np } # Doesn't print anything.
$csvData | % { $_.$($headers[0]) } # Doesn't print anything.
I could change the script like so it will write another a script that does know the column names. Is that my only solution?

I think you want this:
[string[]]$headers = $csvdata | gm -MemberType "noteproperty" |
?{ $_.Name -match "Header [a-zA-Z]$"} |
select -expand Name
$csvdata | select $headers
Choose the headers that match the condition (in this case, ones ending with characters) and then get the csv data for those headers.

the first thing ( and the only one... sorry) that came in my mind is:
$csvData | % { $_.$(( $csvData | gm | ? { $_.membertype -eq "noteproperty"} )[0].name) }
for get the first's column values and
$csvData | % { $_.$(( $csvData | gm | ? { $_.membertype -eq "noteproperty"} )[1].name) }
for second column and so on....
is this what you need?

you can use custom script to parse csv manually:
$content = Get-Content "C:\tmp\data.csv"
$header = $content | Select -first 1
$columns = $header.Split(",")
$indexes = #()
for($i; $i -lt $columns.Count;$i++)
{
# to verify whether string end with letter matches this regex: "[A-Za-z]$"
if ($column[$i] -match "[A-Za-z]$")
{
$indexes += $i
}
}
$outputFile = "C:\tmp\outdata.csv"
Remove-Item $outputFile -ea 0
foreach ($line in $content)
{
$output = ""
$rowcol = $line.Split(",")
[string]::Join(",", ($indexes | foreach { $rowcol[$_] })) | Add-Content $outputFile
}

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Remove certain duplicate values from csv file - powershell

Related

How can I add string and create new column in my csv file using PowerShell

Find out Text data in CSV File Numeric Columns in Powershell

Compare and merge 2 csv files based on 2 first columns with possible duplicate values

Loop through csv compare content with an array and then add content to csv

Get a variable by dynamic variable name

Categories

Resources