Specific match between two CSV files - powershell

I have two CSV files like this:
CSV1:
Name
test;
test & example;
test & example & testagain;
CSV2:
Name
test1;
test&example;
test & example&testagain;
I want to compare each line of CSV1 with each line of CSV2 and, if the first 5 letters match, write the result.
I'm able to compare them but only if match perfectly:
$CSV1 = Import-Csv -Path ".\client.csv" -Delimiter ";"
$CSV2 = Import-Csv ".\client1.csv" -Delimiter ";"
foreach ($record in $CSV1) {
$result = $CSV2 | Where {$_.name -like $record.name}
$result
}

You can do so with Compare-Object and a custom property definition.
Compare-Object $CSV1 $CSV2 -Property {$_.name -replace '^(.{5}).*', '$1'} -PassThru
$_.name -replace '^(.{5}).*', '$1' will take the first 5 characters from the property name (or less if the string is shorter than 5 characters) and remove the rest. This property is then used for comparing the records from $CSV1 and $CSV2. The parameter -PassThru makes the cmdlet emit the original data rather than objects with just the custom property. In theory you could also use $_.name.Substring(0, 5) instead of a regular expression replacement for extracting the first 5 characters. However, that would throw an error if the name is shorter than 5 characters like in the first record from $CSV1.
By default Compare-Object outputs the differences between the input objects, so you also need to add the parameters -IncludeEqual and -ExcludeDifferent to get just the matching records.
Pipe the result through Select-Object * -Exclude SideIndicator to remove the property SideIndicator from the output.

foreach ($record in $CSV1) {
$CSV2 | Where {"$($_.name)12345".SubString(0, 5) -eq "$($record.name)12345".SubString(0, 5)} |
ForEach {[PSCustomObject]#{Name1 = $Record.Name; Name2 = $_.Name}}
}
or:
... | Where {($_.name[0..4] -Join '') -eq ($record.name[0..4] -Join '')} | ...
Using this Join-Object cmdlet:
$CSV1 | Join $CSV2 `
-Using {($Left.name[0..4] -Join '') -eq ($Right.name[0..4] -Join '')} `
-Property #{Name1 = {$Left.Name}; Name2 = {$Right.Name}}
All the above result in:
Name1 Name2
----- -----
test & example; test & example&testagain;
test & example & testagain; test & example&testagain;

Related

How can I find matching string name in the same object and then add value that is contain on each of the matching property?

I created a powershell command to collect and sort a txt file.
Input example:
a,1
a,1
b,3
c,4
z,5
The output that I have to get:
a,2
b,3
c,4
z,5
Here is my code so far:
$filename = 'test.txt'
Get-Content $filename | ForEach-Object {
$Line = $_.Trim() -Split ','
New-Object -TypeName PSCustomObject -Property #{
Alphabet= $Line[0]
Value= [int]$Line[1]
}
}
example with negative value input
a,1,1
a,1,2
b,3,1
c,4,1
z,5,0
Import your text file as a CSV file using Import-Csv with given column names (-Header), which parses the lines into objects with the column names as property names.
Then use Group-Object to group the objects by shared .Letter values, i.e. the letter that is in the first field of each line.
Using ForEach-Object, process each group of objects (lines), and output a single string that contains the shared letter and the sum of all .Number property values across the objects that make up the group, obtained via Measure-Object -Sum:
#'
a,1
a,1
b,3
c,4
z,5
'# > ./test.txt
Import-Csv -Header Letter, Number test.txt |
Group-Object Letter |
ForEach-Object {
'{0},{1}' -f $_.Name, ($_.Group | Measure-Object -Sum -Property Number ).Sum
}
Note: The above OOP approach is flexible, but potentially slow.
Here's a plain-text alternative that will likely perform better:
Get-Content test.txt |
Group-Object { ($_ -split ',')[0] } |
ForEach-Object {
'{0},{1}' -f $_.Name, ($_.Group -replace '^.+,' | Measure-Object -Sum).Sum
}
See also:
-split, the string splitting operator
-replace, the regular-expression-based string replacement operator
Grouping with summing of multiple fields:
It is easy to extend the OOP approach: add another header field to name the additional column, and add another output field that sums that added column's values for each group too:
#'
a,1,10
a,1,10
b,3,30
'# > ./test.txt
Import-Csv -Header Letter, NumberA, NumberB test.txt |
Group-Object Letter |
ForEach-Object {
'{0},{1},{2}' -f $_.Name,
($_.Group | Measure-Object -Sum -Property NumberA).Sum,
($_.Group | Measure-Object -Sum -Property NumberB).Sum
}
Output (note the values in the a line):
a,2,20
b,3,30
Extending the plain-text approach requires a bit more work:
#'
a,1,10
a,1,10
b,3,30
c,4,40
z,5,50
'# > ./test.txt
Get-Content test.txt |
Group-Object { ($_ -split ',')[0] } |
ForEach-Object {
'{0},{1},{2}' -f $_.Name,
($_.Group.ForEach({ ($_ -split ',')[1] }) | Measure-Object -Sum).Sum,
($_.Group.ForEach({ ($_ -split ',')[2] }) | Measure-Object -Sum).Sum
}
One way to go about this is using Group-Object for the count, and then replacing the current number after the comma with the count.
$filename = 'test.txt'
Get-Content $filename | Group-Object |
ForEach-Object -Process {
if ($_.Count -ne 1) {
$_.Name -replace '\d',$_.Count
}
else {
$_.Name
}
} | ConvertFrom-Csv -Header 'Alphabet','Value'

Issue with Array subtraction in Powershell

I have two CSVs as following:
A
20180809000
20180809555
20180809666
20180809777
20180809888
File2:
A
20180809000
20180809555
20180809666
20180809777
I want to find difference of File1 - File2 which should output 20180809888. I tried the following:
$a1= Import-Csv -Path $file1 | select A
$a2 = Import-Csv -Path $file2 | select A
$a1| where {$a2 -notcontains $_}
But it outputs the entire file 1:
A
--------------
20180809000
20180809555
20180809666
20180809777
20180809888
I tried intersection also, but that outputs null.
The simplest solution is to use:
> Compare-Object (Get-Content .\File1.csv) (Get-Content .\File2.csv) -PassThru
20180809888
Or using Import-Csv
> Compare-Object (Import-Csv .\File1.csv).A (Import-Csv .\File2.csv).A -Passthru
20180809888
Or
> (Compare-Object (Import-Csv .\File1.csv) (Import-Csv .\File2.csv) -Passthru).A
20180809888
Your last line should be the following:
$a1.A.where{$_ -notin $a2.A}
To preserve the column, you can do the following for the last line:
$a1.where{$_.A -notin $a2.A}
The problem with this situation is that if the second file has more data than the first file. Then you would need to do something like this for your last line:
$a1 | compare $a2 | select -expand inputobject
select A will still return an object with a property named A.
# Returns an object list with property A
Import-Csv -Path $file | select A # (shorthand for Select-Object -Property A)
# A
# ---
# value1
# value2
# ...
You can get the array of values of property A using dot notation, e.g.:
# Returns the list of values of the A property
(Import-Csv -Path $file).A
# value1
# value2
# ...
The following should work:
$a1= (Import-Csv -Path $file1).A
$a2 = (Import-Csv -Path $file2).A
$a1 | where {$a2 -notcontains $_}

Where-Object leaving blank rows

I'm again stuck on something that should be so simple. I have a CSV file in which I need to do a few string modifications and export it back out. The data looks like this:
FullName
--------
\\server\project\AOI
\\server\project\AOI\Folder1
\\server\project\AOI\Folder2
\\server\project\AOI\Folder3\User
I need to do the following:
Remove the "\\server\project" from each line but leave the rest of the line
Delete all rows which do not have a Folder (e.g., in the example above, the first row would be deleted but the other three would remain)
Delete any row with the word "User" in the path
Add a column called T/F with a value of "FALSE" for each record
Here is my initial attempt at this:
Get-Content C:\Folders.csv |
% {$_.replace('\\server\project\','')} |
Where-Object {$_ -match '\\'} |
#Removes User Folders rows from CSV
Where-Object {$_ -notmatch 'User'} |
Out-File C:\Folders-mod.csv
This works to a certain extent, except it deletes my header row and I have not found a way to add a column using Get-Content. For that, I have to use Import-Csv, which is fine, but it seems inefficient to be constantly reloading the same file. So I tried rewriting the above using Import-Csv instead of Get-Content:
$Folders = Import-Csv C:\Folders.csv
foreach ($Folder in $Folders) {
$Folder.FullName = $Folder.FullName.Replace('\\server\AOI\', '') |
Where-Object {$_ -match '\\'} |
Where-Object {$_ -notmatch 'User Files'}
}
$Folders | Export-Csv C:\Folders-mod.csv -NoTypeInformation
I haven't added the coding for adding the new column yet, but this keeps the header. However, I end up with a bunch of empty rows where the Where-Object deletes the line, and the only way I can find to get rid of them is to run the output file through a Get-Content command. This all seems overly complicated for something that should be simple.
So, what am I missing?
Thanks to TheMadTechnician for pointing out what I was doing wrong. Here is my final script (with additional column added):
$Folders= Import-CSV C:\Folders.csv
ForEach ($Folder in $Folders)
{
$Folder.FullName = $Folder.FullName.replace('\\server\project\','')
}
$Folders | Where-Object {$_ -match '\\' -and $_ -notmatch 'User'} |
Select-Object *,#{Name='T/F';Expression={'FALSE'}} |
Export-CSV C:\Folders.csv -NoTypeInformation
I would do this with a Table Array and pscustomobject.
#Create an empty Array
$Table = #()
#Manipulate the data
$Fullname = Get-Content C:\Folders.csv |
ForEach-Object {$_.replace('\\server\project\', '')} |
Where-Object {$_ -match '\\'} |
#Removes User Folders rows from CSV
Where-Object {$_ -notmatch 'User'}
#Define custom objects
Foreach ($name in $Fullname) {
$Table += [pscustomobject]#{'Fullname' = $name; 'T/F' = 'FALSE'}
}
#Export results to new csv
$Table | Export-CSV C:\Folders-mod.csv -NoTypeInformation
here's yet another way to do it ... [grin]
$FileList = #'
FullName
\\server\project\AOI
\\server\project\AOI\Folder1
\\server\project\AOI\Folder2
\\server\project\AOI\Folder3\User
'# | ConvertFrom-Csv
$ThingToRemove = '\\server\project'
$FileList |
Where-Object {
# toss out any blank lines
$_ -and
# toss out any lines with "user" in them
$_ -notmatch 'User'
} |
ForEach-Object {
[PSCustomObject]#{
FullName = $_.FullName -replace [regex]::Escape($ThingToRemove)
'T/F' = $False
}
}
output ...
FullName T/F
-------- ---
\AOI False
\AOI\Folder1 False
\AOI\Folder2 False
notes ...
putting a slash in the property name is ... icky [grin]
that requires wrapping the property name in quotes every time you need to access it. try another name - perhaps "Correct".
you can test for blank array items [lines] with $_ all on its own
the [regex]::Escape() stuff is really quite handy

Get first two items positionally from Import-CSV row

I have a series of files that have changed some header naming and column counts over time. However, the files always have the first column as the start date and second column as the end date.
I would like to get just these two columns, but the name has changed over time.
What I have tried is this:
$FileContents=Import-CSV -Path "$InputFilePath"
foreach ($line in $FileContents)
{
$StartDate=$line[0]
$EndDate=$line[1]
}
...but $FileContents is (I believe) an array of a type (objects?) that I'm not sure how to positionally access in PowerShell. Any help would be appreciated.
Edit: The files switched from comma delimiter to pipe delimiter a while back and there are 1000s of files to work with, so I use Import-CSV because it can implicitly read either format.
You could use the -Header parameter to give the first to columns of the csv the header names you want. Then you'll skip the first line that has the old header.
$FileContents = Import-CSV -Path "$InputFilePath" -Header "StartDate","EndDate" | Select-Object "StartDate","EndDate" -Skip 1
foreach ($line in $FileContents) {
$StartDate = $line.StartDate
$EndDate = $line.EndDate
}
Here's an example:
Example.csv
a,b,c
1,2,3
4,5,6
Import-CSV -Path Example.csv -Header "StartDate","EndDate" | Select-Object "StartDate","EndDate" -Skip 1
StartDate EndDate
--------- -------
1 2
4 5
If you use Import-Csv, PowerShell will indeed create an object for you. The "columns" are calles properties. You can select properties with Select-Object. You have to name the properties, you want to select. Since you don't know the property names in advance, you can get the names with Get-Member. The first two properties should match the first two columns in your CSV.
Use the following sample code and apply it to your script:
$csv = #'
1,2,3,4,5
a,b,c,d,e
g,h,i,j,k
'#
$csv = $csv | ConvertFrom-Csv
$properties = $csv | Get-Member -MemberType NoteProperty | Select-Object -First 2 -ExpandProperty Name
$csv | Select-Object -Property $properties
How about this:
$FileContents=get-content -Path "$InputFilePath"
for ($i=0;$i -lt $FileContents.count;$i++){
$textrow = ($FileContents[$i]).split(",")
$StartDate=$textrow[0]
$EndDate=$textrow[1]
#do what you want with the variables
write-host $startdate
write-host $EndDate
}
pending you are referencing a csv file....
Other solution with foreach (%=alias of foreach) and split :
Get-Content "example.csv" | select -skip 1 | %{$row=$_ -split ',', 3; [pscustomobject]#{NewCol1=$row[0];NewCol2=$row[1]}}
You can build predicate into the select too like this :
Get-Content "example.csv" | select #{N="Newcol1";E={($_ -split ',', 3)[0]}}, #{N="Newcol2";E={($_ -split ',', 3)[1]}} -skip 1
With convertfrom-string
Get-Content "example.csv" | ConvertFrom-Csv -Delimiter ',' -Header col1, col2 | select -skip 1

Comparing two files: Single column in FirstFile - Multiple columns in SecondFile

I've figured out how to compare single columns in two files, but I cant figure out how to compare two files, with one column in the first and multiple columns in the second file. Both containing emails.
First file.csv (contains single column with emails)
john#email.com
jack#email.com
jill#email.com
Second file.csv (contains multiple column with emails)
john#email.nl,john#email.eu,john#email.com
jill#email.se,jill#email.com,jill#email.us
By comparing I would like to output, the difference. This would result in.
Output.csv
jack#email.com
Anyone able to help me? :)
Single columns comparison and output difference
#Line extracts emails from list
$SubscribedMails = import-csv .\subscribed.csv | Select-Object -Property email
#Line extracts emails from list
$ValidEmails = import-csv .\users-emails.csv | Select-Object -Property email
$compare = Compare-Object $SubscribedMails $ValidEmails -property email -IncludeEqual | where-object {$_.SideIndicator -eq "<="} | Export-csv .\nonvalid-emails.csv –NoTypeInformation
(Get-Content .\nonvalid-emails.csv) | ForEach-Object { $_ -replace ',"<="' } > .\nonvalid-emails.csv
Since the first file already contains email addresses per column, you can import it right away.
Take the second file and split the strings containing several addresses.
A new array with seperate addresses will be generated.
Judging from your output, you only seek addresses that are within the first csv but not in the second.
Your code could look like this:
$firstFile = Get-Content 'FirstFile.csv'
$secondFile = (Get-Content 'SecondFile.csv').Split(',')
foreach ($item in $firstFile) {
if ($item -notin $secondFile) {
$item | Export-Csv output.csv -Append -NoTypeInformation
}
}
If you want to maintain your code, can you consider a script like:
#Line extracts emails from list
$SubscribedMails = import-csv .\subscribed.csv | Select-Object -Property email
Rename-Item .\users-emails.csv users-emails.csv.bk
Get-Content .\users-emails.csv.bk).replace(',', "`r`n") | Set-Content .\users-emails.csv
#Line extracts emails from list
$ValidEmails = import-csv .\users-emails.csv | Select-Object -Property email
$compare = Compare-Object $SubscribedMails $ValidEmails -property email -IncludeEqual | where-object {$_.SideIndicator -eq "<="} | Export-csv .\nonvalid-emails.csv –NoTypeInformation
(Get-Content .\nonvalid-emails.csv) | ForEach-Object { $_ -replace ',"<="' } > .\nonvalid-emails.csv
Remove-Item .\users-emails.csv
Rename-Item .\users-emails.csv.bk users-emails.csv
or, more simplest
#Line extracts emails from list
$SubscribedMails = import-csv .\subscribed.csv | Select-Object -Property email
Get-Content .\users-emails.csv).replace(',', "`r`n") | Set-Content .\users-emails.csv.bk
#Line extracts emails from list
$ValidEmails = import-csv .\users-emails.csv.bk | Select-Object -Property email
$compare = Compare-Object $SubscribedMails $ValidEmails -property email -IncludeEqual | where-object {$_.SideIndicator -eq "<="} | Export-csv .\nonvalid-emails.csv –NoTypeInformation
(Get-Content .\nonvalid-emails.csv) | ForEach-Object { $_ -replace ',"<="' } > .\nonvalid-emails.csv
Remove-Item .\users-emails.csv.bk
None of the suggestions so far works :(
Still hoping :)
Will delete comment when happy :p
Can you try this?
$One = (Get-Content .\FirstFile.csv).Split(',')
$Two = (Get-Content .\SecondFile.csv).Split(',')
$CsvPath = '.\Output.csv'
$Diff = #()
(Compare-Object ($One | Sort-Object) ($two | Sort-Object)| `
Where-Object {$_.SideIndicator -eq '<='}).inputobject | `
ForEach-Object {$Diff += New-Object PSObject -Property #{email=$_}}
$Diff | Export-Csv -Path $CsvPath -NoTypeInformation
Output.csv will contain entries that exist in FirstFile but not SecondFIle.