Compare column values and merge - powershell

Iam trying to merge two csv files with common column name but one has 22 row and other just 16.
1st CSV 2nd CSV
Name Service_StatusA Name Service_StatusB
IEClient running IEClient Manual
IE Nomad running ​IE Nomad running
Data Usage running Print Spooler Manual
Print Spooler running Server running
Server running
I want to merge this to a single csv
Name Service_StatusA Service_StatusB
IEClient running Manual
IE Nomad running running
Data Usage running
Print Spooler running Manual
Server running running
$file1 = Import-Csv -Path .\PC1.csv
$file2 = Import-Csv -Path .\PC2.csv
$report = #()
foreach ($line in $file1)
{
$match = $file2 | Where-Object {$_.Name -eq $line.Name}
if ($match)
{
$row = "" | Select-Object 'Name','Service_StatusA','Service_StatusA',
$row.Name = $line.Name
$row.'Service_StatusA' = $line.'Service_StatusA'
$row.'Service_StatusB' = $match.'Service_StatusB'
$report += $row
}
}
$report | export-csv .\mergetemp.csv -notype -force
how to compare the row values before merging

In SQL database terms you want a left join, and your code is doing an inner join. In set terms, you are doing an intersection of 1.csv and 2.csv (only the rows which appear in both) but you want to be doing a union of 1.csv + the intersection (all rows from 1.csv with only matching lines from 2.csv).
You want every row in the first csv to be a row in the output csv. That should be the start - always output something in your loop. At the moment you output from the if() test. You want matching rows in the second csv to have their data added in if they exist, but not to change the amount of output.
$file1 = Import-Csv -Path .\PC1.csv
$file2 = Import-Csv -Path .\PC2.csv
$report = foreach ($line in $file1)
{
# always make an output line for each row in file1
$row = "" | Select-Object 'Name','Service_StatusA','Service_StatusA',
$row.Name = $line.Name
$row.'Service_StatusA' = $line.'Service_StatusA'
# if there is a matching line in file2, add its data in
$match = $file2 | Where-Object {$_.Name -eq $line.Name}
if ($match)
{
$row.'Service_StatusB' = $match.'Service_StatusB'
}
# always have output a row for a row in file1
$row
}
$report | export-csv .\mergetemp.csv -notype -force
(It is possible that what you want is a SQL outer join where rows in 2.csv that are not in 1.csv also create an output row, but your example does not show that).
(I took out $report += because it's more code that runs slower, which is an annoying combination).

Related

PowerShell: list CSV file rows where at least one value between the 3rd and last column is equal to "0" or "1"

In my PowerShell script, I'm working with a CSV file that looks like this (with a number of rows and columns that can vary, but there will always be at least the headers and the first 2 columns):
OS;IP;user0;user1;user3
Windows;10.0.0.1;;;
Linux;hostname2;0;;1
Linux;10.0.0.3;;0;0
Linux;hostname4;;;
Windows;hostname5;1;1;1
I basically list servers in the first column and users in the first row (CSV header). This represents a user "access granting" matrix to servers (1 for "give access", 0 for "remove access", and void for "don't change").
I'm looking for a way to extract only the rows that include a value equal to "1" or "0" between (and including) the 3rd and last column. (= to eventually get the list of servers where access rights should be changed)
So taking the above example, I only want the following lines returned:
Linux;hostname2;0;;1
Linux;10.0.0.3;;0;0
Windows;hostname5;1;1;1
Any hints to make this possible? Or the opposite (getting the ones without any 0 or 1)?
Even if it means using "Get-Content" instead of "Import-CSV". I don't care about the 1st (headers) row; I know how to exclude that.
Thank you!
--- Final solution, thanks to #Tomalak's answer:
$AccessMatrix = Import-CSV $CSVfile -delimiter ';'
$columns = $AccessMatrix | Get-Member -MemberType NoteProperty | Select-Object -Skip 2 -ExpandProperty Name
$AccessMatrix = $AccessMatrix | ForEach-Object {
$row = $_
foreach ($col in $columns) {
if ($row.$col.trim() -eq "1" -OR $row.$col.trim() -eq "0") {
$row # this pushes the $row onto the pipeline
break
}
}
}
The following uses Get-Member to select the names of all columns after the first two.
Then, using ForEach-Object, we can output only those rows that have a value in any of those columns.
$data = ConvertFrom-Csv "OS;IP;user0;user1;user3
Windows;10.0.0.1;;;
Linux;hostname2;0;;1
Linux;10.0.0.3;;0;0
Linux;hostname4;;;
Windows;hostname5;1;1;1" -Delimiter ";"
$columns = $data | Get-Member -MemberType NoteProperty | Select-Object -Skip 2 -ExpandProperty Name
$data | ForEach-Object {
$row = $_
foreach ($col in $columns) {
if ($row.$col -ne "") {
$row # this pushes the $row onto the pipeline
break
}
}
}
The break statement stops the execution of the inner foreach loop because there is no point in further checking as soon as the first column with any value is found.
This is equivalent to the above, if you prefer Where-Object:
$data | Where-Object {
$row = $_
foreach ($col in $columns) {
if ($row.$col -ne "") {
return $true
}
}
}

Powershell script to match string between 2 files and merge

I have 2 files that contain strings, each string in both files is delimited by a colon. Both files share a common string and I want to be able to merge both files (based on the common string) into 1 new file.
Examples:
File1.txt
tom:mioihsdihfsdkjhfsdkjf
dick:khsdkjfhlkjdhfsdfdklj
harry:lkjsdlfkjlksdjfsdlkjs
File2.txt
mioihsdihfsdkjhfsdkjf:test1
lkjsdlfkjlksdjfsdlkjs:test2
khsdkjfhlkjdhfsdfdklj:test3
File3.txt (results should look like this)
tom:mioihsdihfsdkjhfsdkjf:test1
dick:khsdkjfhlkjdhfsdfdklj:test3
harry:lkjsdlfkjlksdjfsdlkjs:test2
$File1 = #"
tom:mioihsdihfsdkjhfsdkjf
dick:khsdkjfhlkjdhfsdfdklj
harry:lkjsdlfkjlksdjfsdlkjs
"#
$File2 = #"
mioihsdihfsdkjhfsdkjf:test1
lkjsdlfkjlksdjfsdlkjs:test2
khsdkjfhlkjdhfsdfdklj:test3
"#
# You are probably going to want to use Import-Csv here
# I am using ConvertFrom-Csv as I have "inlined" the contents of the files in the variables above
$file1_contents = ConvertFrom-Csv -InputObject $File1 -Delimiter ":" -Header name, code # specifying a header as there isn't one provided
$file2_contents = ConvertFrom-Csv -InputObject $File2 -Delimiter ":" -Header code, test
# There are almost certainly better ways to do this... but this does work so... meh.
$results = #()
# Loop over one file finding the matches in the other file
foreach ($row in $file1_contents) {
$matched_row = $file2_contents | Where-Object code -eq $row.code
if ($matched_row) {
# Create a hashtable with the values you want from source and matched rows
$result = #{
name = $row.name
code = $row.code
test = $matched_row.test
}
# Append the matched up row to the final result set
$results += New-Object PSObject -Property $result
}
}
# Convert back to CSV format, with a _specific_ column ordering
# Although you'll probably want to use Export-Csv instead
$results |
Select-Object name, code, test |
ConvertTo-Csv -Delimiter ":"

Parse line of text and match with parse of CSV

As a continuation of a script I'm running, working on the following.
I have a CSV file that has formatted information, example as follows:
File named Import.csv:
Name,email,x,y,z
\I\RS\T\Name1\c\x,email#jksjks,d,f
\I\RS\T\Name2\d\f,email#jsshjs,d,f
...
This file is large.
I also have another file called Note.txt.
Name1
Name2
Name3
...
I'm trying to get the content of Import.csv and for each line in Note.txt if the line in Note.txt matches any line in Import.csv, then copy that line into a CSV with append. Continue adding every other line that is matched. Then this loops on each line of the CSV.
I need to find the best way to do it without having it import the CSV multiple times, since it is large.
What I got does the opposite though, I think:
$Dir = PathToFile
$import = Import-Csv $Dir\import.csv
$NoteFile = "$Dir\Note.txt"
$Note = GC $NoteFile
$Name = (($Import.Name).Split("\"))[4]
foreach ($j in $import) {
foreach ($i in $Note) {
$j | where {$Name -eq "$i"} | Export-Csv "$Dir\Result.csv" -NoTypeInfo -Append
}
}
This takes too long and I'm not getting the extraction I need.
This takes too long and I'm not getting the extraction I need.
That's because you only assign $name once, outside of the outer foreach loop, so you're basically performing the same X comparisons for each line in the CSV.
I would rewrite the nested loops as a single Where-Object filter, using the -contains operator:
$Import |Where-Object {$Note -contains $_.Name.Split('\')[4]} |Export-Csv "$Dir\Result.csv" -NoTypeInformation -Append
Group the imported data by your distinguishing feature, filter the groups by name, then expand the remaining groups and write the data to the output file:
Import-Csv "$Dir\import.csv" |
Group-Object { $_.Name.Split('\')[4] } |
Where-Object { $Note -contains $_.Name } |
Select-Object -Expand Group |
Export-Csv "$Dir\Result.csv" -NoType

Powershell removing columns and rows from CSV

I'm having trouble making some changes to a series of CSV files, all with the same data structure. I'm trying to combine all of the files into one CSV file or one tab delimited text file (don't really mind), however each file needs to have 2 empty rows removed and two of the columns removed, below is an example:
col1,col2,col3,col4,col5,col6 <-remove
col1,col2,col3,col4,col5,col6 <-remove
col1,col2,col3,col4,col5,col6
col1,col2,col3,col4,col5,col6
^ ^
remove remove
End Result:
col1,col2,col4,col6
col1,col2,col4,col6
This is my attempt at doing this (I'm very new to Powershell)
$ListofFiles = "example.csv" #this is an list of all the CSV files
ForEach ($file in $ListofFiles)
{
$content = Get-Content ($file)
$content = $content[2..($content.Count)]
$contentArray = #()
[string[]]$contentArray = $content -split ","
$content = $content[0..2 + 4 + 6]
Add-Content '...\output.txt' $content
}
Where am I going wrong here...
your example file should be read, before foreach to fetch the file list
$ListofFiles = get-content "example.csv"
Inside the foreach you are getting content of mainfile
$content = Get-Content ($ListofFiles)
instead of
$content = Get-Content $file
and for removing rows i will recommend this:
$obj = get-content C:\t.csv | select -Index 0,1,3
for removing columns (column numbers 0,1,3,5):
$obj | %{(($_.split(","))[0,1,3,5]) -join "," } | out-file test.csv -Append
According to the fact the initial files looks like
col1,col2,col3,col4,col5,col6
col1,col2,col3,col4,col5,col6
,,,,,
,,,,,
You can also try this one liner
Import-Csv D:\temp\*.csv -Header 'C1','C2','C3','C4','C5','C6' | where {$_.c1 -ne ''} | select -Property 'C1','C2','C5' | Export-Csv 'd:\temp\final.csv' -NoTypeInformation
According to the fact that you CSVs have all the same structure, you can directly open them providing the header, then remove objects with the missing datas then export all the object in a csv file.
It is sufficient to specify fictitious column names, with a column number that can exceed the number of columns in the file, change where you want and exclude columns that you do not want to take.
gci "c:\yourdirwithcsv" -file -filter *.csv |
%{ Import-Csv $_.FullName -Header C1,C2,C3,C4,C5,C6 |
where C1 -ne '' |
select -ExcludeProperty C3, C4 |
export-csv "c:\temp\merged.csv" -NoTypeInformation
}

Remove rows present in one .csv from another .csv (windows, powershell, notepad++)

I have notepad++, powershell, and excel 2007. I have two .csv files named
database.csv and import.csv . Import.csv contains new entries that I want to put
into my database online. Database.csv contains the current records in that database.
Both files contain a simple comma-newline delimited list of unique values.
However, the database may already contain some entries in the new file. And, the new
file contains entries that are not in the database. And, the database file contains
entries that are still retained for recording purposes, but are not in the input file.
Simply combining them results in duplicates of any record that has an ongoing existence.
It also results in single copies of records only present in the database and records only
present in the input file.
What I want is a file that only contains records that are only present in the input file.
Any advice?
Assuming your csv files have the columns a, b, & c:
$db = Import-Csv database.csv
$import = Import-Csv import.csv
$new = Compare-Object -ReferenceObject $db -DifferenceObject $import -Property a,b,c -PassThru | ? { $_.SideIndicator -eq "=>" } | Select a,b,c
Just replace a, b, and c with the names of the columns you want to compare
Powershell:
Get-Content <database file> -TotalCount 1 |
Set-Content C:\somedir\ToUpload.csv
$import = #{}
Get-Content <import file> |
select -Skip 1
foreach {
$import[$_] = $true
}
Get-Content <Database file> |
select -Skip 1 |
foreach {
if ($import[$_])
{
$import[$_].remove()
}
}
$import.Keys |
Add-Content C:\Somedir\ToUpload.csv
Alternatively, reading both files into memory:
Get-Content <database file> -TotalCount 1 |
Set-Content C:\somedir\ToUpload.csv
$import = Get-Content <import file>
select -Skip 1
$database = Get-Content <database file>
select -Skip 1
$import |
where {$database -notcontains $_} |
Add-Content C:\somedir\ToUpload.csv
The solutions using import / export csv will work but impose additional memory and process overhead compared to dealing with the files as text data. The difference may be trivial or substantial, depending on the size of the files and the number of columns there are in the csv files. IMHO.
Compare-Object struggles sometimes with customobject imported from csv if you don't have any specific properties to match.
If you want performance(for large csv files), you could try this:
$i = #{}
[IO.File]::ReadAllLines("C:\input.csv") | % { $i[$_] = $true }
$reader = New-Object System.IO.StreamReader "C:\db.csv"
#Skip header. This way the output file(new.csv) will get input.csv's header
$reader.ReadLine() | Out-Null
while (($line = $reader.ReadLine()) -ne $null) {
#Remove row if it exists in db.csv
if ($i.ContainsKey($line)) {
$i.Remove($line)
}
}
$reader.Close()
$i.Keys | Add-Content c:\new.csv