Remove rows present in one .csv from another .csv (windows, powershell, notepad++) - powershell

I have notepad++, powershell, and excel 2007. I have two .csv files named
database.csv and import.csv . Import.csv contains new entries that I want to put
into my database online. Database.csv contains the current records in that database.
Both files contain a simple comma-newline delimited list of unique values.
However, the database may already contain some entries in the new file. And, the new
file contains entries that are not in the database. And, the database file contains
entries that are still retained for recording purposes, but are not in the input file.
Simply combining them results in duplicates of any record that has an ongoing existence.
It also results in single copies of records only present in the database and records only
present in the input file.
What I want is a file that only contains records that are only present in the input file.
Any advice?

Assuming your csv files have the columns a, b, & c:
$db = Import-Csv database.csv
$import = Import-Csv import.csv
$new = Compare-Object -ReferenceObject $db -DifferenceObject $import -Property a,b,c -PassThru | ? { $_.SideIndicator -eq "=>" } | Select a,b,c
Just replace a, b, and c with the names of the columns you want to compare

Powershell:
Get-Content <database file> -TotalCount 1 |
Set-Content C:\somedir\ToUpload.csv
$import = #{}
Get-Content <import file> |
select -Skip 1
foreach {
$import[$_] = $true
}
Get-Content <Database file> |
select -Skip 1 |
foreach {
if ($import[$_])
{
$import[$_].remove()
}
}
$import.Keys |
Add-Content C:\Somedir\ToUpload.csv
Alternatively, reading both files into memory:
Get-Content <database file> -TotalCount 1 |
Set-Content C:\somedir\ToUpload.csv
$import = Get-Content <import file>
select -Skip 1
$database = Get-Content <database file>
select -Skip 1
$import |
where {$database -notcontains $_} |
Add-Content C:\somedir\ToUpload.csv
The solutions using import / export csv will work but impose additional memory and process overhead compared to dealing with the files as text data. The difference may be trivial or substantial, depending on the size of the files and the number of columns there are in the csv files. IMHO.

Compare-Object struggles sometimes with customobject imported from csv if you don't have any specific properties to match.
If you want performance(for large csv files), you could try this:
$i = #{}
[IO.File]::ReadAllLines("C:\input.csv") | % { $i[$_] = $true }
$reader = New-Object System.IO.StreamReader "C:\db.csv"
#Skip header. This way the output file(new.csv) will get input.csv's header
$reader.ReadLine() | Out-Null
while (($line = $reader.ReadLine()) -ne $null) {
#Remove row if it exists in db.csv
if ($i.ContainsKey($line)) {
$i.Remove($line)
}
}
$reader.Close()
$i.Keys | Add-Content c:\new.csv

Related

Export the matched output to a CSV as Column 1 and Column 2 using Powershell

I have below code to match the pattern and save it in CSV file. I need to save regex1 and regex2 as col1 and col2 in csv instead of saving all in 1st col.
$inputfile = ( Get-Content D:\Users\naham1224\Desktop\jil.txt )
$FilePath = "$env:USERPROFILE\Desktop\jil2.csv"
$regex1 = "(insert_job: [A-Za-z]*_*\S*)"
$regex2 = "(machine: [A-Z]*\S*)"
$inputfile |
Select-String -Pattern $regex2,$regex1 -AllMatches |
ForEach-Object {$_.matches.groups[1].value} |
Add-Content $FilePath`
Input file contains : input.txt
/* ----------------- AUTOSYS_DBMAINT ----------------- */
insert_job: AUTOSYS_DBMAINT job_type: CMD
command: %AUTOSYS%\bin\DBMaint.bat
machine: PWISTASASYS01
owner: svc.autosys#cbs
permission:
date_conditions: 1
days_of_week: su,mo,tu,we,th,fr,sa
start_times: "03:30"
description: "Runs DBmaint process on AE Database - if fails - MTS - will run next scheduled time"
std_out_file: ">$$LOGS\dbmaint.txt"
std_err_file: ">$$LOGS\dbmaint.txt"
alarm_if_fail: 0
alarm_if_terminated: 0
send_notification: 0
notification_msg: "Check DBMaint output in autouser.PD1\\out directory"
notification_emailaddress: jnatal#cbs.com
/* ----------------- TEST_ENV ----------------- */
insert_job: TEST_ENV job_type: CMD
command: set
machine: PWISTASASYS01
owner: svc.autosys#cbs
permission:
date_conditions: 1
days_of_week: su,mo,tu,we,th,fr,sa
start_times: "03:30"
description: "output env"
std_out_file: ">C:\Users\svc.autosys\Documents\env.txt"
std_err_file: ">C:\Users\svc.autosys\Documents\env.txt"
alarm_if_fail: 1
alarm_if_terminated: 1
Current output :
Current output
Expected output :
Expected output
I am trying various ways to do so but no luck. any suggestions and help is greatly appreciated.
Here is how I would do this:
$inputPath = 'input.txt'
$outputPath = 'output.csv'
# RegEx patterns to extract data.
$patterns = #(
'(insert_job): ([A-Za-z]*_*\S*)'
'(machine): ([A-Z]*\S*)'
)
# Create an ordered Hashtable to collect columns for one row.
$row = [ordered] #{}
# Loop over all occurences of the patterns in input file
Select-String -Path $inputPath -Pattern $patterns -AllMatches | ForEach-Object {
# Extract key and value from current match
$key = $_.matches.Groups[ 1 ].Value
$value = $_.matches.Value
# Save one column of current row.
$row[ $key ] = $value
# If we have all columns of current row, output it as PSCustomObject.
if( $row.Count -eq $patterns.Count ) {
# Convert hashtable to PSCustomObject and output (implicitly)
[PSCustomObject] $row
# Clear Hashtable in preparation for next row.
$row.Clear()
}
} | Export-Csv $outputPath -NoTypeInformation
Output CSV:
"insert_job","machine"
"insert_job: AUTOSYS_DBMAINT","machine: PWISTASASYS01"
"insert_job: TEST_ENV","machine: PWISTASASYS01"
Remarks:
Using Select-String with parameter -Path we don't have to read the input file beforehand.
An ordered Hashtable (a dictionary) is used to collect all columns, until we have an entire row to output. This is the crucial step to produce multiple columns instead of outputting all data in a single column.
Converting the Hashtable to a PSCustomObject is necessary because Export-Csv expects objects, not dictionaries.
While the CSV looks like your "expected output" and you possibly have good reason to expect it like that, in a CSV file the values normally shouldn't repeat the column names. To remove the column names from the values, simply replace $value = $_.matches.Value by $_.matches.Groups[ 2 ].Value, which results in an output like this:
"insert_job","machine"
"AUTOSYS_DBMAINT","PWISTASASYS01"
"TEST_ENV","PWISTASASYS01"
As for what you have tried:
Add-Content writes only plain text files from string input. While you could use it to create CSV files, you would have to add separators and escape strings all by yourself, which is easy to get wrong and more hassle than necessary. Export-CSV otoh takes objects as inputs and cares about all of the CSV format details automatically.
As zett42 mentioned Add-Content is not the best fit for this. Since you are looking for multiple values separated by commas Export-Csv is something you can use. Export-Csv will take objects from the pipeline, convert them to lines of comma-separated properties, add a header line, and save to file
I took a little bit of a different approach here with my solution. I've combined the different regex patterns into one which will give us one match that contains both the job and machine names.
$outputPath = "$PSScriptRoot\output.csv"
# one regex to match both job and machine in separate matching groups
$regex = '(?s)insert_job: (\w+).+?machine: (\w+)'
# Filter for input files
$inputfiles = Get-ChildItem -Path $PSScriptRoot -Filter input*.txt
# Loop through each file
$inputfiles |
ForEach-Object {
$path = $_.FullName
Get-Content -Raw -Path $path | Select-String -Pattern $regex -AllMatches |
ForEach-Object {
# Loop through each match found in the file.
# Should be 2, one for AUTOSYS_DBMAINT and another for TEST_ENV
$_.Matches | ForEach-Object {
# Create objects with the values we want that we can output to csv file
[PSCustomObject]#{
# remove next line if not needed in output
InputFile = $path
Job = $_.Groups[1].Value # 1st matching group contains job name
Machine = $_.Groups[2].Value # 2nd matching group contains machine name
}
}
}
} | Export-Csv $outputPath # Pipe our objects to Export-Csv
Contents of output.csv
"InputFile","Job","Machine"
"C:\temp\powershell\input1.txt","AUTOSYS_DBMAINT","PWISTASASYS01"
"C:\temp\powershell\input1.txt","TEST_ENV","PWISTATEST2"
"C:\temp\powershell\input2.txt","AUTOSYS_DBMAINT","PWISTASAPROD1"
"C:\temp\powershell\input2.txt","TEST_ENV","PWISTATTEST1"

Powershell script to match string between 2 files and merge

I have 2 files that contain strings, each string in both files is delimited by a colon. Both files share a common string and I want to be able to merge both files (based on the common string) into 1 new file.
Examples:
File1.txt
tom:mioihsdihfsdkjhfsdkjf
dick:khsdkjfhlkjdhfsdfdklj
harry:lkjsdlfkjlksdjfsdlkjs
File2.txt
mioihsdihfsdkjhfsdkjf:test1
lkjsdlfkjlksdjfsdlkjs:test2
khsdkjfhlkjdhfsdfdklj:test3
File3.txt (results should look like this)
tom:mioihsdihfsdkjhfsdkjf:test1
dick:khsdkjfhlkjdhfsdfdklj:test3
harry:lkjsdlfkjlksdjfsdlkjs:test2
$File1 = #"
tom:mioihsdihfsdkjhfsdkjf
dick:khsdkjfhlkjdhfsdfdklj
harry:lkjsdlfkjlksdjfsdlkjs
"#
$File2 = #"
mioihsdihfsdkjhfsdkjf:test1
lkjsdlfkjlksdjfsdlkjs:test2
khsdkjfhlkjdhfsdfdklj:test3
"#
# You are probably going to want to use Import-Csv here
# I am using ConvertFrom-Csv as I have "inlined" the contents of the files in the variables above
$file1_contents = ConvertFrom-Csv -InputObject $File1 -Delimiter ":" -Header name, code # specifying a header as there isn't one provided
$file2_contents = ConvertFrom-Csv -InputObject $File2 -Delimiter ":" -Header code, test
# There are almost certainly better ways to do this... but this does work so... meh.
$results = #()
# Loop over one file finding the matches in the other file
foreach ($row in $file1_contents) {
$matched_row = $file2_contents | Where-Object code -eq $row.code
if ($matched_row) {
# Create a hashtable with the values you want from source and matched rows
$result = #{
name = $row.name
code = $row.code
test = $matched_row.test
}
# Append the matched up row to the final result set
$results += New-Object PSObject -Property $result
}
}
# Convert back to CSV format, with a _specific_ column ordering
# Although you'll probably want to use Export-Csv instead
$results |
Select-Object name, code, test |
ConvertTo-Csv -Delimiter ":"

Powershell removing columns and rows from CSV

I'm having trouble making some changes to a series of CSV files, all with the same data structure. I'm trying to combine all of the files into one CSV file or one tab delimited text file (don't really mind), however each file needs to have 2 empty rows removed and two of the columns removed, below is an example:
col1,col2,col3,col4,col5,col6 <-remove
col1,col2,col3,col4,col5,col6 <-remove
col1,col2,col3,col4,col5,col6
col1,col2,col3,col4,col5,col6
^ ^
remove remove
End Result:
col1,col2,col4,col6
col1,col2,col4,col6
This is my attempt at doing this (I'm very new to Powershell)
$ListofFiles = "example.csv" #this is an list of all the CSV files
ForEach ($file in $ListofFiles)
{
$content = Get-Content ($file)
$content = $content[2..($content.Count)]
$contentArray = #()
[string[]]$contentArray = $content -split ","
$content = $content[0..2 + 4 + 6]
Add-Content '...\output.txt' $content
}
Where am I going wrong here...
your example file should be read, before foreach to fetch the file list
$ListofFiles = get-content "example.csv"
Inside the foreach you are getting content of mainfile
$content = Get-Content ($ListofFiles)
instead of
$content = Get-Content $file
and for removing rows i will recommend this:
$obj = get-content C:\t.csv | select -Index 0,1,3
for removing columns (column numbers 0,1,3,5):
$obj | %{(($_.split(","))[0,1,3,5]) -join "," } | out-file test.csv -Append
According to the fact the initial files looks like
col1,col2,col3,col4,col5,col6
col1,col2,col3,col4,col5,col6
,,,,,
,,,,,
You can also try this one liner
Import-Csv D:\temp\*.csv -Header 'C1','C2','C3','C4','C5','C6' | where {$_.c1 -ne ''} | select -Property 'C1','C2','C5' | Export-Csv 'd:\temp\final.csv' -NoTypeInformation
According to the fact that you CSVs have all the same structure, you can directly open them providing the header, then remove objects with the missing datas then export all the object in a csv file.
It is sufficient to specify fictitious column names, with a column number that can exceed the number of columns in the file, change where you want and exclude columns that you do not want to take.
gci "c:\yourdirwithcsv" -file -filter *.csv |
%{ Import-Csv $_.FullName -Header C1,C2,C3,C4,C5,C6 |
where C1 -ne '' |
select -ExcludeProperty C3, C4 |
export-csv "c:\temp\merged.csv" -NoTypeInformation
}

Fastest way to combine multiple csv files based on 1st column value

So lets say I have 5 csv files (created in order from 1 to 5) with 8-10 columns each. Each file has about 300,000 (give or take) rows each.
Each file should match the value (unique) from the first column in every file, and then combine the records + column title(s). If files 2 through 5 do not have it's value from column 1 found in file1 (from column 1), the entire row should be excluded from the merging.
Example below of two (out of 5) csv files...
File1
ColumnTitle1,ColumnTitle2,ColumnTitle3,ColumnTitle4,ColumnTitle5,ColumnTitle6,ColumnTitle7,ColumnTitle8,ColumnTitle9,ColumnTitle10
Column1Value3145,Column2Value1,Column3Value1,Column4Value1,Column5Value1,Column6Value1,Column7Valu1,Column8Value1,Column9Value1,Column10Value1
Column1Value573,Column2Value2,Column3Value2,Column4Value2,Column5Value2,Column6Value2,Column7Valu2,Column8Value2,Column9Value2,Column10Value2
Column1Value62317,Column2Value3,Column3Value3,Column4Value3,Column5Value3,Column6Value3,Column7Valu3,Column8Value3,Column9Value3,Column10Value3
Column1Value93,Column2Value4,Column3Value4,Column4Value4,Column5Value4,Column6Value4,Column7Valu4,Column8Value4,Column9Value4,Column10Value4
File2:
ColumnTitle1,ColumnTitle11,ColumnTitle12,ColumnTitle13,ColumnTitle14,ColumnTitle15,ColumnTitle16,ColumnTitle17,ColumnTitle18
Column1Value752789,Column11Value1,Column12Value1,Column13Value1,Column14Value1,Column15Value1,Column16Value1,Column17Value1,Column18Value1
Column1Value3145,Column11Value2,Column12Value2,Column13Value2,Column14Value2,Column15Value2,Column16Value2,Column17Value2,Column18Value2
Column1Value573,Column11Value3,Column12Value3,Column13Value3,Column14Value3,Column15Value3,Column16Value3,Column17Value3,Column18Value3
Column1Value832657,Column11Value4,Column12Value4,Column13Value4,Column14Value4,Column15Value4,Column16Value4,Column17Value4,Column18Value4
Column1Value62317,Column11Value5,Column12Value5,Column13Value5,Column14Value5,Column15Value5,Column16Value5,Column17Value5,Column18Value5
Column1Value93,Column11Value6,Column12Value6,Column13Value6,Column14Value6,Column15Value6,Column16Value6,Column17Value6,Column18Value6
Column1Value423568,Column11Value7,Column12Value7,Column13Value7,Column14Value7,Column15Value7,Column16Value7,Column17Value7,Column18Value7
If I were to just merge these two files (2 out of the 5) it would look something like this:
ColumnTitle1,ColumnTitle2,ColumnTitle3,ColumnTitle4,ColumnTitle5,ColumnTitle6,ColumnTitle7,ColumnTitle8,ColumnTitle9,ColumnTitle10,ColumnTitle11,ColumnTitle12,ColumnTitle13,ColumnTitle14,ColumnTitle15,ColumnTitle16,ColumnTitle17,ColumnTitle18
Column1Value3145,Column2Value1,Column3Value1,Column4Value1,Column5Value1,Column6Value1,Column7Valu1,Column8Value1,Column9Value1,Column10Value1,Column11Value2,Column12Value2,Column13Value2,Column14Value2,Column15Value2,Column16Value2,Column17Value2,Column18Value2
Column1Value573,Column2Value2,Column3Value2,Column4Value2,Column5Value2,Column6Value2,Column7Valu2,Column8Value2,Column9Value2,Column10Value2,Column11Value3,Column12Value3,Column13Value3,Column14Value3,Column15Value3,Column16Value3,Column17Value3,Column18Value3
Column1Value62317,Column2Value3,Column3Value3,Column4Value3,Column5Value3,Column6Value3,Column7Valu3,Column8Value3,Column9Value3,Column10Value3,Column11Value5,Column12Value5,Column13Value5,Column14Value5,Column15Value5,Column16Value5,Column17Value5,Column18Value5
Column1Value93,Column2Value4,Column3Value4,Column4Value4,Column5Value4,Column6Value4,Column7Valu4,Column8Value4,Column9Value4,Column10Value4,Column11Value6,Column12Value6,Column13Value6,Column14Value6,Column15Value6,Column16Value6,Column17Value6,Column18Value6
Adding files 3 - 5 would increase the columns to around 50 (give or take).
I'm not sure if this is the quickest method, but here is the logic I am thinking (which I'm not sure how to do using powershell):
Go one file at a time to match and merge with file one
store file1 in variable
store file2 in variable
Loop through lines in file1
\\\\ Where value1 in column1 from file1 is found in column1 from file2
\\\\ append row from file2 to row in file1
\\\\ remove row from file2 (lessen the search during the next loop iteration)
clear variable holding file2
store next file in variable
repeat the loop find and append iterations
All roads lead to Rome. One of them is:
#Hashtable to store master-objects in
$data = #{}
#Import-CSV -Filter "MyMasterList.csv" | Foreach-Object { $data[$_.ColumnTitle1] = $_ }
#Sampledata below
#"
ColumnTitle1,ColumnTitle2,ColumnTitle3,ColumnTitle4,ColumnTitle5,ColumnTitle6,ColumnTitle7,ColumnTitle8,ColumnTitle9,ColumnTitle10
Column1Value3145,Column2Value1,Column3Value1,Column4Value1,Column5Value1,Column6Value1,Column7Valu1,Column8Value1,Column9Value1,Column10Value1
Column1Value573,Column2Value2,Column3Value2,Column4Value2,Column5Value2,Column6Value2,Column7Valu2,Column8Value2,Column9Value2,Column10Value2
Column1Value62317,Column2Value3,Column3Value3,Column4Value3,Column5Value3,Column6Value3,Column7Valu3,Column8Value3,Column9Value3,Column10Value3
Column1Value93,Column2Value4,Column3Value4,Column4Value4,Column5Value4,Column6Value4,Column7Valu4,Column8Value4,Column9Value4,Column10Value4
"# | ConvertFrom-Csv | % { $data[$_.ColumnTitle1] = $_ }
Get-ChildItem -Path "C:\MyOtherCSVs" -Filter "*.csv" | ForEach-Object { Import-Csv -Path $_.FullName } | ForEach-Object {
$ID = $_.ColumnTitle1
#If row is in MasterList
if($data.ContainsKey($ID)) {
#Get matching object
$obj = $data[$ID]
#Foreach line in csv
$_.psobject.Properties | Where-Object { $_.Name -ne 'ColumnTitle1' } | ForEach-Object {
#Foreach property, add to master-object
Add-Member -InputObject $obj -MemberType NoteProperty -Name $_.Name -Value $_.Value
}
#Put modified object back into hashtable
$data[$ID] = $obj
}
}
$data.Values | Export-Csv -Path "MergedCSV.csv" -NoTypeInformation
Be sure to pack some extra memory with large CSV-files.

Loop through csv compare content with an array and then add content to csv

I don't know how to append a string to CSV. What am I doing:
I have two csv files. One with a list of host-names and id's and another one with a list of host-names and some numbers.
Example file 1:
Hostname | ID
IWBW140004 | 3673234
IWBW130023 | 2335934
IWBW120065 | 1350213
Example file 2:
ServiceCode | Hostname | ID
4 | IWBW120065 |
4 | IWBW140004 |
4 | IWBW130023 |
Now I read the content of file 1 in a two dimensional array:
$pcMatrix = #(,#())
Import-Csv $outputFile |ForEach-Object {
foreach($property in $_.PSObject.Properties){
$pcMatrix += ,($property.Value.Split(";")[1],$property.Value.Split(";")[2])
}
}
Then I read the content of file 2 and compare it with my array:
Import-Csv $Group".csv" | ForEach-Object {
foreach($property in $_.PSObject.Properties){
for($i = 0; $i -lt $pcMatrix.Length; $i++){
if($pcMatrix[$i][0] -eq $property.Value.Split('"')[1]){
#Add-Content here
}
}
}
}
What do I need to do, to append $pcMatrix[$i][1] to the active column in file 2 in the row ID?
Thanks for your suggestions.
Yanick
It seems like you are over-complicating this task.
If I understand you correctly, you want to populate the ID column in file two, with the ID that corresponds to the correct hostname from file 1. The easiest way to do that, is to fill all the values from the first file into a HashTable and use that to lookup the ID for each row in the second file:
# Read the first file and populate the HashTable:
$File1 = Import-Csv .\file1.txt -Delimiter "|"
$LookupTable = #{}
$File1 |ForEach-Object {
$LookupTable[$_.Hostname] = $_.ID
}
# Now read the second file and update the ID values:
$File2 = Import-Csv .\file2.txt -Delimiter "|"
$File2 |ForEach-Object {
$_.ID = $LookupTable[$_.Hostname]
}
# Then write the updated rows back to a new CSV file:
$File2 | Export-CSV -Path .\file3.txt -NoTypeInformation -Delimiter "|"