How to convert text files in a folder into a CSV file

How to convert text files in a folder into a CSV file - powershell

I am trying to write a script to take a bunch of text files in a folder (which are all in the same format) and output them to a csv file. Each file has the same "header" information. I have been able to get information in a more easily usable format (removing the first and last lines, which aren't needed), but am having some trouble after that.
Here is the beginning of the text file, though there will be more than just these 7 lines, there will be a total of 36 lines per file:
TYPE VOID
DOB 20200131
DATE 20200131
TIME 21:19:42
TERMINAL 3
ORGTERM 2
EMPLOYEE 1234 John Doe
And here is what I have so far, though I know that it doesn't work:
$currentdir = '.\'
$results = #()
$outputfilename = 'data.csv'
foreach ($req in Get-ChildItem($currentdir)) {
(Get-Content $req)[1..((Get-Content $req).count - 2)] |
ForEach-Object {
$header = $_[0] -split '`t'
$data = $_[1] -split '`t'
$results = $header, $data
}
}
The final product would look something like this:
A B C D E F G
1 TYPE DOB DATE TIME TERMINAL ORGTERM EMPLOYEE
2 VOID 20200131 20200131 21:19:42 3 2 1234 John Doe
3 AUTHORIZE 20200131 20200131 23:29:22 2 4678 Jane Doe
Full sample of VOID file:
BEGIN
TYPE VOID
DOB 20200131
DATE 20200131
TIME 21:19:42
TERMINAL 3
ORGTERM 2
EMPLOYEE 1234 Jane Doe
TABLE TBL 101
CHECK 20030
PAYMENT 20029
AUTHAMT 20.68
BATCHAMT 20.68
CARDTYPE MASTERCARD
CARDMASK XXXXXXXXXXXXXXXXX
{XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX}
EXP 0423
REF 482
STANDALONE YES
PINDEX 1
APPROVEAMT 20.68
LOGTIME 21:07:01
FOHFEATS 10000000000000000000000000000000
TERMCAPS 00000000000000000000000000000000
FOHVERSION 15.1.34.2.97
ACTIONCODE 000
LASTSEND 1580585993
ORIGDATE 20200131
ORIGTIME 21:02:11
ORIGTYPE AUTHORIZE
ORIGREF 482
ORGREFTIME 21:02:11
TENDER_NUM 12
CRCY 840
VPD Sequence #: 107
REVID 2
REVNAME 712 Bar
END
Sample AUTHORIZE file:
BEGIN
TYPE AUTHORIZE
DOB 20200131
DATE 20200131
TIME 23:29:22
TERMINAL 2
EMPLOYEE 1234 Jane Doe
TABLE Table 121
CHECK 20045
PAYMENT 20038
AUTHAMT 72.42
BATCHAMT 72.42
CARDTYPE VISA
CARDMASK XXXXXXXXXXXXXXXX
{XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX}
EXP 0124
REF 485900
STANDALONE YES
PINDEX 1
LOGTIME 23:29:22
FOHFEATS 10000000000000000000000000000000
TERMCAPS 00000000000000000000000000000000
FOHVERSION 15.1.34.2.97
LASTSEND 1580586235
TENDER_NUM 13
CRCY 840
REVID 1
REVNAME 712 Restaurant
COMMERROR TRUE
END
Sample adjust file:
BEGIN
TYPE ADJUST
DOB 20200131
DATE 20200131
TIME 22:18:27
TERMINAL 8
ORGTERM 8
EMPLOYEE 789 Judy Garland
TABLE BAR GUEST
CHECK 80161
PAYMENT 80036
BATCHAMT 30.43
BATCHTIP 6
CARDTYPE MASTERCARD
CARDMASK XXXXXXXXXXXX8699
{XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX}
EXP 0323
REF 1504602
STANDALONE YES
PINDEX 1
LOGTIME 22:18:27
FOHFEATS 10000000000000000000000000000000
TERMCAPS 00000000000000000000000000000000
FOHVERSION 15.1.34.2.97
LASTSEND 1580638928
TENDER_NUM 12
CRCY 840
REVID 4
REVNAME 712 Second Bar
END

here's one way to merge those text files into a CSV. it presumes the files are in a specific dir and can be loaded by matching the names OR by simply grabbing all the files.
what it does ...
sets the source dir
sets the file filter
grabs all the matching files
iterates thru the file list
loads each file into a $Var
uses the way that PoSh handles a collection on the LEFT side of a match
that gives you the matching item, not the usual [bool].
builds a PSCustomObject
it does that by matching the line with the target word, getting the 1st item in the returned array, replaces the unwanted part of the line with nothing, and finally assigns that value to the desired property.
this is rather inefficient, but i can't think of a better way. [blush]
sends the PSCO out to the $Results collection
shows what is in $Results on the screen
exports $Results to a CSV file
here's the code ...
$SourceDir = $env:TEMP
$Filter = 'harlan_*.txt'
$FileList = Get-ChildItem -LiteralPath $SourceDir -Filter $Filter -File
$Results = foreach ($FL_Item in $FileList)
{
$Lines = Get-Content -LiteralPath $FL_Item.FullName
[PSCustomObject]#{
Type = ($Lines -match '^type')[0] -replace '^type\s{1,}'
DOB = ($Lines -match '^dob')[0] -replace '^dob\s{1,}'
Date = ($Lines -match '^date')[0] -replace '^date\s{1,}'
Time = ($Lines -match '^time')[0] -replace '^time\s{1,}'
Terminal = ($Lines -match '^terminal')[0] -replace '^terminal\s{1,}'
OrgTerm = ($Lines -match '^orgterm')[0] -replace '^orgterm\s{1,}'
Employee = ($Lines -match '^employee')[0] -replace '^employee\s{1,}'
}
}
# show on screen
$Results
# save to CSV
$Results |
Export-Csv -LiteralPath "$SourceDir\Harlan_-_MergedFiles.csv" -NoTypeInformation
display on screen ...
Type : ADJUST
DOB : 20200131
Date : 20200131
Time : 22:18:27
Terminal : 8
OrgTerm : 8
Employee : 789 Judy Garland
Type : AUTHORIZE
DOB : 20200131
Date : 20200131
Time : 23:29:22
Terminal : 2
OrgTerm :
Employee : 1234 Jane Doe
Type : VOID
DOB : 20200131
Date : 20200131
Time : 21:19:42
Terminal : 3
OrgTerm : 2
Employee : 1234 Jane Doe
content of the csv file ...
"Type","DOB","Date","Time","Terminal","OrgTerm","Employee"
"ADJUST","20200131","20200131","22:18:27","8","8","789 Judy Garland"
"AUTHORIZE","20200131","20200131","23:29:22","2","","1234 Jane Doe"
"VOID","20200131","20200131","21:19:42","3","2","1234 Jane Doe"

To capture all fields in the files without hardcoding the headers and combine them into a CSV file, the below code should do it.
Snag is that there is one line in each file that does not have a 'Header', it is just a string {XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX}.
I'm guessing that should be the Card Number, so I'm manually inserting the header CARDNUMBER there. If this is something else, please change that in the code.
$files = Get-ChildItem -Path 'D:\Test' -File
$result = foreach($file in $files) {
$obj = [PsCustomObject]#{}
Get-Content -Path $file.FullName | Where-Object { $_ -notmatch '^(BEGIN|END)$' } | ForEach-Object {
# There is a line without 'header' name. Is this the card number?
if ($_ -like '{*}') {
$name = 'CARDNUMBER' # <-- add your own preferred header name here
$value = $_
}
else {
$name,$value = $_ -split '\s+', 2
}
$obj | Add-Member -MemberType NoteProperty -Name $name -Value $value
}
# output the object for this file to be colected in the $result variable
$obj
}
# output on screen
$result
#output to CSV file
$result | Export-Csv -Path 'D:\output.csv' -NoTypeInformation
You need to set the paths for Get-ChildItem and Export-CSV to match your own situation of course

If I'm reading this correctly you have some files each has a single record of data delimited between the aptly positions words "BEGIN" & "END" You want each file to be translated into a single CSV file?
I think I've cooked up something worth while. Though I'm sure it's not perfect.
$Select = 'TYPE','DOB','DATE','TIME','TERMINAL','ORGTERM','EMPLOYEE'
ForEach( $InputFile in (Get-ChildItem $CurrentDirectory) )
{
$OutputFile = $InputFile.BaseName + '.csv'
$Table = Get-Content $InputFile
$TempHash = [Ordered]#{}
ForEach( $Column in $Table )
{
If( $Column -notmatch '(^BEGIN$|^END$)' )
{
$TempArr = $Column.Split( ' ', 2, [System.StringSplitOptions]::RemoveEmptyEntries ) | ForEach{$_.Trim()}
If( $Select -contains $TempArr[0] )
{
$TempHash.Add($TempArr[0], $TempArr[1] )
}
}
}
#Now $TempHash should have enough to create the object and export to CSV
[PSCustomObject]$TempHash | Export-Csv -Path $OutputFile -NoTypeInformation
}
A few points:
I'm ignoring the lines BEGIN & END
I'm manipulating each line thereafter into an array, which for the
most part should be 2 elements.
If the first element [0] is in the collection of fileds your looking
for I'll add as a key/value pair to the hash. Otherwise do nothing.
After processing the lines Convert the object to a PSCustomObject and
export to a CSV file.
I only tested it on a single file I created from your question. I wrapped up the outer loop just as pseudo code.
This works, but the output looks a little choppy, like numbers being strings and such. That said, as a rev one I think we've got something to work with.
If misread your comment, and you want a single output CSV file the adjustment is just to declare the filename before the loop and use the append param on the Export-CSV cmdlet. See below, though I didn't test it any further:
$OutputFile = 'YourOutput.csv'
$Select = 'TYPE','DOB','DATE','TIME','TERMINAL','ORGTERM','EMPLOYEE'
ForEach( $InputFile in (Get-ChildItem $CurrentDirectory) )
{
$Table = Get-Content $InputFile
$TempHash = [Ordered]#{}
ForEach( $Column in $Table )
{
If( $Column -notmatch '(^BEGIN$|^END$)' )
{
$TempArr = $Column.Split( ' ', 2, [System.StringSplitOptions]::RemoveEmptyEntries ) | ForEach{$_.Trim()}
If( $Select -contains $TempArr[0] )
{
$TempHash.Add($TempArr[0], $TempArr[1] )
}
}
}
#Now $TempHash should have enough to create the object and export to CSV
[PSCustomObject]$TempHash | Export-Csv -Path $OutputFile -NoTypeInformation -Append
}
Sorry about the variable names, that could obviously use a refactor...
Let me know what you think.

Related

Get a logfile for a specific date

I want to save in my computer "C:\logFiles" a specific date for logfile generated by program in another PC,
path that i will get from it the log file is "C:\Sut\Stat\03-2021.log"
Example : this file "C:\Sut\Stat\03-2021.Sutwin.log" contenant all the log of Mars month but i just want to get the log of last 7 Days from 19-03-2021 to 26-03-2021
I found this script in the internet but i doesn't work for me i need some help:
Example of the file .log in the photo attached:
Rest of image for the first screenshot :
my PC name : c01234
name of PC contenant log file : c06789
file that i will get from it the infos : 03-2021.Sutwin.log (exist in pc c06789)
i want to transfer the contents of just last 7 days in a folder in my PC c01234 with name Week11_LogFile
$log = "2015-05-09T06:39:34 Some information here
2015-05-09T06:40:34 Some information here
" -split "`n" | Where {$_.trim()}
#using max and min value for the example so all correct dates will comply
$upperLimit = [datetime]::MaxValue #replace with your own date
$lowerLimit = [datetime]::MinValue #replace with your own date
$log | foreach {
$dateAsText = ($_ -split '\s',2)[0]
try
{
$date = [datetime]::Parse($dateAsText)
if (($lowerLimit -lt $date) -and ($date -lt $upperLimit))
{
$_ #output the current item because it belongs to the requested time frame
}
}
catch [InvalidOperationException]
{
#date is malformed (maybe the line is empty or there is a typo), skip it
}
}

Based on your images, your log files look like simple tab-delimited files.
Assuming that's the case, this should work:
# Import the data as a tab-delimited file and add a DateTime column with a parsed value
$LogData = Import-Csv $Log -Delimiter "`t" |
Select-Object -Property *, #{n='DateTime';e={[datetime]::ParseExact($_.Date + $_.Time, 'dd. MMM yyHH:mm:ss', $null)}}
# Filter the data, drop the DateTime column, and write the output to a new tab-delimited file
$LogData | Where-Object { ($lowerLimit -lt $_.DateTime) -and ($_.DateTime -lt $upperLimit) } |
Select-Object -ExcludeProperty DateTime |
Export-Csv $OutputFile -Delimiter "`t"
The primary drawback here is that on Windows Powershell (v5.1 and below) you can only export the data quoted. On Powershell 7 and higher you can use -UseQuotes Never to prevent the fields from being double quote identified if that's important.
The only other drawback is that if these log files are huge then it will take a long time to import and process them. You may be able to improve performance by making the above a one-liner like so:
Import-Csv $Log -Delimiter "`t" |
Select-Object -Property *, #{n='DateTime';e={[datetime]::ParseExact($_.Date + $_.Time, 'dd. MMM yyHH:mm:ss', $null)}} |
Where-Object { ($lowerLimit -lt $_.DateTime) -and ($_.DateTime -lt $upperLimit) } |
Select-Object -ExcludeProperty DateTime |
Export-Csv $OutputFile -Delimiter "`t"
But if the log files are extremely large then you may run into unavoidable performance problems.

It's a shame your example of a line in the log file does not reveal the exact date format.
2015-05-09 could be yyyy-MM-dd or yyyy-dd-MM, so I'm guessing it's yyyy-MM-dd in below code..
# this is the UNC path where the log file is to be found
# you need permissions of course to read that file from the remote computer
$remotePath = '\\c06789\C$\Sut\Stat\03-2021.log' # or use the computers IP address instead of its name
$localPath = 'C:\logFiles\Week11_LogFile.log' # the output file
# set the start date for the week you are interested in
$startDate = Get-Date -Year 2021 -Month 3 -Day 19
# build an array of formatted dates for an entire week
$dates = for ($i = 0; $i -lt 7; $i++) { '{0:yyyy-MM-dd}' -f $startDate.AddDays($i) }
# create a regex string from that using an anchor '^' and the dates joined with regex OR '|'
$regex = '^({0})' -f ($dates -join '|')
# read the log file and select all lines starting with any of the dates in the regex
((Get-Content -Path $remotePath) | Select-String -Pattern $regex).Line | Set-Content -Path $localPath

How to fix broken lines records of a file using PowerShell?

In my csv file I'm getting data in incorrect format for a few rows, sometimes a line is broken into two lines as shown in below table. For EmpId 2, line is broken into two lines. How can I find such records and merge them into one line in correct format to fix the issue for such records using PowerShell. Expected output is shown in below table.
Input File data:
EmpId,EmpName,EmpLocation
1,"Jack","Austin"
2,"Pet
er","NYC"
3,"Raj","Delhi"
Expected Output:
EmpId,EmpName,EmpLocation
1,"Jack","Austin"
2,"Peter","NYC"
3,"Raj","Delhi"

My instinct was to do something similar to Karthick's answer, however I first took a look at the output of Import-Csv. Surprisingly it puts the line break in the individual property where it was found like:
Import-Csv C:\temp\Broken.csv | fl
EmpId : 1
EmpName : Jack
EmpLocation : Austin
EmpId : 2
EmpName : Pet
er
EmpLocation : NYC
EmpId : 3
EmpName : Raj
EmpLocation : Delhi
Notice "peter" is broken across 2 lines.
So I saw some potential to bring the objects in and modify the underlying property values instead of trying to fix up the string data. I cooked up the below:
$CSVData = Import-Csv C:\temp\Broken.csv
$CSVData |
ForEach-Object{
ForEach( $Property in $_.PSObject.Properties.Name )
{
$_.($Property) = $_.($Property) -replace "(`r|`n)"
}
}
$CSVData
# If you want to re-export:
$CSVData | Export-Csv -Path c:\temp\Fixed.csv -NoTypeInformation
This code should work regardless of which field has the line break. Give it a shot and let me know. Thanks!

You can try the below. This worked for me. I assumed the first line is the header.
$filepath = "D:\file.csv"
[string[]]$data = Get-Content $filepath
$data_Final = New-Object System.Collections.ArrayList
for($i = $j = 0; $i -lt $data.Count; $(if($i -eq $j){$i++}else{$i=$j+1}), ($j=$i)) {
While ( ($data[$i] -split ",").Count -ne 3 ) {
$j = $j+1
# Concatenate the target line ($i) with successive line(s) ($j) until the elements Count to 3
$data[$i] = $data[$i] + $data[$j]
}
$data_Final.Add($data[$i]) | Out-Null
}
$inputData = $data_Final | ConvertFrom-Csv
# Or, if you want to fix the csv uncomment the below
# $data_Final | ConvertFrom-Csv | Export-Csv $filepath -NoTypeInformation

.txt Log File Data Extraction Output to CSV with REGEX

I have asked this question before to which LotPings came up with a perfect result. When speaking to the user this relates to I only got half the information in the first place!
Knowing now exactly what is required I will explain the scenario again...
Things to be bear in mind:
Terminal will always be A followed by 3 digits i.e. A123
User ID is at the top of the log file and only appears once, will always start with 89 and be six digits long. the line will always start SELECTED FOR OPERATOR 89XXXX
There are two Date patterns in the file (one is the date of search the other DOB) each needs extracting to separate columns. Not all records have a DOB and some only have the year.
Enquirer doesn't always begin with a 'C' and needs the whole proceeding line.
The search result always has 'Enquiry' and then extraction after that.
Here is the log file
L TRANSACTIONS LOGGED FROM 01/05/2018 0001 TO 31/05/2018 2359
SELECTED FOR OPERATOR 891234
START TERMINAL USER ENQUIRER TERMINAL IP
========================================================================================================================
01/05/18 1603 A555 CART87565 46573 RBCO NPC SERVICES GW/10/0043
SEARCH ENQUIRY RECORD NO : S48456/06P CHAPTER CODE =
RECORD DISPLAYED : S48853/98D
PRINT REQUESTED : SINGLE RECORD
========================================================================================================================
03/05/18 1107 A555 CERT16574 BTD/54/1786 16475
REF ENQUIRY DHF ID : 58/94710W CHAPTER CODE =
RECORD DISPLAYED : S585988/84H
========================================================================================================================
24/05/18 1015 A555 CERT15473 19625 CBRS DDS SERVICES NM/18/0199
IMAGE ENQUIRY NAME : TREVOR SMITH CHAPTER CODE =
DATE OF BIRTH : / /1957
========================================================================================================================
24/05/18 1025 A555 CERT15473 15325 CBRS DDS SERVICES NM/12/0999
REF ENQUIRY DDS ID : 04/102578R CHAPTER CODE =
========================================================================================================================
Here is an example of the log file and what needs to be extracted and under what header.
To a CSV looking like this
The PowerShell Script LotPings has done works perfectly, I just need User ID to be extracted from the top line, to account for not all records having DOB and there being more than one type of enquiry i.e. Ref Enquiry, Search Enquiry, Image Enquiry.
$FileIn = '.\SO_51209341_data.txt'
$TodayCsv = '.\SO_51209341_data.csv'
$RE1 = [RegEx]'(?m)(?<Date>\d{2}\/\d{2}\/\d{2}) (?<Time>\d{4}) +(?<Terminal>A\d{3}) +(?<User>C[A-Z0-9]+) +(?<Enquirer>.*)$'
$RE2 = [RegEx]'\s+SEARCH REF\s+NAME : (?<Enquiry>.+?) (PAGE|CHAPTER) CODE ='
$RE3 = [RegEx]'\s+DATE OF BIRTH : (?<DOB>[0-9 /]+?/\d{4})'
$Sections = (Get-Content $FileIn -Raw) -split "={30,}`r?`n" -ne ''
$Csv = ForEach($Section in $Sections){
$Row= #{} | Select-Object Date, Time, Terminal, User, Enquirer, Enquiry, DOB
$Cnt = 0
if ($Section -match $RE1) {
++$Cnt
$Row.Date = $Matches.Date
$Row.Time = $Matches.Time
$Row.Terminal = $Matches.Terminal
$Row.User = $Matches.User
$Row.Enquirer = $Matches.Enquirer.Trim()
}
if ($Section -match $RE2) {
++$Cnt
$Row.Enquiry = $Matches.Enquiry
}
if ($Section -match $RE3){
++$Cnt
$Row.DOB = $Matches.DOB
}
if ($Cnt -eq 3) {$Row}
}
$csv | Format-Table
$csv | Export-Csv $Todaycsv -NoTypeInformation

With such precise data the first answer could have been:
## Q:\Test\2018\07\12\SO_51311417.ps1
$FileIn = '.\SO_51311417_data.txt'
$TodayCsv = '.\SO_51311417_data.csv'
$RE0 = [RegEx]'SELECTED FOR OPERATOR\s+(?<UserID>\d{6})'
$RE1 = [RegEx]'(?m)(?<Date>\d{2}\/\d{2}\/\d{2}) (?<Time>\d{4}) +(?<Terminal>A\d{3}) +(?<Enquirer>.*)$'
$RE2 = [RegEx]'\s+(SEARCH|REF|IMAGE) ENQUIRY\s+(?<SearchResult>.+?)\s+(PAGE|CHAPTER) CODE'
$RE3 = [RegEx]'\s+DATE OF BIRTH : (?<DOB>[0-9 /]+?/\d{4})'
$Sections = (Get-Content $FileIn -Raw) -split "={30,}`r?`n" -ne ''
$UserID = "n/a"
$Csv = ForEach($Section in $Sections){
If ($Section -match $RE0){
$UserID = $Matches.UserID
} Else {
$Row= #{} | Select-Object Date,Time,Terminal,UserID,Enquirer,SearchResult,DOB
$Cnt = 0
If ($Section -match $RE1){
$Row.Date = $Matches.Date
$Row.Time = $Matches.Time
$Row.Terminal = $Matches.Terminal
$Row.Enquirer = $Matches.Enquirer.Trim()
$Row.UserID = $UserID
}
If ($Section -match $RE2){
$Row.SearchResult = $Matches.SearchResult
}
If ($Section -match $RE3){
$Row.DOB = $Matches.DOB
}
$Row
}
}
$csv | Format-Table
$csv | Export-Csv $Todaycsv -NoTypeInformation
Sample output
Date Time Terminal UserID Enquirer SearchResult DOB
---- ---- -------- ------ -------- ------------ ---
01/05/18 1603 A555 891234 CART87565 46573 RBCO NPC SERVICES GW/10/0043 RECORD NO : S48456/06P
03/05/18 1107 A555 891234 CERT16574 BTD/54/1786 16475 DHF ID : 58/94710W
24/05/18 1015 A555 891234 CERT15473 19625 CBRS DDS SERVICES NM/18/0199 NAME : TREVOR SMITH / /1957
24/05/18 1025 A555 891234 CERT15473 15325 CBRS DDS SERVICES NM/12/0999 DDS ID : 04/102578R

Powershell: How to merge unique headers from one CSV to another?

Edit 1:
So I've figure out how to get the unique headers in CSV 2 to append to CSV 1.
$header = ($table | Get-Member -MemberType NoteProperty).Name
$header_add = ($table_add | Get-Member -MemberType NoteProperty).Name
$header_diff = $header + $header_add
$header_diff = ($header_diff | Sort-Object -Unique)
$header_diff = (Compare-Object -ReferenceObject $header -DifferenceObject $header_diff -PassThru)
$header is an array of headers from CSV 1 ($table). $header_add is an array of headers from CSV 2 ($table_add). $header_diff houses the unique headers in CSV 2 by the end of the code block.
So as far as I'm aware, my next step would be:
$append = ($table_add | Select-Object $header_diff)
My problem now is how do I append these objects to my CSV 1 ($table 1) object? I don't quite see a way for Add-Member to do this in a particularly nice fashion.
Original:
Here's the headers for the two CSV files I'm trying to combine.
CSV 1:
Date, Name, Assigned Router, City, Country, # of Calls , Calls in , Calls out
CSV 2:
Date, Name, Assigned Router, City, Country, # of Minutes, Minutes in, Minutes out
So a quick rundown of what these files are; both files contain call information for a set of names for one day (the date column has the same date for each row; this is because this eventually gets sent to a master .xlsx file with all dates combined). All of the columns up to Country contain the same values in the same order in both files. The files simply separate the # of calls and # of minutes data. I was wondering if there was a convenient way to move the unlike columns from one CSV to another.
I've tried using something along the lines of:
Import-Csv (Get-ChildItem <directory> -Include <common pattern in file pair>) | Export-Csv <output path> -NoTypeInformation
This didn't combine all of the matching headers and append the unique ones afterwards. Only the first file that's processed kept its unique headers. The second file that was processed had all of those headers and data discarded in the output. Shared header data in the second CSV was added as additional rows.
An example output of my described fail output:
PS > $small | Format-Table
Column_1 Column_2 Column_3
-------- -------- --------
1 a a
1 b b
1 c c
PS > $small_add | Format-Table
Column_1 Column_4 Column_5
-------- -------- --------
1 x x
1 y y
1 z z
PS > Import-Csv (Get-ChildItem ./*.* -Include "small*.csv") | Select-Object * -unique | Format-Table
Column_1 Column_2 Column_3
-------- -------- --------
1 a a
1 b b
1 c c
1
1
1
I was wondering if I could do something like the following algorithm:
Import-Csv CSV_1 and CSV_2 to separate variables
Compare CSV_2 headers to CSV_1 headers, storing the unlike headers in CSV_2 into a separate variable
Select-Object all CSV_1 headers and unlike CSV_2 headers
Pipe the Select-Object output to Export-Csv
The only other method I could only think of is doing it line by line where I would:
Import-Csv both
remove all of the shared columns from CSV_2
change it from the custom object Powershell uses for CSVs to a string
append each line of CSV_2 to each line of CSV_1
It feels a bit unrefined and inflexible (flexibility can probably be dealt with by how columns/headers are isolated so there's no problem appending strings).

* This answer focuses on a high-level-of-abstraction OO solution.
* The OP's own solution relies more on string processing, which has the potential to be faster.
# The input file paths.
$files = 'csv1.csv', 'csv2.csv'
$outFile = 'csvMerged.csv'
# Read the 2 CSV files into collections of custom objects.
# Note: This reads the entire files into memory.
$doc1 = Import-Csv $files[0]
$doc2 = Import-Csv $files[1]
# Determine the column (property) names that are unique to document 2.
$doc2OnlyColNames = (
Compare-Object $doc1[0].psobject.properties.name $doc2[0].psobject.properties.name |
Where-Object SideIndicator -eq '=>'
).InputObject
# Initialize an ordered hashtable that will be used to temporarily store
# each document 2 row's unique values as key-value pairs, so that they
# can be appended as properties to each document-1 row.
$htUniqueRowD2Props = [ordered] #{}
# Process the corresponding rows one by one, construct a merged output object
# for each, and export the merged objects to a new CSV file.
$i = 0
$(foreach($rowD1 in $doc1) {
# Get the corresponding row from document 2.
$rowD2 = $doc2[$i++]
# Extract the values from the unique document-2 columns and store them in the ordered
# hashtable.
foreach($pname in $doc2OnlyColNames) { $htUniqueRowD2Props.$pname = $rowD2.$pname }
# Add the properties represented by the hashtable entries to the
# document-1 row at hand and output the augmented object (-PassThru).
$rowD1 | Add-Member -NotePropertyMembers $htUniqueRowD2Props -PassThru
}) | Export-Csv -NoTypeInformation -Encoding Utf8 $outFile
To put the above to the test, you can use the following sample input:
# Create sample input CSV files
#'
Date,Name,Assigned Router,City,Country,# of Calls,Calls in,Calls out
dt,nm,ar,ct,cy,cc,ci,co
dt2,nm2,ar2,ct2,cy2,cc2,ci2,co2
'# > csv1.csv
# Same column layout and data as above through column 'Country', then different.
#'
Date,Name,Assigned Router,City,Country,# of Minutes,Minutes in,Minutes out
dt,nm,ar,ct,cy,mc,mi,mo
dt2,nm2,ar2,ct2,cy2,mc2,mi2,mo2
'# > csv2.csv
The code should produce the following content in csvMerged.csv:
"Date","Name","Assigned Router","City","Country","# of Calls","Calls in","Calls out","# of Minutes","Minutes in","Minutes out"
"dt","nm","ar","ct","cy","cc","ci","co","mc","mi","mo"
"dt2","nm2","ar2","ct2","cy2","cc2","ci2","co2","mc2","mi2","mo2"

Edit 1:
# Read 2 CSVs into PowerShell CSV object
$table = Import-Csv test.csv
$table_add = Import-Csv test_add.csv
# Isolate unique headers in second CSV
$unique_headers = (Compare-Object -ReferenceObject $table[0].PSObject.Properties.Name -DifferenceObject $table_add[0].PSObject.Properties.Name | Where-Object SideIndicator -eq "=>").InputObject
# Convert CSVs to strings, with second CSV only containing unique columns
$table_str = ($table | ConvertTo-Csv -NoTypeInformation)
$table_add_str = ($table_add | Select-Object $unique_headers | ConvertTo-Csv -NoTypeInformation)
# Append CSV 2's unique columns to CSV 1
# Set line counter
$line = 0
# Concatenate CSV 2 lines to the end of CSV 1 lines until one or both are out of lines
While (($table_str[$line] -ne $null) -and ($table_add_str[$line] -ne $null)) {
If ($line -eq 0) {
$table_sum_str = $table_str[$line] + "," + $table_add_str[$line]
}
If ($line -ne 0) {
$table_sum_str = $table_sum_str + "`n" + ($table_str[$line] + "," + $table_add_str[$line])
}
$line = $line + 1
}
$table_sum_str | Set-Content -Path $outpath -Encoding UTF8
Using Measure-Command, the above code on my machine for the most part takes anywhere between 14-17 milliseconds to run. Running Measure-Command on mklement's yields effectively the same times from just eyeballing it.
Note that for both solutions, the data in the 2 CSV files must be in the same order. If you want to add 2 CSVs together that have complimentary data but in different orders, you need to use mklement's object oriented approach and add mechanisms to match the data to a location or name.
Original:
For those who don't want to use a hash table to do this:
# Make sure you're in same directory as files:
# CSV 1
$table = Import-Csv test.csv
# CSV 2
$table_add = Import-Csv test_add.csv
# Get array with CSV 1 headers
$header = ($table | Get-Member -MemberType NoteProperty).Name
# Get array with CSV 2 headers
$header_add = ($table_add | Get-Member -MemberType NoteProperty).Name
# Add arrays of both headers together
$header_diff = $header + $header_add
# Sort the headers, remove duplicate headers (first couple ones), keep unique ones
$header_diff = ($header_diff | Sort-Object -Unique)
# Remove all of CSV 1's unique headers and shared headers
$header_diff = (Compare-Object -ReferenceObject $header -DifferenceObject $header_diff -PassThru)
# Generate a CSV table containing only CSV 2's unique headers
$table_diff = ($table_add | Select-Object $header_diff)
# Convert CSV 1 from a custom PSObject to a string
$table_str = ($table | Select-Object * | ConvertTo-Csv)
# Convert CSV 2 (unique headers only) from custom PSObject to a string
$table_diff_str = ($table_diff | Select-Object * | ConvertTo-Csv)
# Set line counter
$line = 0
# Set flag for if headers have been processed
$headproc = 0
# Concatenate CSV 2 lines to the end of CSV 1 lines until one or both are out of lines.
While (($table_str[$line] -ne $null) -and ($table_diff_str[$line] -ne $null)) {
If ($headproc -eq 1) {
$table_sum_str = $table_sum_str + "`n" + ($table_str[$line] + "," + $table_diff_str[$line])
}
If ($headproc -eq 0) {
$table_sum_str = $table_str[$line] + "," + $table_diff_str[$line]
$headproc = 1
}
$line = $line + 1
}
$table_sum_str | ConvertFrom-Csv | Select-Object * | Export-Csv -Path "./test_sum.csv" -Encoding UTF8 -NoTypeInformation
Ran a quick comparison using Measure-Command between this and mklement0's script.
PS > Measure-Command {./self.ps1}
Days : 0
Hours : 0
Minutes : 0
Seconds : 0
Milliseconds : 26
Ticks : 267771
TotalDays : 3.09920138888889E-07
TotalHours : 7.43808333333333E-06
TotalMinutes : 0.000446285
TotalSeconds : 0.0267771
TotalMilliseconds : 26.7771
PS > Measure-Command {./mklement.ps1}
Days : 0
Hours : 0
Minutes : 0
Seconds : 0
Milliseconds : 18
Ticks : 185058
TotalDays : 2.141875E-07
TotalHours : 5.1405E-06
TotalMinutes : 0.00030843
TotalSeconds : 0.0185058
TotalMilliseconds : 18.5058
I assume speed differences are because I spend time creating a separate CSV PSObject to isolate columns instead of comparing them directly. mklement's also has the advantage of keeping the columns in the same order.

Edit one .CSV using Information from Another

I have two .csv files, one with a listing of employee ID's and a department identification number, and another with a listing of all equipment registered to them. The two files share the employee ID field, and I would like to take the department number from the first file and add it to each piece of the corresponding employee's equipment in the second file (or possibly output a third file with the joined information if that is the most expedient method). So far I have pulled the information I need from the first file and am storing it in a hash table, which I believe I should be able to use to compare to the other file, but I'm not sure exactly how to go about that. The other questions I have found on the site that may be related seem to be exclusively about checking for duplicates/changes between the two files. Any help would be much appreciated. Here is the code I have for creating the hashtable:
Import-Csv "filepath\filename.csv"|ForEach-Object -Begin{
$ids = #{}
} -Process {
$ids.Add($_.UserID,$_.'Cost Center')}
Edit:
Here is a sample of data:
First CSV:
UserID | Legal Name | Department
---------------------------------
XXX123| Namey Mcnamera | 1234
XXX321| Chet Manley | 4321
XXX000| Ron Burgundy | 9999
Second CSV:
Barcode | User ID | Department
--------------------------------
000000000000 | xxx123 | 0000
111111111111 | xxx123 | 0000
222222222222 | xxx123 | 0000
333333333333 | xxx321 | 0000
444444444444 | xxx321 | 0000
555555555555 | xxx000 | 0000
The second csv also has several more columns of data, but these three are the only ones I care about.
Edit 2:
Using this code from #wOxxOm (edited to add -force parameters as was receiving an error when attempting to write to department column due to an entry already existing):
$csv1 = Import-Csv "filename.csv"
$csv2 = Import-CSV "filename.csv"
$indexKey = 'UserID'
$index1 = #{}; foreach($row in $csv1){$index1[$row.$indexKey] = $row.'department'}
$copyfield = 'department'
foreach($row in $csv2){
if ($matched = $index1[$row.'User ID']){
Add-Member #{$copyField = $matched.$copyfield} -InputObject $row -Force
}
}
export-csv 'filepath.csv' -NoTypeInformation -Encoding UTF8 -InputObject $csv2 -Force
outputs the following information:
Count Length LongLength Rank SyncRoot IsReadOnly IsFixedSize IsSynchronized
48 48 48 1 System.Object[] FALSE TRUE FALSE
EDIT 3:
Got everything worked out with help from #Ross Lyons. Working code is as follows:
#First Spreadsheet
$users = Import-Csv "filepath.csv"
#Asset Listing
$assets = Import-Csv "filepath.csv"
[System.Array]$data = ""
#iterating through each row in first spreadsheet
foreach ($user in $users) {
#iterating through each row in the second spreadsheet
foreach ($asset in $assets) {
#compare user ID's in each spreadsheet
if ($user.UserID -eq $asset.'User ID'){
#if it matches up, copy the department data, user ID and barcode from appropriate spreadsheets
$data += $user.UserID + "," + $user."Department" + "," + $asset."Barcode" + ","
}
}
}
$data | Format-Table | Out-File "exportedData.csv" -encoding ascii -Force

Ok first, be gentle please, I'm still learning myself! Let me know if the following works or if anything is glaringly obviously wrong...
#this is your first spreadhseet with usernames & department numbers
$users = Import-Csv "spreadsheet1.csv"
#this is your second spreadsheet with equipment info & user ID's, but no department numbers
$assets = Import-Csv "spreadsheet2.csv"
#set a variable for your export data to null, so we can use it later
$export = ""
#iterating through each row in first spreadsheet
foreach ($user in $users) {
#iterating through each row in the second spreadsheet
foreach ($asset in $assets) {
#compare user ID's in each spreadsheet
if ($user.UserID -like $asset.'User ID')
#if it matches up, copy the department data, user ID and barcode from appropriate spreadsheets
$data = "$user.UserID" + "," + "$user.Department" + "," + "$asset.barcode" + "," + "~"
#splits the data based on the "~" that we stuck in at the end of the string
$export = $data -split "~" | Out-File "exportedData.csv" -Encoding ascii
}
}
Let me know what you think. Yes, I know this is probably not the best or most efficient way of doing it, but I think it will get the job done.
If this doesn't work, let me know and I'll have another crack at it.

The hashtable key should be the common field, its value should be the entire row which you can simply access later as $hashtable[$key]:
$csv1 = Import-Csv 'r:\1.csv'
$csv2 = Import-Csv 'r:\2.csv'
# build the index
$indexKey = 'employee ID'
$index1 = #{}; foreach ($row in $csv1) { $index1[$row.$indexKey] = $row }
# use the index
$copyField = 'department number'
foreach ($row in $csv2) {
if ($matched = $index1[$row.$indexKey]) {
Add-Member #{$copyField = $matched.$copyField} -InputObject $row
}
}
Export-Csv 'r:\merged.csv' -NoTypeInformation -Encoding UTF8 -InputObject $csv2
The code doesn't use pipelines for overall speedup.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

How to convert text files in a folder into a CSV file - powershell

Related

Get a logfile for a specific date

How to fix broken lines records of a file using PowerShell?

.txt Log File Data Extraction Output to CSV with REGEX

Powershell: How to merge unique headers from one CSV to another?

Edit one .CSV using Information from Another

Categories

Resources