How can I transpose and parse a large vertical text file into a CSV file with headers?

How can I transpose and parse a large vertical text file into a CSV file with headers? - powershell

I have a large text file (*.txt) in the following format:
; KEY 123456
; Any Company LLC
; 123 Main St, Anytown, USA
SEC1 = xxxxxxxxxxxxxxxxxxxxx
SEC2 = xxxxxxxxxxxxxxxxxxxxx
SEC3 = xxxxxxxxxxxxxxxxxxxxx
SEC4 = xxxxxxxxxxxxxxxxxxxxx
SEC5 = xxxxxxxxxxxxxxxxxxxxx
SEC6 = xxxxxxxxxxxxxxxxxxxxx
This is repeated for about 350 - 400 keys. These are HASP keys and the SEC codes associated with them. I am trying to parse this file into a CSV file with KEY and SEC1 - SEC6 as the headers, with the rows being filled in. This is the format I am trying to get to:
KEY,SEC1,SEC2,SEC3,SEC4,SEC5,SEC6
123456,xxxxxxxxxx,xxxxxxxxxxx,xxxxxxxxxx,xxxxxxxxxx,xxxxxxxxxx,xxxxxxxxxx
456789,xxxxxxxxxx,xxxxxxxxxx,xxxxxxxxxx,xxxxxxxxxx,xxxxxxxxxx,xxxxxxxxxx
I have been able to get a script to export to a CSV with only one key in the text file (my test file), but when I try to run it on the full list, it only exports the last key and sec codes.
$keysheet = '.\AllKeys.txt'
$holdarr = #{}
Get-Content $keysheet | ForEach-Object {
if ($_ -match "KEY") {
$key, $value = $_.TrimStart("; ") -split " "
$holdarr[$key] = $value }
elseif ($_ -match "SEC") {
$key, $value = $_ -split " = "
$holdarr[$key] = $value }
}
$hash = New-Object PSObject -Property $holdarr
$hash | Export-Csv -Path '.\allsec.csv' -NoTypeInformation
When I run it on the full list, it also adds a couple of extra columns with what looks like properties instead of values.
Any help to get this to work would be appreciated.
Thanks.

Here's the approach I suggest:
$output = switch -Regex -File './AllKeys.txt' {
'^; KEY (?<key>\d+)' {
if ($o) {
[pscustomobject]$o
}
$o = #{
KEY = $Matches['key']
}
}
'^(?<sec>SEC.*?)\s' {
$o[$Matches['sec']] = ($_ | ConvertFrom-StringData)[$Matches['sec']]
}
default {
Write-Warning -Message "No match found: $_"
}
}
# catch the last object
$output += [pscustomobject]$o
$output | Export-Csv -Path './some.csv' -NoTypeInformation

This would be one approach.
& {
$entry = $null
switch -Regex -File '.\AllKeys.txt' {
"KEY" {
if ($entry ) {
[PSCustomObject]$entry
}
$entry = #{}
$key, $value = $_.TrimStart("; ") -split " "
$entry[$key] = [int]$value
}
"SEC" {
$key, $value = $_ -split " = "
$entry[$key] = $value
}
}
[PSCustomObject]$entry
} | sort KEY | select KEY,SEC1,SEC2,SEC3,SEC4,SEC5,SEC6 |
Export-Csv -Path '.\allsec.csv' -NoTypeInformation

Lets leverage the strength of ConvertFrom-StringData which
Converts a string containing one or more key and value pairs to a hash table.
So what we will do is
Split into blocks of text
edit the "; Key" line
Remove an blank lines or semicolon lines.
Pass to ConvertFrom-StringData to create a hashtable
Convert that to a PowerShell object
$path = "c:\temp\keys.txt"
# Split the file into its key/sec collections. Drop any black entries created in the split
(Get-Content -Raw $path) -split ";\s+KEY\s+" | Where-Object{-not [string]::IsNullOrWhiteSpace($_)} | ForEach-Object{
# Split the block into lines again
$lines = $_ -split "`r`n" | Where-Object{$_ -notmatch "^;" -and -not [string]::IsNullOrWhiteSpace($_)}
# Edit the first line so we have a full block of key=value pairs.
$lines[0] = "key=$($lines[0])"
# Use ConvertFrom-StringData to do the leg work after we join the lines back as a single string.
[pscustomobject](($lines -join "`r`n") | ConvertFrom-StringData)
} |
# Cannot guarentee column order so we force it with this select statement.
Select-Object KEY,SEC1,SEC2,SEC3,SEC4,SEC5,SEC6
Use Export-CSV to your hearts content now.

Related

How to split through the whole list using PowerShell

In my CSV file I have "SharePoint Site" column and a few other columns. I'm trying to split the ID from "SharePoint Site" columns and put it to the new column call "SharePoint ID" but not sure how to do it so I'll be really appreciated If I can get any help or suggestion.
$downloadFile = Import-Csv "C:\AuditLogSearch\New folder\Modified-Audit-Log-Records.csv"
(($downloadFile -split "/") -split "_") | Select-Object -Index 5
CSV file
SharePoint Site
Include:[https://companyname-my.sharepoint.com/personal/elksn7_nam_corp_kl_com]
Include:[https://companyname-my.sharepoint.com/personal/tzksn_nam_corp_kl_com]
Include:[https://companyname.sharepoint.com/sites/msteams_c578f2/Shared%20Documents/Forms/AllItems.aspx?id=%2Fsites%2Fmsteams%5Fc578f2%2FShared%20Documents%2FBittner%2DWilfong%20%2D%20Litigation%20Hold%2FWork%20History&viewid=b3e993a1%2De0dc%2D4d33%2D8220%2D5dd778853184]
Include:[https://companyname.sharepoint.com/sites/msteams_c578f2/Shared%20Documents/Forms/AllItems.aspx?id=%2Fsites%2Fmsteams%5Fc578f2%2FShared%20Documents%2FBittner%2DWilfong%20%2D%20Litigation%20Hold%2FWork%20History&viewid=b3e993a1%2De0dc%2D4d33%2D8220%2D5dd778853184]
Include:[All]
After spliting this will show it under new Column call "SharePoint ID"
SharePoint ID
2. elksn
3. tzksn
4. msteams_c578f2
5. msteams_c578f2
6. All

Try this:
# Import csv into an array
$Sites = (Import-Csv C:\temp\Modified-Audit-Log-Records.csv).'SharePoint Site'
# Create Export variable
$Export = #()
# ForEach loop that goes through the SharePoint sites one at a time
ForEach($Site in $Sites){
# Clean up the input to leave only the hyperlink
$Site = $Site.replace('Include:[','')
$Site = $Site.replace(']','')
# Split the hyperlink at the fifth slash (Split uses binary, so 0 would be the first slash)
$SiteID = $Site.split('/')[4]
# The 'SharePoint Site' Include:[All] entry will be empty after doing the split, because it has no 4th slash.
# This If statement will detect if the $Site is 'All' and set the $SiteID as that.
if($Site -eq 'All'){
$SiteID = $Site
}
# Create variable to export Site ID
$SiteExport = #()
$SiteExport = [pscustomobject]#{
'SharePoint ID' = $SiteID
}
# Add each SiteExport to the Export array
$Export += $SiteExport
}
# Write out the export
$Export

A concise solution that appends a Sharepoint ID column to the existing columns by way of a calculated property:
Import-Csv 'C:\AuditLogSearch\New folder\Modified-Audit-Log-Records.csv' |
Select-Object *, #{
Name = 'SharePoint ID'
Expression = {
$tokens = $_.'SharePoint Site' -split '[][/]'
if ($tokens.Count -eq 3) { $tokens[1] } # matches 'Include:[All]'
else { $tokens[5] -replace '_nam_corp_kl_com$' }
}
}
Note:
To see all resulting column values, pipe the above to Format-List.
To re-export the results to a CSV file, pipe to Export-Csv

You have 3 distinct patterns you are trying to extract data from. I believe regex would be an appropriate tool.
If you are wanting the new csv to just have the single ID column.
$file = "C:\AuditLogSearch\New folder\Modified-Audit-Log-Records.csv"
$IdList = switch -Regex -File ($file){
'Include:.+(?=/(\w+?)_)(?<=personal)' {$matches.1}
'Include:(?=\[(\w+)\])' {$matches.1}
'Include:.+(?=/(\w+?)/)(?<=sites)' {$matches.1}
}
$IdList |
ConvertFrom-Csv -Header "Sharepoint ID" |
Export-Csv -Path $newfile -NoTypeInformation
If you want to add a column to your existing CSV
$file = "C:\AuditLogSearch\New folder\Modified-Audit-Log-Records.csv"
$properties = ‘*’,#{
Name = 'Sharepoint ID'
Expression = {
switch -Regex ($_.'sharepoint Site'){
'Include:.+(?=/(\w+?)_)(?<=personal)' {$matches.1}
'Include:(?=\[(\w+)\])' {$matches.1}
'Include:.+(?=/(\w+?)/)(?<=sites)' {$matches.1}
}
}
}
Import-Csv -Path $file |
Select-Object $properties |
Export-Csv -Path $newfile -NoTypeInformation
Regex details
.+ Match any amount of any character
(?=...) Positive look ahead
(...) Capture group
\w+ Match one or more word characters
? Lazy quantifier
(?<=...) Positive look behind

This would require more testing to see if it works well, but with the input we have it works, the main concept is to use System.Uri to parse the strings. From what I'm seeing, the segment you are looking for is always the third one [2] and depending on the previous segments, perform a split on _ or trim the trailing / or leave the string as is if IsAbsoluteUri is $false.
$csv = Import-Csv path/to/test.csv
$result = foreach($line in $csv)
{
$uri = [uri]($line.'SharePoint Site' -replace '^Include:\[|]$')
$id = switch($uri)
{
{-not $_.IsAbsoluteUri} {
$_
break
}
{ $_.Segments[1] -eq 'personal/' } {
$_.Segments[2].Split('_')[0]
break
}
{ $_.Segments[1] -eq 'sites/' } {
$_.Segments[2].TrimEnd('/')
}
}
[pscustomobject]#{
'SharePoint Site' = $line.'SharePoint Site'
'SharePoint ID' = $id
}
}
$result | Format-List

Remove the need to use out-file only to import the file immediately using PowerShell just to convert the base type

I am attempting to turn the file below into one that contains no comments '#', no blank lines, no unneeded spaces, and only one entry per line. I'm unsure how to run the following code without the need to output the file and then reimport it. There should be code that doesn't require that step but I can't find it. The way I wrote my script also doesn't look right to me even though it works. As if there was a more elegant way of doing what I'm attempting but I just don't see it.
Before File Change: TNSNames.ora
#Created 9_27_16
#Updated 8_30_19
AAAA.world=(DESCRIPTION =(ADDRESS_LIST =
(ADDRESS =
(COMMUNITY = tcp.world)
(PROTOCOL = TCP)
(Host = www.url1111.com)
(Port = 1111)
)
)
(CONNECT_DATA = (SID = SID1111)
)
)
#Created 9_27_16
BBBB.world=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(COMMUNITY=tcp.world)(PROTOCOL=TCP)(Host=url2222.COM)(Port=2222))(ADDRESS=(COMMUNITY=tcp.world)(PROTOCOL=TCP)(Host=url22222.COM)(Port=22222)))(CONNECT_DATA=(SID=SID2222)))
CCCC.world=(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(Host=url3333.COM)(Port=3333))(CONNECT_DATA=(SID=SID3333)))
DDDD.url =(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(COMMUNITY=tcp.world)(PROTOCOL=TCP)(Host=URL4444 )(Port=4444))(ADDRESS=(COMMUNITY=TCP.world)(PROTOCOL=TCP)(Host=URL44444 )(Port=44444)))(CONNECT_DATA=(SID=SID4444 )(GLOBAL_NAME=ASDF.URL)))
#Created 9_27_16
#Updated 8_30_19
After File Change:
AAAA.world=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(COMMUNITY=tcp.world)(PROTOCOL=TCP)(Host=www.url1111.com)(Port=1111)))(CONNECT_DATA=(SID=SID1111)))
BBBB.world=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(COMMUNITY=tcp.world)(PROTOCOL=TCP)(Host=url2222.COM)(Port=2222))(ADDRESS=(COMMUNITY=tcp.world)(PROTOCOL=TCP)(Host=url22222.COM)(Port=22222)))(CONNECT_DATA=(SID=SID2222)))
CCCC.world=(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(Host=url3333.COM)(Port=3333))(CONNECT_DATA=(SID=SID3333)))
DDDD.url=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(COMMUNITY=tcp.world)(PROTOCOL=TCP)(Host=URL4444)(Port=4444))(ADDRESS=(COMMUNITY=TCP.world)(PROTOCOL=TCP)(Host=URL44444)(Port=44444)))(CONNECT_DATA=(SID=SID4444)(GLOBAL_NAME=ASDF.URL)))
Code:
# Get the file
[System.IO.FileInfo] $File = 'C:\temp\TNSNames.ora'
[string] $data = (Get-Content $File.FullName | Where-Object { !$_.StartsWith('#') }).ToUpper()
# Convert the data. This part is where any (CONNECT_DATA entry ends up on it's own line.
$Results = $data.Replace(" ", "").Replace("`t", "").Replace(")))", ")))`n")
# Convert $Results from BaseType of System.Object to System.Array
$Path = '.\.vscode\StringResults.txt'
$Results | Out-File -FilePath $Path
$Results = Get-Content $Path
# Find all lines that start with '(CONNECT_DATA'
for ($i = 0; $i -lt $Results.Length - 1; $i++) {
if ($Results[$i + 1].StartsWith("(CONNECT_DATA")) {
# Add the '(CONNECT_DATA' line to the previous line
$Results[$i] = $Results[$i] + $Results[$i + 1]
# Blank out the '(CONNECT_DATA' line
$Results[$i + 1] = ''
}
}
# Remove all blank lines
$FinalForm = $null
foreach ($Line in $Results) {
if ($Line -ne "") {
$FinalForm += "$Line`n"
}
}
$FinalForm

So the crux of your problem is that you have declared $data as a [string] which is fine because probably some of your replace operations work better as a single string. Its just that $Results also then ends up being a string so when you try to index into $Results near the bottom these operations fail. You can however easily turn your $Results variable into a string array using the -split operator this would eliminate the need to save the string to disk and import back in just to accomplish the same. See comments below.
# Get the file
[System.IO.FileInfo] $File = 'C:\temp\TNSNames.ora'
[string] $data = (Get-Content $File.FullName | Where-Object { !$_.StartsWith('#') }).ToUpper()
# Convert the data. This part is where any (CONNECT_DATA entry ends up on it's own line.
$Results = $data.Replace(' ', '').Replace("`t", '').Replace(')))', ")))`n")
# You do not need to do this next section. Essentially this is just saving your multiline string
# to a file and then using Get-Content to read it back in as a string array
# Convert $Results from BaseType of System.Object to System.Array
# $Path = 'c:\temp\StringResults.txt'
# $Results | Out-File -FilePath $Path
# $Results = Get-Content $Path
# Instead split your $Results string into multiple lines using -split
# this will do the same thing as above without writing to file
$Results = $Results -split "\r?\n"
# Find all lines that start with '(CONNECT_DATA'
for ($i = 0; $i -lt $Results.Length - 1; $i++) {
if ($Results[$i + 1].StartsWith('(CONNECT_DATA')) {
# Add the '(CONNECT_DATA' line to the previous line
$Results[$i] = $Results[$i] + $Results[$i + 1]
# Blank out the '(CONNECT_DATA' line
$Results[$i + 1] = ''
}
}
# Remove all blank lines
$FinalForm = $null
foreach ($Line in $Results) {
if ($Line -ne '') {
$FinalForm += "$Line`n"
}
}
$FinalForm
Also, for fun, try this out
((Get-Content 'C:\temp\tnsnames.ora' |
Where-Object {!$_.StartsWith('#') -and ![string]::IsNullOrWhiteSpace($_)}) -join '' -replace '\s' -replace '\)\s?\)\s?\)', ")))`n" -replace '\r?\n\(Connect_data','(connect_data').ToUpper()

Powershell - Store hash table in file and read its content

As follow-up, suggested by Doug, on my previous question on anonymizing file (
PowerShell - Find and replace multiple patterns to anonymize file) I need to save all hash tables values in single file "tmp.txt" for further processing.
Example: after processing the input file with string like:
<requestId>>qwerty-qwer12-qwer56</requestId>
the tmp.txt file contains:
qwerty-qwer12-qwer56 : RequestId-1
and this is perfect. The problem is when working with many strings, in the tmp.txt file there are more pairs than there should be. In my example below in tmp.txt I should see 4 times the "RequestId-x" but there are 6. Also when there are 2 or more "match" on the same line, only the first is updated/replaced. Any idea from where these extra lines comes from? Any why the script doesn't continue to check till the end of the same line?
Here is my test code:
$log = "C:\log.txt"
$tmp = "C:\tmp.txt"
Clear-Content $log
Clear-Content $tmp
#'
<requestId>qwerty-qwer12-qwer56</requestId>qwertykeyId>Qwd84lPhjutf7Nmwr56hJndcsjy34imNQwd84lPhjutZ7Nmwr56hJndcsjy34imNPozDr5</ABC reportId>poGd56Hnm9q3Dfer6Jh</msg:reportId>
<requestId>zxcvbn-zxcv12-zxcv56</requestId>
<requestId>qwerty-qwer12-qwer56</requestId>abcde reportId>plmkjh8765FGH4rt6As</msg:reportId>
<requestId>1234qw-12qw12-12qw56</requestId>
keyId>Qwd84lPhjutf7Nmwr56hJndcsjy34imNQwd84lPhjutZ7Nmwr56hJndcsjy34imNPozDr5</
keyId>Qwd84lPhjutf7Nmwr56hJndcsjy34imNQwd84lPhjutZ7Nmwr56hJndcsjy34imNPozDr5</
keyId>Zdjgi76Gho3sQw0ib5Mjk3sDyoq9zmGdZdjgi76Gho3sQw0ib5Mjk3sDyoq9zmGdLkJpQw</
reportId>plmkjh8765FGH4rt6As</msg:reportId>
reportId>plmkjh8765FGH4rt6As</msg:reportId>
reportId>poGd56Hnm9q3Dfer6Jh</msg:reportId>
'# | Set-Content $log -Encoding UTF8
$requestId = #{
Count = 1
Matches = #()
}
$keyId = #{
Count = 1
Matches = #()
}
$reportId = #{
Count = 1
Matches = #()
}
$output = switch -Regex -File $log {
'(\w{6}-\w{6}-\w{6})' {
if(!$requestId.matches.($matches.1))
{
$req = $requestId.matches += #{$matches.1 = "RequestId-$($requestId.count)"}
$requestId.count++
$req.keys | %{ Add-Content $tmp "$_ : $($req.$_)" }
}
$_ -replace $matches.1,$requestId.matches.($matches.1)
}
'keyId>(\w{70})</' {
if(!$keyId.matches.($matches.1))
{
$kid = $keyId.matches += #{$matches.1 = "keyId-$($keyId.count)"}
$keyId.count++
$kid.keys | %{ Add-Content $tmp "$_ : $($kid.$_)" }
}
$_ -replace $matches.1,$keyId.matches.($matches.1)
}
'reportId>(\w{19})</msg:reportId>' {
if(!$reportId.matches.($matches.1))
{
$repid = $reportId.matches += #{$matches.1 = "Report-$($reportId.count)"}
$reportId.count++
$repid.keys | %{ Add-Content $tmp "$_ : $($repid.$_)" }
}
$_ -replace $matches.1,$reportId.matches.($matches.1)
}
default {$_}
}
$output | Set-Content $log -Encoding UTF8
Get-Content $log
Get-Content $tmp

If you don't care about the order in which they were found, which I assume you wouldn't if you don't want duplicates, just export them all at the end. I would still keep them in an "object" form so you can easily import/export them. Csv would be an ideal candidate for the data.
$requestId,$keyid,$reportid | Foreach-Object {
foreach($key in $_.matches.keys)
{
[PSCustomObject]#{
Original = $key
Replacement = $_.matches.$key
}
}
}
The data output to console for this example
Original Replacement
-------- -----------
qwerty-qwer12-qwer56 RequestId-1
zxcvbn-zxcv12-zxcv56 RequestId-2
1234qw-12qw12-12qw56 RequestId-3
Qwd84lPhjutf7Nmwr56hJndcsjy34imNQwd84lPhjutZ7Nmwr56hJndcsjy34imNPozDr5 keyId-1
Zdjgi76Gho3sQw0ib5Mjk3sDyoq9zmGdZdjgi76Gho3sQw0ib5Mjk3sDyoq9zmGdLkJpQw keyId-2
poGd56Hnm9q3Dfer6Jh Report-1
plmkjh8765FGH4rt6As Report-2
Just pipe it into Export-Csv
$requestId,$keyid,$reportid | Foreach-Object {
foreach($key in $_.matches.keys)
{
[PSCustomObject]#{
Original = $key
Replacement = $_.matches.$key
}
}
} | Export-Csv $tmp -NoTypeInformation

String trim and split

I have a text file that I read and I need to get the values from.
Example text file:
[Site 01]
DBServer=LocalHost
DBName=Database01
Username=admin
Password=qwerty
[Site 02]
DBServer=192.168.0.10
DBName=Database02
Username=admin
Password=qwerty
Currently my code reads through the file and places each each as an array entry for each line DBServer= that is found and this text file can have many sites:
$NumOfSites = Get-Content $Sites |
Select-String -Pattern "DBServer=" -Context 0,3
$i = 0
$NumOfSites | ForEach-Object {
$svr = $NumOfSites[$i] -isplit "\n" |
% { ($_ -isplit 'DBServer=').Trim()[1] }
$db = $NumOfSites[$i] -isplit "\n" |
% { ($_ -isplit 'DBName='.Trim())[1] }
$uid = $NumOfSites[$i] -isplit "\n" |
% { ($_ -isplit 'Username='.Trim())[1] }
$pswd = $NumOfSites[$i] -isplit "\n" |
% { ($_ -isplit 'Password='.Trim())[1] }
$i = $i+1
}
I can't get each attribute to split out properly without some extra spaces or something nicely as a string variable.
I just need to extract the info to put into an SQL connection line as variables from the format of the file example I have.

Other than the record headers (i.e. [Site 01]) the rest can be handled by ConvertFrom-StringData just fine. We can just convert the records to objects directly splitting on the header row more or less. ConvertFrom-StringData turns a multi-line string into a hashtable, and you can just cast that as a [PSCustomObject] and viola, you have objects that are easy to use.
$NumOfSites = Get-Content $Sites -raw
$SiteObjects = $NumOfSites -split '\[.+?\]'|%{[PSCustomObject](ConvertFrom-StringData -StringData $_)}
Then you can manipulate $SiteObjects however you see fit (output to CSV if you want, or filter on any property using Select-Object). Or, if you're looking to make connections you can loop through it building your connections as needed...
ForEach($Connection in $SiteObjects){
$ConStr = "Server = {0}; Database = {1}; Integrated Security = False; User ID = {2}; Password = {3};" -f $Connection.DBServer.Trim(), $Connection.DBName.Trim(), $Connection.Username.Trim(), $Connection.Password.Trim()
<Do stuff with SQL>
}
Edit: Updating my answer since the sample text was changed to add <pre> and </pre>. We just need to remove those, and since the OP is getting errors about methods on null values we'll filter for null as well.
$NumOfSites = Get-Content $Sites -raw
$SiteObjects = $NumOfSites -replace '<.*?>' -split '\[.+?\]' | ?{$_} |%{[PSCustomObject](ConvertFrom-StringData -StringData $_)}
ForEach($Connection in $SiteObjects){
$svr = $Connection.DBServer.Trim()
$db = $Connection.DBName.Trim()
$uid = $Connection.Username.Trim()
$pwd = $Connection.Password.Trim()
}

Here's a suggestion if you only care about getting the value after the equals:
Get-Content Example.txt |
ForEach-Object {
Switch -Regex ($_) {
'dbs.+=' { $svr = ($_ -replace '.+=').Trim()
.. etc ..
}
}
Get-Content piped to ForEach-Object will interpret each line as its own object.
Edit:
You were most of the way there, but it's unnecessary to -split the lines
$NumOfSites = Get-Content $Sites | Select-String -pattern "DBServer=" -Context 0,3
$NumOfSites | ForEach-Object {
Switch -Wildcard ($_) {
'DBS*=' { $svr = ($_ -replace '.+=').Trim() }
'DBN*=' { $db = ($_ -replace '.+=').Trim() }
'U*=' { $uid = ($_ -replace '.+=').Trim() }
'P*=' { $pw = ($_ -replace '.+=').Trim() }
}
}

How to merge all contents in two csv files where records match off 1 column

I have two csv files. They both have SamAccountName in common. User records may or may not have a match found for every record between both files (THIS IS VERY IMPORTANT TO NOTE).
I am trying to basically just merge all columns (and their values) into one file (based from the SamAccountNames found in the first file...).
If the SamAccountName is not found in the 2nd file, it should add all null values for that user record in the merged file (since the record was found in the first file).
If the SamAccountName is found in the 2nd file, but not in the first, it should ignore merging that record.
Number of columns in each file may vary (5, 10, 2, so forth...).
Function MergeTwoCsvFiles
{
Param ([String]$baseFile, [String]$fileToBeMerged, [String]$columnTitleLineInFileToBeMerged)
$baseFileCsvContents = Import-Csv $baseFile
$fileToBeMergedCsvContents = Import-Csv $fileToBeMerged
$baseFileContents = Get-Content $baseFile
$baseFileContents[0] += "," + $columnTitleLineInFileToBeMerged
$baseFileCsvContents | ForEach-Object {
$matchFound = $False
$baseSameAccountName = $_.SamAccountName
[String]$mergedLineInFile = $_
[String]$lineMatchFound = $fileToBeMergedCsvContents | Where-Object {$_.SamAccountName -eq $baseSameAccountName}
Write-Host '$mergedLineInFile =' $mergedLineInFile
Write-Host '$lineMatchFound =' $lineMatchFound
Exit
}
}
The problem is, the record in the file is being written as a hash table instead of a string like line (if you were to view it as .txt). So I'm not really sure how to do this...
Adding results csv example files...
First CSV File
"SamAccountName","sn","GivenName"
"PBrain","Pinky","Brain"
"JSteward","John","Steward"
"JDoe","John","Doe"
"SDoo","Scooby","Doo"
Second CSV File
"SamAccountName","employeeNumber","userAccountControl","mail"
"KYasunori","678213","546","KYasunori#mystuff.com"
"JSteward","43518790","512","JSteward#mystuff.com"
"JKibogabi","24356","546","JKibogabi#mystuff.com"
"JDoe","902187u4","1114624","JDoe#mystuff.com"
"CStrife","54627","512","CStrife#mystuff.com"
Expected Merged CSV File
"SamAccountName","sn","GivenName","employeeNumber","userAccountControl","mail"
"PBrain","Pinky","Brain","","",""
"JSteward","John","Steward","43518790","512","JSteward#mystuff.com"
"JDoe","John","Doe","902187u4","1114624","JDoe#mystuff.com"
"SDoo","Scooby","Doo","","",""
Note: This will be part of a loop process in merging multiple files, so I would like to avoid hardcoding the title names (with $_.SamAccountName as an exception)
Trying suggestion from "restless 1987" (Not Working)
$baseFileCsvContents = Import-Csv 'D:\Scripts\Powershell\Tests\base.csv'
$fileToBeMergedCsvContents = Import-Csv 'D:\Scripts\Powershell\Tests\lookup.csv'
$resultsFile = 'D:\Scripts\Powershell\Tests\MergedResults.csv'
$resultsFileContents = #()
$baseFileContents = Get-Content 'D:\Scripts\Powershell\Tests\base.csv'
$recordsMatched = compare-object $baseFileCsvContents $fileToBeMergedCsvContents -Property SamAccountName
switch ($recordsMatched)
{
'<=' {}
'=>' {}
'==' {$resultsFileContents += $_}
}
$resultsFileCsv = $resultsFileContents | ConvertTo-Csv
$resultsFileCsv | Export-Csv $resultsFile -NoTypeInformation -Force
Output gives a blank file :(

The code below outputs the desired results based on the inputs you provided.
function CombineSkip1($s1, $s2){
$s3 = $s1 -split ','
$s2 -split ',' | select -Skip 1 | % {$s3 += $_}
$s4 = $s3 -join ', '
$s4
}
Write-Output "------Combine files------"
# content
$c1 = Get-Content D:\junk\test1.csv
$c2 = Get-Content D:\junk\test2.csv
# users in both files, could be a better way to do this
$t1 = $c1 | ConvertFrom-Csv
$t2 = $c2 | ConvertFrom-Csv
$users = $t1 | Select SamAccountName
# generate final, combined output
$combined = #()
$combined += CombineSkip1 $c1[0] $c2[0]
$c2PropCount = ($c2[0] -split ',').Count - 1
$filler = (', ""' * $c2PropCount)
for ($i = 1; $i -lt $c1.Count; $i++){
$user = $c1[$i].Split(',')[0]
$u2 = $c2 | where {([string]$_).StartsWith($user)}
if ($u2)
{
$combined += CombineSkip1 $c1[$i] $u2
}
else
{
$combined += ($c1[$i] + $filler)
}
}
# write to output and file
Write-Output $combined
$combined | Set-Content -Path D:\junk\test3.csv -Force

You can use compare-object for that purpose. Use -property samaccountname with it. For example:
$a = 1,2,3,4,5
$b = 4,5,6,7
$side = compare-object $a $b
switch ($side){
'<=' {is not in $a}
'=>' {is not in $b}
'==' { is on both sides}
}
When you have all the data in your output-variable, trow it at convertto-csv and write it in a file

After an entire day, I finally came up with something that works...
...
Edit
Reason: breaking the inner loop and removing the found element from the array will be much faster when merging files with thousands of records...
Function GetTitlesFromFileToBeMerged
{
Param ($csvFile)
[String]$fileToBeMergedTitles = Get-Content $fileToBeMerged -TotalCount 1
[String[]]$fileToBeMergedTitles = ($fileToBeMergedTitles -replace "`",`"", "|").Trim()
[String[]]$fileToBeMergedTitles = ($fileToBeMergedTitles -replace "`"", "").Trim()
[String[]]$fileToBeMergedTitles = ($fileToBeMergedTitles -replace "SamAccountName", "").Trim()
[String[]]$listOfColumnTitles = $fileToBeMergedTitles.Split('|',[System.StringSplitOptions]::RemoveEmptyEntries)
Write-Output $listOfColumnTitles
}
$baseFile = 'D:\Scripts\Powershell\Tests\base.csv'
$fileToBeMerged = 'D:\Scripts\Powershell\Tests\lookup.csv'
$baseFileCsvContents = Import-Csv $baseFile
$baseFileContents = Get-Content $baseFile
$fileToBeMergedCsvContents = Import-Csv $fileToBeMerged
[System.Collections.Generic.List[System.Object]]$fileToBeMergedContents = Get-Content $fileToBeMerged
$resultsFile = 'D:\Scripts\Powershell\Tests\MergedResults.csv'
$resultsFileContents = #()
[String]$baseFileTitles = $baseFileContents[0]
[String]$fileToBeMergedTitles = (Get-Content $fileToBeMerged -TotalCount 1) -replace "`"SamAccountName`",", ""
$resultsFileContents += $baseFileTitles + "," + $fileToBeMergedTitles
[String]$lineMatchNotFound = ""
$arrayFileToBeMergedTitles = GetTitlesFromFileToBeMerged $fileToBeMerged
For ($valueNum = 0; $valueNum -lt $arrayFileToBeMergedTitles.Length; $valueNum++)
{
$lineMatchNotFound += ",`"`""
}
$baseLineCounter = 1
$baseFileCsvContents | ForEach-Object {
$baseSameAccountName = $_.SamAccountName
[String]$baseLineInFile = $baseFileContents[$baseLineCounter]
$lineMatchCounter = 1
$lineMatchFound = ""
:inner
ForEach ($line in $fileToBeMergedContents) {
If ($line -like "*$baseSameAccountName*") {
[String]$lineMatchFound = "," + ($line -replace '^"[^"]*",', "")
$fileToBeMergedContents.RemoveAt($lineMatchCounter)
break inner
}; $lineMatchCounter++
}
If (!($lineMatchFound))
{
[String]$lineMatchFound = $lineMatchNotFound
}
$mergedLine = $baseLineInFile + $lineMatchFound
$resultsFileContents += $mergedLine
$baseLineCounter++
}
ForEach ($line in $resultsFileContents)
{
Write-Host $line
}
$resultsFileContents | Set-Content $resultsFile -Force
I'm very sure this is not the best approach and there is something better that would handle this much faster. If anyone has any ideas, I'm open to them. Thanks.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

How can I transpose and parse a large vertical text file into a CSV file with headers? - powershell

Related

How to split through the whole list using PowerShell

Remove the need to use out-file only to import the file immediately using PowerShell just to convert the base type

Powershell - Store hash table in file and read its content

String trim and split

How to merge all contents in two csv files where records match off 1 column

Categories

Resources