How can I create file with combined lines from two different text files and create new file like this one:
First line from text file A
First line from text file B
Second line from text file A
Second line from text file B
...
For a solution that:
keeps memory use constant (doesn't load the whole files into memory up front)
performs acceptably with larger files.
direct use of .NET APIs is needed:
# Input files, assumed to be in the current directory.
# Important: always use FULL paths when calling .nET methods.
$dir = $PWD.ProviderPath
$fileA = [System.IO.File]::ReadLines("dir/fileA.txt")
$fileB = [System.IO.File]::ReadLines("$dir/fileB.txt")
# Create the output file.
$fileOut = [System.IO.File]::CreateText("$dir/merged.txt")
# Iterate over the files' lines in tandem, and write each pair
# to the output file.
while ($fileA.MoveNext(), $fileB.MoveNext() -contains $true) {
if ($null -ne $fileA.Current) { $fileOut.WriteLine($fileA.Current) }
if ($null -ne $fileB.Current) { $fileOut.WriteLine($fileB.Current) }
}
# Dipose of (close) the files.
$fileA.Dispose(); $fileB.Dispose(); $fileOut.Dispose()
Note: .NET APIs use UTF-8 by default, but you can pass the desired encoding, if needed.
See also: The relevant .NET API help topics:
System.IO.File.ReadLines
System.IO.File.CreateText
A solution that uses only PowerShell features:
Note: Using PowerShell-only features you can only lazily enumerate one file's lines at a time, so reading the other into memory in full is required.
(However, you could again use a lazy enumerable via the .NET API, i.e. System.IO.File]::ReadLines() as shown above, or read both files into memory in full up front.)
The key to acceptable performance is to only have one Set-Content call (plus possibly one Add-Content call) which processes all output lines.
However, given that Get-Content (without -Raw) is quite slow itself, due to decorating each line read with additional properties, the solution based on .NET APIs will perform noticeably better.
# Read the 2nd file into an array of lines up front.
# Note: -ReadCount 0 greatly speeds up reading, by returning
# the lines directly as a single array.
$fileBLines = Get-Content fileB.txt -ReadCount 0
$i = 0 # Initialize the index into array $fileBLines.
# Lazily enumerate the lines of file A.
Get-Content fileA.txt | ForEach-Object {
$_ # Output the line from file A.
# If file B hasn't run out of lines yet, output the corresponding file B line.
if ($i -lt $fileBLines.Count) { $fileBLines[$i++] }
} | Set-Content Merged.txt
# If file B still has lines left, append them now:
if ($i -lt $fileBLines.Count) {
Add-Content Merged.txt -Value $fileBLines[$i..($fileBLines.Count-1)]
}
Note: Windows PowerShell's Set-Content cmdlet defaults to "ANSI" encoding, whereas PowerShell (Core) (v6+) uses BOM-less UTF-8; use the -Encoding parameter as needed.
$file1content = Get-Content -Path "IN_1.txt"
$file2content = Get-Content -Path "IN_2.txt"
$filesLenght =#($file1content.Length, $file2content.Length)
for ($i = 1; $i -le ($filesLenght | Measure-Object -Maximum).Maximum; $i++)
{ Add-Content -Path "OUT.txt" $file1content[$i]
Add-Content -Path "OUT.txt" $file2content[$i]
}
Related
I have Pipe separated data file with huge data and i want to remove 3,7, and 9 column.
below script is working 100% fine. but its too slow its taking 5 mins for 22MB file.
Adeel|01|test|1234589|date|amount|00|123345678890|test|all|01|
Adeel|00|test|1234589|date|amount|00|123345678890|test|all|00|
Adeel|00|test|1234589|date|amount|00|123345678890|test|all|00|
Adeel|00|test|1234589|date|amount|00|123345678890|test|all|00|
Adeel|05|test|1234589|date|amount|00|123345678890|test|all|05|
Adeel|00|test|1234589|date|amount|00|123345678890|test|all|00|
Adeel|00|test|1234589|date|amount|00|123345678890|test|all|00|
Adeel|00|test|1234589|date|amount|00|123345678890|test|all|00|
Adeel|09|test|1234589|date|amount|00|123345678890|test|all|09|
Adeel|00|test|1234589|date|amount|00|123345678890|test|all|00|
Adeel|00|test|1234589|date|amount|00|123345678890|test|all|00|
Adeel|12|test|1234589|date|amount|00|123345678890|test|all|12|
param
(
# Input data file
[string]$Path = 'O:\Temp\test.txt',
# Columns to be removed, any order, dupes are allowed
[int[]]$Remove = (3,6)
)
# sort indexes descending and remove dupes
$Remove = $Remove | Sort-Object -Unique -Descending
# read input lines
Get-Content $Path | .{process{
# split and add to ArrayList which allows to remove items
$list = [Collections.ArrayList]($_ -split '\|')
# remove data at the indexes (from tail to head due to descending order)
foreach($i in $Remove) {
$list.RemoveAt($i)
}
# join and output
#$list -join '|'
$contentUpdate=$list -join '|'
Add-Content "O:\Temp\testoutput.txt" $contentUpdate
}
}
Get-Content is comparatively slow. Use of the pipeline adds additional overhead.
When performance matters, StreamReader and StreamWriter can be a better choice:
param (
# Input data file
[string] $InputPath = 'input.txt',
# Output data file
[string] $OutputPath = 'output.txt',
# Columns to be removed, any order, dupes are allowed
[int[]] $Remove = (1, 2, 2),
# Column separator
[string] $Separator = '|',
# Input file encoding
[Text.Encoding] $Encoding = [Text.Encoding]::Default
)
$ErrorActionPreference = 'Stop'
# Gets rid of dupes and provides fast lookup ability
$removeSet = [Collections.Generic.HashSet[int]] $Remove
$reader = $writer = $null
try {
$reader = [IO.StreamReader]::new(( Convert-Path -LiteralPath $InputPath ), $encoding )
$null = New-Item $OutputPath -ItemType File -Force # as Convert-Path requires existing path
while( $line = $reader.ReadLine() ) {
if( -not $writer ) {
# Construct writer only after first line has been read, so $reader.CurrentEncoding is available
$writer = [IO.StreamWriter]::new(( Convert-Path -LiteralPath $OutputPath ), $false, $reader.CurrentEncoding )
}
$columns = $line.Split( $separator )
$isAppend = $false
for( $i = 0; $i -lt $columns.Length; $i++ ) {
if( -not $removeSet.Contains( $i ) ) {
if( $isAppend ) { $writer.Write( $separator ) }
$writer.Write( $columns[ $i ] )
$isAppend = $true
}
}
$writer.WriteLine() # Write (CR)LF
}
}
finally {
# Make sure to dispose the reader and writer so files get closed.
if( $writer ) { $writer.Dispose() }
if( $reader ) { $reader.Dispose() }
}
Convert-Path is used because .NET has a different current directory than PowerShell, so it's best practice to pass absolute paths to .NET API.
If this still isn't fast enough, consider writing this in C# instead. Especially with such "low level" code, C# tends to be faster. You may embed C# code in PowerShell using Add-Type -TypeDefinition $csCode.
As another optimization, instead of using String.Split() which creates more sub strings than actually needed, you may use String.IndexOf() and String.Substring() to only extract the necessary columns.
Last not least, you may experiment with StreamReader and StreamWriter constructors that lets you allow to specify a buffer size.
Just a more native PowerShell solution/syntax:
Import-Csv .\Test.txt -Delimiter "|" -Header #(1..12) |
Select-Object -ExcludeProperty $Remove |
ConvertTo-Csv -Delimiter "|" -UseQuotes Never |
Select-Object -Skip 1 |
Set-Content -Path .\testoutput.txt
For general performance hints, see: PowerShell scripting performance considerations. Meaning that the answer from #zett42 probably holds the fastest solution. But there are a few reasons you might want to defer from this solution if only for the note:
⚠ Note
Many of the techniques described here are not idiomatic PowerShell and may reduce the readability of a PowerShell script. Script authors are advised to use idiomatic PowerShell unless performance dictates otherwise.
(Correctly) using the PowerShell Pipeline might save a lot of memory (as every item will be immediately processed and released from memory at the end of the stream -when e.g. sent to disk-) where .Net solutions generally require to load everything into memory. Meaning, at the moment your PC runs out of physical memory and memory pages are swapped to disk, PowerShell might even outperform .Net solutions.
As the helpful comment "note that calling Add-Content in every iteration is slow, because the file has to be opened and closed every time. Instead, add another pipeline segment with a single Set-Content call" from #mklement0 implies: the Set-Content cmdlet should be at the end of the pipeline, after the (last) pipe (|) character.
The syntax ... .{process{ ... is probably an attempt to Speeding Up the Pipeline. This might indeed improve the PowerShell performance but if you want to implement this properly, you probably don't want to dot-source this but invoke this via the background operator &, See: #8911 Performance problem: (implicitly) dot-sourced code is slowed down by variable lookups.
Anyways, the bottleneck is likely the input/output device (disk), as long as PowerShell is able to keep up with this input/output device, there is probably no performance improvement in tweaking this.
Besides the fact that the later PowerShell versions (7.2) generally perform better than Windows PowerShell (5.1), some cmdlets are improved. Like the newer ConvertTo-Csv which has additional -UseQuotes <QuoteKind> and -QuoteFields <String[]> parameters. If you are stuck with Windows PowerShell, you might check this question: Delete duplicate lines from text file based on column
Although there is an easy way to read delimited files without headers using the Import-Csv cmdlet (with the -header parameter) using the Import-Csv cmdlet, there is no easier way to skip the Cvs header for the counter cmdlet Export-Csv. This can be worked around with the with: ConvertTo-Csv |Select-Object -Skip 1 |Set-Content -Path .\output.txt, see also: #17527 Add -NoHeader switch to Export-Csv and ConvertTo-Csv
I have a Powershell script that scans log files and replaces text when a match is found. The list is currently 500 lines, and I plan to double/triple this. the log files can range from 400KB to 800MB in size.
Currently, when using the below, a 42MB file takes 29mins, and I'm looking for help if anyone can see any way to make this faster?
I tried changing ForEach-Object with ForEach-ObjectFast but it's causing the script to take sufficiently longer. also tried changing the first ForEach-Object to a forloop but still took ~29 mins.
$lookupTable= #{
'aaa:bbb:123'='WORDA:WORDB:NUMBER1'
'bbb:ccc:456'='WORDB:WORDBC:NUMBER456'
}
Get-Content -Path $inputfile | ForEach-Object {
$line=$_
$lookupTable.GetEnumerator() | ForEach-Object {
if ($line-match$_.Key)
{
$line=$line-replace$_.Key,$_.Value
}
}
$line
}|Set-Content -Path $outputfile
Since you say your input file could be 800MB in size, reading and updating the entire content in memory could potentially not fit.
The way to go then is to use a fast line-by-line method and the fastest I know of is switch
# hardcoded here for demo purposes.
# In real life you get/construct these from the Get-ChildItem
# cmdlet you use to iterate the log files in the root folder..
$inputfile = 'D:\Test\test.txt'
$outputfile = 'D:\Test\test_new.txt' # absolute full file path because we use .Net here
# because we are going to Append to the output file, make sure it doesn't exist yet
if (Test-Path -Path $outputfile -PathType Leaf) { Remove-Item -Path $outputfile -Force }
$lookupTable= #{
'aaa:bbb:123'='WORDA:WORDB:NUMBER1'
}
# create a regex string from the Keys of your lookup table,
# merging the strings with a pipe symbol (the regex 'OR').
# your Keys could contain characters that have special meaning in regex, so we need to escape those
$regexLookup = '({0})' -f (($lookupTable.Keys | ForEach-Object { [regex]::Escape($_) }) -join '|')
# create a StreamWriter object to write the lines to the new output file
# Note: use an ABSOLUTE full file path for this
$streamWriter = [System.IO.StreamWriter]::new($outputfile, $true) # $true for Append
switch -Regex -File $inputfile {
$regexLookup {
# do the replacement using the value in the lookup table.
# because in one line there may be multiple matches to replace
# get a System.Text.RegularExpressions.Match object to loop through all matches
$line = $_
$match = [regex]::Match($line, $regexLookup)
while ($match.Success) {
# because we escaped the keys, to find the correct entry we now need to unescape
$line = $line -replace $match.Value, $lookupTable[[regex]::Unescape($match.Value)]
$match = $match.NextMatch()
}
$streamWriter.WriteLine($line)
}
default { $streamWriter.WriteLine($_) } # write unchanged
}
# dispose of the StreamWriter object
$streamWriter.Dispose()
I've tried solving the following case:
many small text files (in subfolders) need their content (lines) matched to lines that exist in another (large) text file. The small files then need to be updated or copied with those matching Lines.
I was able to come up with some running code for this but I need to improve it or use a complete other method because it is extremely slow and would take >40h to get through all files.
One idea I already had was to use a SQL Server to bulk-import all files in a single table with [relative path],[filename],[jap content] and the translation file in a table with [jap content],[eng content] and then join [jap content] and bulk-export the joined table as separate files using [relative path],[filename]. Unfortunately I got stuck right at the beginning due to formatting and encoding issues so I dropped it and started working on a PowerShell script.
Now in detail:
Over 40k txt files spread across multiple subfolders with multiple lines each, every line can exist in multiple files.
Content:
UTF8 encoded Japanese text that also can contain special characters like \\[*+(), each Line ending with a tabulator character. Sounds like csv files but they don't have headers.
One large File with >600k Lines containing the translation to the small files. Every line is unique within this file.
Content:
Again UTF8 encoded Japanese text. Each line formatted like this (without brackets):
[Japanese Text][tabulator][English Text]
Example:
テスト[1] Test [1]
End result should be a copy or a updated version of all these small files where their lines got replaced with the matching ones of the translation file while maintaining their relative path.
What I have at the moment:
$translationfile = 'B:\Translation.txt'
$inputpath = 'B:\Working'
$translationarray = [System.Collections.ArrayList]#()
$translationarray = #(Get-Content $translationfile -Encoding UTF8)
Get-Childitem -path $inputpath -Recurse -File -Filter *.txt | ForEach-Object -Parallel {
$_.Name
$filepath = ($_.Directory.FullName).substring(2)
$filearray = [System.Collections.ArrayList]#()
$filearray = #(Get-Content -path $_.FullName -Encoding UTF8)
$filearray = $filearray | ForEach-Object {
$result = $using:translationarray -match ("^$_" -replace '[[+*?()\\.]','\$&')
if ($result) {
$_ = $result
}
$_
}
If(!(test-path B:\output\$filepath)) {New-Item -ItemType Directory -Force -Path B:\output\$filepath}
#$("B:\output\"+$filepath+"\")
$filearray | Out-File -FilePath $("B:\output\" + $filepath + "\" + $_.Name) -Force -Encoding UTF8
} -ThrottleLimit 10
I would appreciate any help and ideas but please keep in mind that I rarely write scripts so anything to complex might fly right over my head.
Thanks
As zett42 states, using a hash table is your best option for mapping the Japanese-only phrases to the dual-language lines.
Additionally, use of .NET APIs for file I/O can speed up the operation noticeably.
# Be sure to specify all paths as full paths, not least because .NET's
# current directory usually differs from PowerShell's
$translationfile = 'B:\Translation.txt'
$inPath = 'B:\Working'
$outPath = (New-Item -Type Directory -Force 'B:\Output').FullName
# Build the hashtable mapping the Japanese phrases to the full lines.
# Note that ReadLines() defaults to UTF-8
$ht = #{ }
foreach ($line in [IO.File]::ReadLines($translationfile)) {
$ht[$line.Split("`t")[0] + "`t"] = $line
}
Get-ChildItem $inPath -Recurse -File -Filter *.txt | Foreach-Object -Parallel {
# Translate the lines to the matching lines including the $translation
# via the hashtable.
# NOTE: If an input line isn't represented as a key in the hashtable,
# it is passed through as-is.
$lines = foreach ($line in [IO.File]::ReadLines($_.FullName)) {
($using:ht)[$line] ?? $line
}
# Synthesize the output file path, ensuring that the target dir. exists.
$outFilePath = (New-Item -Force -Type Directory ($using:outPath + $_.Directory.FullName.Substring(($using:inPath).Length))).FullName + '/' + $_.Name
# Write to the output file.
# Note: If you want UTF-8 files *with BOM*, use -Encoding utf8bom
Set-Content -Encoding utf8 $outFilePath -Value $lines
} -ThrottleLimit 10
Note: Your use of ForEach-Object -Parallel implies that you're using PowerShell [Core] 7+, where BOM-less UTF-8 is the consistent default encoding (unlike in Window PowerShell, where default encodings vary wildly).
Therefore, in lieu of the .NET [IO.File]::ReadLines() API in a foreach loop, you could also use the more PowerShell-idiomatic switch statement with the -File parameter for efficient line-by-line text-file processing.
So I am now tasked with getting constant reports that are more than 1 Million lines long.
My last question did not explain all things so I'm tryin got do a better question.
I'm getting a dozen + daily reports that are coming in as CSV files. I don't know what the headers are or anything like that as I get them.
They are huge. I cant open in excel.
I wanted to basically break them up into the same report, just each report maybe 100,000 lines long.
The code I wrote below does not work as I keep getting a
Exception of type 'System.OutOfMemoryException' was thrown.
I am guessing I need a better way to do this.
I just need this file broken down to a more manageable size.
It does not matter how long it takes as I can run it over night.
I found this on the internet, and I tried to manipulate it, but I cant get it to work.
$PSScriptRoot
write-host $PSScriptRoot
$loc = $PSScriptRoot
$location = $loc
# how many rows per CSV?
$rowsMax = 10000;
# Get all CSV under current folder
$allCSVs = Get-ChildItem "$location\Split.csv"
# Read and split all of them
$allCSVs | ForEach-Object {
Write-Host $_.Name;
$content = Import-Csv "$location\Split.csv"
$insertLocation = ($_.Name.Length - 4);
for($i=1; $i -le $content.length ;$i+=$rowsMax){
$newName = $_.Name.Insert($insertLocation, "splitted_"+$i)
$content|select -first $i|select -last $rowsMax | convertto-csv -NoTypeInformation | % { $_ -replace '"', ""} | out-file $location\$newName -fo -en ascii
}
}
The key is not to read large files into memory in full, which is what you're doing by capturing the output from Import-Csv in a variable ($content = Import-Csv "$location\Split.csv").
That said, while using a single pipeline would solve your memory problem, performance will likely be poor, because you're converting from and back to CSV, which incurs a lot of overhead.
Even reading and writing the files as text with Get-Content and Set-Content is slow, however.
Therefore, I suggest a .NET-based approach for processing the files as text, which should substantially speed up processing.
The following code demonstrates this technique:
Get-ChildItem $PSScriptRoot/*.csv | ForEach-Object {
$csvFile = $_.FullName
# Construct a file-path template for the sequentially numbered chunk
# files; e.g., "...\file_split_001.csv"
$csvFileChunkTemplate = $csvFile -replace '(.+)\.(.+)', '$1_split_{0:000}.$2'
# Set how many lines make up a chunk.
$chunkLineCount = 10000
# Read the file lazily and save every chunk of $chunkLineCount
# lines to a new file.
$i = 0; $chunkNdx = 0
foreach ($line in [IO.File]::ReadLines($csvFile)) {
if ($i -eq 0) { ++$i; $header = $line; continue } # Save header line.
if ($i++ % $chunkLineCount -eq 1) { # Create new chunk file.
# Close previous file, if any.
if (++$chunkNdx -gt 1) { $fileWriter.Dispose() }
# Construct the file path for the next chunk, by
# instantiating the template with the next sequence number.
$csvFileChunk = $csvFileChunkTemplate -f $chunkNdx
Write-Verbose "Creating chunk: $csvFileChunk"
# Create the next chunk file and write the header.
$fileWriter = [IO.File]::CreateText($csvFileChunk)
$fileWriter.WriteLine($header)
}
# Write a data row to the current chunk file.
$fileWriter.WriteLine($line)
}
$fileWriter.Dispose() # Close the last file.
}
Note that the above code creates BOM-less UTF-8 files; if your input contains ASCII-range characters only, these files will effectively be ASCII files.
Here's the equivalent single-pipeline solution, which is likely to be substantially slower.
Get-ChildItem $PSScriptRoot/*.csv | ForEach-Object {
$csvFile = $_.FullName
# Construct a file-path template for the sequentially numbered chunk
# files; e.g., ".../file_split_001.csv"
$csvFileChunkTemplate = $csvFile -replace '(.+)\.(.+)', '$1_split_{0:000}.$2'
# Set how many lines make up a chunk.
$chunkLineCount = 10000
$i = 0; $chunkNdx = 0
Get-Content -LiteralPath $csvFile | ForEach-Object {
if ($i -eq 0) { ++$i; $header = $_; return } # Save header line.
if ($i++ % $chunkLineCount -eq 1) { #
# Construct the file path for the next chunk.
$csvFileChunk = $csvFileChunkTemplate -f ++$chunkNdx
Write-Verbose "Creating chunk: $csvFileChunk"
# Create the next chunk file and write the header.
Set-Content -Encoding ASCII -LiteralPath $csvFileChunk -Value $header
}
# Write data row to the current chunk file.
Add-Content -Encoding ASCII -LiteralPath $csvFileChunk -Value $_
}
}
Another option from linux world - split command. To get it on windows just install git bash, then you'll be able to use many linux tools in your CMD/powershell.
Below is the syntax to achieve your goal:
split -l 100000 --numeric-suffixes --suffix-length 3 --additional-suffix=.csv sourceFile.csv outputfile
It's very fast. If you want you can wrap split.exe as a cmdlet
This question already has answers here:
In PowerShell, what's the best way to join two tables into one?
(5 answers)
Batch FOR LOOP a line of text from each file and ECHO both lines
(4 answers)
Merge Two text files line by line using batch script
(1 answer)
Closed 9 months ago.
Is it possible using CMD and Powershell to combine 2 files into 1 file like this:
file1-line1 tab file2-line1
file1-line2 tab file2-line2
file1-line3 tab file2-line3
So it takes file 1 line 1 and the inserts a tab and then inserts file 2 line 1. Then does this for all subsequent lines in each file?
In PowerShell, and assuming both files have exactly the same number of lines:
$f1 = Get-Content file1
$f2 = Get-Content file2
for ($i = 0; $i -lt $f1.Length; ++$i) {
$f1[$i] + "`t" + $f2[$i]
}
Probably the simplest solution is to use a Windows port of the Linux paste utility (e.g. paste.exe from the UnxUtils):
paste C:\path\to\file1.txt C:\path\to\file2.txt
From the man page:
DESCRIPTION
Write lines consisting of the sequentially corresponding lines from each FILE, separated by TABs, to standard output.
For a PowerShell(ish) solution, I'd use two stream readers:
$sr1 = New-Object IO.StreamReader 'C:\path\to\file1.txt'
$sr2 = New-Object IO.StreamReader 'C:\path\to\file2.txt'
while ($sr1.Peek() -ge 0 -or $sr2.Peek() -ge 0) {
if ($sr1.Peek() -ge 0) { $txt1 = $sr1.ReadLine() } else { $txt1 = '' }
if ($sr2.Peek() -ge 0) { $txt2 = $sr2.ReadLine() } else { $txt2 = '' }
"{0}`t{1}" -f $txt1, $txt2
}
This avoids having to read the two files entirely into memory before merging them, which bears the risk of memory exhaustion for large files.
#echo off
setlocal EnableDelayedExpansion
rem Next line have a tab after the equal sign:
set "TAB= "
Rem First file is read with FOR /F command
Rem Second file is read via Stdin
< file2.txt (for /F "delims=" %%a in (file1.txt) do (
Rem Read next line from file2.txt
set /P "line2="
Rem Echo lines of both files separated by tab
echo %%a%TAB%!line2!
))
Further details at this post
A generalized solution supporting multiple files, building on Ansgar Wiechers' great, memory-efficient System.IO.StreamReader solution:
PowerShell's ability to invoke members (properties, methods) directly on a collection and have them automatically invoked on all items in the collection (member-access enumeration, v3+) allows for easy generalization:
# Make sure .NET has the same current dir. as PS.
[System.IO.Directory]::SetCurrentDirectory($PWD)
# The input file paths.
$files = 'file1', 'file2', 'file3'
# Create stream-reader objects for all input files.
$readers = [IO.StreamReader[]] $files
# Keep reading while at least 1 file still has more lines.
while ($readers.EndOfStream -contains $false) {
# Read the next line from each stream (file).
# Streams that are already at EOF fortunately just return "".
$lines = $readers.ReadLine()
# Output the lines separated with tabs.
$lines -join "`t"
}
# Close the stream readers.
$readers.Close()
Get-MergedLines (source code below; invoke with -? for help) wraps the functionality in a function that:
accepts a variable number of filenames - both as an argument and via the pipeline
uses a configurable separator to join the lines (defaults to a tab)
allows trimming trailing separator instances
function Get-MergedLines() {
<#
.SYNOPSIS
Merges lines from 2 or more files with a specifiable separator (default is tab).
.EXAMPLE
> Get-MergedLines file1, file2 '<->'
.EXAMPLE
> Get-ChildItem file? | Get-MergedLines
#>
param(
[Parameter(Mandatory, ValueFromPipeline, ValueFromPipelineByPropertyName)]
[Alias('PSPath')]
[string[]] $Path,
[string] $Separator = "`t",
[switch] $TrimTrailingSeparators
)
begin { $allPaths = #() }
process { $allPaths += $Path }
end {
# Resolve all paths to full paths, which may include wildcard resolution.
# Note: By using full paths, we needn't worry about .NET's current dir.
# potentially being different.
$fullPaths = (Resolve-Path $allPaths).ProviderPath
# Create stream-reader objects for all input files.
$readers = [System.IO.StreamReader[]] $fullPaths
# Keep reading while at least 1 file still has more lines.
while ($readers.EndOfStream -contains $false) {
# Read the next line from each stream (file).
# Streams that are already at EOF fortunately just return "".
$lines = $readers.ReadLine()
# Join the lines.
$mergedLine = $lines -join $Separator
# Trim (remove) trailing separators, if requested.
if ($TrimTrailingSeparators) {
$mergedLine = $mergedLine -replace ('^(.*?)(?:' + [regex]::Escape($Separator) + ')+$'), '$1'
}
# Output the merged line.
$mergedLine
}
# Close the stream readers.
$readers.Close()
}
}
Powershell solution:
$file1 = Get-Content file1
$file2 = Get-Content file2
$outfile = "file3.txt"
for($i = 0; $i -lt $file1.length; $i++) {
"$($file1[$i])`t$($file2[$i])" | out-file $outfile -Append
}
There are a number of recent locked [duplicate] questions that link into this question like:
Merging two csvs into one with columns [duplicate]
Merge 2 csv files in powershell [duplicate]
were I do not agree with because they differ in a way that this question concerns text files and the other concern csv files. As a general rule, I would advice against manipulating files that represent objects (like xml, json and csv). Instead, I recommend to import these files (to objects), make the concerned changes and ConvertTo/Export the results back to a file.
One example where all the given general solutions in this issue will result in an incorrect output for these "duplicates" is where e.g. both csv files have a common column (property) name.
The general Join-Object (see also: In Powershell, what's the best way to join two tables into one?) will join two objects list when the -on parameter is simply omitted. Therefor this solution will better fit the other (csv) "duplicate" questions. Take Merge 2 csv files in powershell [duplicate] from #Ender as an example:
$A = ConvertFrom-Csv #'
ID,Name
1,Peter
2,Dalas
'#
$B = ConvertFrom-Csv #'
Class
Math
Physic
'#
$A | Join $B
ID Name Class
-- ---- -----
1 Peter Math
2 Dalas Physic
In comparison with the "text" merge solutions given in this answer, the general Join-Object cmdlet is able to deal with different file lengths, and let you decide what to include (LeftJoin, RightJoin or FullJoin). Besides you have control over which columns you what to include ($A | Join $B -Property ID, Name) the order ($A | Join $B -Property ID, Class, Name) and a lot more which cannot be done which just concatenating text.
Specific to this question:
As this specific question concerns text files rather then csv files, you will need to ad a header (property) name (e.g.-Header File1) while imparting the file and remove the header (Select-Object -Skip 1) when exporting the result:
$File1 = Import-Csv .\File1.txt -Header File1
$File2 = Import-Csv .\File2.txt -Header File2
$File3 = $File1 | Join $File2
$File3 | ConvertTo-Csv -Delimiter "`t" -NoTypeInformation |
Select-Object -Skip 1 | Set-Content .\File3.txt