How do I concatenate two text files in PowerShell? - powershell

I am trying to replicate the functionality of the cat command in Unix.
I would like to avoid solutions where I explicitly read both files into variables, concatenate the variables together, and then write out the concatenated variable.

Simply use the Get-Content and Set-Content cmdlets:
Get-Content inputFile1.txt, inputFile2.txt | Set-Content joinedFile.txt
You can concatenate more than two files with this style, too.
If the source files are named similarly, you can use wildcards:
Get-Content inputFile*.txt | Set-Content joinedFile.txt
Note 1: PowerShell 5 and older versions allowed this to be done more concisely using the aliases cat and sc for Get-Content and Set-Content respectively. However, these aliases are problematic because cat is a system command in *nix systems, and sc is a system command in Windows systems - therefore using them is not recommended, and in fact sc is no longer even defined as of PowerShell Core (v7). The PowerShell team recommends against using aliases in general.
Note 2: Be careful with wildcards - if you try to output to inputFiles.txt (or similar that matches the pattern), PowerShell will get into an infinite loop! (I just tested this.)
Note 3: Outputting to a file with > does not preserve character encoding! This is why using Set-Content is recommended.

Do not use >; it messes up the character encoding. Use:
Get-Content files.* | Set-Content newfile.file

In cmd, you can do this:
copy one.txt+two.txt+three.txt four.txt
In PowerShell this would be:
cmd /c copy one.txt+two.txt+three.txt four.txt
While the PowerShell way would be to use gc, the above will be pretty fast, especially for large files. And it can be used on on non-ASCII files too using the /B switch.

You could use the Add-Content cmdlet. Maybe it is a little faster than the other solutions, because I don't retrieve the content of the first file.
gc .\file2.txt| Add-Content -Path .\file1.txt

To concat files in command prompt it would be
type file1.txt file2.txt file3.txt > files.txt
PowerShell converts the type command to Get-Content, which means you will get an error when using the type command in PowerShell because the Get-Content command requires a comma separating the files. The same command in PowerShell would be
Get-Content file1.txt,file2.txt,file3.txt | Set-Content files.txt

I used:
Get-Content c:\FileToAppend_*.log | Out-File -FilePath C:\DestinationFile.log
-Encoding ASCII -Append
This appended fine. I added the ASCII encoding to remove the nul characters Notepad++ was showing without the explicit encoding.

If you need to order the files by specific parameter (e.g. date time):
gci *.log | sort LastWriteTime | % {$(Get-Content $_)} | Set-Content result.log

You can do something like:
get-content input_file1 > output_file
get-content input_file2 >> output_file
Where > is an alias for "out-file", and >> is an alias for "out-file -append".

Since most of the other replies often get the formatting wrong (due to the piping), the safest thing to do is as follows:
add-content $YourMasterFile -value (get-content $SomeAdditionalFile)
I know you wanted to avoid reading the content of $SomeAdditionalFile into a variable, but in order to save for example your newline formatting i do not think there is proper way to do it without.
A workaround would be to loop through your $SomeAdditionalFile line by line and piping that into your $YourMasterFile. However this is overly resource intensive.

To keep encoding and line endings:
Get-Content files.* -Raw | Set-Content newfile.file -NoNewline
Note: AFAIR, whose parameters aren't supported by old Powershells (<3? <4?)

I think the "powershell way" could be :
set-content destination.log -value (get-content c:\FileToAppend_*.log )

Related

How to display the file a match was found in using get-content and select-string one liner

I am attempting to search a directory of perl scripts and compile a list of all the other perl scripts executed from those files(intentionally trying to do this through Powershell). A simplistic dependency mapper, more or less.
With the below line of code I get output of every line where a reference to a perl file is found, but what I really need is same output AND the file in which each match was found.
Get-Content -Path "*.pl" | Select-String -Pattern '\w+\.pl' | foreach {Write-Host "$_"}
I have succeeded using some more complicated code but I think I can simplify it and accomplish most of the work through a couple lines of code(The code above accomplishes half of that).
Running this on a windows 10 machine powershell v5.1
I do things like this all the time. You don't need to use get-content.
ls -r *.pl | Select-String \w+\.pl
file.pl:1:file2.pl
You don't need to use ls or Get-ChildItem either; Select-String can take a path parameter:
Select-String -Pattern '\w+\.pl' -Path *.pl
which shortens to this in the shell:
sls \w+\.pl *.pl
(if your regex is more complex it might need spaces around it).
For the foreach {write-host part, you're writing a lot of code to turn useful objects back into less-useful strings, and forcibly writing them to the host instead of the standard output stream. You can pick out the data you want with:
sls \w+\.pl *.pl | select filename, {$_.matches[0]}
which will keep them as objects with properties, but render by default as a table.

Strip lines from text file based on content

I like to use one of the packaged HOSTS (MVPS,) files to protect myself from some of the nastier domains. Unfortunately, sometimes these files are a bit overzealous for me (blocking googleadsservices is a pain sometimes). I want an easy way to strip certain lines out of these files. In Linux I use:
cat hosts |grep -v <pattern> >hosts.new
And the file is rewritten minus the lines referencing the pattern I specified in the grep. So I just set it up to replace hosts with hosts.new on reboot and I'm done.
Is there an easy way to do this in PowerShell?
In PowerShell you'd do
(Get-Content hosts) -notmatch $pattern | Out-File hosts.new
or
(cat hosts) -notmatch $pattern > hosts.new
for short.
Of course, since Out-File (and with it the redirection operator) default to Unicode format, you may actually want to use Set-Content instead of Out-File:
(Get-Content hosts) -notmatch $pattern | Set-Content hosts.new
or
(gc hosts) -notmatch $pattern | sc hosts.new
And since the input file is read in a grouping expression (the parentheses around Get-Content hosts) you could actually write the output back to the source file:
(Get-Content hosts) -notmatch $pattern | Set-Content hosts
To complement Ansgar Wiechers' helpful answer (which offers pragmatic and concise solutions based on reading the entire input file into memory up-front):
PowerShell's grep equivalent is the Select-String cmdlet and, just like grep, it directly accepts a filename argument (PSv3+ syntax):
Select-String -NotMatch <pattern> hosts | ForEach-Object Line | Set-Content hosts.new
Select-String -NotMatch <pattern> hosts is short for
Select-String -NotMatch -Pattern <pattern> -LiteralPath hosts and is the virtual equivalent of
grep -v <pattern> hosts
However, Select-String doesn't output strings, it outputs [Microsoft.PowerShell.Commands.MatchInfo] instances that wrap matching lines (stored in property .Line) along with metadata about the match.
ForEach-Object Line extracts just the matching lines (the value of property .Line) from these objects.
Set-Content hosts.new writes the matching lines to file hosts.new, using "ANSI" encoding in Windows PowerShell - i.e., it uses the legacy code page implied by the active system locale, typically a supranational 8-bit superset of ASCII - and UTF-8 encoding (without BOM) in PowerShell Core.
Use the -Encoding parameter to specify a different encoding.
>, by contrast (an effective alias of the Out-File cmdlet), creates:
UTF16-LE ("Unicode") files by default in Windows PowerShell.
UTF-8 files (without BOM) in PowerShell Core - in other words: in PowerShell Core, using
> hosts.new in lieu of | Set-Content hosts.new will do.
Note: While both > / Out-File and Set-Content are suitable for sending string inputs to an output file, they are not generally suitable for sending other data types to a file for programmatic processing: > / Out-File output objects the way they would print to the console / terminal, which is pretty format for display, whereas Set-Content stringifies (simply put: calls .ToString() on) the input objects, which often results in loss of information.
For non-string data, consider a (more) structured data format such as XML (Export-CliXml), JSON (ConvertTo-Json) or CSV (Export-Csv).

ANSI Encoding via PowerShell [duplicate]

In PowerShell, what's the difference between Out-File and Set-Content? Or Add-Content and Out-File -append?
I've found if I use both against the same file, the text is fully mojibaked.
(A minor second question: > is an alias for Out-File, right?)
Here's a summary of what I've deduced, after a few months experience with PowerShell, and some scientific experimentation. I never found any of this in the documentation :(
[Update: Much of this now appears to be better documented.]
Read and write locking
While Out-File is running, another application can read the log file.
While Set-Content is running, other applications cannot read the log file. Thus never use Set-Content to log long running commands.
Encoding
Out-File saves in the Unicode (UTF-16LE) encoding by default (though this can be specified), whereas Set-Content defaults to ASCII (US-ASCII) in PowerShell 3+ (this may also be specified). In earlier PowerShell versions, Set-Content wrote content in the Default (ANSI) encoding.
Editor's note: PowerShell as of version 5.1 still defaults to the culture-specific Default ("ANSI") encoding, despite what the documentation claims. If ASCII were the default, non-ASCII characters such as ü would be converted to literal ?, but that is not the case: 'ü' | Set-Content tmp.txt; (Get-Content tmp.txt) -eq '?' yields $False.
PS > $null | out-file outed.txt
PS > $null | set-content set.txt
PS > md5sum *
f3b25701fe362ec84616a93a45ce9998 *outed.txt
d41d8cd98f00b204e9800998ecf8427e *set.txt
This means the defaults of two commands are incompatible, and mixing them will corrupt text, so always specify an encoding.
Formatting
As Bartek explained, Out-File saves the fancy formatting of the output, as seen in the terminal. So in a folder with two files, the command dir | out-file out.txt creates a file with 11 lines.
Whereas Set-Content saves a simpler representation. In that folder with two files, the command dir | set-content sc.txt creates a file with two lines. To emulate the output in the terminal:
PS > dir | ForEach-Object {$_.ToString()}
out.txt
sc.txt
I believe this formatting has a consequence for line breaks, but I can't describe it yet.
File creation
Set-Content doesn't reliably create an empty file when Out-File would:
In an empty folder, the command dir | out-file out.txt creates a file, while dir | set-content sc.txt does not.
Pipeline Variable
Set-Content takes the filename from the pipeline; allowing you to set a number of files' contents to some fixed value.
Out-File takes the data as from the pipeline; updating a single file's content.
Parameters
Set-Content includes the following additional parameters:
Exclude
Filter
Include
PassThru
Stream
UseTransaction
Out-File includes the following additional parameters:
Append
NoClobber
Width
For more information about what those parameters are, please refer to help; e.g. get-help out-file -parameter append.
Out-File has the behavior of overwriting the output path unless the -NoClobber and/or the -Append flag is set. Add-Content will append content if the output path already exists by default (if it can). Both will create the file if one doesn't already exist.
Another interesting difference is that Add-Content will create an ASCII encoded file by default and Out-File will create a little endian unicode encoded file by default.
> is an alias syntactic sugar for Out-File. It's Out-File with some pre-defined parameter settings.
Well, I would disagree... :)
Out-File has -Append (-NoClober is there to avoid overwriting) that will Add-Content. But this is not the same beast.
command | Add-Content will use .ToString() method on input. Out-File will use default formatting.
so:
ls | Add-Content test.txt
and
ls | Out-File test.txt
will give you totally different results.
And no, '>' is not alias, it's redirection operator (same as in other shells). And has very serious limitation... It will cut lines same way they are displayed. Out-File has -Width parameter that helps you avoid this. Also, with redirection operators you can't decide what encoding to use.
HTH
Bartek
Set-Content supports -Encoding Byte, while Out-File does not.
So when you want to write binary data or result of Text.Encoding#GetBytes() to a file, you should use Set-Content.
Wanted to add about difference on encoding:
Windows with PowerShell 5.1:
Out-File - Default encoding is utf-16le
Set-Content - Default encoding is us-ascii
Linux with PowerShell 7.1:
Out-File - Default encoding is us-ascii
Set-Content - Default encoding is us-ascii
Out-file -append or >> can actually mix two encodings in the same file. Even if the file is originally ASCII or ANSI, it will add Unicode by default to the bottom of it. Add-content will check the encoding and match it before appending. Btw, export-csv defaults to ASCII (no accents), and set-content/add-content to ANSI.
TL;DR, use Set-Content as it's more consistent over Out-File.
Set-Content behavior is the same over different powershell versions
Out-File as #JagWireZ says produces different encodings for the default settings, even on the same OS(Windows) the docs for powershell 5.1 and powershell 7.3 state that the encoding changed from unicode to utf8NoBOM
Some issues like Malformed XML arise from using Out-File, that could of course be fixed by setting the desired encoding, however it's likely to forget to set the encoding and end up with issues.

Concatenate files using PowerShell

I am using PowerShell 3.
What is best practice for concatenating files?
file1.txt + file2.txt = file3.txt
Does PowerShell provide a facility for performing this operation directly? Or do I need each file's contents be loaded into local variables?
If all the files exist in the same directory and can be matched by a simple pattern, the following code will combine all files into one.
Get-Content .\File?.txt | Out-File .\Combined.txt
I would go this route:
Get-Content file1.txt, file2.txt | Set-Content file3.txt
Use the -Encoding parameter on Set-Content if you need something other than ASCII which is the default for Set-Content.
If you need more flexibility, you could use something like
Get-ChildItem -Recurse *.cs | ForEach-Object { Get-Content $_ } | Out-File -Path .\all.txt
Warning: Concatenation using a simple Get-Content (whether or not using -Raw flag) works for text files; Powershell is too helpful for that:
Without -Raw, it "fixes" (i.e. breaks, pun intended) line breaks, or what Powershell thinks is a line break.
With -Raw, you get a terminating line end (normally CR+LF) at the
end of each file part, which is added at the end of the pipeline. There's an option for that in newer Powershells' Set-Content.
To concatenate a binary file (that is, an arbitrary file that was split for some reason and needs to be put together again), use either this:
Get-Content -Raw file1, file2 | Set-Content -NoNewline destination
or something like this:
Get-Content file1 -Encoding Byte -Raw | Set-Content destination -Encoding Byte
Get-Content file2 -Encoding Byte -Raw | Add-Content destination -Encoding Byte
An alternative is to use the CMD shell and use
copy file1 /b + file2 /b + file3 /b + ... destinationfile
You must not overwrite any part, that is, use any of the parts as destination. The destination file must be different from any of the parts. Otherwise you're up for a surprise and must find a backup copy of the file part.
a generalization based on #Keith answer:
gc <some regex expression> | sc output
Here is an interesting example of how to make a zip-in-image file based on Powershell 7
Get-Content -AsByteStream file1.png, file2.7z | Set-Content -AsByteStream file3.png
Get-Content -AsByteStream file1.png, file2.7z | Add-Content -AsByteStream file3.png
gc file1.txt, file2.txt > output.txt
I think this is as short as it gets.
In case you would like to ensure the concatenation is done in a specific order, use the Sort-Object -Property <Some Name> argument. For example, concatenate based on the name sorting in an ascending order:
Get-ChildItem -Path ./* -Include *.txt -Exclude output.txt | Sort-Object -Property Name | ForEach-Object { Get-Content $_ } | Out-File output.txt
IMPORTANT: -Exclude and Out-File MUST contain the same values, otherwise, it will recursively keep on adding to output.txt until your disk is full.
Note that you must append a * at the end of the -Path argument because you are using -Include, as mentioned in Get-ChildItem documentation.

PowerShell Set-Content and Out-File - what is the difference?

In PowerShell, what's the difference between Out-File and Set-Content? Or Add-Content and Out-File -append?
I've found if I use both against the same file, the text is fully mojibaked.
(A minor second question: > is an alias for Out-File, right?)
Here's a summary of what I've deduced, after a few months experience with PowerShell, and some scientific experimentation. I never found any of this in the documentation :(
[Update: Much of this now appears to be better documented.]
Read and write locking
While Out-File is running, another application can read the log file.
While Set-Content is running, other applications cannot read the log file. Thus never use Set-Content to log long running commands.
Encoding
Out-File saves in the Unicode (UTF-16LE) encoding by default (though this can be specified), whereas Set-Content defaults to ASCII (US-ASCII) in PowerShell 3+ (this may also be specified). In earlier PowerShell versions, Set-Content wrote content in the Default (ANSI) encoding.
Editor's note: PowerShell as of version 5.1 still defaults to the culture-specific Default ("ANSI") encoding, despite what the documentation claims. If ASCII were the default, non-ASCII characters such as ü would be converted to literal ?, but that is not the case: 'ü' | Set-Content tmp.txt; (Get-Content tmp.txt) -eq '?' yields $False.
PS > $null | out-file outed.txt
PS > $null | set-content set.txt
PS > md5sum *
f3b25701fe362ec84616a93a45ce9998 *outed.txt
d41d8cd98f00b204e9800998ecf8427e *set.txt
This means the defaults of two commands are incompatible, and mixing them will corrupt text, so always specify an encoding.
Formatting
As Bartek explained, Out-File saves the fancy formatting of the output, as seen in the terminal. So in a folder with two files, the command dir | out-file out.txt creates a file with 11 lines.
Whereas Set-Content saves a simpler representation. In that folder with two files, the command dir | set-content sc.txt creates a file with two lines. To emulate the output in the terminal:
PS > dir | ForEach-Object {$_.ToString()}
out.txt
sc.txt
I believe this formatting has a consequence for line breaks, but I can't describe it yet.
File creation
Set-Content doesn't reliably create an empty file when Out-File would:
In an empty folder, the command dir | out-file out.txt creates a file, while dir | set-content sc.txt does not.
Pipeline Variable
Set-Content takes the filename from the pipeline; allowing you to set a number of files' contents to some fixed value.
Out-File takes the data as from the pipeline; updating a single file's content.
Parameters
Set-Content includes the following additional parameters:
Exclude
Filter
Include
PassThru
Stream
UseTransaction
Out-File includes the following additional parameters:
Append
NoClobber
Width
For more information about what those parameters are, please refer to help; e.g. get-help out-file -parameter append.
Out-File has the behavior of overwriting the output path unless the -NoClobber and/or the -Append flag is set. Add-Content will append content if the output path already exists by default (if it can). Both will create the file if one doesn't already exist.
Another interesting difference is that Add-Content will create an ASCII encoded file by default and Out-File will create a little endian unicode encoded file by default.
> is an alias syntactic sugar for Out-File. It's Out-File with some pre-defined parameter settings.
Well, I would disagree... :)
Out-File has -Append (-NoClober is there to avoid overwriting) that will Add-Content. But this is not the same beast.
command | Add-Content will use .ToString() method on input. Out-File will use default formatting.
so:
ls | Add-Content test.txt
and
ls | Out-File test.txt
will give you totally different results.
And no, '>' is not alias, it's redirection operator (same as in other shells). And has very serious limitation... It will cut lines same way they are displayed. Out-File has -Width parameter that helps you avoid this. Also, with redirection operators you can't decide what encoding to use.
HTH
Bartek
Set-Content supports -Encoding Byte, while Out-File does not.
So when you want to write binary data or result of Text.Encoding#GetBytes() to a file, you should use Set-Content.
Wanted to add about difference on encoding:
Windows with PowerShell 5.1:
Out-File - Default encoding is utf-16le
Set-Content - Default encoding is us-ascii
Linux with PowerShell 7.1:
Out-File - Default encoding is us-ascii
Set-Content - Default encoding is us-ascii
Out-file -append or >> can actually mix two encodings in the same file. Even if the file is originally ASCII or ANSI, it will add Unicode by default to the bottom of it. Add-content will check the encoding and match it before appending. Btw, export-csv defaults to ASCII (no accents), and set-content/add-content to ANSI.
TL;DR, use Set-Content as it's more consistent over Out-File.
Set-Content behavior is the same over different powershell versions
Out-File as #JagWireZ says produces different encodings for the default settings, even on the same OS(Windows) the docs for powershell 5.1 and powershell 7.3 state that the encoding changed from unicode to utf8NoBOM
Some issues like Malformed XML arise from using Out-File, that could of course be fixed by setting the desired encoding, however it's likely to forget to set the encoding and end up with issues.