How to remove white space using powershell in multiple text file in the same folder? - powershell

folder name is: c:\home\alltext\
inside has: 2 text files with different names(each text contents extra whitespace that I want to trim)
text1.txt
text2.txt
I don't want to use notepad++ and do one by one text.txt if I have more than 2 command.
I tried PowerShell it returns both text1 and text2 together in same one text.txt.
How can I trim them in one command and return individual txt?
This is my command:
(get-content c:\home\alltext\*.txt).trim() -ne '' | Set-content c:\home\alltext\*.txt

You need to process the input files one by one:
Get-ChildItem c:\home\alltext*.txt | ForEach-Object {
Set-Content -LiteralPath $_.FullName -Value (($_ | Get-Content).Trim() -ne '')
}
Note that PowerShell never preserves the original character encoding when reading text files, so you may have to use the -Encoding parameter with Set-Content.
As for what you tried:
(get-content c:\home\alltext*.txt).trim() -ne '' streams the non-blank lines of all files matching wildcard expression c:\home\alltext*.txt, across file boundaries.
Perhaps surprisingly, not only does Set-Content's (positionally implied) -Path parameter accept wildcard expressions too, it writes the same content (the stringified versions of whatever input it receives) to whatever files happen to match that wildcard expression.
This problematic behavior is discussed in GitHub issue #6729; unfortunately, it was decided to retain the current behavior.

Related

How to split a text file into two in PowerShell?

I have one text file with Script that I want to split into two
Below is the dummy script
--serverone
this is first part of my script
--servertwo
this is second part of my script
I want to create two text files that would look like
file1
--serverone
this is first part of my script
file2
--servertwo
this is second part of my script
So far, I have added a special character within the script that I know don't exist ("}")
$script = get-content -Path "C:\Users\shamvil\Desktop\test.txt"
$newscript = $script.Replace("--servertwo","}--servertwo")
$newscript.split("}")
but I don't know how to save the split into two separate places.
This might not be a best approach, so I am also open to different solution as well.
Please help, thanks!
Use a regex-based -split operation:
$i = 0
(Get-Content -Raw test.txt) -split '(?m)^(?=--)' -ne '' |
ForEach-Object { $fileName = 'file' + (++$i); Set-Content $fileName $_ }
This assumes that each block of lines that starts with a line that starts with -- is to be saved to a separate file.
Get-Content -Raw reads the entire file into a single, multi-line string.
As for the separator regex passed to -split:
The (?m) inline regex option makes anchors ^ and $ match on each line
^(?=--) therefore matches every line that starts with --, using a by definition non-capturing look-ahead assertion ((?=...)) to ensure that the -- isn't removed from the resulting blocks (by default, what matches the separator regex is not included).
-ne '' filters out the extra empty element that results from the separator expression matching at the very start of the string.
Note that Set-Content knows nothing about the character encoding of the input file and uses its default encoding; use -Encoding as needed.
zett42 points out that the file-writing part can be streamlined with the help of a delay-bind script-block parameter:
$i = 0
(Get-Content -Raw test.txt) -split '(?m)^(?=--)' -ne '' |
Set-Content -LiteralPath { (Get-Variable i -Scope 1).Value++; "file$i" }
The Get-Variable call to access and increment the $i variable in the parent scope is necessary, because delay-bind script blocks (as well as script blocks for calculated properties) run in a child scope - perhaps surprisingly, as discusssed in GitHub issue #7157
A shorter - but even more obscure - option is to use ([ref] $i).Value++ instead; see this answer for details.
zett42 also points to a proposed future enhancement that would obviate the need to maintain the sequence numbers manually, via the introduction of an automatic $PSIndex variable that reflects the sequence number of the current pipeline object: see GitHub issue #13772.

Issues merging multiple CSV files in Powershell

I found a nifty command here - http://www.stackoverflow.com/questions/27892957/merging-multiple-csv-files-into-one-using-powershell that I am using to merge CSV files -
Get-ChildItem -Filter *.csv | Select-Object -ExpandProperty FullName | Import-Csv | Export-Csv .\merged\merged.csv -NoTypeInformation -Append
Now this does what it says on the tin and works great for the most part. I have 2 issues with it however, and I am wondering if there is a way they can be overcome:
Firstly, the merged csv file has CRLF line endings, and I am wondering how I can make the line endings just LF, as the file is being generated?
Also, it looks like there are some shenanigans with quote marks being added/moved around. As an example:
Sample row from initial CSV:
"2021-10-05"|"00:00"|"1212"|"160477"|"1.00"|"3.49"LF
Same row in the merged CSV:
"2021-10-05|""00:00""|""1212""|""160477""|""1.00""|""3.49"""CRLF
So see that the first row has lost its trailing quotes, other fields have doubled quotes, and the end of the row has an additional quote. I'm not quite sure what is going on here, so any help would be much appreciated!
For dealing with the quotes, the cause of the “problem” is that your CSV does not use the default field delimiter that Import-CSV assumes - the C in CSV stands for comma, and you’re using the vertical bar. Add the parameter -Delimiter "|" to both the Import-CSV and Export-CSV cmdlets.
I don’t think you can do anything about the line-end characters (CRLF vs LF); that’s almost certainly operating-system dependent.
Jeff Zeitlin's helpful answer explains the quote-related part of your problem well.
As for your line-ending problem:
As of PowerShell 7.2, there are no PowerShell-native features that allow you to control the newline format of file-writing cmdlets such as Export-Csv.
However, if you use plain-text processing, you can use multi-line strings built with the newline format of interest and save / append them with Set-Content and its -NoNewLine switch, which writes the input strings as-is, without a (newline) separator.
In fact, to significantly speed up processing in your case, plain-text handling is preferable, since in essence your operation amounts to concatenating text files, the only twist being that the header lines of all but the first file should be skipped; using plain-text handling also bypasses your quote problem:
$tokenCount = 1
Get-ChildItem -Filter *.csv |
Get-Content -Raw |
ForEach-Object {
# Get the file content and replace CRLF with LF.
# Include the first line (the header) only for the first file.
$content = ($_ -split '\r?\n', $tokenCount)[-1].Replace("`r`n", "`n")
$tokenCount = 2 # Subsequent files should have their header ignored.
# Make sure that each file content ends in a LF
if (-not $content.EndsWith("`n")) { $content += "`n" }
# Output the modified content.
$content
} |
Set-Content -NoNewLine ./merged/merged.csv # add -Encoding as needed.

Batch File to Find and Replace in text file using whole word only?

I am writing a script which at one point has to check in a text file and remove certain strings. So far I have this:
powershell -Command "(gc myFile.txt) -replace 'foo', 'bar' | Out-File -encoding ASCII myFile.txt"
The only problem is that that can find and replace but will not remove the line all together.
The second problem is that say I am removing the line that has Mark, it needs to not remove a line that has something like Markus.
I don't know if this is possible with the powershell interface?
Your current code will only replace foo with bar, this is what replace does.
Removing the whole line if it matches requires a different approach, almost backwards, as you can use notmatch to output any lines that do not match you filter - effectively removing them.
Also using regex word boundaries will then only match Mark but not Markus:
(Get-Content file.txt) | Where-Object {$_ -notmatch "\bMark\b"} | Set-Content file.txt

Extract lines matching a pattern from all text files in a folder to a single output file

I am trying to extract each line starting with "%%" in all files in a folder and then copy those lines to a separate text file. Currently using this code in PowerShell code, but I am not getting any results.
$files = Get-ChildItem "folder" -Filter *.txt
foreach ($file in $files)
{
if ($_ -like "*%%*")
{
Set-Content "Output.txt"
}
}
I think that mklement0's suggestion to use Select-String is the way to go. Adding to his answer, you can pipe the output of Get-ChildItem into the Select-String so that the entire process becomes a Powershell one liner.
Something like this:
Get-ChildItem "folder" -Filter *.txt | Select-String -Pattern '^%%' | Select -ExpandProperty line | Set-Content "Output.txt"
The Select-String cmdlet offers a much simpler solution (PSv3+ syntax):
(Select-String -Path folder\*.txt -Pattern '^%%').Line | Set-Content Output.txt
Select-String accepts a filename/path pattern via its -Path parameter, so, in this simple case, there is no need for Get-ChildItem.
If, by contrast, you input file selection is recursive or uses more complex criteria, you can pipe Get-ChildItem's output to Select-String, as demonstrated in Dave Sexton's helpful answer.
Note that, according to the docs, Select-String by default assumes that the input files are UTF-8-encoded, but you can change that with the -Encoding parameter; also consider the output encoding discussed below.
Select-String's -Pattern parameter expects a regular expression rather than a wildcard expression.
^%% only matches literal %% at the start (^) of a line.
Select-String outputs [Microsoft.PowerShell.Commands.MatchInfo] objects that contain information about each match; each object's .Line property contains the full text of an input line that matched.
Set-Content Output.txt sends all matching lines to single output file Output.txt
Set-Content uses the system's legacy Windows codepage (an 8-bit single-byte encoding - even though the documentation mistakenly claims that ASCII files are produced).
If you want to control the output encoding explicitly, use the -Encoding parameter; e.g., ... | Set-Content Output.txt -Encoding Utf8.
By contrast, >, the output redirection operator always creates UTF-16LE files (an encoding PowerShell calls Unicode), as does Out-File by default (which can be changed with -Encoding).
Also note that > / Out-File apply PowerShell's default formatting to the input objects to obtain the string representation to write to the output file, whereas Set-Content treats the input as strings (calls .ToString() on input objects, if necessary). In the case at hand, since all input objects are already strings, there is no difference (except for the character encoding, potentially).
As for what you've tried:
$_ inside your foreach ($file in $files) refers to a file (a [System.IO.FileInfo] object), so you're effectively evaluating your wildcard expression *%%* against the input file's name rather than its contents.
Aside from that, wildcard pattern *%%* will match %% anywhere in the input string, not just at its start (you'd have to use %%* instead).
The Set-Content "Output.txt" call is missing input, because it is not part of a pipeline and, in the absence of pipeline input, no -Value argument was passed.
Even if you did provide input, however, output file Output.txt would get rewritten as a whole in each iteration of your foreach loop.
First you have to use
Get-Content
in order to get the content of the file. Then you do the string match and based on that you again set the content back to the file. Use get-content and put another loop inside the foreach to iterate all the lines in the file.
I hope this logic helps you
ls *.txt | %{
$f = $_
gc $f.fullname | {
if($_.StartWith("%%") -eq 1){
$_ >> Output.txt
}#end if
}#end gc
}#end ls
Alias
ls - Get-ChildItem
gc - Get-Content
% - ForEach
$_ - Iterator variable for loop
>> - Redirection construct
# - Comment
http://ss64.com/ps/

Matching lines in file from list

I have two text files, Text1.txt and Text2.txt
Text2.txt is a list of keywords, one keyword per line. I want to read from Text1.txt and any time a keyword in the Text2.txt list shows up, pipe that entire line of text to a new file, output.txt
Without using Text2.txt I figured out how to do it manually in PowerShell.
Get-Content .\Text1.txt | Where-Object {$_ -match 'CAPT'} | Set-Content output.txt
That seems to work, it searches for "CAPT" and returns the entire line of text, but I don't know how to replace the manual text search with a variable that pulls from Text2.txt
Any ideas?
Using some simple regex you can make a alternative matching string from all the keywords in the file Text2.txt
$pattern = (Get-Content .\Text2.txt | ForEach-Object{[regex]::Escape($_)}) -Join "|"
Get-Content .\Text1.txt | Where-Object {$_ -match $pattern} | Set-Content output.txt
In case your keywords have special regex characters we need to be sure they are escaped here. The .net regex method Escape() handles that.
This is not an efficient approach for large files but it is certainly a simple method. If your keywords were all similar like CAPT CAPS CAPZ then we could improve it but I don't think it would be worth it depending how often the keywords change.
Changing the pattern
If you wanted to just match the first 4 characters from the lines in your input file that is just a matter of making a change in the loop.
$pattern = (Get-Content .\Text2.txt | ForEach-Object{[regex]::Escape($_.Substring(0,4))}) -Join "|"