How to split a text file into two in PowerShell?

How to split a text file into two in PowerShell? - powershell

I have one text file with Script that I want to split into two
Below is the dummy script
--serverone
this is first part of my script
--servertwo
this is second part of my script
I want to create two text files that would look like
file1
--serverone
this is first part of my script
file2
--servertwo
this is second part of my script
So far, I have added a special character within the script that I know don't exist ("}")
$script = get-content -Path "C:\Users\shamvil\Desktop\test.txt"
$newscript = $script.Replace("--servertwo","}--servertwo")
$newscript.split("}")
but I don't know how to save the split into two separate places.
This might not be a best approach, so I am also open to different solution as well.
Please help, thanks!

Use a regex-based -split operation:
$i = 0
(Get-Content -Raw test.txt) -split '(?m)^(?=--)' -ne '' |
ForEach-Object { $fileName = 'file' + (++$i); Set-Content $fileName $_ }
This assumes that each block of lines that starts with a line that starts with -- is to be saved to a separate file.
Get-Content -Raw reads the entire file into a single, multi-line string.
As for the separator regex passed to -split:
The (?m) inline regex option makes anchors ^ and $ match on each line
^(?=--) therefore matches every line that starts with --, using a by definition non-capturing look-ahead assertion ((?=...)) to ensure that the -- isn't removed from the resulting blocks (by default, what matches the separator regex is not included).
-ne '' filters out the extra empty element that results from the separator expression matching at the very start of the string.
Note that Set-Content knows nothing about the character encoding of the input file and uses its default encoding; use -Encoding as needed.
zett42 points out that the file-writing part can be streamlined with the help of a delay-bind script-block parameter:
$i = 0
(Get-Content -Raw test.txt) -split '(?m)^(?=--)' -ne '' |
Set-Content -LiteralPath { (Get-Variable i -Scope 1).Value++; "file$i" }
The Get-Variable call to access and increment the $i variable in the parent scope is necessary, because delay-bind script blocks (as well as script blocks for calculated properties) run in a child scope - perhaps surprisingly, as discusssed in GitHub issue #7157
A shorter - but even more obscure - option is to use ([ref] $i).Value++ instead; see this answer for details.
zett42 also points to a proposed future enhancement that would obviate the need to maintain the sequence numbers manually, via the introduction of an automatic $PSIndex variable that reflects the sequence number of the current pipeline object: see GitHub issue #13772.

Related

Read value of variable in .ps1 and update the same variable in another .ps1

I'm trying to find an efficient way to read the value of a string variable in a PowerShell .ps1 file and then update the same variable/value in another .ps1 file. In my specific case, I would update a variable for the version # on script one and then I would want to run a script to update it on multiple other .ps1 files. For example:
1_script.ps1 - Script I want to read variable from
$global:scriptVersion = "v1.1"
2_script.ps1 - script I would want to update variable on (Should update to v1.1)
$global:scriptVersion = "v1.0"
I would want to update 2_script.ps1 to set the variable to "v1.1" as read from 1_script.ps1. My current method is using get-content with a regex to find a line starting with my variable, then doing a bunch of replaces to get the portion of the string I want. This does work, but it seems like there is probably a better way I am missing or didn't get working correctly in my tests.
My Modified Regex Solution Based on Answer by #mklement0 :
I slightly modified #mklement0 's solution because dot-sourcing the first script was causing it to run
$file1 = ".\1_script.ps1"
$file2 = ".\2_script.ps1"
$fileversion = (Get-Content $file1 | Where-Object {$_ -match '(?m)(?<=^\s*\$global:scriptVersion\s*=\s*")[^"]+'}).Split("=")[1].Trim().Replace('"','')
(Get-Content -Raw $file2) -replace '(?m)(?<=^\s*\$global:scriptVersion\s*=\s*")[^"]+',$fileversion | Set-Content $file2 -NoNewLine

Generally, the most robust way to parse PowerShell code is to use the language parser. However, reconstructing source code, with modifications after parsing, may situationally be hampered by the parser not reporting the details of intra-line whitespace - see this answer for an example and a discussion.[1]
Pragmatically speaking, using a regex-based -replace solution is probably good enough in your simple case (note that the value to update is assumed to be enclosed in "..." - but matching could be made more flexible to support '...' quoting too):
# Dot-source the first script in order to obtain the new value.
# Note: This invariably executes *all* top-level code in the script.
. .\1_script.ps1
# Outputs to the display.
# Append
# | Set-Content -Encoding utf8 2_script.ps1
# to save back to the input file.
(Get-Content -Raw 2_script.ps1) -replace '(?m)(?<=^\s*\$global:scriptVersion\s*=\s*")[^"]+', $global:scriptVersion
For an explanation of the regex and the ability to experiment with it, see this regex101.com page.
[1] Syntactic elements are reported in terms of line and column position, and columns are character-based, meaning that spaces and tabs are treated the same, so that a difference of, say, 3 character positions can represent 3 spaces, 3 tabs, or any mix of it - the parser won't tell you. However, if your approach allows keeping the source code as a whole while only removing and splicing in certain elements, that won't be a problem, as shown in iRon's helpful answer.

To compliment the helpful answer from #mklement0. In case your do go for the PowerShell abstract syntax tree (AST) class, you might use the Extent.StartOffset/Extent.EndOffset properties to reconstruct your script:
Using NameSpace System.Management.Automation.Language
$global:scriptVersion = 'v1.1' # . .\Script1.ps1
$Script2 = { # = Get-Content -Raw .\Script2.ps1
[CmdletBinding()]param()
begin {
$global:scriptVersion = "v1.0"
}
process {
$_
}
end {}
}.ToString()
$Ast = [Parser]::ParseInput($Script2, [ref]$null, [ref]$null)
$Extent = $Ast.Find(
{
$args[0] -is [AssignmentStatementAst] -and
$args[0].Left.VariablePath.UserPath -eq 'global:scriptVersion' -and
$args[0].Operator -eq 'Equals'
}, $true
).Right.Extent
-Join (
$Script2.SubString(0, $Extent.StartOffset),
$global:scriptVersion,
$Script2.SubString($Extent.EndOffset)
) # |Set-Content .\Script2.ps1

How to remove white space using powershell in multiple text file in the same folder?

folder name is: c:\home\alltext\
inside has: 2 text files with different names(each text contents extra whitespace that I want to trim)
text1.txt
text2.txt
I don't want to use notepad++ and do one by one text.txt if I have more than 2 command.
I tried PowerShell it returns both text1 and text2 together in same one text.txt.
How can I trim them in one command and return individual txt?
This is my command:
(get-content c:\home\alltext\*.txt).trim() -ne '' | Set-content c:\home\alltext\*.txt

You need to process the input files one by one:
Get-ChildItem c:\home\alltext*.txt | ForEach-Object {
Set-Content -LiteralPath $_.FullName -Value (($_ | Get-Content).Trim() -ne '')
}
Note that PowerShell never preserves the original character encoding when reading text files, so you may have to use the -Encoding parameter with Set-Content.
As for what you tried:
(get-content c:\home\alltext*.txt).trim() -ne '' streams the non-blank lines of all files matching wildcard expression c:\home\alltext*.txt, across file boundaries.
Perhaps surprisingly, not only does Set-Content's (positionally implied) -Path parameter accept wildcard expressions too, it writes the same content (the stringified versions of whatever input it receives) to whatever files happen to match that wildcard expression.
This problematic behavior is discussed in GitHub issue #6729; unfortunately, it was decided to retain the current behavior.

How to remove a multi line block of text from $pattern in Powershell

I'm getting the contents of a text file which is partly created by gsutil and I'm trying to put its contents in $body but I want to omit a block of text that contains special characters. The problem is that I'm not able to match this block of text in order for it to be removed. So when I print out $body it still contains all the text that I'm trying to omit.
Here's a part of my code:
$pattern = #"
==> NOTE: You are uploading one or more large file(s), which would run
significantly faster if you enable parallel composite uploads. This
feature can be enabled by editing the
"parallel_composite_upload_threshold" value in your .boto
configuration file. However, note that if you do this you and any
users that download such composite files will need to have a compiled
crcmod installed (see "gsutil help crcmod").
"#
$pattern = ([regex]::Escape($pattern))
$body = Get-Content -Path C:\temp\file.txt -Raw | Select-String -Pattern $pattern -NotMatch
So basically I need it to display everything inside the text file except for the block of text in $pattern. I tried without -Raw and without ([regex]::Escape($pattern)) but it won't remove that entire block of text.
It has to be because of the special characters, probably the " , . () because if I make the pattern simple such as:
$pattern = #"
NOTE: You are uploading one or more
"#
then it works and this part of text is removed from $body.
It'd be nice if everything inside $pattern between the #" and "# was treated literally. I'd like the simplest solution without functions, etc. I'd really appreciate it if someone could help me out with this.

With the complete text of your question stored in file .\SO_55538262.txt
This script with manually escaped pattern:
$pattern = '(?sm)^==\> NOTE: You .*?"gsutil help crcmod"\)\.'
$body = (Get-Content .\SO_55538262.txt -raw) -replace $pattern
$body
Returns here:
I'm getting the contents of a text file which is partly created by gsutil and I'm trying to put its contents in $body but I want to omit a block of text that contains special characters. The problem is that I'm not able to match this block of text in order for it to be removed. So when I print out $body it still contains all the text that I'm trying to omit.
Here's a part of my code:
$pattern = #"
"#
$pattern = ([regex]::Escape($pattern))
$body = Get-Content -Path C:\temp\file.txt -Raw | Select-String -Pattern $pattern -NotMatch
So basically I need it to display everything inside the text file except for the block of text in $pattern. I tried without -Raw and without ([regex]::Escape($pattern)) but it won't remove that entire block of text.
It has to be because of the special characters, probably the " , . () because if I make the pattern simple such as:
$pattern = #" NOTE: You are uploading one or more "#
then it works and this part of text is removed from $body.
It'd be nice if everything inside $pattern between the #" and "# was treated literally. I'd like the simplest solution without functions, etc.
Explanation of the RegEx from regex101.com:
(?sm)^==\> NOTE: You .*?"gsutil help crcmod"\)\.
(?sm) match the remainder of the pattern with the following effective flags: gms
s modifier: single line. Dot matches newline characters
m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)
^ asserts position at start of a line
== matches the characters == literally (case sensitive)
\> matches the character > literally (case sensitive)
NOTE: You matches the characters NOTE: You literally (case sensitive)
.*?
. matches any character
*? Quantifier — Matches between zero and unlimited times, as few times as possible, expanding as needed (lazy)
"gsutil help crcmod" matches the characters "gsutil help crcmod" literally (case sensitive)
\) matches the character ) literally (case sensitive)
\. matches the character . literally (case sensitive)

An easy way to tackle this task (without regex) would be using the -notin operator. Since Get-Content is returning your file content as a string[]:
#requires -Version 4
$set = #('==> NOTE: You are uploading one or more large file(s), which would run'
'significantly faster if you enable parallel composite uploads. This'
'feature can be enabled by editing the'
'"parallel_composite_upload_threshold" value in your .boto'
'configuration file. However, note that if you do this you and any'
'users that download such composite files will need to have a compiled'
'crcmod installed (see "gsutil help crcmod").')
$filteredContent = #(Get-Content -Path $path).
Where({ $_.Trim() -notin $set }) # trim added for misc whitespace
v2 compatible solution:
#(Get-Content -Path $path) |
Where-Object { $set -notcontains $_.Trim() }

Extract lines matching a pattern from all text files in a folder to a single output file

I am trying to extract each line starting with "%%" in all files in a folder and then copy those lines to a separate text file. Currently using this code in PowerShell code, but I am not getting any results.
$files = Get-ChildItem "folder" -Filter *.txt
foreach ($file in $files)
{
if ($_ -like "*%%*")
{
Set-Content "Output.txt"
}
}

I think that mklement0's suggestion to use Select-String is the way to go. Adding to his answer, you can pipe the output of Get-ChildItem into the Select-String so that the entire process becomes a Powershell one liner.
Something like this:
Get-ChildItem "folder" -Filter *.txt | Select-String -Pattern '^%%' | Select -ExpandProperty line | Set-Content "Output.txt"

The Select-String cmdlet offers a much simpler solution (PSv3+ syntax):
(Select-String -Path folder\*.txt -Pattern '^%%').Line | Set-Content Output.txt
Select-String accepts a filename/path pattern via its -Path parameter, so, in this simple case, there is no need for Get-ChildItem.
If, by contrast, you input file selection is recursive or uses more complex criteria, you can pipe Get-ChildItem's output to Select-String, as demonstrated in Dave Sexton's helpful answer.
Note that, according to the docs, Select-String by default assumes that the input files are UTF-8-encoded, but you can change that with the -Encoding parameter; also consider the output encoding discussed below.
Select-String's -Pattern parameter expects a regular expression rather than a wildcard expression.
^%% only matches literal %% at the start (^) of a line.
Select-String outputs [Microsoft.PowerShell.Commands.MatchInfo] objects that contain information about each match; each object's .Line property contains the full text of an input line that matched.
Set-Content Output.txt sends all matching lines to single output file Output.txt
Set-Content uses the system's legacy Windows codepage (an 8-bit single-byte encoding - even though the documentation mistakenly claims that ASCII files are produced).
If you want to control the output encoding explicitly, use the -Encoding parameter; e.g., ... | Set-Content Output.txt -Encoding Utf8.
By contrast, >, the output redirection operator always creates UTF-16LE files (an encoding PowerShell calls Unicode), as does Out-File by default (which can be changed with -Encoding).
Also note that > / Out-File apply PowerShell's default formatting to the input objects to obtain the string representation to write to the output file, whereas Set-Content treats the input as strings (calls .ToString() on input objects, if necessary). In the case at hand, since all input objects are already strings, there is no difference (except for the character encoding, potentially).
As for what you've tried:
$_ inside your foreach ($file in $files) refers to a file (a [System.IO.FileInfo] object), so you're effectively evaluating your wildcard expression *%%* against the input file's name rather than its contents.
Aside from that, wildcard pattern *%%* will match %% anywhere in the input string, not just at its start (you'd have to use %%* instead).
The Set-Content "Output.txt" call is missing input, because it is not part of a pipeline and, in the absence of pipeline input, no -Value argument was passed.
Even if you did provide input, however, output file Output.txt would get rewritten as a whole in each iteration of your foreach loop.

First you have to use
Get-Content
in order to get the content of the file. Then you do the string match and based on that you again set the content back to the file. Use get-content and put another loop inside the foreach to iterate all the lines in the file.
I hope this logic helps you

ls *.txt | %{
$f = $_
gc $f.fullname | {
if($_.StartWith("%%") -eq 1){
$_ >> Output.txt
}#end if
}#end gc
}#end ls
Alias
ls - Get-ChildItem
gc - Get-Content
% - ForEach
$_ - Iterator variable for loop
>> - Redirection construct
# - Comment
http://ss64.com/ps/

Replacing contents of a text file using PowerShell

I've looked all around this site and can't quite seem to find anything that fits my situation. Basically, I am trying to write an addition to the NETLOGON file that will replace text in a text file on all of our users' desktops. The current text is static across the board.
The text I want it changed to will be unique to each user. I want to change the current text (user1) to the users AD username (i.e. johnd, janed, etc.). I am using Windows Server 2008 R2 and all the workstations are Windows 7 Professional SP1 64 bit.
Here's what I have tried so far (with a few variables, which none have worked for one reason or the other):
gc c:\Users\%USERNAME%\desktop\VPN.txt' -replace "user1",$env:username | out-file c:\Users\%USERNAME%\desktop\VPN.txt
I didn't get an error, but it also did not go back to the normal "PS C:>" prompt, just ">>>" and the file did not change as anticipated.

If that is how you have the code exactly then I suppose it is because you have an opening single quote without a closing one. You are still going to have two other problems and you have one answer in your code. The >>> is the line continuation characters because the parser knows that the code is not complete and giving you the option to continue with the code. If you were purposely coding a single line on multiple lines you would consider this a feature.
$path = "c:\Users\$($env:username)\desktop\VPN.txt"
(Get-Content $path) -replace "user1",$env:username | out-file $path
Closed the path in quotes and used a variable since you called the path twice.
%name% is used in command prompt. Environment variables in PowerShell use the $env: provider which you did you once in your snippet.
-replace is a regex replaced tool that can work against Get-Content but you need to capture the result in a sub expression first.
Secondly with -replace is for regex and your string is not regex based you could just use .Replace() as well.
Set-Content is generally preferred over Out-File for performance reasons.
All that being said...
you could also try something like this.
$path = "c:\Users\$($env:username)\desktop\VPN.txt"
(Get-Content $path).Replace("user1",$env:username) | Set-Content $path
Do you want to only replace the first occurrence?
You could use a little regex here with a tweak in how you get the use Get-Content
$path = "c:\Users\$($env:username)\desktop\VPN.txt"
(Get-Content $path | Out-String) -replace "(.*?)user1(.*)",('$1{0}$2' -f $env:username) | out-file $path
Regex will match the entire file. There are two groups which it captures.
(.*?) - Up until the first "user1"
(.*) - Everything after that
Then we use the format operator to sandwich the new username in between those capture groups.

Use:
(Get-Content $fileName) | % {
if ($_.ReadCount -eq 1) {
$_ -replace "$original", "$content"
}
else {
$_
}
} | Set-Content $fileName

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse