I'm using the following bit of code to process a SQL Script and split it up using the GO command:
[string]$batchDelimiter = "[gG][oO]"
$scriptContent = Get-Content $sqlScript | Out-String
$batches = $scriptContent -split "\s*$batchDelimiter\s*\r?\n"
foreach($batch in $batches)
{
if(![string]::IsNullOrEmpty($batch.Trim()))
{
$SqlCmd.CommandText = $batch
$reader = $SqlCmd.ExecuteNonQuery()
}
}
The problem I have is when a GO command appears in the middle of a comment block:
/*
IF OBJECT_ID('AmyTempMapRetroDateFK') IS NOT NULL
DROP FUNCTION AmyTempMapRetroDateFK
GO
*/
Is there a way of removing all of the comment blocks before processing the script? I've seen a few examples in c# but nothing for Powershell.
Assuming that there are no nested comments (PSv3+ syntax):
(Get-Content -Raw $sqlScript) -split '(?s)/\*.*?\*/' -split '\r?\ngo\r?\n' -notmatch '^\s*$' |
ForEach-Object { $SqlCmd.CommandText = $_.Trim(); $reader = $SqlCmd.ExecuteNonQuery() }
Note: If there's a chance that the final line doesn't end in a line break,
use '\r?\ngo(\r?\n|$) instead of
'\r?\ngo\r?\n'
Get-Content -Raw, available since PSv3, reads the entire file into a single string - it is the simpler and more efficient equivalent of Get-Content $sqlScript | Out-String
-split '(?s)/\*.*?\*/' splits the input string by /* ... */ spans; note the inline option, (?s), which is required to make . match newlines too; non-greedy quantifier .*? is needed to only match up to the next */ instance; the result is an array of line blocks with the comment blocks excluded.
-split '\r?\ngo\r?\n' then further splits that array by the word go preceded and followed by a newline.
Note that -split is case-insensitive by default, so you needn't worry about case variations such as GO.
(You could use alias -isplit to make the case-insensitive behavior more explicit; similarly,
-csplit can be used for case-sensitive matching.)
-notmatch '^\s*$' filters out blank / empty elements from the resulting array, and sends the filtered array through the pipeline (|).
The ForEach-Object cmdlet then operates on each array element - now containing an individual SQL command - via automatic variable $_, which always represents the input object at hand.
A simplified version of the solution marked as the best answer, adapted here to remove PS comment blocks:
(get-content .\myscript.ps1 -raw) -replace "(?s)<#.+?#>",'' > myscript_Clean.ps1
Not sure why GO statements should interfere here. Applied to SQL comment blocks, this should do it:
(get-content .\myscript.sql -raw) -replace "(?s)/\*.+?*\/",'' > myscript_Clean.sql
Perhaps you could split on /* and join the resulting array.
Then split that on */, and join the resulting array.
Joins would be easier read with a `r`n(carriage return, newline) delimiter
Related
I have a large text file (output from SQL db) and I need to determine the row count. However, since the source SQL data itself contains carriage returns \r and line feeds \n (NEVER appearing together), the data for some rows spans multiple lines in the output .txt file. The Powershell I'm using below gives me the file line count which is greater than the actual SQL row count. So I need to modify the script to ignore the additional lines - one way of doing it might be just counting the number of times CRLF or \r\n occurs (TOGETHER) in the file and that should be the actual number of rows but I'm not sure how to do it.
Get-ChildItem "." |% {$n = $_; $c = 0; Get-Content -Path $_ -ReadCount 1000 |% { $c += $_.Count }; "$n; $c"} > row_count.txt
I just learned myself that the Get-Content splits and streams each lines in a file by CR, CRLF, and LF sothat it can read data between operating systems interchangeably:
"1`r2`n3`r`n4" | Out-File .\Test.txt
(Get-Content .\Test.txt).Count
4
Reading the question again, I might have misunderstood your question.
In any case, if you want to split (count) on only a specific character combination:
CR
((Get-Content -Raw .\Test.txt).Trim() -Split '\r').Count
3
LF
((Get-Content -Raw .\Test.txt).Trim() -Split '\n').Count
3
CRLF
((Get-Content -Raw .\Test.txt).Trim() -Split '\r\n').Count # or: -Split [Environment]::NewLine
2
Note .Trim() method which removes the extra newline (white spaces) at the end of the file added by the Get-Content -Raw parameter.
Addendum
(Update based on the comment on the memory exception)
I am afraid that there is currently no other option then building your own StreamReader using the ReadBlock method and specifically split lines on a CRLF. I have opened a feature request for this issue: -NewLine Parameter to customize line separator for Get-Content
Get-Lines
A possible way to workaround the memory exception errors:
function Get-Lines {
[CmdletBinding()][OutputType([string])] param(
[Parameter(ValueFromPipeLine = $True)][string] $Filename,
[String] $NewLine = [Environment]::NewLine
)
Begin {
[Char[]] $Buffer = new-object Char[] 10
$Reader = New-Object -TypeName System.IO.StreamReader -ArgumentList (Get-Item($Filename))
$Rest = '' # Note that a multiple character newline (as CRLF) could be split at the end of the buffer
}
Process {
While ($True) {
$Length = $Reader.ReadBlock($Buffer, 0, $Buffer.Length)
if (!$length) { Break }
$Split = ($Rest + [string]::new($Buffer[0..($Length - 1)])) -Split $NewLine
If ($Split.Count -gt 1) { $Split[0..($Split.Count - 2)] }
$Rest = $Split[-1]
}
}
End {
$Rest
}
}
Usage
To prevent the memory exceptions it is important that you do not assign the results to a variable or use brackets as this will stall the PowerShell PowerShell pipeline and store everything in memory.
$Count = 0
Get-Lines .\Test.txt | ForEach-Object { $Count++ }
$Count
The System.IO.StreamReader.ReadBlock solution that reads the file in fixed-size blocks and performs custom splitting into lines in iRon's helpful answer is the best choice, because it both avoids out-of-memory problems and performs well (by PowerShell standards).
If performance in terms of execution speed isn't paramount, you can take advantage of
Get-Content's -Delimiter parameter, which accepts a custom string to split the file content by:
# Outputs the count of CRLF-terminated lines.
(Get-Content largeFile.txt -Delimiter "`r`n" | Measure-Object).Count
Note that -Delimiter employs optional-terminator logic when splitting: that is, if the file content ends in the given delimiter string, no extra, empty element is reported at the end.
This is consistent with the default behavior, where a trailing newline in a file is considered an optional terminator that does not resulting in an additional, empty line getting reported.
However, in case a -Delimiter string that is unrelated to newline characters is used, a trailing newline is considered a final "line" (element).
A quick example:
# Create a test file without a trailing newline.
# Note the CR-only newline (`r) after 'line 1'
"line1`rrest of line1`r`nline2" | Set-Content -NoNewLine test1.txt
# Create another test file with the same content plus
# a trailing CRLF newline.
"line1`rrest of line1`r`nline2`r`n" | Set-Content -NoNewLine test2.txt
'test1.txt', 'test2.txt' | ForEach-Object {
"--- $_"
# Split by CRLF only and enclose the resulting lines in [...]
Get-Content $_ -Delimiter "`r`n" |
ForEach-Object { "[{0}]" -f ($_ -replace "`r", '`r') }
}
This yields:
--- test1.txt
[line1`rrest of line1]
[line2]
--- test2.txt
[line1`rrest of line1]
[line2]
As you can see, the two test files were processed identically, because the trailing CRLF newline was considered an optional terminator for the last line.
I want to use the method given in the answer of this question:
PowerShell - Remove all lines of text file until a certain string is found
However I don't get my string from "Get-Content"; I get it from "Out-String". How can I convert my "Out-String" variable into a "Get-Content" format without needing to "Set-Content"/"Get-Content" a temporary file? Or how can I get the same end result without even converting?
It really hurts my brains that a "Get-Member" on the variable from either 'Out-String' or 'Get-Content' returns a TypeName of System.String but you cannot use them the same way...
Here is the simplified code I've been trying to understand - let's use that:
# Let's work with the text from 'Get-Help' output:
$myString = (Get-Help | out-string)
# I only want the text from the "SEE ALSO" section:
$cut = $myString.Where({ $_ -like ("*SEE ALSO*") },'SkipUntil')
$cut # This shows the whole thing!!! :-(
$cut | gm | findstr TypeName # says 'TypeName: System.String'
# Dirty conversion to "Get-Content" format:
Set-Content "tmp.file" -value $cut
$recut = (Get-Content "tmp.file").Where({ $_ -like ("*SEE ALSO*") },'SkipUntil')
$recut # Now this shows what I want, even though the below returns 'TypeName: System.String' as well !!!
(Get-Content "tmp.file") | gm | findstr TypeName
The problem is get-help (with no parameters) or out-string is outputting one multiline string (with windows line endings). I even tried out-string -stream. This is unusual for a powershell command. Get-content would split up the lines for you automatically.
(get-help).count
1
One way to resolve it is to split on the line endings. I'm also skipping blank lines at the end. (This split pattern works with unix/osx text too.)
((get-help) -split '\r?\n').Where({ $_ -like '*SEE ALSO*' },'SkipUntil') | where { $_ }
SEE ALSO:
about_Updatable_Help
Get-Help
Save-Help
Update-Help
Or:
((get-help) -split '\r?\n').Where({ $_ -match 'SEE ALSO' },'SkipUntil').Where{ $_ }
In this case, you do not even need Out-String, but I will stick to your example:
$myString = (Get-Help | Out-String)
$mystring -match "(?ms)^.*(SEE\sALSO.*)$" | Out-Null
$Matches[1]
The key in the regex is (?ms). m enables multi-line search and s enables wildcards to span over multiple lines (in other words: including line breaks). The result of the -match operator is piped to Out-Null to not see it in the terminal. You might want to evaluate it though. If $true, $Matches[1] will contain your desired string.
The password for Century10 is the 161st word within the file on the desktop.
NOTE:
- The password will be lowercase no matter how it appears on the screen.
*The question above is where i am facing my challenges. I tried the command below. *
Get-Content C:\Users\Century9\Desktop\Word_File.txt | Select-Object -Index 161
Result was nil. I understand that i need to assign a value to the string as it is now seen as one whole entity. But how do i do it ?
If the token of interest is the 161st word in the file, use the following approach, which splits the file into words irrespective of line breaks[1]:
$pass = (-split (Get-Content -Raw Word_File.txt))[160]
Append .ToLower() if you want to convert the token to all-lowercase.
Note that the above loads the entire file into memory as a single string, using -Raw.
Since array indices are 0-based, it is index [160] that returns the 161st element.
The unary form of the -split operator splits the input into an array of tokens by whitespace.
Note: If you want to split by the stricter definition of what constitutes a word in a regular-expression context, use the following instead:
$pass = ((Get-Content -Raw Word_File.txt) -split '\W+' -ne '')[160]
[1] If your input file contains each word on its own line:
Your solution was on the right track, except that you should pass 160 to Select-Object -Index, because the -Index parameter expects 0-based indices, not 1-based line numbers:
# Extract the 161st line.
$pass = Get-Content Word_File.txt | Select-Object -Index 160
To convert to lowercase:
$pass = (Get-Content Word_File.txt | Select-Object -Index 160).ToLower()
The above will fail if the input file has fewer than 161 lines (with error message You cannot call a method on a null-valued expression).
If you prefer to receive no output quietly instead, use the following (which uses built-in aliases select for Select-Object and foreach for ForEach-Object for brevity):
$pass = Get-Content Word_File.txt | select -Index 160 | foreach ToLower
Try running this:
((Get-Content -Path C:\Users\Century9\Desktop\Word_File.txt -TotalCount 161)[-1]).ToLower()
Trying to edit this line of a file ("VoIP.Enabled "1"). I wanna change the 1 to a zero. When I change it with
$dewprefs = Get-Content .\dewrito_prefs.cfg
$dewprefs | Select-String "VoIP.Enabled" | ForEach-Object {$_ -replace "1","0"} | Set-Content .\dewrito_prefs.cfg}`
However when I use this script, it removes 100 other lines, edits the right line, then deletes everything else, just leaving the line I wanted to edit.
Any help on this matter would be highly appreciated
Select-String acts as a filter: that is, the input it is given is only passed out if it matches a pattern.
Therefore, only the line of interest is written to the output file.
Do not use Select-String if all input lines - though possibly modified - should be passed through; use only ForEach-Object, and conditionally modify each input line:
$dewprefs = Get-Content .\dewrito_prefs.cfg
$dewprefs |
ForEach-Object { if ($_ -match 'VoIP\.Enabled') { $_ -replace '1', '0' } else { $_ } } |
Set-Content .\dewrito_prefs.cfg
$_ -match 'VoIP\.Enabled' now does what Select-String did in your original command: it matches only if the input line at hand contains literal VoIP.Enabled (note how the . is escaped as \. to ensure that is treated as a literal in the context of a regular expression).
Note how both branches of the if statement produce output:
$_ -replace '1', '0' outputs the result of replacing all instances of 1 in the input line with 0
$_ simply passes the input line through as-is.
Most likely you could replace the if statement with a single -replace expression, however, and, assuming that the file is small enough to be read as a whole (quite likely, in the case of a configuration file), you can use a variant of Stu's helpful simplification.
Taking full advantage of the fact that -replace supports regexes (regular expressions), the code can update lines based on a key name such as VoIP.Enabled only, without needing to know that key's current value.
$key = 'VoIP.Enabled'
$newValue = '1'
# Construct a regex that matches the entire target line.
$regex = '^\s*' + [regex]::Escape($key) + '\b.*$'
# Build the replacement line.
$modifiedLine = "$key $newValue"
(Get-Content .\dewrito_prefs.cfg) -replace $regex, $modifiedLine | Set-Content .\dewrito_prefs.cfg
Note that writing the output back to the input file only works because the input file was read into memory as a whole, up front, due to enclosing the Get-Content call in (...).
This will work too, with PowerShell v3+, and is a little more succinct:
(Get-Content .\dewrito_prefs.cfg).replace('"VoIP.Enabled "1"', '"VoIP.Enabled "0"') |
Set-Content .\dewrito_prefs.cfg
Your quotes are a little strange (3 double quotes in total?), I've mimicked what you've asked, however.
I am replacing multiple strings in a file. The following works, but is it the best way to do it? I'm not sure if doing multiple block expressions is a good way.
(Get-Content $tmpFile1) |
ForEach-Object {$_ -replace 'replaceMe1.*', 'replacedString1'} |
% {$_ -replace 'replaceMe2.*', 'replacedString2'} |
% {$_ -replace 'replaceMe3.*', 'replacedString3'} |
Out-File $tmpFile2
You don't really need to foreach through each replace operations. Those operators can be chained in a single command:
#(Get-Content $tmpFile1) -replace 'replaceMe1.*', 'replacedString1' -replace 'replaceMe2.*', 'replacedString2' -replace 'replaceMe3.*', 'replacedString3' |
Out-File $tmpFile2
I'm going to assume that your patterns and replacements don't really just have a digit on the end that is different, so you might want to execute different code based on which regex actually matched.
If so you can consider using a single regular expression but using a function instead of a replacement string. The only catch is you have to use the regex Replace method instead of the operator.
PS C:\temp> set-content -value #"
replaceMe1 something
replaceMe2 something else
replaceMe3 and another
"# -path t.txt
PS C:\temp> Get-Content t.txt |
ForEach-Object { ([regex]'replaceMe([1-3])(.*)').Replace($_,
{ Param($m)
$head = switch($m.Groups[1]) { 1 {"First"}; 2 {"Second"}; 3 {"Third"} }
$tail = $m.Groups[2]
"Head: $head, Tail: $tail"
})}
Head: First, Tail: something
Head: Second, Tail: something else
Head: Third, Tail: and another
This may be overly complex for what you need today, but it is worth remembering you have the option to use a function.
The -replace operator uses regular expressions, so you can merge your three script blocks into one like this:
Get-Content $tmpFile1 `
| ForEach-Object { $_ -replace 'replaceMe([1-3]).*', 'replacedString$1' } `
| Out-File $tmpFile2
That will search for the literal text 'replaceMe' followed by a '1', '2', or '3' and replace it with 'replacedString' followed by whichever digit was found (the '$1').
Also, note that -replace works like -match, not -like; that is, it works with regular expressions, not wildcards. When you use 'replaceMe1.*' it doesn't mean "the text 'replaceMe1.' followed by zero or more characters" but rather "the text 'replaceMe1' followed by zero or more occurrences ('*') of any character ('.')". The following demonstrates text that will be replaced even though it wouldn't match with wildcards:
PS> 'replaceMe1_some_extra_text_with_no_period' -replace 'replaceMe1.*', 'replacedString1'
replacedString1
The wildcard pattern 'replaceMe1.*' would be written in regular expressions as 'replaceMe1\..*', which you'll see produces the expected result (no replacement performed):
PS> 'replaceMe1_some_extra_text_with_no_period' -replace 'replaceMe1\..*', 'replacedString1'
replaceMe1_some_extra_text_with_no_period
You can read more about regular expressions in the .NET Framework here.