windows powershell findstr incorrect behaviour - powershell

I am looking for a search string inside a json file:
> type .\input.json
[
{"name": "moish"},
{"name": "oren"}
]
> type .\input.json | findstr /n /l "\`"name\`": \`"or"
2: {"name": "moish"},
3: {"name": "oren"}
How come moish entry is found? what am I missing?

Note: The quoted lines below, originally a direct part of the answer, turned out not to apply to the problem at hand, because the escaping in the question is correct (the only thing missing was placing /c: directly before the string to make findstr.exe search for it as a whole).
See this answer for a more comprehensive analysis of the problem.
Escape the literal quotation marks by doubling them:
type input.json |findstr /n /l """name"": ""or"
... or use single-quotes to qualify the search term:
type input.json |findstr /n /l '"name": "or'
.... or perhaps use the native PowerShell cmdlet Select-String instead of findstr:
Select-String -LiteralPath input.json -Pattern '"name": "or'

Prepend /c: to your search string in order to make findstr treat it as a single string to search for:
Get-Content .\input.json | findstr /n /l /c:"\`"name\`": \`"or" # Note the /c:
Note the use of the Get-Content cmdlet for reading a file line by line, which type is a built-in alias for in PowerShell.
Note:
By default, if a search string contains spaces, findstr searches for the occurrence of any of the space-separated words, i.e., "name" or "or, causing both lines to match. /c: signals that the string as a whole string should be searched for (either as a regular expression, by default, or as a literal string, with /l)
Except for the missing /c:, your search string was correct, but you could have simplified by using a verbatim (single-quoted) string ('...'):
... | findstr /n /l /c:'\"name\": \"or'
Sadly, the additional \-escaping of the embedded " chars. is a requirement either way, up to at least PowerShell 7.2.x, even though it shouldn't be necessary.
It is due to a long-standing bug in how PowerShell passes arguments with embedded double-quotes to external programs; a - possibly opt-in - fix may be coming - see this answer.
If you're using a 7.2.x version or a 7.3 preview version of PowerShell with the experimental feature named PSNativeCommandArgumentPassing enabled and the $PSNativeCommandArgumentPassing preference variable set to either 'Standard' or 'Windows', the \-escaping is no longer needed, because PowerShell then (finally) does it for you; that is, findstr /n /l /c:'"name": "or' would suffice.
PowerShell alternative: Select-String:
As shown in Mathias R. Jessen's answer, you may alternatively use the Select-String cmdlet, the more powerful PowerShell counterpart to findstr.exe, not least to avoid quoting headaches (see above) and potential character-encoding issues.
Like findstr.exe, Select-String uses regular expressions by default; use -SimpleMatch to opt for literal matching.
Unlike findstre.exe, Select-String is case-insensitive by default (as PowerShell generally is). Use -CaseSensitive to make matching case-sensitive.
Select-String wraps matching lines in objects that include metadata about each match; if you're interested in the line text only, use -Raw in PowerShell (Core) 7+, or pipe to ForEach-Object Line in Windows PowerShell.
While piping lines read from a file via Get-Content works, it is much slower than the passing the file path as an argument directly to Select-String, via its -LiteralPath parameter (you may also pipe file-info objects obtained with Get-ChildItem to it).
This has the added advantage that the display representation of the matching lines includes the file name an line number (see below).
Thus, the equivalent of your (corrected) findstr.exe call is:
Select-String -LiteralPath .\input.json -CaseSensitive -SimpleMatch -Pattern '"name": "or'
# Alternative:
Get-ChildItem .\input.json |
Select-String -CaseSensitive -SimpleMatch -Pattern '"name": "or'
You'll get the following output in the console (note the output-line prefix consisting of the file name and the line number):
input.json:3: {"name": "oren"}
Note that this is the for-display representation of the object of type [Microsoft.PowerShell.Commands.MatchInfo] that Select-String emitted for the matching line.

Related

How do I run test-path on all paths in the system variable PATH using powershell?

I would like to run the Test-Path, or something similar the completes my purpose to find the invalid paths in my path variable.
The main thing I have done is search for
test path system variable for invalid entries
This did not find anything.
This example is just to show I have tried something, but I don't really know what the best command it.
Test-Path -Path %Path% -PathType Any
Update
These scripts enabled my to find a couple bad paths and fix them
Building on Mathias R. Jessen's great solution in a comment:
# Output those PATH entries that refer to nonexistent dirs.
# Works on both Windows and Unix-like platforms.
$env:PATH -split [IO.Path]::PathSeparator -ne '' |
Where-Object { -not (Test-Path $_) }
Using the all uppercase form PATH of the variable name and [IO.Path]::PathSeparator as the separator to -split by makes the command cross-platform:
On Unix-like platforms environment variable names are case-sensitive, so using $env:PATH (all-upercase) is required; by contrast, Windows is not case-sensitive, so $env:PATH works there too, even though the actual case of the name is Path.
On Unix-like platforms, : separates the entries in $env:PATH, whereas it is ; on Windows - [IO.Path]::PathSeparator returns the platform-appropriate character.
-ne '' filters out any empty tokens resulting from the -split operation, which could result from directly adjacent separators in the variable value (e.g., ;;) - such empty entries have no effect and can be ignored.
Note: With a an array as the LHS, such as returned by -split, PowerShell comparison operators such as -eq and -ne act as filters and return an array of matching items rather than a Boolean - see about_Comparison_Operators.
The Where-Object call filters the input directory paths down to those that do not exist, and outputs them (which prints to the display by default).
Note that, strictly speaking, Test-Path's first positional parameter is -Path, which interprets its argument as a wildcard expression.
For full robustness, Test-Path -LiteralPath $_ is needed, to rule out inadvertent interpretation of literal paths that happen to contain [ as wildcards - though with entries in $env:PATH that seems unlikely.

How to escape square brackets in file paths with Invoke-WebRequest's -OutFile parameter

When you include something like [1] in a file name like File[1].txt used with the Invoke-WebRequest and the -OutFile parameter you get an error Cannot perform operation because the wildcard path File[1].txt did not resolve to a file.
This is caused by the behavior documented here.
With other cmdlets you would use -LiteralPath to force the path to be taken literally but in this case that is not an option.
I have tried escaping the [ and ] characters with ` or \ but it still gives the same error.
To simplify testing you can reproduce the same issue with Out-File, Test-Path, etc.
#Fails
Out-File -FilePath "file[1].txt"
Out-File -FilePath "file`[1`].txt"
Out-File -FilePath "file\[1\].txt"
#Succeeds
Out-File -LiteralPath "file[1].txt"
#Fails
Test-Path -Path "file[1].txt"
#Succeeds
Test-Path -LiteralPath "file[1].txt"
How can I escape characters that would be used to express wildcards in -Path, -FilePath, -OutFile, etc. so that they function like the string was specified with -LiteralPath since -LiteralPath isn't available with Invoke-WebRequest?
Update:
In PowerShell (Core) 7.1+, file paths passed to the -OutFile parameter of Invoke-WebRequest and Invoke-RestMethod are now interpreted literally:
That is, -OutFile now acts like -LiteralPath,[1] and there is no longer a need to escape [ and ] characters, so that the following example command works as-is:
# PowerShell 7.1+ only.
Invoke-WebRequest http://example.org -OutFile File[1].txt
Therefore, the following applies only to Windows PowerShell (and to now-obsolete PowerShell (Core) versions v7.0 and below):
Escaping the [ and ] characters as `[ and `] so that they are treated literally when interpreted as a wildcard expression with -Path (-FilePath) and -OutFile unfortunately only half works at the moment, due to a bug discussed in the bottom section:
Performing the escaping ensures that the target parameter accepts the path (the command doesn't break anymore) ...
... but on creation of the file it is mistakenly the escaped representation that is used as the literal filename - see bottom section.
Workaround for now:Tip of the hat to hashbrown for helping to simplify it.
Make Invoke-RestMethod / Invoke-WebRequest save to a temporary file...
... and then rename (move) the temporary file to the desired output file path.
# Literal output file path.
$outFile = '.\file[1].txt'
# Simulate a call to Invoke-RestMethod / Invoke-WebRequest -OutFile.
# Save to a *temporary file*, created on demand - such
# a temporary file path can be assumed to never contain '[' or ']'
'hi' | Out-File -FilePath ($tempFile = New-TemporaryFile)
# Rename (move) the temporary file to the desired target path.
Move-Item -Force -LiteralPath $tempFile -Destination $outFile
In Windows PowerShell v4-, use [IO.Path]::GetTempfileName() in lieu of New-TemporaryFile.
Escaping [literal] paths for use as wildcard patterns:
Use any of the following string-literal representations, which ultimately result in the same string with verbatim content file`[1`].txt, which, when interpreted as a wildcard expression, is the escaped equivalent of literal string file[1].txt:
'file`[1`].txt'
"file``[1``].txt"
file``[1``].txt
To create this escaping programmatically, use:
$literalName = 'file[1].txt'
$escapedName = [WildcardPattern]::Escape($literalName) # -> 'file`[1`].txt'
What matters is that the target cmdlet sees the [ and ] as `-escaped in the -Path (-FilePath) argument it is passed for them to be treated verbatim.
If you use "..." quoting or an unquoted argument (which mostly behaves as if it were enclosed in "..."), PowerShell's string parsing gets in the way: ` is also used as the escape character inside expandable strings ("..."), so in order to pass ` through, you must escape it itself, as ``.
Otherwise something like `[ inside "..." turns into just [ - the ` is "eaten" - because `[ is an escaped [ from "..."'s perspective, and escaping a character that doesn't need escaping turns into just that character; in short: both "file`[1`].txt" and file`[1`].txt turn into plain file[1].txt, as if you had never used `.
By contrast, ` characters are used verbatim inside '...'-quoted strings and need no escaping.
Flawed file-creation behavior of many cmdlets with -Path:
The bug mentioned above - that on file creation the escaped representation is mistakenly used as the literal filename - affects most cmdlets, unfortunately: That is, they unexpectedly retain the ` characters in the escaped pattern on creating a file, so that by specifying -Path 'file[1].txt' you'll end up with a file literally named file`[1`].txt.
Fortunately, most cmdlets do support -LiteralPath, so use of -LiteralPath file[1].txt is the better choice anyway and avoids this bug.
Some of the affected cmdlets:
Invoke-WebRequest and Invoke-RestMethod
Out-File and therefore also redirection operators > and >>, which effectively call Out-File behind the scenes.
Note that Set-Content and Add-Content do not exhibit this problem.
All(?) Export-* cmdlets.
Others?
The bug has been reported in GitHub issue #9475.
[1] This was technically a breaking change, but it was considered acceptable, due to the counterintuitive nature of the original behavior. Unfortunately, the counterintuitive behavior still surfaces in many other contexts - including still with Out-File unless -LiteralPath is explicitly used. See GitHub issue #17106 for a summary.

Strip lines from text file based on content

I like to use one of the packaged HOSTS (MVPS,) files to protect myself from some of the nastier domains. Unfortunately, sometimes these files are a bit overzealous for me (blocking googleadsservices is a pain sometimes). I want an easy way to strip certain lines out of these files. In Linux I use:
cat hosts |grep -v <pattern> >hosts.new
And the file is rewritten minus the lines referencing the pattern I specified in the grep. So I just set it up to replace hosts with hosts.new on reboot and I'm done.
Is there an easy way to do this in PowerShell?
In PowerShell you'd do
(Get-Content hosts) -notmatch $pattern | Out-File hosts.new
or
(cat hosts) -notmatch $pattern > hosts.new
for short.
Of course, since Out-File (and with it the redirection operator) default to Unicode format, you may actually want to use Set-Content instead of Out-File:
(Get-Content hosts) -notmatch $pattern | Set-Content hosts.new
or
(gc hosts) -notmatch $pattern | sc hosts.new
And since the input file is read in a grouping expression (the parentheses around Get-Content hosts) you could actually write the output back to the source file:
(Get-Content hosts) -notmatch $pattern | Set-Content hosts
To complement Ansgar Wiechers' helpful answer (which offers pragmatic and concise solutions based on reading the entire input file into memory up-front):
PowerShell's grep equivalent is the Select-String cmdlet and, just like grep, it directly accepts a filename argument (PSv3+ syntax):
Select-String -NotMatch <pattern> hosts | ForEach-Object Line | Set-Content hosts.new
Select-String -NotMatch <pattern> hosts is short for
Select-String -NotMatch -Pattern <pattern> -LiteralPath hosts and is the virtual equivalent of
grep -v <pattern> hosts
However, Select-String doesn't output strings, it outputs [Microsoft.PowerShell.Commands.MatchInfo] instances that wrap matching lines (stored in property .Line) along with metadata about the match.
ForEach-Object Line extracts just the matching lines (the value of property .Line) from these objects.
Set-Content hosts.new writes the matching lines to file hosts.new, using "ANSI" encoding in Windows PowerShell - i.e., it uses the legacy code page implied by the active system locale, typically a supranational 8-bit superset of ASCII - and UTF-8 encoding (without BOM) in PowerShell Core.
Use the -Encoding parameter to specify a different encoding.
>, by contrast (an effective alias of the Out-File cmdlet), creates:
UTF16-LE ("Unicode") files by default in Windows PowerShell.
UTF-8 files (without BOM) in PowerShell Core - in other words: in PowerShell Core, using
> hosts.new in lieu of | Set-Content hosts.new will do.
Note: While both > / Out-File and Set-Content are suitable for sending string inputs to an output file, they are not generally suitable for sending other data types to a file for programmatic processing: > / Out-File output objects the way they would print to the console / terminal, which is pretty format for display, whereas Set-Content stringifies (simply put: calls .ToString() on) the input objects, which often results in loss of information.
For non-string data, consider a (more) structured data format such as XML (Export-CliXml), JSON (ConvertTo-Json) or CSV (Export-Csv).

How to properly escape quotes in powershell v2?

How do you properly escape quotes in powershell v2 (called from within a batch file)?
I have tried:
powershell -Command "(gc file1.txt) -join "`n" | Out-File file2.txt"
and
powershell -Command "(gc file1.txt) -join ""`n"" | Out-File file2.txt"
and
powershell -Command "(gc file1.txt) -join '"`n`" | Out-File file2.txt"
but they all fail.
Editor's note: The purpose of the command is to transform Windows CRLF line breaks to Unix LF-only ones, so as to create a file that will be processed on Linux.
From a batch file (cmd.exe), you must \-escape embedded " instances
(even though PowerShell-internally it is ` (the backtick) that serves as the escape character):
As wOxxOm points out in a comment on the question, in Windows PowerShell using """ to embed a single " is also an option.
However, given that most command-line utilities support \", \" is easier to remember. However, both """ and \" can break on the cmd.exe side, in which case "^"" (sic) is required ("" in PowerShell (Core) 7+).
powershell -Command "(gc file1.txt) -join \"`n\" | Set-Content -NoNewLine file2.txt"
Note:
Set-Content -NoNewline requires PSv5+.
Set-Content writes "ANSI"-encoded files by default (e.g., based on code page Windows-1252 on US-English systems); use the -Encoding parameter to change that.
Since you're only dealing with strings, Set-Content is preferable to Out-File, which is only needed if you have non-string objects that must have PowerShell's default formatting applied to them first.
Consider using powershell -NoProfile ... to suppress loading of PowerShell's profile files, both for faster execution and for a more predictable execution environement.
PSv2 solution:
Unfortunately, prior to PSv5, only the Write-Host cmdlet supports the -NoNewline parameter (introduced in v2), which is of no help here, so the .NET framework must be used:
powershell -Command "[IO.File]::WriteAllText(\"$PWD/file2.txt\", ((gc $PWD/file1.txt) -join \"`n\") + \"`n\")"
Note the need to use path prefix $PWD explicitly, because the .NET Framework's current directory typically differs from PowerShell's.
Also, the output file's encoding will be UTF-8, without a BOM, but you can pass a different encoding as the 3rd argument to [IO.File]::WriteAllText(), such as [System.Text.Encoding]::Default to match Set-Content's default behavior (as of Windows PowerShell v5.1).
Optional reading: platform-specific line breaks "`n" vs. "`r`n" vs. [Environment]::Newline
Implicit choice of newlines (line breaks):
when reading, PowerShell accepts LF-only (Unix-style) and CRLF (Windows-style) newlines (line breaks) interchangeably.
when writing (e.g., when sending an array of lines / objects to a file with > / Out-File / Set-Content), PowerShell uses the platform-appropriate newline sequence.
Note, however, that any newline sequences embedded in a given string input object are sent to the file as-is.
As for escape sequences / constants:
[Environment]::Newline contains the platform-appropriate newline.
"`n" is always just LF (\n)
as evidenced by "`n".Length returning 1 and [char] 10 -eq [char] "`n" returning $True (10 is the decimal Unicode/ASCII code point of LF).
The documentation isn't explicit about this: Get-Help about_Special_Characters mentions "new line" and "line break", without mentioning what specific character [sequence] that represents. (As mentioned, LF by itself is just as valid a line break as CRLF is in PowerShell).
Therefore, to create CRLF sequences, you must use "`r`n".
If you need to match newlines in either format, you can use regex '\r?\n', with operators such as -match, -replace, and -split
As for multi-line string literals in scripts (including here-documents):
They reflect their source file's newline style. That is, if your script uses CRLF newlines, so will the newlines embedded in the multi-line string.
Here's one way to do it from the PowerShell command line:
(Get-Content Input.txt -Raw) -replace "`r`n","`n" | Out-File Output.txt -Encoding ASCII -NoNewline
As others have noted this is a PowerShell v5 solution (-Raw appeared in v3, and -NoNewline appeared in v5).
Here's a PowerShell v2 version of the same thing:
$content = [IO.File]::ReadAllText("C:\Path\Input.txt") -replace "`r`n","`n"
[IO.File]::WriteAllText("C:\Path\Output.txt", $content)
(The paths are needed because the .NET methods don't use PowerShell's "current location".)

How do I concatenate two text files in PowerShell?

I am trying to replicate the functionality of the cat command in Unix.
I would like to avoid solutions where I explicitly read both files into variables, concatenate the variables together, and then write out the concatenated variable.
Simply use the Get-Content and Set-Content cmdlets:
Get-Content inputFile1.txt, inputFile2.txt | Set-Content joinedFile.txt
You can concatenate more than two files with this style, too.
If the source files are named similarly, you can use wildcards:
Get-Content inputFile*.txt | Set-Content joinedFile.txt
Note 1: PowerShell 5 and older versions allowed this to be done more concisely using the aliases cat and sc for Get-Content and Set-Content respectively. However, these aliases are problematic because cat is a system command in *nix systems, and sc is a system command in Windows systems - therefore using them is not recommended, and in fact sc is no longer even defined as of PowerShell Core (v7). The PowerShell team recommends against using aliases in general.
Note 2: Be careful with wildcards - if you try to output to inputFiles.txt (or similar that matches the pattern), PowerShell will get into an infinite loop! (I just tested this.)
Note 3: Outputting to a file with > does not preserve character encoding! This is why using Set-Content is recommended.
Do not use >; it messes up the character encoding. Use:
Get-Content files.* | Set-Content newfile.file
In cmd, you can do this:
copy one.txt+two.txt+three.txt four.txt
In PowerShell this would be:
cmd /c copy one.txt+two.txt+three.txt four.txt
While the PowerShell way would be to use gc, the above will be pretty fast, especially for large files. And it can be used on on non-ASCII files too using the /B switch.
You could use the Add-Content cmdlet. Maybe it is a little faster than the other solutions, because I don't retrieve the content of the first file.
gc .\file2.txt| Add-Content -Path .\file1.txt
To concat files in command prompt it would be
type file1.txt file2.txt file3.txt > files.txt
PowerShell converts the type command to Get-Content, which means you will get an error when using the type command in PowerShell because the Get-Content command requires a comma separating the files. The same command in PowerShell would be
Get-Content file1.txt,file2.txt,file3.txt | Set-Content files.txt
I used:
Get-Content c:\FileToAppend_*.log | Out-File -FilePath C:\DestinationFile.log
-Encoding ASCII -Append
This appended fine. I added the ASCII encoding to remove the nul characters Notepad++ was showing without the explicit encoding.
If you need to order the files by specific parameter (e.g. date time):
gci *.log | sort LastWriteTime | % {$(Get-Content $_)} | Set-Content result.log
You can do something like:
get-content input_file1 > output_file
get-content input_file2 >> output_file
Where > is an alias for "out-file", and >> is an alias for "out-file -append".
Since most of the other replies often get the formatting wrong (due to the piping), the safest thing to do is as follows:
add-content $YourMasterFile -value (get-content $SomeAdditionalFile)
I know you wanted to avoid reading the content of $SomeAdditionalFile into a variable, but in order to save for example your newline formatting i do not think there is proper way to do it without.
A workaround would be to loop through your $SomeAdditionalFile line by line and piping that into your $YourMasterFile. However this is overly resource intensive.
To keep encoding and line endings:
Get-Content files.* -Raw | Set-Content newfile.file -NoNewline
Note: AFAIR, whose parameters aren't supported by old Powershells (<3? <4?)
I think the "powershell way" could be :
set-content destination.log -value (get-content c:\FileToAppend_*.log )