How can I filter filename patterns in PowerShell? - powershell

I need to do something similar to Unix's ls | grep 'my[rR]egexp?' in Powershell. The similar expression ls | Select-String -Pattern 'my[rR]egexp?' seems to go through contents of the listed files, rather than simply filtering the filenames themselves.
The Select-String documentation hasn't been of much help either.

Very simple:
ls | where Name -match 'myregex'
There are other options, though:
(ls) -match 'myregex'
Or, depending on how complex your regex is, you could maybe also solve it with a simple wildcard match:
ls wild[ck]ard*.txt
which is faster than above options. And if you can get it into a wildcard match without character classes you can also just use the -Filter parameter to Get-ChildItem (ls), which performs filtering on the file system level and thus is even faster. Note also that PowerShell is case-insensitive by default, so a character class like [rR] is unnecessary.

While researching based on #Joey's answer, I stumbled upon another way to achieve the same (based on Select-String itself):
ls -Name | Select-String -Pattern 'my[Rr]egexp?'
The -Name argument seems to make ls return the result as a plain string rather than FileInfo object, so Select-String treats it as the string to be searched in rather than a list of files to be searched.

Related

Equivalent of `ls -t | head`

Intro
On Linux, I'll often use something like this to see the recently changed files in a directory with many files:
ls -t | head
I can do the following in PowerShell which is similar:
Get-ChildItem | Sort-Object -Property LastWriteTime | Select-Object -Last 15
That's a bit long so I then have the following in $PROFILE:
function Recent ()
{
Get-ChildItem | Sort-Object -Property LastWriteTime | Select-Object -Last 15
}
And maybe also:
Set-Alias lst Recent
or
Set-Alias ls-t Recent
as a shorter variant.
Question
Is there a built-in way to list the recently changed files that's more concise than the approach I've shown above?
Is there some other best practice that y'all would recommend?
As already presented in the comments,
You can go from :
Get-ChildItem | Sort-Object -Property LastWriteTime | Select-Object -Last 15
to
gci | Sort-Object LastWriteTime | Select -l 15
What is at play ?
gci is an alias for Get-ChildItem. To view all aliases available, you can type Get-Alias in your current session.
Sort-Object LastWriteTime make use of positional arguments. When an unnamed argument is given to a Powershell cmdlet, it is mapped to the first positional parameter.
Select -l 15 -l stand for -last. This work because when getting a parameter that does not exist, Powershell will attempt to map it to the closest matching parameter. In all the parameter available with the Select-Object cmdlet, only -last can be matched (no other parameter for that cmdlet start with the letter L. Note that in this case, l
is not defined as an alias for last. It is Powershell parameter disambiguation.
Best practices
What you do in your session stay in your session.
You can use aliases, parameter disambiguation as much as you please.
That being said, when developing a script or a module, you should avoid using aliases, disambiguated parameters and positional parameter altogether.
Some kind of problems that might occurs.
Parameter disambiguation might fail if the cmdlet introduce another parameter that could also be a match. For instance Get-Service -inputObject something work well. Get-Service -in "test" will fail as it is ambiguous. -in can match -inputObject but also -include. And while Get-Service -inp "test" would work, it is not very readable compared to simply using the full parameter name.
Aliases might not be available cross-platform. For instance, while sort work as an alias for sort-object in Windows, it does not in Linux (as it is a native command there). This kind of differentiation might produce unexpected results and break your script depending on context. Also, some aliases might be dropped in the future and they do make the script less readable)
Finally, positional parameters should also be avoided in scripts & modules.
Using named parameter will make your scripts more clear and readable for everyone.
To summarize, while working in a session, you can use aliases, parameter disambiguation and positional parameter as you please but when working on scripts or modules, they should be avoided.
References
Select-Object
Select-Object
[-InputObject ]
[[-Property] <Object[]>]
[-ExcludeProperty <String[]>]
[-ExpandProperty ]
[-Unique]
[-Last ]
[-First ]
[-Skip ]
[-Wait]
[]
Types of Cmdlet Parameters
A positional parameter requires only that you type the arguments in
relative order. The system then maps the first unnamed argument to the
first positional parameter. The system maps the second unnamed
argument to the second unnamed parameter, and so on. By default, all
cmdlet parameters are named parameters.
Powershell Parameter Disambiguation and a surprise
For instance, instead of saying Get-ChildItem -Recurse, you can say
Get-ChildItem -R. Get-ChildItem only has one (non-dynamic) parameter
that started with the letter ‘R’.. Since only one parameter matches,
PowerShell figures you must mean that one. As a side note, dynamic
parameters like -ReadOnly are created at run-time and are treated a
bit differently.
I would do:
ls | sort lastw*
or
ls | sort lastw <#press tab#>
The most recent ones appear at the bottom anyway.

Powershell String search

I am trying to search a keyword/pattern match in a file, where the lines will be starting with date.
Line will be like below
11/02/15 02:28:49%%PROGRAM$$SUCCESS$$End.
So i tried with below command,
Select-String -Path C:\Path\To\File.txt -Pattern $(Get-Date -format d) | Select-String -Pattern SUCCESS
So that i can get lines which contain SUCCESS with a starting of current date.
Its working on my test box and when i tried the same on a big file (~200 MB), its not giving any results. Tried below too,
Get-Content -Path C:\Path\To\File.txt | Select-String -Pattern $(Get-Date -format d) | Select-String -Pattern SUCCESS
Any help any help would be greatly appreciated!
Some things to consider here. As PetSerAl brings to light, Get-Date -Format d depends on the culture, so you need to be careful about relying on the output of that.
If the files you're searching are generated using Get-Date -Format d then it makes sense to do the search that way as long as the files will always be searched on a machine with the same culture they were generated with.
By the way on my machine it's 11/2/2015 not 11/02/15 and I am in the US.
Also, when you use Select-String -Pattern it's a regular expression, so you need to make sure that there are no special characters in the string. In the case of PetSerAl's date, the dots . would be interpreted as special characters. To avoid that use [RegEx]::Escape().
Select-String returns a match object (or objects), so piping it directly into another Select-String may not work. Consider making a single pattern out of it.
Just a guess here, but it kind of seems like the pattern you want is to match the current date string at the beginning of the line and then find SUCCESS anywhere after that in the line.
I think for that you could use a pattern like this: 11/02/15.+?SUCCESS
So code like this:
Get-Content -Path C:\Path\To\File.txt | Select-String -Pattern "$([RegEx]::Escape((Get-Date -Format d))).+?SUCCESS"
Would do the trick I think, again assuming culture issues don't mess you up.

In function repeat an action for each entered parameter

My main script run once gci on a specified drive via -path parameter , then it does multiple different tables from this output. Here below is a part of my script which does a specific table from an directory specified via -folder parameter, for example :
my-globalfunction -path d:\ -folder d:\folder
It work fine, but only for one entered folder path, the goal of this script is that user can enter multiple folders path and get a tables for each entered -folder parameter value, like this :
This clause in your Where-Object would be the issue:
$_.FullName.StartsWith($folder, [System.StringComparison]::OrdinalIgnoreCase)
The array of folders passed are most likely being cast as one long string which would never match. I had a regex solution posted but remembered a simpler way after looking at what your logic was trying to do.
Simpler Way
Even easier way is to put this information right into Get-ChildItem since it accepts string arrays for -Path. This way I don't think you even need to have 2 parameters since you never again use the results from $fol anyway. Based on the assumption that you were looking for all subfolders of $folder
$gdfolders = Get-ChildItem -Path $folder -Recurse -Force | Where-Object{$_.psiscontainer}
That would return all subfolders of the paths provided. If you have PowerShell 3.0 or higher this would even be easier.
$gdfolders = Get-ChildItem -Path $folder -Recurse -Force -Directory
Update from comments
The code you have displayed is incomplete which is what lead me to the solution that you see above. If you do use the variable $fol somewhere else that you do not show lets go back to my earlier regex solution which would work better in place with what you already have.
$regex = "^($(($folder | ForEach-Object{[regex]::Escape($_)}) -join "|")).+"
....
$gdfolders = $fol | Where-Object{($_.Attributes -eq "Directory") -and ($_.FullName -match $regex)}
What this will do is build a regex compare string with what I will assume is the logic of locate folders that begin with either of paths passed.
Using your example input of "d:\folder1", "d:\folder2" the variable $regex would work out to ^(d:\\folder1|d:\\folder2). The proper characters, like \, are escaped automatically by the static method [regex]::Escape which is applied to each element. We then use -join to place a pipe which, in this regex capture group means match whats on the left OR on the right. For completeness sake we state that the match has to occur at the beginning of the path with the caret ^ although this is most likely redundant. It would match paths that start with either "d:\folder1" or "d:\folder2". At the end of the regex string we have .+ which means match 1 to more characters. This should ensure we dont match the actual folder "d:\folder1" but meerly its children
Side Note
The quotes in the line with ’Size (MB)’ are not the proper ones which are '. If you have issues around that code consider changing the quotes.

Select-String pattern not matching

I have the text of a couple hundred Word documents saved into individual .txt files in a folder. I am having an issue where a MergeField in the Word document wasn't formatted correctly, and now I need to find all the instances in the folder where the incorrect formatting occurs. the incorrect formatting is the string \#,$##,##0.00\* So, I'm trying to use PowerShell as follows:
select-string -path MY_PATH\.*txt -pattern '\#,$##,##0.00\*'
select-string -path MY_PATH\.*txt -pattern "\#`,`$##`,##0.00\*"
But neither of those commands finds any results, even though I'm sure the string exists in at least one file. I feel like the error is occurring because there are special characters in the parameter (specifically $ and ,) that I'm not escaping correctly, but I'm not sure how else to format the pattern. Any suggestions?
If you are actually looking for \#,$##,##0.00\* then you need to be aware that Select-String uses regex and you have a lot of control characters in there. Your string should be
\\\#,\$\#\#,\#\#0\.00\\\*
Or you can use the static method Escape of regex to do the dirty work for you.
[regex]::Escape("\#,$##,##0.00\*")
To put this all together you would get the following:
select-string -path MY_PATH\.*txt -pattern ([regex]::Escape("\#,$##,##0.00\*"))
Or even simpler would be to use the parameter -SimpleMatch since it does not interpet the string .. just searches as is. More here
select-string -path MY_PATH\.*txt -SimpleMatch "\#,$##,##0.00\*"
My try, similar to Matts:
select-string -path .\*.txt -pattern '\\#,\$##,##0\.00\\\*'
result:
test.txt:1:\#,$##,##0.00\*

piping get-childitem into select-string in powershell

I am sorting a large directory of files and I am trying to select individual lines from the output of an ls command and show those only, but I get weird results and I am not familiar enough with powershell to know what I'm doing wrong.
this approach works:
ls > data.txt
select-string 2012 data.txt
rm data.txt
but it seems wasteful to me to create a file just to read the data that I already have to fill into the file. I want to pipe the output directly to select-string.
I have tried this approach:
ls | select-string 2012
but that does not give me the appropriate output.
My guess is that I need to convert the output from ls into something select-string can work with, but I have no idea how to do that, or even whether that is actually the correct approach.
PowerShell is object-oriented, not pure text like cmd. If you want to get fileobjects(lines) that were modified in 2012, use:
Get-ChildItem | Where-Object { $_.LastWriteTime.Year -eq 2012 }
If you want to get fileobjects with "2012" in the filename, try:
Get-ChildItem *2012*
When you use
ls | select-string 2012
you're actually searching for lines with "2012" INSIDE every file that ls / get-childitem listed.
If you really need to use select-string on the output from get-childitem, try converting it to strings, then splitting up into lines and then search it. Like this:
(Get-ChildItem | Out-String) -split "`n" | Select-String 2012
I found another simple way to convert objects to strings:
Get-ChildItem | Out-String -stream | Select-String 2012
in this very interesting article:
http://blogs.msdn.com/b/powershell/archive/2006/04/25/how-does-select-string-work-with-pipelines-of-objects.aspx
If you wanted Select-String to work on the Monad formatted output, you'll need to get that as a string. Here is the thing to grok about
our outputing. When your command sequence emits a stream of strings,
we emit it without processing. If instead, your command sequence
emits a stream of objects, then we redirect those objects to the
command Out-Default. Out-Default looks at the type of the object and
the registered formating metadata to see if there is a default view
for that object type. A view defines a FORMATTER and the metadata for
that command. Most objects get vectored to either Format-Table or
Format-List (though they could go to Format-Wide or Format-Custom).
THESE FORMATTERS DO NOT EMIT STRINGS! You can see this for yourself
by the following: "These formating records are then vectored to an
OUT-xxx command to be rendered into the appropriate data for a
particular output device. By default, they go to Out-Host but you can
pipe this to Out-File, Out-Printer or Out-String. (NOTE: these
OUT-xxx commands are pretty clever, if you pipe formating objects to
them, they'll render them. If you pipe raw object to them, they'll
first call the appropriate formatter and then render them.)