powershell filter filenames with regex - powershell

I am building a list of files that I'm putting into my $list variable.
Then I want to filter the list based on the $filter variable. The current solution works, but it doesn't work with a regex.
$filter = #("test.txt","Fake","AnotherFile\d{1..6}")
######### HTML TESTS #############
[string]$list = #"
FakeFile.txt
test120119.txt
AnotherFile120119.txt
LastFile.txt
"#
[array]$files = $list -split '\r?\n'
$files = $files | Where-Object {$_} | Where {$_ -notin $filter} # filter out empty items from the array...
$files
My idea is to put regex patterns in the $filter variable so I can catch filenames that have datestamps in them such as test120119.txt in the $list variable above.
How can I change my code to allow for regex? I tried some variations of select-string without splitting my $list, but was not fruitful. I also tried changing my -notin to -notmatch but this doesn't work at all of course.

If you want to use regex, I think it would be easier to just fully commit to regex with your $filter array.
$filter = "^test\d{0,6}\.txt","^Fake","^AnotherFile\d{0,6}\.txt" -join '|'
$list = #"
FakeFile.txt
test120119.txt
AnotherFile120119.txt
LastFile.txt
"#
$files = $list -split '\r?\n'
$files | Where {$_ -notmatch $filter}
The thing to keep in mind is remembering to escape special regex characters if you want them treated literally. You can use the [regex]::Escape() method to do this for you but not if you already purposely injected regex characters.
Once you have your regex filter list, you can join each item with a regex or using the | character.
Not all operators recognize regex language. -match and -notmatch are among the few that do. -match and -notmatch are not case-sensitive. If you want to match against case, you should use the -c variants of the operators, namely -cmatch and -cnotmatch.
The regex items can be tweaked to your liking. More requirements would need to be given in order to come up with an exact solution. Here are some examples to consider:
\d{0,6} matches 0 to 6 consecutive digits. 122619 will match successfully, but so will 1226. If you want only 0 or 6 digits to match, you can use (\d{6})?.
^ should be used if you want to start each match at the beginning of the input string. So if you want the regex or to apply from the beginning of the string, you need to include ^ in each item or group items succeeding the initial ^ with () accordingly. ^item1|^item2 will return the same capture group 0 match as ^(item1|item2).
\ escape the literal . characters.
Not using anchor characters like ^ and $ create a lot of flexibility and potentially unwanted results. 'FakeFile' -match 'Fake' returns true but so does 'MyFakeFile' -match 'Fake'. However, 'MyFakeFile' -match 'Fake$' returns false and 'MyFake' -match 'Fake$' returns true.

Related

Sort-Object -Unique

I'm making a script that collects all the subkeys from a specific location and converts the REG_BINARY keys to text, but for some reason I can't remove the duplicate results or sort them alphabetically.
PS: Unfortunately I need the solution to be executable from the command line.
Code:
$List = ForEach ($i In (Get-ChildItem -Path 'HKCU:SOFTWARE\000' -Recurse)) {$i.Property | ForEach-Object {([System.Text.Encoding]::Unicode.GetString($i.GetValue($_)))} | Select-String -Pattern ':'}; ForEach ($i In [char[]]'ABCDEFGHIJKLMNOPQRSTUVWXYZ') {$List = $($List -Replace("$i`:", "`n$i`:")).Trim()}; $List | Sort-Object -Unique
Test.reg:
Windows Registry Editor Version 5.00
[HKEY_CURRENT_USER\SOFTWARE\000\Test1]
"HistorySZ1"="Test1"
"HistoryBIN1"=hex:43,00,3a,00,5c,00,54,00,65,00,73,00,74,00,5c,00,44,00,2e,00,\
7a,00,69,00,70,00,5c,00,00,00,43,00,3a,00,5c,00,54,00,65,00,73,00,74,00,5c,\
00,43,00,2e,00,7a,00,69,00,70,00,5c,00,00,00,43,00,3a,00,5c,00,54,00,65,00,\
73,00,74,00,5c,00,42,00,2e,00,7a,00,69,00,70,00,5c,00,00,00,43,00,3a,00,5c,\
00,54,00,65,00,73,00,74,00,5c,00,41,00,2e,00,7a,00,69,00,70,00,5c,00,00,00
[HKEY_CURRENT_USER\SOFTWARE\000\Test2]
"HistorySZ2"="Test2"
"HistoryBIN2"=hex:4f,00,3a,00,5c,00,54,00,65,00,73,00,74,00,5c,00,44,00,2e,00,\
7a,00,69,00,70,00,5c,00,00,00,43,00,3a,00,5c,00,54,00,65,00,73,00,74,00,5c,\
00,43,00,2e,00,7a,00,69,00,70,00,5c,00,00,00,44,00,3a,00,5c,00,54,00,65,00,\
73,00,74,00,5c,00,42,00,2e,00,7a,00,69,00,70,00,5c,00,00,00,41,00,3a,00,5c,\
00,54,00,65,00,73,00,74,00,5c,00,41,00,2e,00,7a,00,69,00,70,00,5c,00,00,00
The path strings that are encoded in your array of bytes are separated with NUL characters (code point 0x0).
Therefore, you need to split your string by this character into an array of individual paths, on which you can then perform operations such as Sort-Object:
You can represent a NUL character as "`0" in an expandable PowerShell string, or - inside a regex to pass to the -split operator - \0:
# Convert the byte array stored in the registry to a string.
$text = [System.Text.Encoding]::Unicode.GetString($i.GetValue($_))
# Split the string into an *array* of strings by NUL.
# Note: -ne '' filters out empty elements (the one at the end, in your case).
$list = $text -split '\0' -ne ''
# Sort the list.
$list | Sort-Object -Unique
After many attempts I discovered that it is necessary to use the Split command to make the lines break and thus be able to organize the result.
{$List = ($List -Replace("$i`:", "`n$i`:")) -Split("`n")}

Powershell text search - multiple matches

I have a group of .txt files that contain one or two of the following strings.
"red", "blue", "green", "orange", "purple", .... many more (50+) possibilities in the list.
If it helps, I can tell if the .txt file contains one or two items, but don't know which one/ones they are. The string patterns are always on their own line.
I'd like the script to tell me specifically which one or two string matches (from the master list) it found, and the order in which it found them. (Which one was first)
Since I have a lot of text files to search, I'd like to write the output results to a CSV file as I search.
FILENAME1,first_match,second_match
file1.txt,blue,red
file2.txt,red, blue
file3.txt,orange,
file4.txt,purple,red
file5.txt,purple,
...
I've tried using many individual Select-Strings returning Boolean results to set variables with any matches found, but with the number of possible strings it gets ugly real fast. My search results for this issue has provided me with no new ideas to try. (I'm sure I'm not asking in the correct way)
Do I need to loop through each line of text in each file?
Am I stuck with the process of elimination method by checking for the existence of each search string?
I'm looking for a more elegant approach to this problem. (if one exists)
Not very intuïtive but elegant...
Following switch statement
$regex = "(purple|blue|red)"
Get-ChildItem $env:TEMP\test\*.txt | Foreach-Object{
$result = $_.FullName
switch -Regex -File $_
{
$regex {$result = "$($result),$($matches[1])"}
}
$result
}
returns
C:\Users\Lieven Keersmaekers\AppData\Local\Temp\test\file1.txt,blue,red
C:\Users\Lieven Keersmaekers\AppData\Local\Temp\test\file2.txt,red,blue
where
file1 contains first blue, then red
file2 contains first red, then blue
You can use regex to search to get index (startpos. in line) combine with Select-String which returns linenumber and you're good to go.
Select-String supports an array as value for -Pattern, but unfortunately it stops on a line after first match even when you use -AllMatches (bug?). Because of this we have to search one time per word/pattern. Try:
#List of words. Had to escape them because Select-String doesn't return Matches-objects (with Index/location) for SimpleMatch
$words = "purple","blue","red" | ForEach-Object { [regex]::Escape($_) }
#Can also use a list with word/sentence per line using $words = Get-Content patterns.txt | % { [regex]::Escape($_.Trim()) }
#Get all files to search
Get-ChildItem -Filter "test.txt" -Recurse | Foreach-Object {
#Has to loop words because Select-String -Pattern "blue","red" won't return match for both pattern. It stops on a line after first match
foreach ($word in $words) {
$_ | Select-String -Pattern $word |
#Select the properties we care about
Select-Object Path, Line, Pattern, LineNumber, #{n="Index";e={$_.Matches[0].Index}}
}
} |
#Sort by File (to keep file-matches together), then LineNumber and Index to get the order of matches
Sort-Object Path, LineNumber, Index |
Export-Csv -NoTypeInformation -Path Results.csv -Encoding UTF8
Results.csv
"Path","Line","Pattern","LineNumber","Index"
"C:\Users\frode\Downloads\test.txt","file1.txt,blue,red","blue","3","10"
"C:\Users\frode\Downloads\test.txt","file1.txt,blue,red","red","3","15"
"C:\Users\frode\Downloads\test.txt","file2.txt,red, blue","red","4","10"
"C:\Users\frode\Downloads\test.txt","file2.txt,red, blue","blue","4","15"
"C:\Users\frode\Downloads\test.txt","file4.txt,purple,red","purple","6","10"
"C:\Users\frode\Downloads\test.txt","file4.txt,purple,red","red","6","17"
"C:\Users\frode\Downloads\test.txt","file5.txt,purple,","purple","7","10"

Iterate & Search for a string in child items

I am trying to build a PowerShell script that iterates through a list of files and searches and removes a match, not having much luck, here is my script
$path = "D:\Test\"
$filter = "*.txt"
$files = Get-ChildItem -path $path -filter $filter
foreach ($item in $files)
{
$search = Get-content -path $path$item
$search| select-string -pattern "T|"
}
At the moment the script is just returning the whole content of the file and not the select string.
Basically each file in the folder will have a trailer record at the end i.e. T|1410 I need to iterate through all the files and delete the last line, some of these files will be 200mb+ can someone guide me please.
I've edited my script and now I am using the following method.
$path = "D:\Test\"
$filter = "*.txt"
$files = Get-ChildItem -path $path -filter $filter
foreach ($item in $files)
{
$search = Get-content $path$item
($search)| ForEach-Object { $_ -replace 'T\|[0-9]*', '' } | Set-Content $path$item
}
I am using Powershell v.2
However, this is adding a new empty line to my end of file as well as leaving the replace empty, how can I avoid this as well as starting the search from the bottom
-pattern "T|"
That pattern matches a "T" or nothing. But there is nothing between every pair of characters in any string. To avoid the usual regular expression handling of | as an alternates separator, use a backslash to match a literal |:
-pattern "T\|"
Alternately, use Select-String's -SimpleMatch switch to stop the argument to -Pattern being treated as a regular expression.
As Richard mentioned, you have to escape the | character.
You could also use the regex::escape function for that:
[regex]::Escape("T|")
Aside from escaping the characters the other option you have available is the -SimpleMatch switch. From TechNet
Uses a simple match rather than a regular expression match. In a simple match, Select-String searches the input for the text in the Pattern parameter. It does not interpret the value of the Pattern parameter as a regular expression statement.
If you don't want to have to worry about escaping the characters and are not using regex this would be the way to go.
$search | select-string -pattern "T|" -SimpleMatch

powershell multiple block expressions

I am replacing multiple strings in a file. The following works, but is it the best way to do it? I'm not sure if doing multiple block expressions is a good way.
(Get-Content $tmpFile1) |
ForEach-Object {$_ -replace 'replaceMe1.*', 'replacedString1'} |
% {$_ -replace 'replaceMe2.*', 'replacedString2'} |
% {$_ -replace 'replaceMe3.*', 'replacedString3'} |
Out-File $tmpFile2
You don't really need to foreach through each replace operations. Those operators can be chained in a single command:
#(Get-Content $tmpFile1) -replace 'replaceMe1.*', 'replacedString1' -replace 'replaceMe2.*', 'replacedString2' -replace 'replaceMe3.*', 'replacedString3' |
Out-File $tmpFile2
I'm going to assume that your patterns and replacements don't really just have a digit on the end that is different, so you might want to execute different code based on which regex actually matched.
If so you can consider using a single regular expression but using a function instead of a replacement string. The only catch is you have to use the regex Replace method instead of the operator.
PS C:\temp> set-content -value #"
replaceMe1 something
replaceMe2 something else
replaceMe3 and another
"# -path t.txt
PS C:\temp> Get-Content t.txt |
ForEach-Object { ([regex]'replaceMe([1-3])(.*)').Replace($_,
{ Param($m)
$head = switch($m.Groups[1]) { 1 {"First"}; 2 {"Second"}; 3 {"Third"} }
$tail = $m.Groups[2]
"Head: $head, Tail: $tail"
})}
Head: First, Tail: something
Head: Second, Tail: something else
Head: Third, Tail: and another
This may be overly complex for what you need today, but it is worth remembering you have the option to use a function.
The -replace operator uses regular expressions, so you can merge your three script blocks into one like this:
Get-Content $tmpFile1 `
| ForEach-Object { $_ -replace 'replaceMe([1-3]).*', 'replacedString$1' } `
| Out-File $tmpFile2
That will search for the literal text 'replaceMe' followed by a '1', '2', or '3' and replace it with 'replacedString' followed by whichever digit was found (the '$1').
Also, note that -replace works like -match, not -like; that is, it works with regular expressions, not wildcards. When you use 'replaceMe1.*' it doesn't mean "the text 'replaceMe1.' followed by zero or more characters" but rather "the text 'replaceMe1' followed by zero or more occurrences ('*') of any character ('.')". The following demonstrates text that will be replaced even though it wouldn't match with wildcards:
PS> 'replaceMe1_some_extra_text_with_no_period' -replace 'replaceMe1.*', 'replacedString1'
replacedString1
The wildcard pattern 'replaceMe1.*' would be written in regular expressions as 'replaceMe1\..*', which you'll see produces the expected result (no replacement performed):
PS> 'replaceMe1_some_extra_text_with_no_period' -replace 'replaceMe1\..*', 'replacedString1'
replaceMe1_some_extra_text_with_no_period
You can read more about regular expressions in the .NET Framework here.

Match strings stored in variables using PowerShell

I am attempting to create a backup script that will move files that are older that 30 days, but I want to be able to exclude folders from the list
$a = "C:\\Temp\\Exclude\\test"
$b = "C:\\Temp\\Exclude"
if I run the following:
$a -match $b
Following PowerShell Basics: Conditional Operators -Match -Like -Contains & -In -NotIn:
$Guy ="Guy Thomas 1949"
$Guy -match "Th"
This returns true.
I'd say use wilcards and the like operator, it can save you a lot of head aches:
$a -like "$b*"
The match operator is using regex pattern and the path is having regex special characters in it (the escape characeter). If you still want to use -match - make sure to escape the string:
$a -match [regex]::escape($b)
This will work but keep in mind that it can match in the middle of the string, you can add the '^' anchor to tell the regex engine to match from the begining of the string:
$a -match ("^"+[regex]::escape($b))