Search two different string patterns in one line - powershell

THE SCENARIO
I have a *.txt file containing 3 lines:
test-1234.htm
test-5678.htm
somefile.htm
I need a script which will find specific string patterns in that *.txt file.
Currently, following script will find all *.htm files in *.txt file and store results in specified results.log file.
dir *.txt | Select-String -pattern "\.htm$" |Select-Object -Expandproperty line | Out-File results.log -Encoding utf8 -Width 500
QUESTION
How to modify it, so it only finds all "test-****.htm" lines?
(Will only log lines containing "test-" and ".htm")

Change the pattern argument to test-\d*.htm
'\d' - "Matches any digit character (0-9)"
'*' - "Match or more of the preceding token"
so it will match any num of digits
if you want to match at least 4 digits you can use test-\d{4,}.htm
i would recommend playing with regex using this site: http://regexr.com/

Thank you all for the hints!
Ultimately I got it working with:
test-.{4,}\.htm$
This way I was able to also include in results lines with digits+characters (for example "test-a12c.htm") and lines where there was something before the key word "test" (for example "this is a test-13bg.htm".
# user1432893
The site you've provided helped me with this! THX!
# Dave Sexton
with ^ in argument it didn't work.

Related

Rename files in a folder using powershell, keeping the start and end of string in original filename

Currently trying to create a script that renames specific files within a chosen folder so that the resulting renamed files look like the following:
Original Filename: 45.09 - WrapperA12_rev1.DXF
Resultant Filename: 45.09_1.DXF
So the rev number is included as a suffix to the base filename, the extension is kept and the first 5 characters of the filename is kept (including the ".").
I can get fairly close by removing the hyphens, spaces and letters from the original filename using the -replace argument, but the resultant filename using the example above would be "45.0912_1", where the file extension is ".0912_1". This makes sense, but any attempt I've made to append the file extension (".DXF") to the filename hasn't worked.
$listdxf=gci -path $pathfolder -Filter *.DXF | Select-Object
$prenameDXF=$listdxf|rename-item -WhatIf -newname {$_.name -replace('[a-z]') -replace('-') -
replace('\s','')}
$prenameDXF
Any feedback on how I would go about doing this would be greatly appreciated.
For further clarification; the original filenames will always have the 4 numbers and the dot at the start of the filename - these need to be kept for the output name, the only other number I want is the number at the end of the filename that will always refer to the revision number, however this number may be variable (i.e; it could be 0 or 0.1,1,1.1 etc.). The Rev number will ALWAYS follow the underscore in the original filename. All other numbers and letters etc. in the original filename need to be removed. I'm assuming the solution might include assigning a variable to just return the first 4 numbers (i.e; XX.XX) as a substring maybe, while assigning a variable to the last few characters that follow the "_". Then maybe combine the two and add the ".DXF" file extension.
LATEST UPDATE: Following the responses here, I've been able to get the functionality nearly exactly where I need it to be.
I've been using the regex provided below, and with some slight changes adapted it to allow for some other things (to allow for spaces after "rev" and to allow for the rev number to be separated by a dot if present, i.e; rev1.1 etc.), but currently struggling to find a way of simply returning "0" if no "rev" is present in the file name. For example, if a filename is as follows: 31.90 - SADDLE SHIM.DXF - I wish for the rename regex script to return 31.90_0. The expression I'm currently using is as follows: '(\d{2}\.\d{2}).*?rev(\s?\d+\.\d+|\s?\d+).*(?=\.DXF)', '$1_$2'
I have tried putting a pipeline (if) after the capture block following the "rev" and then putting (0) in a new capture block, but that's not working. Any feedback on this would be greatly appreciated. Thanks again for the replies.
It looks like this regex could do the trick to rename your files with your desired format: (?<=\.\d+)\s.+(?=_rev)|rev.
Get-ChildItem -Filter *-*_rev*.dxf |
Rename-Item -NewName { $_.Name -replace '(?<=\.\d+)\s.+(?=_rev)|rev' }
However the above assumes all files will start with some digits followed by a dot followed by more digits and may or may not be 5 digits including dots. It also assumes there will be a white space after the remaining digits. It also assumes the files will end with rev followed by more digits after it's dxf extension.
This regex could work too (?<=^[\d.]{5})\s.+(?=_rev)|rev, however this one assumes only will capture the first 5 digits including one or more dots.
Per your update, you could try using switch with the -regex option. $Matches will contain the matches and you can reference the match groups by using the group number as the key (e.g. $Matches[1]). You may also reference as a property (e.g., $Matches.1)
Get-ChildItem c:\temp\powershell\testrename -File |
Rename-Item -NewName {
switch -Regex ($_.Name) {
'(\d{2}\.\d{2}).*?rev(\s?\d+\.\d+|\s?\d+).*(?=\.DXF)' {
"$($Matches.1)_$($Matches.2).DXF"
break
}
'(\d{2}\.\d{2}).*(?=\.DXF)' {
"$($Matches.1)_0.DXF"
break
}
default {
$_
}
}
} -WhatIf
Remove -WhatIf once done testing to perform rename action

Select-String - find a string that spans multiple lines

I am trying to read lines in a file and search for a pattern that spans two lines. Looking at the file in notepad++ I see a LF char in the file.
Example log.txt:
I want to find this
value here: OK
My simple code does not work and returns nothing:
select-string -Path "log.txt" -Pattern "find this\n*value here: OK"
I have tried many combos of various things here including .+ and \r that I found posted on various threads. I can get the first line by using:
select-string -Path "log.txt" -Pattern "find this\n*"
Result of above is: I want to find this
Adding anything more to the line above results in nothing being returned. Any ideas how to do this using select-string? I was trying to avoid using get content due to the potential size of the files I am working with.
So I think I understand your question. If you have a file that has a line that you want to key off of then the next line is the line that you want to look at:
(Select-String -Path "Log.txt" -Pattern "find this" -Context 1).Context.PostContext
I wasn't sure if that carriage return was an artifact of your formatting or not. If it is not then this would work better:
(Select-String -Path "Log.txt" -Pattern "find this" -Context 2).Context.PostContext[1]
Here is a way to do it if you don't know how many lines will be between the two bits:
$file = Get-Content 'Log.txt' -Raw
$file -match '(?smi)I want to find this.*(value here: OK)'
$matches[1]
Since you might want a multi line regex solution you need to read in the text file as one string.
Using the test file:
Stfuf
Bagel
I want to find this
value here: OK
Things
I was able to get the result using a simple matching pattern that satisfies your example text.
(Get-Content -Raw c:\temp\test.txt | Select-String -Pattern "([\w ]+)\s*value here: OK").Matches.Groups[1].Value
regex101.com
Basically gets the text preceding invariant spaces, including newlines and the static text "value here: OK". Could be made better with positive look aheads but this seems to work fine.

Batch File to Find and Replace in text file using whole word only?

I am writing a script which at one point has to check in a text file and remove certain strings. So far I have this:
powershell -Command "(gc myFile.txt) -replace 'foo', 'bar' | Out-File -encoding ASCII myFile.txt"
The only problem is that that can find and replace but will not remove the line all together.
The second problem is that say I am removing the line that has Mark, it needs to not remove a line that has something like Markus.
I don't know if this is possible with the powershell interface?
Your current code will only replace foo with bar, this is what replace does.
Removing the whole line if it matches requires a different approach, almost backwards, as you can use notmatch to output any lines that do not match you filter - effectively removing them.
Also using regex word boundaries will then only match Mark but not Markus:
(Get-Content file.txt) | Where-Object {$_ -notmatch "\bMark\b"} | Set-Content file.txt

Q: Powershell - read and report special characters from file

I've got a huge directory listing of files, and I need to see what special characters exist in the file names - specifically nonstandard characters like you'd get using ALT codes.
I can export a directory listing to a file easily enough with:
get-childitem -path D:\files\ -File -Recurse >output.txt
What I need to do however, is pull out the special characters, and only the special characters from the text file. The only way I can think to easily quantify everything "special" (since there are a ton of possibilities in the that character set) would be to compare the text against a list of characters I'd want to keep, stored in a joined variable (a-z, 0-9, etc)
I can't quite figure out how to pull out the "good" characters, leaving only the special ones. Any ideas on where to start?
I take "special" characters to be anything that falls outside US ASCII.
That basically means any character with a numerical value of 128 or more, easy to inspect in a Where-Object filter:
Get-ChildItem -File -Recurse |Where-Object {
$_.Name.ToCharArray() -gt 127
}
This will return all files containing "special" characters in their name.
If you want to extract the special characters themselves, per file, use ForEach-Object:
Get-ChildItem -File -Recurse |ForEach-Object {
if(($Specials = $_.Name.ToCharArray() -gt 127)){
New-Object psobject -Property #{File=$_.FullName;Specials=$(-join $Specials)}
}
}
Look at piping your results to Select-String. With Select-String you can specify a list of regex values to search for.

getting specific part from line using powershell

I am trying to find specific lines in files. When I get a match using Select-String I do not want the entire line, I just want one specific part from the line (error part).
Is there a parameter I can use to do this?
For example:
If I did
select-string USERINTERACTION file.txt
and the file contained a line with:
MainControlInterleaf-D: 21:59:14:631: myErrorShowTracer (300) -> Info:: USERINTERACTION: <this is the error part> from type <1> occured
I'd like to get a result of just instead of the entire line getting returned.
EDIT:
One more thing I forgot: if there are differences between the lines, what do i need to change in the code?
For example:
log-29-10-2013_00-11-52.txt:2737:MainControlInterleaf-D: 02:50:50:097: myErrorShowTracer (300) -> Info:: USERINTERACTION: <this is the error1> from type <1> occured
log-29-10-2013_00-11-52.txt:2732:MainControlInterleaf-D: 02:50:39:933: myErrorQuitTracer (350) -> Info:: USERINTERACTION <this is the error2<br> OK ... try again.<br>
Unless file.txt is a really big file, this should work:'
$regex = '.+USERINTERACTION: (.+) from type <1> occured'
(get-content file.txt) -match $regex -replace $regex,'$1'
How about
Select-String -Pattern foo -Path X:\myfile.txt
Select-String outputs a MatchInfo object, you want to use something like this:
Select-String 'USERINTERACTION:\s*(.*?occured)' file.txt | Foreach {$_.matches.groups[1].value}
You are trying to return specific results, but do not provide the script with specific criteria.
Try RegEx matching (remember that you can use pipes in RegEx to form logical OR gates and deal with diffferent patterns that way).
Here's a fairly decent starting point for getting into this kind of solution, if you are unaccustomed to RegEx in POSH (or at all).