Powershell Extract Number from File Name

Powershell Extract Number from File Name - powershell

Files names could be:
1234_billing.txt
1234billling.txt
123_billing.txt
123billing.txt
How can I extract the only the number in all 4 cases?
I've tried -split and $_.BaseName.Substring() but can't seem to get it correct.

Assuming that the filenames are in the array variable $flist, the following will do the trick:
foreach ($file in $flist) {
if ($file -match "\d+") {
$matches.value
}
}
The -match operator takes as its right operand a regex pattern; in this case we use the pattern \d+ to signal any non-zero number of consecutive digits. The operator returns either $true or $false, and stores the matched substring in $matches. There's more about the -match operator at Get-Help about_Operators, and everyone can use a handy reference for regular expressions.

Related

Is it possible to alter part of a variable based on its value?

I have a script that grabs a list of AD usernames for members of a student group and allocates those as an array of $students
Later the script will need to take those usernames and input them into a URL
$students = Get-ADGroupMember -Identity "GG_LHS All Students" | Select-Object -ExpandProperty SamAccountName | Sort-Object SamAccountName
foreach ($student in $students)
{
foreach ($OneDriveAdmin in $OneDriveAdmins)
Set-SPOUser -Site https://mydomain-my.sharepoint.com/personal/$($student)_mydomain_co_uk
In the cases where we have duplicate usernames, our naming scheme adds increments in the format of .1 and .2, but I need to change the ".1" to a "_1" to work in the URL.
My initial thinking is an IF statement during the $students declaration
IF SamAccountName is like '.1' replace '.1' with '_1'
Is this possible to do via powershell?

To offer a streamlined alternative to Santiago Squarzon's helpful answer, using the (also regex-based) -replace operator:
# Sample student account names
$students = 'jdoe.1', 'jsixpack', 'jroe.2'
# Transform all names, if necessary, and loop over them.
foreach ($student in $students -replace '\.(?=\d+$)', '_') {
$student
}
Regex notes: \. matches a verbatim ., and (?=...) is a look-ahead assertion that matches one or more (+) digits (\d) at the end ($) of the string. What the look-ahead assertion matches doesn't become part of the overall match, so it is sufficient to replace only the . char.
Output:
jdoe_1
jsixpack
jroe_2
Note:
-replace - like -match accepts an array as its LHS, in which case the operation is performed on each element, and a (usually transformed) new array is returned.
If the regex on the RHS in a given replacement operation doesn't match, the input string is passed through (returned as-is), so it is safe to attempt replacement on strings that don't match the pattern of interest.

You could add this check in your loop, if student matches a dot followed by any amount of digits (\.(\d+)), replace for the same digits but prepending and underscore instead (-replace $Matches[0], "_$($Matches[1])"):
foreach($student in $students) {
if($student -match '\.(\d+)$') {
$student = $student -replace $Matches[0], "_$($Matches[1])"
}
# rest of your code here
}
See https://regex101.com/r/fZAOur/1 for more info.

PowerShell script that searches for a string in a .txt and if it finds it, looks for the next line containing another string and does a job with it

I have the line
Select-String -Path ".\*.txt" -Pattern "6,16" -Context 20 | Select-Object -First 1
that would return 20 lines of context looking for a pattern of "6,16".
I need to look for the next line containing the string "ID number:" after the line of "6,16", read what is the text right next to "ID number:", find if this exact text exists in another "export.txt" file located in the same folder (so in ".\export.txt"), and see if it contains "6,16" on the same line as the one containing the text in question.
I know it may seem confusing, but what I mean is for example:
example.txt:5218: ID number:0002743284
shows whether this is true:
export.txt:9783: 0002743284 *some text on the same line for example* 6,16

If I understand the question correctly, you're looking for something like:
Select-String -List -Path *.txt -Pattern '\b6,16\b' -Context 0, 20 |
ForEach-Object {
if ($_.Context.PostContext -join "`n" -match '\bID number:(\d+)') {
Select-String -List -LiteralPath export.txt -Pattern "$($Matches[1]).+$($_.Pattern)"
}
}
Select-String's -List switch limits the matching to one match per input file; -Context 0,20 also includes the 20 lines following the matching one in the output (but none (0) before).
Note that I've placed \b, a word-boundary assertion at either end of the search pattern, 6,16, to rule out accidental false positives such as 96,169.
$_.Context.PostContext contains the array of lines following the matching line (which itself is stored in $_.Line):
-join "`n" joins them into a multi-line string, so as to ensure that the subsequent -match operation reports the captured results in the automatic $Matches variable, notably reporting the ID number of interest in $Matches[1], the text captured by the first (and only) capture group ((\d+)).
The captured ID is then used in combination with the original search pattern to form a regex that looks for both on the same line, and is passed to a second Select-String call that searches through export.txt
Note: An object representing the matching line, if any, is output by default; to return just $true or $false, replace -List with -Quiet.

There's a lot wrong with what you're expecting and the code you've tried so let's break it down and get to the solution. Kudos for attempting this on your own. First, here's the solution, read below this code for an explanation of what you were doing wrong and how to arrive at the code I've written:
# Get matching lines plus the following line from the example.txt seed file
$seedMatches = Select-String -Path .\example.txt -Pattern "6,\s*16" -Context 0, 2
# Obtain the ID number from the line following each match
$idNumbers = foreach( $match in $seedMatches ) {
$postMatchFields = $match.Context.PostContext -split ":\s*"
# Note: .IndexOf(object) is case-sensitive when looking for strings
# Returns -1 if not found
$idFieldIndex = $postMatchFields.IndexOf("ID number")
# Return the "ID number" to `$idNumbers` if "ID number" is found in $postMatchFields
if( $idFieldIndex -gt -1 ) {
$postMatchFields[$idFieldIndex + 1]
}
}
# Match lines in export.txt where both the $id and "6,16" appear
$exportMatches = foreach( $id in $idNumbers ) {
Select-String -Path .\export.txt -Pattern "^(?=.*\b$id\b)(?=.*\b6,\s*16\b).*$"
}
mklement0's answer essentially condenses this into less code, but I wanted to break this down fully.
First, Select-String -Path ".\*.txt" will look in all .txt files in the current directory. You'll want to narrow that down to a specific naming pattern you're looking for in the seed file (the file we want to find the ID to look for in the other files). For this example, I'll use example.txt and export.txt for the paths which you've used elsewhere in your question, without using globbing to match on filenames.
Next, -Context gives context of the surrounding lines from the match. You only care about the next line match so 0, 1 should suffice for -Context (0 lines before, 1 line after the match).
Finally, I've added \s* to the -Pattern to match on whitespace, should the 16 ever be padded from the ,. So now we have our Select-String command ready to go:
$seedMatches = Select-String -Path .\example.txt -Pattern "6,\s*16" -Context 0, 2
Next, we will need to loop over the matching results from the seed file. You can use foreach or ForEach-Object, but I'll use foreach in the example below.
For each $match in $seedMatches we'll need to get the $idNumbers from the lines following each match. When $match is ToString()'d, it will spit out the matched line and any surrounding context lines. Since we only have one line following the match for our context, we can grab $match.Context.PostContext for this.
Now we can get the $idNumber. We can split example.txt:5218: ID number:0002743284 into an array of strings by using the -split operator to split the string on the :\s* pattern (\s* matches on any or no whitespace). Once we have this, we can get the index of "ID Number" and get the value of the field immediately following it. Now we have our $idNumbers. I'll also add some protection below to ensure the ID numbers field is actually found before continuing.
$idNumbers = foreach( $match in $seedMatches ) {
$postMatchFields = $match.Context.PostContext -split ":\s*"
# Note: .IndexOf(object) is case-sensitive when looking for strings
# Returns -1 if not found
$idFieldIndex = $postMatchFields.IndexOf("ID number")
# Return the "ID number" to `$idNumbers` if "ID number" is found in $postMatchFields
if( $idFieldIndex -gt -1 ) {
$postMatchFields[$idFieldIndex + 1]
}
}
Now that we have $idNumbers, we can look in export.txt for this ID number "6,\s*16" on the same line, once again using Select-String. This time, I'll put the code first since it's nothing new, then explain the regex a bit:
$exportMatches = foreach( $id in $idNumbers ) {
Select-String -Path .\export.txt -Pattern "^(?=.*\b$id\b)(?=.*\b6,\s*16\b).*$"
}
$exportMatches will now contain the lines which contain both the target ID number and the 6,16 value on the same line. Note that order wasn't specified so the expression uses positive lookaheads to find both the $id and 6,16 values regardless of their order in the string. I won't break down the exact expression but if you plug ^(?=.*\b0123456789\b)(?=.*\b6,\s*16\b).*$ into https://regexr.com it will break down and explain the regex pattern in detail.
The full code is above in at the top of this answer.

How do I find all files in a folder whose names contain words from a list?

I have a massive list of files whose names contain a number.
On the other hand, I have a list of numbers.
I need to find, using PowerShell (or any other Windows resource) the list of files that contain in their names any of the numbers from the other list.
I know how to find one by one using
Get-ChildItem | Where-Object {$_.Name -like "*123*"}
But I don't know how to search by the whole list without using the -or operator.

get-childitem *123*,*456*,*789*
Patterns from a file:
get-childitem -name | select-string (get-content patterns.txt)

An efficient approach is to use the regex-based -match, the regular-expression matching operator with alternation (|) to search for one of multiple patterns in a single operation:
$numbers = 42, 43, 44 # ...
Get-ChildItem | Where-Object Name -match ($numbers -join '|')
Alternatively, js2010's helpful answer shows that you can directly use Get-ChildItem's (implied) -Path parameter (whose type is [string[]], i.e., an array of paths), with an array of wildcard expressions:
$numbers = 42, 43, 44 # ...
Get-ChildItem ($numbers -replace '^|$', '*')
The above uses the -replace operator to enclose each number in *...*; that is, the above is the equivalent of:
Get-ChildItem *42*, *43*, *44*

Try this:
$files = ( Get-ChildItem 'path' )
$numbers = 1 .. 100 # or your list contents
foreach( $n in $numbers ) {
foreach( $f in $files.BaseName ) {
if( $f -like "*$n*" ) {
"Found $f"
}
}
}

As js2010's helpful answer and mklement0 mention, we can exploit the string array in the Get-ChildItem -Path parameter to do our filtering. These are nice quick elegant solutions and would be great solutions for limited sets of strings.
The quirk comes in with #JBourne's comment when he mentions that he has hundreds of numbers to match. When we are dealing with hundreds of names to match with hundreds of filenames, these methods will all get exponentially slower. e.g. #Vish's very easy to understand answer demonstrates this. When you have, say, 100 numbers, and 1,000 files, you perform 100 x 1,000 = 100,000 evaluations. I assume that the internal code for Get-ChildItem will do something similar when handling string[] arrays on the input.
If we are interested in pure performance, we can't use arrays. Arrays are efficient for storing items, and accessing indexed locations, but are terrible for random querying. What we could use is a slightly more complicated method using Regex and Hashtables. Although Hashtables are a key/value system, and in this case we don't need a "value", they are highly efficient for finding and matching and querying large numbers of keys, typically with a "O(1)" level of success. e.g. our example goes from a O(n*f) problem to an O(n) problem, we only evaluate 1 x 1,000 = 1,000 evaluations.
To start with, we need our list of keys:
$FileWithListOfNumbers = #"
123 = Matched file with 123
456 = Matched file with 456
789 = Matched file with 789
"#
$KeyHashtable = ConvertFrom-StringData $FileWithListOfNumbers
This will load our hashtable with a list of keys. Next, we iterate through our files and use Regex for matching our filenames:
Get-ChildItem | % {
if($_.Name -match '\D*(\d+)\D*')
{
#Filename contains a number, perform a key lookup to see if it matches
if($KeyHashtable.ContainsKey($Matches[1]))
{
Write-Host $_.Name
}
}
}
By using Regex for matching (rather than a file system provider to filter) we can use match groups to "pull" out the number. You may have to adjust the Regex based on your specific needs and file naming convention, but it is:
-match '\D*(\d+)\D*'
\D* - Match 0 or more non-digits
( - Start of capture group
\d+ - Match 1 or more digits
) - End of capture group
\D* - Match 0 or more non-digits
That number we "pull" is stored in the special $Matches variable in the second array location $Matches[1]. We then perform a key lookup with the number to see if it matches anything we are looking for.

Question regarding incrementing a string value in a text file using Powershell

Just beginning with Powershell. I have a text file that contains the string "CloseYear/2019" and looking for a way to increment the "2019" to "2020". Any advice would be appreciated. Thank you.

If the question is how to update text within a file, you can do the following, which will replace specified text with more specified text. The file (t.txt) is read with Get-Content, the targeted text is updated with the String class Replace method, and the file is rewritten using Set-Content.
(Get-Content t.txt).Replace('CloseYear/2019','CloseYear/2020') | Set-Content t.txt
Additional Considerations:
General incrementing would require a object type that supports incrementing. You can isolate the numeric data using -split, increment it, and create a new, joined string. This solution assumes working with 32-bit integers but can be updated to other numeric types.
$str = 'CloseYear/2019'
-join ($str -split "(\d+)" | Foreach-Object {
if ($_ -as [int]) {
[int]$_ + 1
}
else {
$_
}
})
Putting it all together, the following would result in incrementing all complete numbers (123 as opposed to 1 and 2 and 3 individually) in a text file. Again, this can be tailored to target more specific numbers.
$contents = Get-Content t.txt -Raw # Raw to prevent an array output
-join ($contents -split "(\d+)" | Foreach-Object {
if ($_ -as [int]) {
[int]$_ + 1
}
else {
$_
}
}) | Set-Content t.txt
Explanation:
-split uses regex matching to split on the matched result resulting in an array. By default, -split removes the matched text. Creating a capture group using (), ensures the matched text displays as is and is not removed. \d+ is a regex mechanism matching a digit (\d) one or more (+) successive times.
Using the -as operator, we can test that each item in the split array can be cast to [int]. If successful, the if statement will evaluate to true, the text will be cast to [int], and the integer will be incremented by 1. If the -as operator is not successful, the pipeline object will remain as a string and just be output.
The -join operator just joins the resulting array (from the Foreach-Object) into a single string.

AdminOfThings' answer is very detailed and the correct answer.
I wanted to provide another answer for options.
Depending on what your end goal is, you might need to convert the date to a datetime object for future use.
Example:
$yearString = 'CloseYear/2019'
#convert to datetime
[datetime]$dateConvert = [datetime]::new((($yearString -split "/")[-1]),1,1)
#add year
$yearAdded = $dateConvert.AddYears(1)
#if you want to display "CloseYear" with the new date and write-host
$out = "CloseYear/{0}" -f $yearAdded.Year
Write-Host $out
This approach would allow you to use $dateConvert and $yearAdded as a datetime allowing you to accurately manipulate dates and cultures, for example.

How do I change foreach to for in PowerShell?

I want to print the word exist in a text file and print "match" and "not match". My 1st text file is: xxaavv6J, my 2nd file is 6J6SCa.yB.
If it is match, it return like this:
Match found:
Match found:
Match found:
Match found:
Match found:
Match found: 6J
Match found:
Match found:
Match found:
My expectation is just print match and not match.
$X = Get-Content "C:\Users\2.txt"
$Data = Get-Content "C:\Users\d.txt"
$Split = $Data -split '(..)'
$Y = $X.Substring(0, 6)
$Z = $Y -split '(..)'
foreach ($i in $Z) {
foreach ($j in $Split) {
if ($i -like $j) {
Write-Host ("Match found: {0}" -f $i, $j)
}
}
}

The operation -split '(..)' does not produce the result you think it does. If you take a look at the output of the following command you'll see that you're getting a lot of empty results:
PS C:\> 'xxaavv6J' -split '(..)' | % { "-$_-" }
--
-xx-
--
-aa-
--
-vv-
--
-6J-
--
Those empty values are the additional matches you're getting from $i -like $j.
I'm not quite sure why -split '(..)' gives you any non-empty values in the first place, because I would have expected it to produce 5 empty strings for an input string "xxaavv6J". Apparently it has to do with the grouping parentheses, since -split '..' (without the grouping parentheses) actually does behave as expected. Looks like with the capturing group the captured matches are returned on top of the results of the split operation.
Anyway, to get the behavior you want replace
... -split '(..)'
with
... |
Select-String '..' -AllMatches |
Select-Object -Expand Matches |
Select-Object -Expand Value
You can also replace the nested loop with something like this:
foreach ($i in $Z) {
if (if $Split -contains $i) {
Write-Host "Match found: ${i}"
}
}

A slightly different approach using regex '.Match()' should also do it.
I have added a lot of explaining comments for you:
$Test = Get-Content "C:\Users\2.txt" -Raw # Read as single string. Contains "xxaavv6J"
$Data = (Get-Content "C:\Users\d.txt") -join '' # Read as array and join the lines with an empty string.
# This will remove Newlines. Contains "6J6SCa.yB"
# Split the data and make sure every substring has two characters
# In each substring, the regex special characters need to be Escaped.
# When this is done, we join the substrings together using the pipe symbol.
$Data = ($Data -split '(.{2})' | # split on every two characters
Where-Object { $_.Length -eq 2 } | # don't care about any left over character
ForEach-Object { [Regex]::Escape($_) } ) -join '|' # join with the '|' which is an OR in regular expression
# $Data is now a string to use with regular expression: "6J|6S|Ca|\.y"
# Using '.Match()' works Case-Sensitive. To have it compare Case-Insensitive, we do this:
$Data = '(?i)' + $Data
# See if we can find one or more matches
$regex = [regex]$Data
$match = $regex.Match($Test)
# If we have found at least one match:
if ($match.Groups.Count) {
while ($match.Success) {
# matched text: $match.Value
# match start: $match.Index
# match length: $match.Length
Write-Host ("Match found: {0}" -f $match.Value)
$match = $match.NextMatch()
}
}
else {
Write-Host "Not Found"
}
Result:
Match found: 6J

Further to the excellent Ansgar Wiechers' answer: if you are running (above) Windows PowerShell 4.0 then you could apply the .Where() method described in Kirk Munro's exhaustive article ForEach and Where magic methods:
With the release of Windows PowerShell 4.0, two new “magic” methods
were introduced for collection types that provide a new syntax for
accessing ForEach and Where capabilities in Windows PowerShell.
These methods are aptly named ForEach and Where. I call
these methods “magic” because they are quite magical in how they work
in PowerShell. They don’t show up in Get-Member output, even if you
apply -Force and request -MemberType All. If you roll up your
sleeves and dig in with reflection, you can find them; however, it
requires a broad search because they are private extension methods
implemented on a private class. Yet even though they are not
discoverable without peeking under the covers, they are there when you
need them, they are faster than their older counterparts, and they
include functionality that was not available in their older
counterparts, hence the “magic” feeling they leave you with when you
use them in PowerShell. Unfortunately, these methods remain
undocumented even today, almost a year since they were publicly
released, so many people don’t realize the power that is available in
these methods.
…
The Where method
Where is a method that allows you to filter a collection of objects.
This is very much like the Where-Object cmdlet, but the Where
method is also like Select-Object and Group-Object as well,
includes several additional features that the Where-Object cmdlet
does not natively support by itself. This method provides faster
performance than Where-Object in a simple, elegant command. Like
the ForEach method, any objects that are output by this method are
returned in a generic collection of type
System.Collections.ObjectModel.Collection1[psobject].
There is only one version of this method, which can be described as
follows:
Where(scriptblock expression[, WhereOperatorSelectionMode mode[, int numberToReturn]])
As indicated by the square brackets, the expression script block is
required and the mode enumeration and the numberToReturn integer
argument are optional, so you can invoke this method using 1, 2, or 3
arguments. If you want to use a particular argument, you must provide
all arguments to the left of that argument (i.e. if you want to
provide a value for numberToReturn, you must provide values for
mode and expression as well).
Applied to your case (using the simplest variant Where(scriptblock expression) of the .Where() method):
$X = '6J6SCa.yB' # Get-Content "C:\Users\2.txt"
$Data = 'xxaavv6J' # Get-Content "C:\Users\d.txt"
$Split = ($Data -split '(..)').Where({$_ -ne ''})
$Y = $X.Substring(0, 6)
$Z = ($Y -split '(..)').Where{$_ -ne ''} # without parentheses
For instance, Ansgar's example changes as follows:
PS > ('xxaavv6J' -split '(..)').Where{$_ -ne ''} | % { "-$_-" }
-xx-
-aa-
-vv-
-6J-

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Powershell Extract Number from File Name - powershell

Files names could be: 1234_billing.txt 1234billling.txt 123_billing.txt 123billing.txt How can I extract the only the number in all 4 cases? I've tried -split and $_.BaseName.Substring() but can't seem to get it correct.

Related

Is it possible to alter part of a variable based on its value?

PowerShell script that searches for a string in a .txt and if it finds it, looks for the next line containing another string and does a job with it

How do I find all files in a folder whose names contain words from a list?

Question regarding incrementing a string value in a text file using Powershell

How do I change foreach to for in PowerShell?

Categories

Resources