Split and regex match with Powershell - powershell

Say I have a filename string, something like:
test_ABC_19000101_010101.987.txt,
Where "test" could be any combination of white space, characters, numbers, etc. I wish to extract the 19000101_010101 part (date and time) with Powershell. Currently I am assigning -split "_ABC_" to a variable and taking the second element of the array. I am then splitting this string subsequent times. Is there a way to accomplish this in one go?
PS
"_ABC_" is constant, occurring unchanged in all instances of filename(s).

A more concise - albeit perhaps more obscure - alternative to Santiago Squarzon's helpful answer:
# Construct a regex that consumes the entire file name while
# using capture groups for the parts of interest.
$re = '.+_ABC_(\d{4})(\d{2})(\d{2})_(\d{2})(\d{2})(\d{2})\.(\d{3})\..+'
[datetime] (
# In the replacement string, use $1, $2, ... to refer to what the
# first, second, ... capture group captured.
'test_ABC_19000101_010101.987.txt' -replace $re, '$1-$2-$3T$4:$5:$6.$7'
)
Output:
Monday, January 1, 1900 1:01:01 AM
The -replace operation results in string '1900-01-01T01:01:01.987', which is a (culture-invariant) format that you can use as-is with a [datetime] cast.
Note that with a Get-ChildItem call as input you could slightly simplify the regex by providing $_.BaseName rather than $_.Name as the -replace LHS, which obviates the need to also match the extension (.\.+) in the regex.
An aside re the [datetime] cast: [datetime] '...' results in a [datetime] instance that is an unspecified timestamp (its .Kind property value is Unspecified), i.e. it is undefined whether it represents as Local or a Utc timestamp.
To get a Local timestamp, use
[datetime]::Parse('...', [cultureinfo]::InvariantCulture, 'AssumeLocal')
(use 'AssumeLocal, AdjustToUniversal' to get a Utc timestamp).
Alternatively, you can cast to [datetimeoffset] - a type that is generally preferable to [datetime] - which interprets a string cast to it as local by default. (You can then access its .LocalDateTime / .UtcDateTime properties to get Local / Utc [datetime] instances).

This regex seems an overkill but I think it should work, as long as _ABC_ is constant and there is a _ to separate the date from the time and a . to separate time from milliseconds:
$re = [regex]'(?<=_ABC_)(?<date>\d*)_(?<time>\d*)\.(?<millisec>\d*)(?=\.)'
#'
test_ABC_19000101_010101.987.txt
t' az# 0est_ABC_20000101_090101.123.txt
tes8as712t_ABC_21000101_080101.456.txt
te098d $st_ABC_22000101_070101.789.txt
[test]_ABC_23000101_060101.101.txt
t?\est_ABC_24000101_050101.112.txt
'# -split '\r?\n' | ForEach-Object {
$groups = $re.Match($_).Groups
$date = $groups['date']
$time = $groups['time']
$msec = $groups['millisec']
[datetime]::ParseExact(
"$date $time $msec",
"yyyyMMdd HHmmss fff",
[cultureinfo]::InvariantCulture
)
}
See https://regex101.com/r/8oSpqf/1 for details.

If there will never be multiple sequences in the filename that appear as the timestamp (8 digits, _, 6 digits, then you could match on that pattern of digits.
PS C:\> 'test_ABC_19000101_010101.987.txt' -match '^.*ABC_(\d{8}_\d{6})\..*'
True
PS C:\> $Matches
Name Value
---- -----
1 19000101_010101
0 test_ABC_19000101_010101.987.txt
PS C:\> $Matches[1]
19000101_010101
You would use the filename instead of the explicit string.
If you want to get a [System.DateTime] from it:
PS C:\> [datetime]::ParseExact($Matches[1], 'yyyyMMdd_HHmmss', $null)
Monday, January 1, 1900 01:01:01

Related

Get the files whose date in the filename is greater than some specific date using Powershell script

I have a specific date "2021/11/28", i want the list of files from example filenames(below) whose file name is greater than 2021/11/28. remember not the creation time of the file names.
"test_20211122_aba.*"
"abc_20211129_efg.*"
"hij_20211112_lmn.*"
"opq_20211130_rst.*"
I'm expecting to get
"abc_20211129_efg.*"
"opq_20211130_rst.*"
Really appreciate your help.
You don't strictly need to parse your strings into dates ([datetime] instances): Because the date strings embedded in your file names are in a format where their lexical sorting is equivalent to chronological sorting, you can compare the string representations directly:
# Simulate output from a Get-ChildItem call.
$files = [System.IO.FileInfo[]] (
"test_20211122_aba1.txt",
"abc_20211129_efg2.txt",
"hij_20211112_lmn3.txt",
"hij_20211112_lmn4.txt",
"opq_20211130_rst5.txt"
)
# Filter the array of files.
$resultFiles =
$files | Where-Object {
$_.Name -match '(?:^|.*\D)(\d{8})(?:\D.*|$)' -and
$Matches[1] -gt ('2021/11/28"' -replace '/')
}
# Print the names of the filtered files.
$resultFiles.Name
$_.Name -match '(?:^|.*\D)(\d{8})(?:\D.*|$)' looks for (the last) run of exactly 8 digits in each file name via a capture group ((...)), reflected in the automatic $Matches variable's entry with index 1 ($Matches[1]) afterwards, if found.
'2021/11/28"' -replace '/' removes all / characters from the input string, to make the format of the date strings the same. For brevity, the solution above performs this replacement in each loop operation. In practice, you would perform it once, before the loop, and assign the result to a variable for use inside the loop.

Is there a way to replace the first two occurrences of a value in a string?

I am going to be as clear with my question as I can.
I might be missing something very obvious here but I just don't know how to find a solution...
I have a string and I would like to replace the first two occurrences of ":" with "/":
String:
$string = 2020:10:07 08:45:49
Desired String:
2020/10/07 08:45:49
I have tried using .Replace as seen below:
$string = $string.Replace([regex]":","/",2)
But I am given this error every time:
Cannot find an overload for "replace" and the argument count: "3".
I have seen others use .Replace in this way before so I'm not sure what is so different about my usage. Can anyone point me in the right direction?
PowerShell is .net-based language.
String does not have overload method Replace with anything like count argument in .Net, but Python's string does.
You can use this:
$string = '2020:10:07 08:45:49'
#Replace 2+ spaces you have in source with single space
$string = $string -replace '\s+', ' '
# Variant 0 - Best - ALWAYS use proper types. There is DateTime type to use for Dates and Times!
#Exact parse by format to DateTime object
$dt = [DateTime]::ParseExact($string, 'yyyy:MM:dd hh:mm:ss', [System.Globalization.CultureInfo]::InvariantCulture)
#Convert DateTime to String
$result = $dt.ToString('yyyy\/MM\/dd hh:mm:ss')
.Net's String.Split has optional parameter count that means split no more than into # pieces. You can use it:
# Variant1
$result = [string]::Join('/',$string.Split(':', 3))
# Variant2
$result = $string.Split(':', 3) -join '/'
String.Replace() does not support regex patterns, nor does it accept a maximum count.
Use the -replace regex operator instead:
$string = '2020:10:07 08:45:49'
$string -replace '(?<=^[^:]*):(.*?):','/$1/'
This will replace only the first and second occurrence of : with /
Specifically for date/time representations, you may want to parse it as such, at which point you can easily re-format it:
$string = '2020:10:07 08:45:49'
$datetime = [datetime]::ParseExact($string, "yyyy:MM:dd HH:mm:ss", $null)
# Now we can create a new string with the desired format
Get-Date $datetime -Format 'yyyy/MM/dd HH:mm:ss'
# This might be easier than figuring out regex patterns
'{0:dd/MMM/yyyy-HH.mm.ss}' -f $datetime

Powershell format date after .AddDays() method

I can't find a way to format a date after using .AddDays()
CODE
[datetime] $searchDate = '2020-01-10'
$searchDate = '{0:yyyy-MM-dd}' -f $searchDate.AddDays(1)
returns "Saturday, January 11, 2020 12:00:00 AM" while I'm looking for 2020-01-11
tl;dr
# NOTE: [datetime] must be on the RHS if you want to assign a different type later.
$searchDate = [datetime] '2020-01-10'
$searchDate = '{0:yyyy-MM-dd}' -f $searchDate.AddDays(1)
Of course, you can combine that into a single assignment:
$searchDate = '{0:yyyy-MM-dd}' -f ([datetime] '2020-01-10').AddDays(1)
Or, via Get-Date:
$searchDate = Get-Date ([datetime] '2020-01-10').AddDays(1) -Format yyyy-MM-dd
Your own solution simply bypasses the conceptual problem with your code, which Jeroen Mostert describes well in a comment on the question.
[datetime] $searchDate = '2020-01-10'
By placing the cast ([datetime]) to the left of the variable ($searchDate) in your assignment, you type-constrain it.
This means that any values assigned later are invariably and implicitly coerced (converted) to the specified type ([datetime], in this case).
Therefore, you mustn't use the same variable to assign your string representation of a date, obtained with the -f operator, as that string representation is automatically reconverted to [datetime].
That is, after executing
$searchDate = '{0:yyyy-MM-dd}' -f $searchDate.AddDays(1), $searchData again contains a [datetime] instance, not the string of interest.
Another solution is to simply assign to a different variable, one that either isn't type-constrained or is constrained to [string].
Solved with Get-Date $searchDate -Format yyyy-MM-dd

Question regarding incrementing a string value in a text file using Powershell

Just beginning with Powershell. I have a text file that contains the string "CloseYear/2019" and looking for a way to increment the "2019" to "2020". Any advice would be appreciated. Thank you.
If the question is how to update text within a file, you can do the following, which will replace specified text with more specified text. The file (t.txt) is read with Get-Content, the targeted text is updated with the String class Replace method, and the file is rewritten using Set-Content.
(Get-Content t.txt).Replace('CloseYear/2019','CloseYear/2020') | Set-Content t.txt
Additional Considerations:
General incrementing would require a object type that supports incrementing. You can isolate the numeric data using -split, increment it, and create a new, joined string. This solution assumes working with 32-bit integers but can be updated to other numeric types.
$str = 'CloseYear/2019'
-join ($str -split "(\d+)" | Foreach-Object {
if ($_ -as [int]) {
[int]$_ + 1
}
else {
$_
}
})
Putting it all together, the following would result in incrementing all complete numbers (123 as opposed to 1 and 2 and 3 individually) in a text file. Again, this can be tailored to target more specific numbers.
$contents = Get-Content t.txt -Raw # Raw to prevent an array output
-join ($contents -split "(\d+)" | Foreach-Object {
if ($_ -as [int]) {
[int]$_ + 1
}
else {
$_
}
}) | Set-Content t.txt
Explanation:
-split uses regex matching to split on the matched result resulting in an array. By default, -split removes the matched text. Creating a capture group using (), ensures the matched text displays as is and is not removed. \d+ is a regex mechanism matching a digit (\d) one or more (+) successive times.
Using the -as operator, we can test that each item in the split array can be cast to [int]. If successful, the if statement will evaluate to true, the text will be cast to [int], and the integer will be incremented by 1. If the -as operator is not successful, the pipeline object will remain as a string and just be output.
The -join operator just joins the resulting array (from the Foreach-Object) into a single string.
AdminOfThings' answer is very detailed and the correct answer.
I wanted to provide another answer for options.
Depending on what your end goal is, you might need to convert the date to a datetime object for future use.
Example:
$yearString = 'CloseYear/2019'
#convert to datetime
[datetime]$dateConvert = [datetime]::new((($yearString -split "/")[-1]),1,1)
#add year
$yearAdded = $dateConvert.AddYears(1)
#if you want to display "CloseYear" with the new date and write-host
$out = "CloseYear/{0}" -f $yearAdded.Year
Write-Host $out
This approach would allow you to use $dateConvert and $yearAdded as a datetime allowing you to accurately manipulate dates and cultures, for example.

Culture based formatting of time variable

In this example it seems to me that the first two outputs should match, giving me formatting based on my defined culture. The last should be different because French formatting is different. Instead, the last two are the same, and are both getting some kind of default formatting. So, how do I do Culture based formatting when the time is a variable rather than formatting directly with Get-Date? It seems like it should be the same, but it's not.
get-date -format ((Get-Culture).DateTimeFormat.FullDateTimePattern)
$time = Get-Date
$pattern = 'FullDateTimePattern'
$formattedTime = $time -f (Get-Culture).DateTimeFormat.$pattern
Write-Host "$formattedTime"
$culture = New-Object system.globalization.cultureinfo('fr-FR')
$formattedTime = $time -f ($culture).DateTimeFormat.$pattern
Write-Host "$formattedTime"
The output I get is
July 9, 2019 11:22:01 AM
07/09/2019 11:22:01
07/09/2019 11:22:01
What I want to get is
July 9, 2019 11:26:46 AM
July 9, 2019 11:26:46 AM
Tuesday 9 July 2019 11:26:46
EDIT: So, based on I.T Delinquent's response, I tried this
$pattern = 'longDateTimePattern'
$date = Get-Date
$format = (Get-Culture).DateTimeFormat.$pattern
$string = ($date).ToString($format)
Write-Host $string
$culture = New-Object system.globalization.cultureinfo('de-DE')
$format = $culture.DateTimeFormat.$pattern
$string = ($date).ToString($format)
Write-Host $string
And it gave me identical results. Because it's not 'longDateTimePattern', its 'longDatePattern'. Given that the pattern could become a user supplied string, I better validate them.
Your attempt at using the -f operator is flawed (see bottom section).
To get the desired output, use the [datetime] type's appropriate .ToString() overload:
$time.ToString($culture.DateTimeFormat.$pattern, $culture)
Passing $culture as the 2nd argument ensures that the formatting is applied in the context of that culture.
If your intent is truly to use a format from another culture and apply it in the context of the current culture, simply omit the 2nd argument (as an alternative to the Get-Date -Format approach in your question):
$time.ToString($culture.DateTimeFormat.$pattern)
If there's no need to involve a different culture, the task becomes much simpler, by way of the standard date-time format strings, where single-character strings such as "D" refer to standard formats, such as LongDatePattern:
$time.ToString("D")
You can also pass these strings to Get-Date -Format
Get-Date -Format D
As for what you tried:
In order for the -f operator to work correctly, your LHS must be a string template with placeholders ({0} for the first one, {1} for the second, ...), to be replaced with the RHS operands.
Using a simple example:
Format the RHS, an [int], as a number with 2 decimal places.
PS> '{0:N2}' -f 1
1.00
Therefore, $time -f (Get-Culture).DateTimeFormat.$pattern doesn't perform (explicit) formatting at all, because the LHS - $time - contains no placeholders.
That is, the RHS is ignored, and the LHS is returned as a string: It is effectively the same as calling $time.ToString() in the context of the invariant culture (because the result of applying the -f operator is always a string and PowerShell uses the invariant culture in many string-related contexts).
While you can incorporate a specific date-time format string into a template-string placeholder - by following the placeholder index with : and a format string, as shown above ({0:N2}) - you cannot also provide a culture context for it.
You'd have to (temporarily) switch to the desired culture first:
# Save the currently effective culture and switch to the French culture
$prev = [cultureinfo]::CurrentCulture
[cultureinfo]::CurrentCulture = 'fr-FR'
# Format with the desired format string.
"{0:$($culture.DateTimeFormat.$pattern)}" -f $time
[cultureinfo]::CurrentCulture = $prev
I think this has something to do with how the Get-Date is passed using the variable, it seems to lose the format capability. In fact, if you try using Write-Host ($date -Format $format) gives an error:
Unexpected token '$format' in expression or statement
Here are my setup variables:
$pattern = 'FullDateTimePattern'
$format = (Get-Culture).DateTimeFormat.$pattern
$date = Get-Date
As stated above, using Write-Host ($date -f $format) and incorrectly outputs 07/09/2019 12:24:38. However, using any of the below options does work and correctly outputs 09 July 2019 12:24:38:
Write-Host (Get-Date -Format $format)
Write-Host (Get-Date).ToString($format)
Write-Host ($date).ToString($format)
Hope this helps :)