PowerShell concatenate output of Get-ChildItem - powershell

I have a working script that searches for files with a regular expression. The script returns 2 lines per file: the parent folder naĆ¹e, and the filename (matching the regex).
Get-ChildItem -Path "D:\test\" -Recurse -File |
Where-Object { $_.BaseName -match '^[0-9]+$' } |
ForEach-Object { $_.FullName -Split '\',-3,'SimpleMatch' } |
select -last 2 |
Out-File "D:\wim.txt"
A certain system needs to have the output on one line, concatenated with for example \ or a similar character. How can I achieve this please ?
Many thanks !

Get-ChildItem -Path D:\test -Recurse -File |
Where-Object { $_.BaseName -match '^[0-9]+$' } |
ForEach-Object { ($_.FullName -split '\\')[-2,-1] -join '\' } | #'
Out-File D:\wim.txt
($_.FullName -Split '\\')[-2,-1] extracts the last 2 components from the file path
and -join '\' joins them back together.
Note that, aside from the line-formatting issue, your original command does not work as intended, because | select -last 2 is applied to the overall output, not per matching file; thus, even if there are multiple matching files, you'll only ever get the parent directory and filename of the last matching file.
The command above therefore extracts the last 2 \-separated path components inside the ForEach-Object block, directly on the result of the -split operation, so that 2 (joined) components are returned per file.
As an aside, the -3 in $_.FullName -split '\', -3, 'SimpleMatch' does not extract the last 3 tokens; it is currently effectively treated the same as 0, meaning that all resulting tokens are returned; given that -split defaults to using regexes, and representing a literal \ requires escaping as \\, $_.FullName -split '\', -3, 'SimpleMatch' is the same as $_.FullName -split '\\', which is what the solution above uses.
Note that there is a green-lit -split enhancement that will give negative <Max-substrings> values new meaning in the future, applying the current positive-number logic analogously to the end of the input string; e.g, -3 would mean: return the last 2 components plus whatever is left of the input string before them (with the resulting tokens still reported from left to right).

Related

How to remove last X number of character from file name

Looking for help writing a script that will remove a specific number of characters from the end of a file name. In my specific dilemma, I have dozens of files with the following format:
1234567 XKJDFDA.pdf
5413874 KJDFSXZ.pdf
... etc. etc.
I need to remove the last 7 alpha characters to leave the 7 digits standing as the file name. Through another posted question I was able to find a script that would remove the first X number of digits from the beginning of the file name but I'm having an incredibly difficult time modifying it to remove from the end:
get-childitem *.pdf | rename-item -newname { [string]($_.name).substring(x) }
Any and all relevant help would be greatly appreciated.
Respectfully,
$RootFolder = '\\server.domain.local\share\folder'
Get-ChildItem -LiteralPath $RootFolder -Filter '*.pdf' |
Where-Object { $_.psIsContainer -eq $false } | # No folders
ForEach-Object {
if ($_.Name -match '^(?<BeginningDigits>\d{7})\s.+\.pdf$' ) {
$local:newName = "$($Matches['BeginningDigits'])$($_.Extension)"
return Rename-Item -LiteralPath $_.FullName -NewName $local:newName -PassThru
}
} |
ForEach-Object {Write-Host "New name: $($_.Name)"}
If file name matches "<FilenameBegin><SevenDigits><Space><Something>.pdf<FilenameEnd>", then rename it to "<SevenDigits>.<KeepExtension>". This uses Regular Expressions with Named Selection groups ( <BeginningDigits> is group name ). Take a note that due to RegExp usage, this is most CPU-taking algorythm, but if you have one-time run or you have little amount of files, there is no sense. Otherwise, if you have many files, I'd recommend adding Where-Object { $_.BaseName.Length -gt 7 } | before if (.. -match ..) to filter out files shorter than 7 symbols before RegExp check to minimize CPU Usage ( string length check is less CPU consumable than RegExp ). Also you can remove \.pdf from RegExp to minimize CPU usage, because you already have this filter in Get-ChildItem
If you strictly need match "<7digits><space><7alpha>.pdf", you should replace RegExp expression with '^(?<BeginningDigits>\d{7})\s[A-Z]{7}\.pdf$'
$RootFolder = '\\server.domain.local\share\folder'
#( Get-ChildItem -LiteralPath $RootFolder -Filter '*.pdf' ) |
Where-Object { $_.psIsContainer -eq $false } | # No folders
Where-Object { $_.BaseName.Length -gt 7 } | # For files where basename (name without extension) have more than 7 symbols)
ForEach-Object {
$local:newName = [string]::Join('', $_.BaseName.ToCharArray()[0..6] )
return Rename-Item -LiteralPath $_.FullName -NewName $local:newName -PassThru
} |
ForEach-Object {Write-Host "New name: $($_.Name)"}
Alternative: Using string split-join: Rename all files, whose name without extension > 7 symbols to first 7 symbols ( not taking in account if digits or not ), keeping extension.
This is idiotic algorythm, because Substring is faster. This just can help learning subarray selection using [x..y]
Please take note that we check string length > 7 before using [x..y] in Where-Object { $_.BaseName.Length -gt 7 }. Otherwise we cat hit error when name is shorter than 7 symbols and we trying to take 7th symbol.
$RootFolder = '\\server.domain.local\share\folder'
#( Get-ChildItem -LiteralPath $RootFolder -Filter '*.pdf' ) |
Where-Object { $_.psIsContainer -eq $false }
Where-Object { $_.BaseName.Length -gt 7 } | # For files where basename (name without extension) have more than 7 symbols)
ForEach-Object {
$local:newName = $x[0].BaseName.Substring(0,7)
return Rename-Item -LiteralPath $_.FullName -NewName $local:newName -PassThru
} |
ForEach-Object {Write-Host "New name: $($_.Name)"}
Alternative: Using substring. Rename all files, whose name without extension > 7 symbols to first 7 symbols ( not taking in account if digits or not ), keeping extension.
.Substring(0,7) # 0 - position of first symbol, 7 - how many symbols to take. Please take note that we check string length > 7 before using substring in Where-Object { $_.BaseName.Length -gt 7 }. Otherwise we cat hit error when name is shorter than 7 symbols
A much simpler alternative to PowerShell is using Command Prompt. If your filenames are along the lines of "00001_01.jpg", "00002_01.jpg", "00003_01.jpg", you can use the following command:
ren ?????_0.jpg ?????.jpg
where the number of ? matches the first part of the filename that you want to keep.
You can read more about this and other Command Prompt methods of batch renaming files in Windows at this useful website.
EDIT: edited to preserve the extension
There's another substring() method that takes 2 args, startIndex and length. https://learn.microsoft.com/en-us/dotnet/api/system.string.substring?view=netframework-4.8
'hithere'.substring
OverloadDefinitions
-------------------
string Substring(int startIndex)
string Substring(int startIndex, int length)
Thus, to delete a total of 8 characters from the right of the basename, including the space:
get-childitem *.pdf | rename-item -newname { $_.basename.substring(0,
$_.basename.length-8) + $_.extension } -whatif
What if: Performing the operation "Rename File" on target
"Item: /Users/js/1234567890.pdf
Destination: /Users/js/12.pdf".
You can write a quick bash function for this.
function fileNameShortener() {
mv "$1" ${1:0:4}
}
This will take the first Argument. which is stored in $1, and create a substring from index 0 to the index 4 characters to the left. This is then used in the move command to move the initial file to the new filename. To Further generalise:
function fileNameShortener2() {
mv "$1" ${1:$2:$3}
}
This can be used, to give starting point, and length of the Substring as second and third function argument.
fileNameShortener fileName.txt 0 -5
This sample would remove the last 5 characters from the Filename.

Rename file - delete all characters AFTER 2nd underscore

I need to replace the time\date stamp that's included in the filename after 2nd underscore (needs to be in the same format yyyyMMddHHmmss)
example file: 123456_123456_20190716163001.xml
sometimes the file in question gets created with an additional character which invalidates the file, in this case I need to replace this with the current timestamp.
example: 123456_123456_current Timestamp here.xml
The file should never exceed 32 characters(including extension)
I found a script but it deletes everything after the 1st underscore not the 2nd and I'm struggling to find a way to replace the text with the current timestamp.
Get-ChildItem c:\test -Filter 123456_123456*.xml | Foreach-Object -Process {
$NewName = [Regex]::Match($_.Name,"^[^_]*").Value + '.xml' $_ | Rename-Item -NewName $NewName
}
timestamp after 2nd underscore to be updated to the current timestamp if original file exceeds 32 characters
123456_123456_current Timestamp here.xml
this takes advantage of the way a [fileinfo] object is structured. the .BaseName is easy to get to & use .Split() on. then one can use -join to put it back into one basename & finally add the extension onto the basename.
# fake reading in a file info object
# in real life, use Get-ChildItem or Get-Item
$FileObject = [System.IO.FileInfo]'123456_123456_current Timestamp here.xml'
$NewName = -join (($FileObject.BaseName.Split('_')[0,1] -join '_'), $FileObject.Extension)
$NewName
output = 123456_123456.xml
Sticking with the regex theme, you can do the following:
$CurrentTime = Get-Date -Format 'yyyyMMddHHmmss'
$RegexReplace = "(.*?_.*?_).*(\..*)"
Get-ChildItem c:\test -Filter 123456_123456*.xml |
Rename-Item -NewName {$_.Name -replace $RegexReplace,"`${1}$CurrentTime`${2}"}
If duplicate file names are a concern, you can build in an increment to $CurrentTime.
$CurrentTime = Get-Date -Format 'yyyyMMddHHmmss'
$RegexReplace = "(.*?_.*?_).*(\..*)"
Get-ChildItem c:\test -Filter 123456_123456*.xml |
Rename-Item -NewName {
$NewName = $_.Name -replace $RegexReplace,"`${1}$CurrentTime`${2}"
if (test-path $NewName) {
$CurrentTime = [double]$CurrentTime + 1
$NewName = $_.Name -replace $RegexReplace,"`${1}$CurrentTime`${2}"
}
$NewName
}
Explanation:
$RegexReplace contains the regex expression that will need to be matched for the ideal rename operation to happen. The regex mechanisms are explained below:
.*?_.*?_: Matches a minimal number of characters (lazy matching) followed by an underscore and then another minimal number of characters followed by an underscore.
.*: Greedily matches any characters
\.: Literally matches the dot character (.).
(): The parentheses here represent capture groups with the first set being 1 and the second set being 2. These are later referenced as ${1} and ${2} in the -replace operation.
Since Rename-Item -NewName supports delayed script binding, we can just pipe Get-ChildItem output directly to it. The current pipeline object is $_.
The -replace operation uses the variable $CurrentTime, which must be expanded in order for a successful outcome. For that reason, we use double quotes around the replacement. Since we do not want capture groups ${1} and ${2} expanded, we backtick escape them.

Retain initial characters in file names, remove all remaining characters using powershell

I have a batch of files with names like: 78887_16667_MR12_SMITH_JOHN_713_1.pdf
I need to retain the first three sets of numbers and remove everything between the third "_" and "_1.pdf".
So this: 78887_16667_MR12_SMITH_JOHN_713_1.pdf
Becomes this: 78887_16667_MR12_1.pdf
Ideally, I'd like to be able to just use the 3rd "_" as the break as the third set of numbers sometimes includes 3 characters, sometimes 4 characters (like the example) and other times, 5 characters.
If I used something like this:
Get-ChildItem Default_*.pdf | Rename-Item -NewName {$_.name -replace...
...and then I'm stuck: can I state that everything from the 3rd "" and the 6th "" should be replaced with "" (nothing)? My understanding that I'd include ".Extension" to also save the extension, too.
You can use the -split operator to split your name into _-separated tokens, extract the tokens of interest, and then join them again with the -join operator:
PS> ('78887_16667_MR12_SMITH_JOHN_713_1.pdf' -split '_')[0..2 + -1] -join '_'
78887_16667_MR12_1.pdf
0..2 extracts the first 3 tokens, and -1 the last one (you could write this array of indices as 0, 1, 2, -1 as well).
Applied in the context of renaming files:
Get-ChildItem -Filter *.pdf | Rename-Item -NewName {
($_.Name -split '_')[0..2 + -1] -join '_'
} -WhatIf
Common parameter -WhatIf previews the rename operation; remove it to perform actual renaming.
mklement0 has given you a good and working answer. Here is another way to do it using a regex.
Get-ChildItem -Filter *.pdf |
ForEach-Object {
if ($_.Name -match '(.*?_.*?_.*?)_.*(_1.*)') {
Rename-Item -Path $_.FullName -NewName $($Matches[1..2] -join '') -WhatIf
}
}

Splitting in Powershell

I want to be able to split some text out of a txtfile:
For example:
Brackets#Release 1.11.6#Path-to-Brackets
Atom#v1.4#Path-to-Atom
I just want to have the "Release 1.11.6" part. I am doing a where-object starts with Brackets but I don't know the full syntax. Here is my code:
"Get-Content -Path thisfile.txt | Where-Object{$_ < IM STUCK HERE > !
You could do this:
((Get-Content thisfile.txt | Where-Object { $_ -match '^Brackets' }) -Split '#')[1]
This uses the -match operator to filter out any lines that don't start with Brackets (the ^ special regex character indicates that what follows must be at the beginning of the line). Then it uses the -Split operator to split those lines on # and then it uses the array index [1] to get the second element of the split (arrays start at 0).
Note that this will throw an error if the split on # doesn't return at least two elements and it assumes that the text you want is always the second of those elements.
$bracketsRelease = Get-Content -path thisfile.txt | foreach-object {
if ( $_ -match 'Brackets#(Release [^#]+)#' )
{
$Matches[1]
}
}
or
(select-string -Path file.txt -Pattern 'Brackets#(Release [^#]+)#').Matches[0].Groups[1].value

PowerShell - lowercase text between two characters

I have a lot of .txt files where I need to lowercase content in between two characters - after "%" and before ";".
The code below makes all content in the files lowercase and I need it to only do it in all instances between the two characters as mentioned.
$path=".\*.txt"
Get-ChildItem $path -Recurse | foreach{
(Get-Content $_.FullName).ToLower() | Out-File $_.FullName
}
Here an example using regex replace with a callback function to perform the lowercase:
$path=".\*.txt"
$callback = { param($match) $match.Groups[1].Value.ToLower() }
$rex = [regex]'(?<=%)(.*)(?=;)'
Get-ChildItem $path -Recurse | ForEach-Object {
$rex.Replace((Get-Content $_ -raw), $callback) | Out-File $_.FullName
}
Explanation:
The regex uses a positive lookbehind to find the position of % and a lookahead for the position of ; and caputes everything between in a group:
The caputred group gets passed to the callbackfunction which invokes ToLower() on it.