Filtering file names with Get-Content - powershell

EDIT:
This seems to work. Hope it helps someone:
Get-ChildItem C:\Users\$env:username\Desktop\RD |
Where-Object {$_.Name -match "^[^8]*141.txt"} |
Foreach-Object {$_
Get-Content -Path C:\Users\$env:username\Desktop\RD\$_ |
Out-File C:\Users\$env:username\Desktop\RD\141Master.txt
}
I'm trying to work out filtering file names for a more explicit appending process. So I can do this:
Get-Content C:\erik\*.txt | Out-File C:\erik\whatever.txt
And all the text files append. Then I can do this:
Get-Content C:\erik\*101.txt | Out-File C:\erik\whatever.txt
And all the files with 101 in them append. But when I try something like this:
Get-Content C:\erik\^[^8]*141.txt | Out-File C:\erik\whatever.txt
I get:
Get-Content : An object at the specified path
C:\Users\edarling\Desktop\RD\^[^8]*141.txt does not exist, or has been
filtered by the -Include or -Exclude parameter. At line:1 char:1
+ Get-Content C:\Users\edarling\Desktop\RD\^[^8]*141.txt | Out-File C:\Users\edarl ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : ObjectNotFound: (System.String[]:String[]) [Get-Content], Exception
+ FullyQualifiedErrorId : ItemNotFound,Microsoft.PowerShell.Commands.GetContentCommand
I've been trying to pipe Get-ChildItem to Get-Content, but can't quite figure it out. Any suggestions out there?
Thanks

Get-Content supports only globbing, not regular expressions. With the former you can only do things like this:
match all file names that end with 141.txt: *141.txt
match all file names that begin with foo and end with a d or t: foo*[dt]
match all file names that consist of two arbitrary characters followed by the extension .doc: ??.doc
Globbing does not allow you to form an expression to match a name that does not contain particular characters. To Get-Content your expression ^[^8]*141.txt means "a file name that begins with a caret followed by either another caret or the character 8 and ends with 141.txt".
If you need to filter by regular expression you have to use the -match operator:
Get-ChildItem 'C:\some\folder' | ? { $_.Name -match '^[^8]*141\.txt$' }
Note that in regular expressions you need to escape dots if you want to match literal dots (\.). Unescaped dots matche any character except a line feed. You should also anchor your expression on both sides. Otherwise a regular expression ^[^8]*141.txt would match not only abc141.txt, but also something like 141_txt.doc.

Related

Powershell Remove Text Between before the file extension and an underscore

I have a few hundered PDF files that have text in their file names which need to be removed. Each of the file names have several underscores in their names depending on how long the file name is. My goal is to remove the text in that exists between the .pdf file extension and the last _.
For example I have:
AB_NAME_NAME_NAME_NAME_DS_123_EN_6.pdf
AC_NAME_NAME_NAME_DS_321_EN_10.pdf
AD_NAME_NAME_DS_321_EN_101.pdf
And would like the bold part to be removed to become:
AB_NAME_NAME_NAME_NAME_DS_123_EN.pdf
AC_NAME_NAME_NAME_DS_321_EN.pdf
AD_NAME_NAME_DS_321_EN.pdf
I am a novice at powershell but I have done some research and have found Powershell - Rename filename by removing the last few characters question helpful but it doesnt get me exactly what I need because I cannot hardcode the length of characters to be removed because they may different lengths (2-4)
Get-ChildItem 'C:\Path\here' -filter *.pdf | rename-item -NewName {$_.name.substring(0,$_.BaseName.length-3) + $_.Extension}
It seems like there may be a way to do this using .split or regex but I was not able to find a solution. Thanks.
You can use the LastIndexOf() method of the [string] class to get the index of the last instance of a character. In your case this should do it:
Get-ChildItem 'C:\Path\here' -filter *.pdf | rename-item -NewName { $_.BaseName.substring(0,$_.BaseName.lastindexof('_')) + $_.Extension }
Using the -replace operator with a regex enables a concise solution:
Get-ChildItem 'C:\Path\here' -Filter *.pdf |
Rename-Item -NewName { $_.Name -replace '_[^_]+(?=\.)' } -WhatIf
-WhatIf previews the renaming operation. Remove it to perform actual renaming.
_[^_]+ matches a _ character followed by one or more non-_ characters ([^-])
If you wanted to match more specifically by (decimal) digits only (\d), use _\d+ instead.
(?=\.) is a look-ahead assertion ((?=...)) that matches a literal . (\.), i.e., the start of the filename extension without including it in the match.
By not providing a replacement operand to -replace, it is implicitly the empty string that replaces what was matched, which effectively removes the last _-prefixed token before the filename extension.
You can make the regex more robust by also handling file names with "double" extensions; e.g., the above solution would replace filename a_bc.d_ef.pdf with a.c.pdf, i.e., perform two replacements. To prevent that, use the following regex instead:
$_.Name -replace '_[^_]+(?=\.[^.]+$)'
The look-ahead assertion now ensures that only the last extension matches: a literal . (\.) followed by one or more (+) characters other than literal . ([^.], a negated character set ([^...])) at the end of the string ($).
Just to show another alternative,
the part to remove from the Name is the last element from the BaseName splitted with _
which is a negative index from the split [-1]
Get-ChildItem 'C:\Path\here' -Filter *.pdf |%{$_.BaseName.split('_\d+')[-1]}
6
10
101
as the split removes the _ it has to be applied again to remove it.
Get-ChildItem 'C:\Path\here' -Filter *.pdf |
Rename-Item -NewName { $_.Name -replace '_'+$_.BaseName.split('_')[-1] } -whatif
EDIT a modified variant which splits the BaseName at the underscore
without removing the splitting character is using the -split operator and
a RegEx with a zero length lookahead
> Get-ChildItem 'C:\Path\here' -Filter *.pdf |%{($_.BaseName -split'(?=_\d+)')[-1]}
_6
_10
_101
Get-ChildItem 'C:\Path\here' -Filter *.pdf |
Rename-Item -NewName { $_.Name -replace ($_.BaseName -split'(?=_)')[-1] } -whatif

How to get Get-ChildItem to handle path with non-breaking space

I have the following code that works for most files. The input file (FoundLinks.csv) is a UTF-8 file with one file path per line. It is full paths of files on a particular drive that I need to process.
$inFiles = #()
$inFiles += #(Get-Content -Path "C:\Users\sw_admin\FoundLinks.csv")
foreach ($inFile in $inFiles) {
Write-Host("Processing: " + $inFile)
$objFile = Get-ChildItem -LiteralPath $inFile
New-Object PSObject -Prop #{
FullName = $objFile.FullName
ModifyTime = $objFile.LastWriteTime
}
}
But even though I've used -LiteralPath, it continues to not be able to process files that have a non-breaking space in the file name.
Processing: q:\Executive\CLC\Budget\Co  2018 Budget - TO Bob (GA Prophix).xlsx
Get-ChildItem : Cannot find path 'Q:\Executive\CLC\Budget\Co  2018 Budget - TO Bob (GA Prophix).xlsx'
because it does not exist.
At ListFilesWithModifyTime.ps1:6 char:29
+ $objFile = Get-ChildItem <<<< -LiteralPath $inFile
+ CategoryInfo : ObjectNotFound: (Q:\Executive\CL...A Prophix).xlsx:String) [Get-ChildItem], ItemNotFound
Exception
+ FullyQualifiedErrorId : PathNotFound,Microsoft.PowerShell.Commands.GetChildItemCommand
I know my input file has the non-breaking space in the path because I'm able to open it in Notepad, copy the offending path, paste into Word, and turn on paragraph marks. It shows a normal space followed by a NBSP just before 2018.
Is PowerShell not reading in the NBSP? Am I passing it wrong to -LiteralPath? I'm at my wit's end. I saw this solution, but in that case they are supplying the path as a literal in the script, so I can't see how I could use that approach.
I've also tried: -Encoding UTF8 parameter on Get-Content, but no difference.
I'm not even sure how I can check $inFile in the code just to confirm if it still contains the NBSP.
Grateful for any help to get unstuck!
Confirmed that $inFile has NBSP
Thank you all! As per #TheMadTechnician, I have updated the code like this, and also reduced my input file to only the one file having a problem.
$inFiles = #()
$inFiles += #(Get-Content -Path "C:\Users\sw_admin\FoundLinks.csv" -Encoding UTF8)
foreach ($inFile in $inFiles) {
Write-Host("Processing: " + $inFile)
# list out all chars to confirm it has an NBSP
$inFile.ToCharArray()|%{"{0} -> {1}" -f $_,[int]$_}
$objFile = Get-ChildItem -LiteralPath $inFile
New-Object PSObject -Prop #{
FullName = $objFile.FullName
ModifyTime = $objFile.LastWriteTime
}
}
And so now I can confirm that $inFile in fact still contains the NBSP just as it gets passed to Get-ChildItem. Yet Get-ChildItem says the file does not exist.
More I've tried:
Same if I use Get-Item instead of Get-ChildItem
Same if I use -Path instead of -LiteralPath
Windows explorer and Excel can deal with the file successfully.
I'm on a Windows 7 machine, Powershell 2.
Thanks again for all the responses!
It's still unclear why Sandra's code didn't work: PowerShell v2+ is capable of retrieving files with paths containing non-ASCII characters; perhaps a non-NTFS filesystem with different character encoding was involved?
However, the following workaround turned out to be effective:
$objFile = Get-ChildItem -Path ($inFile -replace ([char] 0xa0), '?')
The idea is to replace the non-breaking space char. (Unicode U+00A0; hex. 0xa) in the input file path with wildcard character ?, which represents any single char.
For Get-ChildItem to perform wildcard matching, -Path rather than -LiteralPath must be used (note that -Path is actually the default if you pass a path argument positionally, as the first argument).
Hypothetically, the wildcard-based paths could match multiple files; if that were the case, the individual matches would have to be examined to identify the specific match that has a non-breaking space in the position of the ?.
Get-ChildItem is for listing children so you would be giving it a directory, but it seems you are giving it a file, so when it says it cannot find the path, it's because it can't find a directory with that name.
Instead, you would want to use Get-Item -LiteralPath to get each individual item (this would be the same items you would get if you ran Get-ChildItem on its parent.
I think swapping in Get-Item would make your code work as is.
After testing, I think the above is in fact false, so sorry for that, but I will leave the below in case it's helpful, even though it may not solve your immediate problem.
But let's take a look at how it can be simplified with the pipeline.
First, you're starting with an empty array, then calling a command (Get-Content) which likely already returns an array, wrapping that in an array, then concatenating it to the empty one.
You could just do:
$inFiles = Get-Content -Path "C:\Users\sw_admin\FoundLinks.csv"
Yes, there is a chance that $inFiles will contain only a single item and not an array at all.
But the nice thing is that foreach won't mind one bit!
You can do something like this and it just works:
foreach ($string in "a literal single string") {
Write-Host $string
}
But Get-Item (and Get-ChildItem for that matter) accept pipeline input, so they accept multiple items.
That means you could do this:
$inFiles = Get-Content -Path "C:\Users\sw_admin\FoundLinks.csv" | Get-Item
foreach ($inFile in $inFiles) {
Write-Host("Processing: " + $inFile)
New-Object PSObject -Prop #{
FullName = $inFile.FullName
ModifyTime = $inFile.LastWriteTime
}
}
But even more than that, there is a pipeline-aware cmdlet for processing items, called ForEach-Object, to which you pass a [ScriptBlock], in which $_ represents the current item, so we could do it like this:
Get-Content -Path "C:\Users\sw_admin\FoundLinks.csv" |
Get-Item |
ForEach-Object -Process {
Write-Host("Processing: " + $_)
New-Object PSObject -Prop #{
FullName = $_.FullName
ModifyTime = $_.LastWriteTime
}
}
All in one pipeline!
But further, you're creating a new object with the 2 properties you want.
PowerShell has a nifty cmdlet called Select-Object which takes an input object and returns a new object containing only the properties you want; this would make for a cleaner syntax:
Get-Content -Path "C:\Users\sw_admin\FoundLinks.csv" |
Get-Item |
Select-Object -Property FullName,LastWriteTime
This is the power of the the pipeline passing real objects from one command to another.
I realize this last example does not write the processing message to the screen, however you could re-add that in if you wanted:
Get-Content -Path "C:\Users\sw_admin\FoundLinks.csv" |
Get-Item |
ForEach-Object -Process {
Write-Host("Processing: " + $_)
$_ | Select-Object -Property FullName,LastWriteTime
}
But you might also consider that many cmdlets support verbose output and try to just add -Verbose to some of your existing cmdlets. Sadly, it won't really help in this case.
One final note, when you pass items to the filesystem cmdlets via pipeline, the parameter they bind to is in fact -LiteralPath, not -Path, so your special characters are still safe.
I just run into the same issue. Looks like get-childitem ak gci expects the path in unicode (UTF-16). So either convert the csv file into unicode or convert the lines that include the path as unicode within your script.
Testet on PS 5.1.22621.608

Getting an Error when I try to change file name in PowerShell

I found similar commands to these online. I want to replace the parenthesis in my file names to either a space or empty string.
The files I'm trying to change look like the following:
Nehemiah (1).mp3
Nehemiah (2).mp3
Nehemiah (11).mp3
Really I'd like them too look like the following:
Nehemiah 01.mp3
Nehemiah 02.mp3
Nehemiah 11.mp3
Here are the scripts I've tried.
Dir | Rename-Item –NewName { $_.name –replace “(“,”” }
Dir *.mp3 | rename-item -newname { $_.name -replace " ("," " }
Neither of these work.
Here is the error message I'm getting.
Rename-Item : The input to the script block for parameter 'NewName'
failed. The regular expression pattern ( is not valid. At line:1
char:34
+ Dir *.mp3 | rename-item -newname { $_.name -replace " ("," " }
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidArgument: (C:\Users...ehemiah (1).mp3:PSObject) [Rename-Item], Parameter ndingException
+ FullyQualifiedErrorId : ScriptBlockArgumentInvocationFailed,Microsoft.PowerShell.Commands.RenameItemCommand
As you have seen in the comments from Mathias R Jessen -replace supports regular expressions and you need to account for that. The static method Escape can help automatically escape regex meta characters in strings so that the can appear more readable and function as you see them.
I did manage to find this on MSDN that talks about the characters that are escaped using [regex]::escape():
Escapes a minimal set of characters (\, *, +, ?, |, {, [, (,), ^, $,., #, and white space) by replacing them with their escape codes. This instructs the regular expression engine to interpret these characters literally rather than as metacharacters.
However since you don't actually need to be using regex here I would suggest you use the string method .replace() as that will accomplish the same without the extra overhead.
Get-ChildItem "*.mp3" | Rename-Item -NewName {($_.name).Replace(" ("," ")}
Character padding changes things a little though. I would op for regex there since it is a lot easier that splitting strings and putting them back together.
Get-ChildItem "*.mp3" |
Where-Object{$_.name -match "\((\d+)\)"}
Rename-Item -NewName {
[regex]::Replace($_.Name,"\((\d+)\)",{param($match) ($match.Groups[1].Value).PadLeft(2,"0")})
}
Thanks to PetSerAl for helping with this backreference replacement solution.
I'm sure this can probably be done in a simpler manner, but it works for me.
[regex]$rxp = '\((\d+)\)'; gci | ?{$_.Name -match $rxp} | %{ ren $_.FullName ($_.name -replace $rxp, $matches[1].padLeft(2,'0')) }
Here's a breakdown:
# Define regex. The outer parentheses are literal. The inner are a
# capture group, capturing the track number in the filename.
[regex]$rxp = '\((\d+)\)'
# Get-ChildItems. For each child item that matches the regex...
gci | ?{$_.Name -match $rxp} | %{
# Rename the file, replacing the offending regex match with the match
# obtained from the where-object selector (the ? block)
ren $_.FullName ($_.name -replace $rxp, $matches[1].padLeft(2,'0'))
}
PowerShell doesn't make it easy to massage the data captured by a backreference. However, the -match operator populates a $matches array which can be manipulated more easily.
This makes it possible not only to remove the parentheses, but also to zero pad your single-digit numbers.
Try this.
ls | Where {$_.FullName -match '(-)'} | Rename-Item -NewName { $_ -replace ("-","_") }
I'm listing every file in the directory that matches (contains) a dash - then I pipe that into Rename-Item with new-name replacing the - for an underscore _
PS: The bonus is, I can quickly put back the dash from the underscore.

Iterate & Search for a string in child items

I am trying to build a PowerShell script that iterates through a list of files and searches and removes a match, not having much luck, here is my script
$path = "D:\Test\"
$filter = "*.txt"
$files = Get-ChildItem -path $path -filter $filter
foreach ($item in $files)
{
$search = Get-content -path $path$item
$search| select-string -pattern "T|"
}
At the moment the script is just returning the whole content of the file and not the select string.
Basically each file in the folder will have a trailer record at the end i.e. T|1410 I need to iterate through all the files and delete the last line, some of these files will be 200mb+ can someone guide me please.
I've edited my script and now I am using the following method.
$path = "D:\Test\"
$filter = "*.txt"
$files = Get-ChildItem -path $path -filter $filter
foreach ($item in $files)
{
$search = Get-content $path$item
($search)| ForEach-Object { $_ -replace 'T\|[0-9]*', '' } | Set-Content $path$item
}
I am using Powershell v.2
However, this is adding a new empty line to my end of file as well as leaving the replace empty, how can I avoid this as well as starting the search from the bottom
-pattern "T|"
That pattern matches a "T" or nothing. But there is nothing between every pair of characters in any string. To avoid the usual regular expression handling of | as an alternates separator, use a backslash to match a literal |:
-pattern "T\|"
Alternately, use Select-String's -SimpleMatch switch to stop the argument to -Pattern being treated as a regular expression.
As Richard mentioned, you have to escape the | character.
You could also use the regex::escape function for that:
[regex]::Escape("T|")
Aside from escaping the characters the other option you have available is the -SimpleMatch switch. From TechNet
Uses a simple match rather than a regular expression match. In a simple match, Select-String searches the input for the text in the Pattern parameter. It does not interpret the value of the Pattern parameter as a regular expression statement.
If you don't want to have to worry about escaping the characters and are not using regex this would be the way to go.
$search | select-string -pattern "T|" -SimpleMatch

PowerShell: String replacement doesn't work every time?

I am writing a script to prepare a csv for uploading it into SQL.
I have two almost identical commands (1 and 3) but the first doesn't work, the third one does. There is no error, it just doesn't do what I expect it to do.
I do 3 steps, first I get rid of all the pipes in the source file (this is what does not work), then I read and write the CSV (this gets rid of the different handling of using quotes and not using quotes and it changes the delimiter to pipes). Lastly I get rid of all quotations, since there should be no more pipes other than the delimiters.
# ---------------------------------------------------------
# 1. Get rid of all pipes
#----------------------------------------------------------
get-content ($csvfile + ".csv") | ForEach-Object { $_ -replace "|",""} | Set-Content ($csvfile + "2.csv")
# ---------------------------------------------------------
# 2. Make standard CSV but use Pipes as delimiter
#----------------------------------------------------------
Import-csv -path ($csvfile + "2.csv") -Delimiter ',' | Export-CSV -path ($csvfile + " 3.csv") -Delimiter '|'
# ---------------------------------------------------------
# 3. Get rid of all Quotes
#----------------------------------------------------------
get-content ($csvfile + "3.csv") | ForEach-Object { $_ -replace '"',""} | Set-Content ($csvfile + "4.csv")
The getting rid-parts are the same. The second one works, it gets rid of all the quotes but the first one does not work, the pipes are still in. I tried different characters but for some reason none works at this position.
What am I missing?
Thank You!
Just escape the | symbol with \ like \| as below and it will work fine; cause | is a special character (command pipe symbol).
get-content ($csvfile + ".csv") | ForEach-Object { $_ -replace "\|",""} |
Set-Content ($csvfile + "2.csv")
So if I have input like dadad|xcvxvv|sdffgfg then after the command the output would look like dadadxcvxvvsdffgfg