PowerShell `Select-String` selecting some but not others - powershell

I have a folder with a list of excel files in .xls and .xlsx formats. I have verified the strings exist in some of the files. When I run the below code, I can find some string patterns but not others. Example - I can find 'randomword' - but not '5625555555' or 'P-888452'. When I run -NotMatch on '5625555555' or 'P-888452' I do get a list of file names that do not match (although they return duplicated in many rows) so I know the pattern is registering. What could be happening here? Why is it playing nice with some string (it seems mostly letters) but no others (that contain integers).
gci "path" -Filter "*.xls" -Recurse -File | Select-String '\bANYTEXTORINT\b' | Select FileName
I also do not get an error when I run code. Just a return completed with white text and no results. I do get results for the 'randomword' though. Three files get returned that contain that pattern.

Related

Rename files in a folder using powershell, keeping the start and end of string in original filename

Currently trying to create a script that renames specific files within a chosen folder so that the resulting renamed files look like the following:
Original Filename: 45.09 - WrapperA12_rev1.DXF
Resultant Filename: 45.09_1.DXF
So the rev number is included as a suffix to the base filename, the extension is kept and the first 5 characters of the filename is kept (including the ".").
I can get fairly close by removing the hyphens, spaces and letters from the original filename using the -replace argument, but the resultant filename using the example above would be "45.0912_1", where the file extension is ".0912_1". This makes sense, but any attempt I've made to append the file extension (".DXF") to the filename hasn't worked.
$listdxf=gci -path $pathfolder -Filter *.DXF | Select-Object
$prenameDXF=$listdxf|rename-item -WhatIf -newname {$_.name -replace('[a-z]') -replace('-') -
replace('\s','')}
$prenameDXF
Any feedback on how I would go about doing this would be greatly appreciated.
For further clarification; the original filenames will always have the 4 numbers and the dot at the start of the filename - these need to be kept for the output name, the only other number I want is the number at the end of the filename that will always refer to the revision number, however this number may be variable (i.e; it could be 0 or 0.1,1,1.1 etc.). The Rev number will ALWAYS follow the underscore in the original filename. All other numbers and letters etc. in the original filename need to be removed. I'm assuming the solution might include assigning a variable to just return the first 4 numbers (i.e; XX.XX) as a substring maybe, while assigning a variable to the last few characters that follow the "_". Then maybe combine the two and add the ".DXF" file extension.
LATEST UPDATE: Following the responses here, I've been able to get the functionality nearly exactly where I need it to be.
I've been using the regex provided below, and with some slight changes adapted it to allow for some other things (to allow for spaces after "rev" and to allow for the rev number to be separated by a dot if present, i.e; rev1.1 etc.), but currently struggling to find a way of simply returning "0" if no "rev" is present in the file name. For example, if a filename is as follows: 31.90 - SADDLE SHIM.DXF - I wish for the rename regex script to return 31.90_0. The expression I'm currently using is as follows: '(\d{2}\.\d{2}).*?rev(\s?\d+\.\d+|\s?\d+).*(?=\.DXF)', '$1_$2'
I have tried putting a pipeline (if) after the capture block following the "rev" and then putting (0) in a new capture block, but that's not working. Any feedback on this would be greatly appreciated. Thanks again for the replies.
It looks like this regex could do the trick to rename your files with your desired format: (?<=\.\d+)\s.+(?=_rev)|rev.
Get-ChildItem -Filter *-*_rev*.dxf |
Rename-Item -NewName { $_.Name -replace '(?<=\.\d+)\s.+(?=_rev)|rev' }
However the above assumes all files will start with some digits followed by a dot followed by more digits and may or may not be 5 digits including dots. It also assumes there will be a white space after the remaining digits. It also assumes the files will end with rev followed by more digits after it's dxf extension.
This regex could work too (?<=^[\d.]{5})\s.+(?=_rev)|rev, however this one assumes only will capture the first 5 digits including one or more dots.
Per your update, you could try using switch with the -regex option. $Matches will contain the matches and you can reference the match groups by using the group number as the key (e.g. $Matches[1]). You may also reference as a property (e.g., $Matches.1)
Get-ChildItem c:\temp\powershell\testrename -File |
Rename-Item -NewName {
switch -Regex ($_.Name) {
'(\d{2}\.\d{2}).*?rev(\s?\d+\.\d+|\s?\d+).*(?=\.DXF)' {
"$($Matches.1)_$($Matches.2).DXF"
break
}
'(\d{2}\.\d{2}).*(?=\.DXF)' {
"$($Matches.1)_0.DXF"
break
}
default {
$_
}
}
} -WhatIf
Remove -WhatIf once done testing to perform rename action

Need to batch convert a large quantity of text files from ANSI to Unicode

I have a lot of ANSI text files that vary in size (from a few KB up to 1GB+) that I need to convert to Unicode.
At the moment, this has been done by loading the files into Notepad and then doing "Save As..." and selecting Unicode as the Encoding. Obviously this is very time consuming!
I'm looking for a way to convert all the files in one hit (in Windows). The files are in a directory structure so it would need to be able to traverse the full folder structure and convert all the files within it.
I've tried a few options but so far nothing has really ticked all the boxes:
ansi2unicode command line utility. This has been the closest to what I'm after as it processes files recursively in a folder structure...but it keeps crashing whilst running before it's finished converting.
CpConverter GUI utility. Works OK to a point but struggles with multiple files in a folder structure - only seems to be able to handle files in one folder
There's a DOS command that works OK on smaller files but doesn't seem to be able to cope with large files.
Tried GnuWin sed utility but it crashes every time I try and install it
So I'm still looking! If anyone has any recommendations I'd be really grateful
Thanks...
OK, so in case anyone else is interested, I found a way to do this using PowerShell:
Get-ChildItem "c:\some path\" -Filter *.csv -recurse |
Foreach-Object {
Write-Host (Get-Date).ToString() $_.FullName
Get-Content $_.FullName | Set-Content -Encoding unicode ($_.FullName + '_unicode.csv')
}
This recurses through the entire folder structure and converts all CSV files to Unicode; the converted files are written to the same locations as the originals but with "unicode" appended to the filename. You can change the value of the -Encoding parameter if you want to convert to something different (e.g. utf-8).
It also outputs a list of all the files converted along with a timestamp against each

Powershell, rename-item doesn't work as expected

I have a bunch of jpg image files named in the following pattern:
0001-rand01_012.jpg
0002-rand03_034.jpg
I want to rename them by removing the first 5 characters to get the form:
rand01_012.jpg
etc..
I use the following command:
Get-ChildItem | Rename-Item -NewName {$_.name.Substring(5)}
When using this with -whatif flag i get the expected message saying:
Performing the operation "Rename File" on target "Item: C:\Users\xxxx\Documents\xx xx\temp2\0123-rand16_030.jpg Destination: C:\Users\
xxxx\Documents\xx xx\temp2\rand16_030.jpg".
But removing the whatif gives me errors of this type:
Rename-Item : The input to the script block for parameter 'NewName' failed. Exception calling "Substring" with "1" argument(s): "startIndex cannot be
larger than length of string.
followed by a whole bunch of:
Rename-Item : Cannot create a file when that file already exists.
The files themselves are renamed with random number of characters removed rather than 5 as was intended. So they have ended up like:
01.jpg
01.jpg
.
.
.
d14_001.jpg
etc.
I have used this command to rename such files in the past with success. The fact that I'm getting such random results is making me pull my hair out.
tl;dr
Make sure you only process the files of interest:
(Get-ChildItem -File [0-9][0-9][0-9][0-9]-*.jpg) |
Rename-Item -NewName {$_.name.Substring(5)} -WhatIf
The -WhatIf common parameter in the command above previews the operation. Remove -WhatIf once you're sure the operation will do what you want.
In PowerShell [Core] 6+, placing (...) around Get-ChildItem is no longer technically necessary, but advisable.[1]
That way:
You rule out unrelated files up front.
Even if something goes wrong, you can correct the problem and run the command again to reprocess only the failed files, without affecting the previously renamed files.
The most likely reason for something going wrong is more than 1 input file resulting in the same filename after removing the 5 first char.
It sounds like you've mistakenly run the command repeatedly, so you've cut off 5 chars. multiple times:
0001-rand01_01.jpg -> rand01_01.jpg -> _01.jpg
Once a filename has fewer than 5 chars., you'll get the the startIndex-related error, because the [string] class's .Substring() method doesn't accept an index beyond the length of the string (try 'ab'.Substring(3)).
That said, since you're running Get-ChildItem without a filter and therefore return all (non-hidden) child items, you may be processing unrelated files ore even directories whose names are too short.
The Cannot create a file when that file already exists. errors are just follow-on errors that result from the script block that normally returns the new name effectively returning the empty string, so Rename-Item is somewhat obscurely complaining that you can't rename a file to its current name.
That said, you can even get Cannot create a file when that file already exists errors during the first run, namely if more than 1 input file with its first 5 chars. chopped off results in the same filename.
E.g., 0001-rand01_012.jpg and 0002-rand01_012.jpg would both be renamed to rand01_012.jpg, which fails once the first one has been renamed.
That is, for your command to work as intended, all filenames that result from dropping the first 5 chars. must be unique.
Here's an MCVE (Minimal, Complete, and Verifiable Example):
Setup:
# Create and change to temp dir.
Set-Location (mkdir temp)
# Create sample input files named 0001-rand01_01.jpg, 0002-rand01_02.jpg, ...
# Note how the suffixes after the first 5 char. must be *unique*.
1..9 | %{ $null > "000${_}-rand01_0${_}.jpg" }
1st run:
# No errors
> (Get-ChildItem -File) | Rename-Item -NewName { $_.Name.Substring(5) }
# Show new names
> Get-ChildItem | Select Name
Name
----
rand01_01.jpg
rand01_02.jpg
rand01_03.jpg
rand01_04.jpg
rand01_05.jpg
rand01_06.jpg
rand01_07.jpg
rand01_08.jpg
rand01_09.jpg
A 2nd run yields:
Name
----
1_01.jpg
1_02.jpg
1_03.jpg
1_04.jpg
1_05.jpg
1_06.jpg
1_07.jpg
1_08.jpg
1_09.jpg
At the time of the 3rd run, all the names are too short, and all you'll get is Rename-Item : Cannot create a file when that file already exists. errors.
[1] Enclosing Get-ChildItem in (...) ensures that the matching files are collected in an array, up front, before Rename-Item is invoked.
This explicitly prevents already-renamed files from getting re-enumerated by Get-ChildItem and thus interfering with the iteration. Explicit use of (...) is technically no longer necessary in PowerShell [Core] 6+ (it is necessary in Windows PowerShell (5.1-)), because Get-ChildItem is implemented in a way that always internally collects info about all files up front, across platforms, because it sorts them by name, which is inherently only possible after all names have been collected.
In light of that, whether you use (...) or not should functionally amount to the same, although using (...) is advisable, because it doesn't rely on what amounts to an implementation detail (the documentation doesn't mention how the outputs are ordered).

Q: Powershell - read and report special characters from file

I've got a huge directory listing of files, and I need to see what special characters exist in the file names - specifically nonstandard characters like you'd get using ALT codes.
I can export a directory listing to a file easily enough with:
get-childitem -path D:\files\ -File -Recurse >output.txt
What I need to do however, is pull out the special characters, and only the special characters from the text file. The only way I can think to easily quantify everything "special" (since there are a ton of possibilities in the that character set) would be to compare the text against a list of characters I'd want to keep, stored in a joined variable (a-z, 0-9, etc)
I can't quite figure out how to pull out the "good" characters, leaving only the special ones. Any ideas on where to start?
I take "special" characters to be anything that falls outside US ASCII.
That basically means any character with a numerical value of 128 or more, easy to inspect in a Where-Object filter:
Get-ChildItem -File -Recurse |Where-Object {
$_.Name.ToCharArray() -gt 127
}
This will return all files containing "special" characters in their name.
If you want to extract the special characters themselves, per file, use ForEach-Object:
Get-ChildItem -File -Recurse |ForEach-Object {
if(($Specials = $_.Name.ToCharArray() -gt 127)){
New-Object psobject -Property #{File=$_.FullName;Specials=$(-join $Specials)}
}
}
Look at piping your results to Select-String. With Select-String you can specify a list of regex values to search for.

Rename Files with Index(Excel)

Anyone have any ideas on how to rename files by finding an association with an index file?
I have a file/folder structure like the following:
Folder name = "Doe, John EO11-123"
Several files under this folder
The index file(MS Excel) has several columns. It contains the names in 2 columns(First and Last). It also has a column containing the number EO11-123.
What I would like to do is write maybe a script to look at the folder names in a directory, compare/find an associated value in the index file(like that number EO11-123) and then rename all the files under the folder using a 4th column value in the index.
So,
Folder name = "Doe, John EO11-123", index column1 contains same value "EO11-123", use column2 value "111111_000000" and rename all the files under that directory folder to "111111_000000_0", "111111_000000_1", "111111_000000_2" and so on.
This possible with powershell or vbscript?
Ok, I'll answer your questions in your comment first. Importing the data into PowerShell allows you to make an array in powershell that you can match against, or better yet make a HashTable to reference for your renaming purposes. I'll get into that later, but it's way better than trying to have PowerShell talk to Excel and use Excel's search functions because this way it's all in PowerShell and there's no third party application dependencies. As for importing, that script is a function that you can load into your current session, so you run that function and it will automatically take care of the import for you (it opens Excel, then opens the XLS(x) file, saves it as a temp CSV file, closes Excel, imports that CSV file into PowerShell, and then deletes the temp file).
Now, you did not state what your XLS file looks like, so I'm going to assume it's got a header row, and looks something like this:
FirstName | Last Name | Identifier | FileCode
Joe | Shmoe | XA22-573 | JS573
John | Doe | EO11-123 | JD123
If that's not your format, you'll need to either adapt my code, or your file, or both.
So, how do we do this? First, download, save, and if needed unblock the script to Import-XLS. Then we will dot source that file to load the function into the current PowerShell session. Once we have the function we will run it and assign the results to a variable. Then we can make an empty hashtable, and for each record in the imported array create an entry in the hashtable where the 'Identifier' property (in your example above that would be the one that has the value "EO11-123" in it), make that the Key, then make the entire record the value. So, so far we have this:
#Load function into current session
. C:\Path\To\Import-XLS.ps1
$RefArray = Import-XLS C:\Path\To\file.xls
$RefHash = #{}
$RefArray | ForEach( $RefHash.Add($_.Identifier, $_)}
Now you should be able to reference the identifier to access any of the properties for the associated record such as:
PS C:\> $RefHash['EO11-123'].FileCode
JD123
Now, we just need to extract that name from the folder, and rename all the files in it. Pretty straight forward from here.
Get-ChildItem c:\Path\to\Folders -directory | Where{$_.Name -match "(?<= )(\S+)$"}|
ForEach{
$Files = Get-ChildItem $_.FullName
$NewName = $RefHash['$($Matches[1])'].FileCode
For($i = 1;$i -lt $files.count;$i++){
$Files[$i] | Rename-Item -New "$NewName_$i"
}
}
Edit: Ok, let's break down the rename process here. It is a lot of piping here, so I'll try and take it step by step. First off we have Get-ChildItem that gets a list of folders for the path you specify. That part's straight forward enough. Then it pipes to a Where statement, that filters the results checking each one's name to see if it matches the Regular Expression "(?<= )(\S+)$". If you are unfamiliar with how regular expressions work you can see a fairly good breakdown of it at https://regex101.com/r/zW8sW1/1. What that does is matches any folders that have more than one "word" in the name, and captures the last "word". It saves that in the automatic variable $Matches, and since it captured text, that gets assigned to $Matches[1]. Now the code breaks down here because your CSV isn't laid out like I had assumed, and you want the files named differently. We'll have to make some adjustments on the fly.
So, those folder that pass the filter will get piped into a ForEach loop (which I had a typo in previously and had a ( instead of {, that's fixed now). So for each of those folders it starts off by getting a list of files within that folder and assigning them to the variable $Files. It also sets up the $NewName variable, but since you don't have a column in your CSV named 'FileCode' that line won't work for you. It uses the $Matches automatic variable that I mentioned earlier to reference the hashtable that we setup with all of the Identifier codes, and then looks at a property of that specific record to setup the new name to assign to files. Since what you want and what I assumed are different, and your CSV has different properties we'll re-work both the previous Where statement, and this line a little bit. Here's how that bit of the script will now read:
Get-ChildItem c:\Path\to\Folders -directory | Where{$_.Name -match "^(.+?), .*? (\S+)$"}|
ForEach{
$Files = Get-ChildItem $_.FullName
$NewName = $Matches[2] + "_" + $Matches[1]
That now matches the folder name in the Where statement and captures 2 things. The first thing it grabs is everything at the beginning of the name before the comma. Then it skips everything until it gets tho the last piece of text at the end of the name and captures everything after the last space. New breakdown on RegEx101: https://regex101.com/r/zW8sW1/2
So you want the ID_LName, which can be gotten from the folder name, there's really no need to even use your CSV file at this point I don't think. We build the new name of the files based off the automatic $Matches variable using the second capture group and the first capture group and putting an underscore between them. Then we just iterate through the files with a For loop basing it off how many files were found. So we start with the first file in the array $Files (record 0), add that to the $NewName with an underscore, and use that to rename the file.