How to strip illegal characters before trying to save filenames?

How to strip illegal characters before trying to save filenames? - powershell

I was able to find how to use the GetInvalidFileNameChars() method in a PowerShell script. However, it seems to also filter out whitespace (which is what I DON'T want).
EDIT: Maybe I'm not asking this clearly enough. I want the below function to INCLUDE the spaces that already existing in filenames. Currently, the script filters out spaces.
Function Remove-InvalidFileNameChars {
param([Parameter(Mandatory=$true,
Position=0,
ValueFromPipeline=$true,
ValueFromPipelineByPropertyName=$true)]
[String]$Name
)
return [RegEx]::Replace($Name, "[{0}]" -f ([RegEx]::Escape([String][System.IO.Path]::GetInvalidFileNameChars())), '')}

Casting the character array to System.String actually seems to join the array elements with spaces, meaning that
[string][System.IO.Path]::GetInvalidFileNameChars()
does the same as
[System.IO.Path]::GetInvalidFileNameChars() -join ' '
when you actually want
[System.IO.Path]::GetInvalidFileNameChars() -join ''
As #mjolinor mentioned (+1), this is caused by the output field separator ($OFS).
Evidence:
PS C:\> [RegEx]::Escape([string][IO.Path]::GetInvalidFileNameChars())
"\ \ \|\ \ ☺\ ☻\ ♥\ ♦\ ♣\ ♠\ \\ \t\ \n\ ♂\ \f\ \r\ ♫\ ☼\ ►\ ◄\ ↕\ ‼\ ¶\ §\ ▬\ ↨\ ↑\ ↓\ →\ ←\ ∟\ ↔\ ▲\ ▼\ :\ \*\ \?\ \\\ /
PS C:\> [RegEx]::Escape(([IO.Path]::GetInvalidFileNameChars() -join ' '))
"\ \ \|\ \ ☺\ ☻\ ♥\ ♦\ ♣\ ♠\ \\ \t\ \n\ ♂\ \f\ \r\ ♫\ ☼\ ►\ ◄\ ↕\ ‼\ ¶\ §\ ▬\ ↨\ ↑\ ↓\ →\ ←\ ∟\ ↔\ ▲\ ▼\ :\ \*\ \?\ \\\ /
PS C:\> [RegEx]::Escape(([IO.Path]::GetInvalidFileNameChars() -join ''))
"\| ☺☻♥♦\t\n♂\f\r♫☼►◄↕‼¶§▬↨↑↓→←∟↔▲▼:\*\?\\/
PS C:\> $OFS=''
PS C:\> [RegEx]::Escape([string][IO.Path]::GetInvalidFileNameChars())
"\| ☺☻♥♦\t\n♂\f\r♫☼►◄↕‼¶§▬↨↑↓→←∟↔▲▼:\*\?\\/
Change your function to something like this:
Function Remove-InvalidFileNameChars {
param(
[Parameter(Mandatory=$true,
Position=0,
ValueFromPipeline=$true,
ValueFromPipelineByPropertyName=$true)]
[String]$Name
)
$invalidChars = [IO.Path]::GetInvalidFileNameChars() -join ''
$re = "[{0}]" -f [RegEx]::Escape($invalidChars)
return ($Name -replace $re)
}
and it should do what you want.

My current favourite way to accomplish this is:
$Path.Split([IO.Path]::GetInvalidFileNameChars()) -join '_'
This replaces all invalid characters with _ and is very human readable, compared to alternatives such as:
$Path -replace "[$([RegEx]::Escape([string][IO.Path]::GetInvalidFileNameChars()))]+","_"

I suspect it has to do with non-display characters being coerced to [string] for the regex operation (and ending up expressed as spaces).
See if this doesn't work better:
([char[]]$name | where { [IO.Path]::GetinvalidFileNameChars() -notcontains $_ }) -join ''
That will do a straight char comparison, and seems to be more reliable (embedded spaces are not removed).
$name = 'abc*\ def.txt'
([char[]]$name | where { [IO.Path]::GetinvalidFileNameChars() -notcontains $_ }) -join ''
abc def.txt
Edit - I believe #Ansgar is correct about the space being caused by casting the character array to string. The space is being introduced by $OFS.

I wanted spaces to replace all the illegal characters so space is replaced with space
$Filename = $ADUser.SamAccountName
[IO.Path]::GetinvalidFileNameChars() | ForEach-Object {$Filename = $Filename.Replace($_," ")}
$Filename = "folder\" + $Filename.trim() + ".txt"

Please try this one-liner with the same underlying function.
to match
'?Some "" File Name <:.txt' -match ("[{0}]"-f (([System.IO.Path]::GetInvalidFileNameChars()|%{[regex]::Escape($_)}) -join '|'))
to replace
'?Some "" File Name <:.txt' -replace ("[{0}]"-f (([System.IO.Path]::GetInvalidFileNameChars()|%{[regex]::Escape($_)}) -join '|')),'_'

[System.IO.Path]::GetInvalidFileNameChars() returns an array of invalid chars. If it is returning the space character for you (which it does not do for me), you could always iterate over the array and remove it.
> $chars = #()
> foreach ($c in [System.IO.Path]::GetInvalidFileNameChars())
{
if ($c -ne ' ')
{
$chars += $c
}
}
Then you can use $chars as you would have used the output from GetInvalidFileNameChars().

Very slightly different, combined, flexible approach. I was finding that GetInvalidFileNameChars() was not getting all the illegal chars for my needs.
$arrInvalidChars = '[]/|\+={}-$%^&*()'.ToCharArray()
$cleanName = 'a[]|\+={9}-$%^&*()\b'
$arrInvalidChars | % { $cleanName = $cleanName.replace($_,'_')}
Returns $cleanName = 'a_______9__________b'

Related

How to format the List output in specific format in Powershell?

I'm trying to create a list of Id's in specific format as shown below:
RequiredFormat:
{id1},{id2}
My code:
[System.Collections.Generic.List[System.String]]$IDList = #()
foreach ($key in $keys) {
$id = $sampleHashTable[$key]
$IDList.Add($id)
}
echo $IDList
My output:
id1
id2
How to convert my output into required format?

To complement Mathias' helpful answer with a PowerShell (Core) 7+ alternative, using the Join-String cmdlet:
# Sample input values
[System.Collections.Generic.List[System.String]] $IDList = 'id1', 'id2'
# -> '{id1},{id2}'
$IDList | Join-String -FormatString '{{{0}}}' -Separator ','
-FormatString accepts a format string as accepted by the .NET String.Format method, as also used by PowerShell's -foperator, with placeholder {0} representing each input string; literal { characters must be escaped by doubling, which is why the enclosing { and } are represented as {{ and }}.
Alternatives that work in Windows PowerShell too:
Santiago Squarzon proposes this:
'{{{0}}}' -f ($IDList -join '},{')
Another - perhaps somewhat obscure - option is to use the regex-based -replace operator:
$IDList -replace '^', '{' -replace '$', '}' -join ','

You can surround each list item in {} and then join them all together like this:
$IDList.ForEach({"{${_}}"}) -join ','
If you want to avoid empty strings in the list, remember to test whether the key actually exists in the hashtable before adding to the list:
foreach ($key in $keys) {
if ($sampleHashTable.ContainsKey[$key]){
$id = $sampleHashTable[$key]
$IDList.Add($id)
}
}

PowerShell split input and replace/combine?

I want to be able to automatically generate an output if I run a PowerShell script that splits the input by a period "." and adds "DC=" in each item/object that has been split.
$DomainFQDN = "prod.mydomain.com" # This varies depending on the input. It could be "prod.boston.us.mydomain.com" as the input.
$DistinguishedName = $DomainFQDN -split "\."
...
...
...I just don't know how to proceed
How do I get an output of "DC=prod,DC=mydomain,DC=com" for prod.mydomain.com as the input or DC=prod,DC=boston,DC=us,DC=mydomain,DC=com for prod.boston.us.mydomain.com?

Well, you can use foreach construct with $DistinguishedName and use -join like this (if you want to output directly the joined string):
$AddDC = foreach ($e in $DistinguishedName) { "DC=$e" }
Write-Host $($AddDC -join ",")
-join works like -split, you just specify the character that you need to join by.
Other way to do it is to store $AddDC
$AddDC = foreach ($e in $DistinguishedName) { "DC=$e" }
$new_string = $AddDC -join ","
Write-Host $new_string
You can consult this page for more info.

If I got it right, this is what needed:
$fqdn='prod.boston.us.mydomain.com'
$dn="DC=$($fqdn.replace('.',',DC='))"
$dn

$DomainFQDN = "prod.mydomain.com"
$DomainFQDN = $DomainFQDN.Split(".")
For ($i = 0; $i -lt $DomainFQDN.Count; $i++) {
$DomainFQDN[$i] = "DC=" + $DomainFQDN[$i]
}
$DomainFQDN = $DomainFQDN -join ","
Write-Host $DomainFQDN
Output:
DC=prod,DC=mydomain,DC=com
I'm not sure why I couldn't get the -Split "." operator to work. It should function the same as .Split(".") but for some reason it gives a different result. But anyway this should work for you.
I feel like I should mention that a real FQDN would not be DC= on every line. It would look more like:
DC=Com,DC=MyDomain,OU=Prod

I usually do a single replace operation in an expandable to convert from FQDN to distinguished name of the domain root:
$DistinguishedName = "DC=$($DomainFQDN.TrimEnd('.') -replace '\.',',DC=')"
The TrimEnd('.') call strips any dot from rooted FQDNs, and the replace operation replaces each remaining dot with ,DC=

Is there a better way to convert all control characters to entities in PowerShell 5?

Context: Azure, Windows Server 2012, PowerShell 5
I've got the following code to convert all control characters (ascii and unicode whitespace other than \x20 itself) to their ampersand-hash equivalents.
function ConvertTo-AmpersandHash {
param ([Parameter(Mandatory)][String]$Value)
# there's got to be a better way of doing this.
$AMPERHASH = '&#'
$SEMICOLON = ';'
for ($i = 0x0; $i -lt 0x20; $i++) {
$value = $value -replace [char]$i,($AMPERHASH + $i + $SEMICOLON)
}
for ($i = 0x7f; $i -le 0xa0; $i++) {
$value = $value -replace [char]$i,($AMPERHASH + $i + $SEMICOLON)
}
return $Value
}
As can be seen by the embedded comment, I'm sure there's a better way to do this. As it stands, one does some 65 iterations for each incoming string. Would regular expressions work better/faster?
LATER
-replace '([\x00-\x1f\x7f-\xa0])',('&#' + [byte][char]$1 + ';')
looks promising but the $1 is evaluating to zero all the time, giving me  all the time.
LATER STILL
Thinking that -replace couldn't internally iterate, I came up with
$t = [char]0 + [char]1 + [char]2 + [char]3 + [char]4 + [char]5 + [char]6
$r = '([\x00-\x1f\x7f-\xa0])'
while ($t -match [regex]$r) {
$t = $t -replace [regex]$r, ('&#' + [byte][char]$1 + ';')
}
echo $t
However out of that I still get

FINALLY
function ConvertTo-AmpersandHash {
param ([Parameter(Mandatory)][String]$Value)
$AMPERHASH = '&#'
$SEMICOLON = ';'
$patt = '([\x00-\x1f\x7f-\xa0]{1})'
while ($Value -match [regex]$patt) {
$Value = $Value -replace $Matches[0], ($AMPERHASH + [byte][char]$Matches[0] + $SEMICOLON)
}
return $Value
}
That works better. Faster too. Any advances on that?

Kory Gill's answer with the library call is surely a better approach, but to address your regex question, you can't evaluate code in the replacement with the -replace operator.
To do that, you need to use the .Net regex replace method, and pass it a scriptblock to evaluate the replacement, which takes a parameter of the match. e.g.
PS C:\> [regex]::Replace([string][char]2,
'([\x00-\x20\x7f-\xa0])',
{param([string]$m) '&#' + [byte][char]$m + ';'})

Your question is a little unclear to me, and could be a duplicate of What is the best way to escape HTML-specific characters in a string (PowerShell)?.
It would be nice if you explicitly stated the exact string you have and what you want it to converted to. One has to read the code to try to guess.
I am guessing one or more of these functions will do what you want:
$a = "http://foo.org/bar?baz & also <value> conversion"
"a"
$a
$b = [uri]::EscapeDataString($a)
"b"
$b
$c = [uri]::UnescapeDataString($b)
"c"
$c
Add-Type -AssemblyName System.Web
$d = [System.Web.HttpUtility]::HtmlEncode($a)
"d"
$d
$e = [System.Web.HttpUtility]::HtmlDecode($d)
"e"
$e
Gives:
a
http://foo.org/bar?baz & also <value> conversion
b
http%3A%2F%2Ffoo.org%2Fbar%3Fbaz%20%26%20also%20%3Cvalue%3E%20conversion
c
http://foo.org/bar?baz & also <value> conversion
d
http://foo.org/bar?baz & also <value> conversion
e
http://foo.org/bar?baz & also <value> conversion

I have one small function which helps me replacing as per my requirement:
$SpecChars are all the characters that are going to be replaced with nothing
Function Convert-ToFriendlyName
{param ($Text)
# Unwanted characters (includes spaces and '-') converted to a regex:
$SpecChars = '\', ' ','\\'
$remspecchars = [string]::join('|', ($SpecChars | % {[regex]::escape($_)}))
# Convert the text given to correct naming format (Uppercase)
$name = (Get-Culture).textinfo.totitlecase(“$Text”.tolower())
# Remove unwanted characters
$name = $name -replace $remspecchars, ""
$name
}
Example: Convert-ToFriendlyName "My\Name\isRana\Dip " will result me "MyNameIsranaDip".
Hope it helps you.

(PowerShell) split string with escaped separator characters

The split module is often used to split Active Directory Distinguished Names and Canonical Names to RDNs conveniently forgetting about the escaped separator characters that might be used in OUs and CNs as:
Distinguished Name Example with an escaped comma:
CN=Test User,OU=Comma\,Test,OU=Test,DC=domain,DC=com
Canonical Name Example with an escaped slash:
Domain.com/Test/Slash\/Test/Test User
There are several splitting examples on the internet that do not even mention this trap which might work for a long time but sooner or later will cause a lot of pain troubleshooting this programming flaw .
I don’t think there is an easy way to correctly split escaped strings using a Regular Expression (see also: Is there a pure regex split of a string containing escape sequences?).
.

Using negative lookbehind:
$text = 'CN=Test User,OU=Comma\,Test,OU=Test,DC=domain,DC=com'
$text -split '(?<!\\),'
CN=Test User
OU=Comma\,Test
OU=Test
DC=domain
DC=com
$text = 'Domain.com/Test/Slash\/Test/Test User'
$text -split '(?<!\\)/'
Domain.com
Test
Slash\/Test
Test User

To summarize and complement the existing, helpful answers:
mjolinor's answer works well if you needn't worry about \\ appearing in the input as an escaped \.
If \\ were present, the solution would misinterpret the , in \\, as escaped (rather than an escaped \ followed by an unescaped ,).
iRon's own answer addresses that problem with a more sophisticated regex.
Additionally, you may want to remove the escape characters after splitting; building on a regex provided by Wiktor Stribiżew here and adding a -replace operation with regex \\(.):
PS> 'foo,bar\,baz,bang\\,last' -split '(?<=(?<!\\)(?:\\\\)*),' -replace '\\(.)', '$1'
foo
bar,baz
bang\
last
Here's a simple utility function that wraps the above, with a configurable separator and escape char.:
function Split-Text {
param(
[Parameter(Mandatory=$True)] [string] $Text,
[Parameter(Mandatory=$True)] [string] $Separator,
[string] $EscapeChar = '\'
)
$Text -split
('(?<=(?<!{0})(?:{0}{0})*){1}' -f [regex]::Escape($EscapeChar), [regex]::Escape($Separator)) `
-replace ('{0}(.)' -f [regex]::Escape($EscapeChar)), '$1'
}
# Sample call - yields the same as above.
Split-Text 'foo,bar\,baz,bang\\,last' ','
# With "/" as the separator - analogous output.
Split-Text 'foo/bar\/baz/bang\\/last' '/'

I think there is still a little trap as RNDs could potentially end with a backslash (which will be escaped with an additional backslash):
$text = 'CN=Test User,OU=EndSlash\\,OU=Comma\,Test,DC=domain,DC=com'
$text -split '(?<!\\),'
CN=Test User
OU=EndSlash\\,OU=Comma\,Test
DC=domain
DC=com
In other words the concerned separator should only be skipped if there is an odd number of backslashes in front of it.
To cover this, I think the complete regular expressions should be:
(?<![^\\](\\\\)*\\), (for Distinguished Names) and
(?<![^\\](\\\\)*\\)/ (for Canonical Names).
$text = 'CN=Test User,OU=EndSlash\\,OU=Comma\,Test,DC=domain,DC=com'
$text -split '(?<![^\\](\\\\)*\\),'
CN=Test User
OU=EndSlash\\
OU=Comma\,Test
DC=domain
DC=com
$text = 'Domain.com/Slash\/Test/EndSlash\\/Test/Test User'
$text -split '(?<![^\\](\\\\)*\\)/'
Domain.com
Slash\/Test
EndSlash\\
Test
Test User

Therefore I have created a little cmdlet that adds an escape feature to the existing split module:
Function Split {
param(
[Parameter(Mandatory = $True, ValueFromPipeline = $true)][String]$String,
[Parameter(Mandatory = $False, Position = 0)][String]$Delimiter = " ",
[Parameter(Mandatory = $False, Position = 1)][Int]$MaxSubstrings = 0,
[Parameter(Mandatory = $False, Position = 2)][String]$Escape,
[Parameter(Mandatory = $False, Position = 3)][String]$Options = ""
)
If ($Escape) {$String = $String.Replace("$Escape$Delimiter", [String][Char]27)}
$Split = $String -Split $Delimiter, $MaxSubstrings, $Options
If ($Escape) {$Split | ForEach {$_.Replace([String][Char]27, "$Escape$Delimiter")}} Else {$Split}
}
"CN=Test User,OU=Comma\,Test,OU=Test,DC=domain,DC=com" | Split "," -Escape "\"
"Domain.com/Test/Slash\/Test/Test User" | Split "/" -Escape "\"

Join statement in powershell

I am learning powershell, I got an example. But I am totally unable to understand it.
Here is code :
if($($wordProgress -join '') -like $targetWord)

Going back to your previous question, $wordProgress is a strong-typed array. So the $($wordProgress -join '') is joining the array values and comparing the joined value to $targetword.
Its in an If statement, so if it returns true, it will do whatever is in the proceeding {} block.
Here is an example of this in action.
[int[]]$nums = 1,2,3,4
Write-Host "Not Joined = "
$nums
Write-Host "Joined = "
($nums -join '')
If($($nums -join '') -like '1234'){
Write-host "Do something!"
}
Also, as you are new to Powershell, I recommend you start by learning about the Get-Help Cmdlet.
Here is how you would use it to learn about the -join operator
Get-help about_join

I'm guessing $wordProgress is a char-array(list of characters).
$() is a subexpression that is used to run something before continue processing.
-join '' joins the array values with a blank delimiter(so just add the values after eachother), to create a string.
-like matches the left side(the string created from the char-array) with the word on the right side.
This is all inside an if-test, so if the joined string matches the $targetWord, it would run the code that should come after your expression.
Sample:
PS > [char[]]$wordProgress = "a","b","c"
PS > $targetWord = 'abc'
PS > $($wordProgress -join '')
abc
PS > if($($l -join '') -like $targetWord) { "MATCH" }
MATCH