PowerShell. Remove Forbidden Characters - powershell

What are the good ways of replacing uppercase letters to lower case letters while leaving only numbers and dashes from the strings with PowerShell?
We are automating deployment of Azure resources based on some strings. A lot of Azure Resources can only have in their names combinations of lowercase letters, numbers and dashes.

From the docs, valid names are:
Lowercase letters, numbers, and hyphens. Can't start or end with hyphen. Consecutive hyphens aren't allowed. Length 1-63 characters
To make sure a proposed name obides by these rules, one way would be to use a series of -replace actions, followed by trimming any possible hyphens at the beginning or start of the string, convert to lowercase and finally truncate the remaining string to the 63 character limit.
Something like this:
$proposedName = '_Container___Name-123-'
$containerName = (($proposedName -replace '[^-a-z0-9]', '-' -replace # any character not hyphen, letter or digit --> '-'
'-{2,}', '-').Trim("-").ToLower() -replace # remove leading and trailing hyphens and convert to lowercase
'(.{63}).*', '$1').TrimEnd("-") # truncate to 63 characters and trim any trailing hyphens
$containerName # --> container-name-123

Related

How to replace a character using sed with different lengths in preceding string

I have a file in which I want to replace the "_" string with "-" in cases where it makes up a part of my gene name. Examples of the gene names and my intended output are:
aa1c1_123 -> aa1c1-123
aa1c2_456 -> aa1c1-456
aa1c10_789 -> aa1c1-789
In essence, the first four characters are fixed, followed by 1 or 2 characters depending on the chromosome, an underscore and then the remainder of the gene ID which could vary in length and character. Important is that there are other strings in this gene information column contains other strings with underscores (e.g. "gene_id", "transcript_id", "five_prime_utr") so using sed -i.bak s/_/-/g' file.gtf
can't be done.
Perhaps not the most elegant way, but this should work:
sed -i.bak 's/\([0-9a-z]\{4\}[0-9][0-9]\?\)_/\1-/g' file.gtf
i.e. capture a group (referenced by \1 in the substitution) of 4 characters consisting of lower case letters and digits followed by exactly one digit and perhaps another digit, which is followed by an underscore; if found, replace it by the group's content and a dash. This should exclude your other occurrences consisting of only characters and an underscore.

Find and replace a href value with PowerShell?

I have a HTML file with a load of links in it.
They are in the format
http:/oldsite/showFile.asp?doc=1234&lib=lib1
I'd like to replace them with
http://newsite/?lib=lib1&doc=1234
(1234 and lib1 are variable)
Any idea on how to do that?
Thanks
P
I don't think your examples are correct.
http:/oldsite/showFile.asp?doc=1234&lib=lib1 should be
http:/oldsite/showFile.asp?doc=1234&lib=lib1
and
http://newsite/?lib=lib1&doc=1234 should be http://newsite?lib=lib1&doc=1234
To do the replacement on these, you can do
'http:/oldsite/showFile.asp?doc=1234&lib=lib1' -replace 'http:/oldsite/showFile\.asp\?(doc=\d+)&(lib=\w+)', 'http://newsite?$2&$1'
which returns http://newsite?lib=lib1&doc=1234
To replace these in a file you can use:
(Get-Content -Path 'X:\TheHtmlFile.html' -Raw) -replace 'http:/oldsite/showFile\.asp\?(doc=\d+)&(lib=\w+)', 'http://newsite?$2&$1' |
Set-Content -Path 'X:\TheNewHtmlFile.html'
Regex details:
http:/oldsite/showFile Match the characters “http:/oldsite/showFile” literally
\. Match the character “.” literally
asp Match the characters “asp” literally
\? Match the character “?” literally
( Match the regular expression below and capture its match into backreference number 1
doc= Match the characters “doc=” literally
\d Match a single digit 0..9
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
& Match the character “&” literally
( Match the regular expression below and capture its match into backreference number 2
lib= Match the characters “lib=” literally
\w Match a single character that is a “word character” (letters, digits, etc.)
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
Read in the file, loop through each line and replace the old value with the new value, send the output to the a new file:
gc file.html | % { $_.Replace('oldsite...','newsite...') } | out-file new-file.html

How would I change multiple filenames in Powershell?

I am trying to remove parts of a name "- xx_xx" from the end of multiple files. I'm using this and it works well.
dir | Rename-Item -NewName { $_.Name -replace " - xx_xx","" }
However, there are other parts like:
" - yy_yy"
" - zz_zz"
What can I do to remove all of these at once instead of running it again and again changing the part of the name I want removed?
Easiest way
You can keep on stringing -replace statements until the cows come home, if you need to.
$myLongFileName = "Something xx_xx yy_yy zz_zz" -replace "xx_xx","" -replace "yy_yy"
More Terse Syntax
If every file has these, you can also make an array of pieces you want to replace, like this, just separating them with commas.
$stuffWeDontWantInOurFile =#("xx_xx", "yy_yy", "zz_zz")
$myLongFileName -replace $stuffWeDontWantInOurFile, ""
Yet another way
If your file elements are separated by spaces or dashes or something predictable, you can split the file name on that.
$myLongFileName = "Something xx_xx yy_yy zz_zz"
PS> $myLongFileName.Split()
Something
xx_xx
yy_yy
zz_zz
PS> $myLongFileName.Split()[0] #select just the first piece
Something
For spaces, you use the Spit() method with no overload inside of it.
If it were dashes or another character, you'd provide it like so Split("-"). Between these techniques, you should be able to do what you want to do.
If as you say, the pattern - xx_xx is always at the end of the file name, I'd suggest using something like this:
Get-ChildItem -Path '<TheFolderWhereTheFilesAre>' -File |
Rename-Item -NewName {
'{0}{1}' -f ($_.BaseName -replace '\s*-\s*.._..$'), $_.Extension
} -WhatIf
Remove the -WhatIf switch if you are satisfied with the results shown in the console
Result:
D:\Test\blah - xx_yy.txt --> D:\Test\blah.txt
D:\Test\somefile - zy_xa.txt --> D:\Test\somefile.txt
Regex details:
\s Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
- Match the character “-” literally
\s Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
. Match any single character that is not a line break character
. Match any single character that is not a line break character
_ Match the character “_” literally
. Match any single character that is not a line break character
. Match any single character that is not a line break character
$ Assert position at the end of the string (or before the line break at the end of the string, if any)

Text file search for match strings regex

I am trying to understand how regex works and what are the possibilities of working with it.
So I have a txt file and I am trying to search for 8 char long strings containing numbers. for now I use a quite simple option:
clear
Get-ChildItem random.txt | Select-String -Pattern [0-9][a-z] | foreach {$_.line}
It sort of works but I am trying to find a better option. ATM it takes too long to read through the left out text since it writes entire lines and it does not filter them by length.
You can use a lookahead to assert that a string contains at least 1 digit, then specify the length of the match and finally anchor it with ^ (start of string) and $ (end of string) if the string is on a line of its own, or \b (word boundary) if it's part of an HTML document as your comments seem to suggest:
Get-ChildItem C:\files\ |Select-String -Pattern '^(?=.*\d)\w{8}$'
Get-ChildItem C:\files\ |Select-String -Pattern '\b(?=.*\d)\w{8}\b'
The pattern [0-9][a-z] matches a digit followed by a letter. If you want to match a sequence of 8 characters use .{8}. The dot in regular expressions matches any character except newlines. A number in curly brackets matches the preceding expression the given number of times.
If you want to match non-whitespace characters use \S instead of .. If you want to match only digits and letters use [0-9a-z] (a character class) instead of ..
For a more thorough introduction please go find a tutorial. The subject is way too complex to be covered by a single answer on SO.
What you're currently searching for is a single number ranging from 0-9 followed by a single lowercase letter ranging from a-z.
this, for example, will match any 8 char long strings containing only alphanumeric characters.
\w{8}
i often forget what some regex classes are, and it may be useful to you as a learning tool, but i use this as a point of reference: http://regexr.com/
It can also validate what you're typing inline via a text field so you can see if what you're doing works or not.
If you need more of a tutorial than a reference, i found this extremely useful when i learned: regexone.com

Check characters inside string for their Unicode value

I would like to replace characters with certain Unicode values in a variable with dash. I have two ideas which might work, but I do not know how to check for the value of character:
1/ processing variable as string, checking every characters value and placing these characters in a new variable (replacing those characters which are invalid)
2/ use these magic :-)
$variable = s/[$char_range]/-/g;
char_range should be similar to [0-9] or [A-Z], but it should be values for utf-8 characters. I need range from 0x00 to 0x7F to be exact.
The following expression should replace anything that is not ASCII with a hyphen, which is (I think) what you want to do:
s/[\N{U+0080}-\N{U+FFFF}]/-/g
There's no such thing as UTF-8 characters. There are only characters that you encode into UTF-8. Even then, you don't want to make ranges outside of the magical ones that Perl knows about. You're likely to get more than you expect.
To get the ordinal value for a character, use ord:
use utf8;
my $code_number = ord '😸'; # U+1F638
say sprintf "%#x", $code_number;
However, I don't think that's what you need. It sounds like you want to replace characters in the ASCII range with a -. You can specify ranges of code numbers:
s/[\000-\177]/-/g; # in octal
s/[\x00-\x7f]/-/g; # in hexadecimal
You can specify wide character ordinal values in braces:
s/[\x80-\x{10ffff}]/-/g; # wide characters, replace non-ASCII in this case
When the characters have a common property, you can use that:
s/\p{ASCII}/-/g;
However, if you are replacing things character for character, you might want a transliteration:
$string =~ tr/\000-\177/-/;