I have a HTML file with a load of links in it.
They are in the format
http:/oldsite/showFile.asp?doc=1234&lib=lib1
I'd like to replace them with
http://newsite/?lib=lib1&doc=1234
(1234 and lib1 are variable)
Any idea on how to do that?
Thanks
P
I don't think your examples are correct.
http:/oldsite/showFile.asp?doc=1234&lib=lib1 should be
http:/oldsite/showFile.asp?doc=1234&lib=lib1
and
http://newsite/?lib=lib1&doc=1234 should be http://newsite?lib=lib1&doc=1234
To do the replacement on these, you can do
'http:/oldsite/showFile.asp?doc=1234&lib=lib1' -replace 'http:/oldsite/showFile\.asp\?(doc=\d+)&(lib=\w+)', 'http://newsite?$2&$1'
which returns http://newsite?lib=lib1&doc=1234
To replace these in a file you can use:
(Get-Content -Path 'X:\TheHtmlFile.html' -Raw) -replace 'http:/oldsite/showFile\.asp\?(doc=\d+)&(lib=\w+)', 'http://newsite?$2&$1' |
Set-Content -Path 'X:\TheNewHtmlFile.html'
Regex details:
http:/oldsite/showFile Match the characters “http:/oldsite/showFile” literally
\. Match the character “.” literally
asp Match the characters “asp” literally
\? Match the character “?” literally
( Match the regular expression below and capture its match into backreference number 1
doc= Match the characters “doc=” literally
\d Match a single digit 0..9
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
& Match the character “&” literally
( Match the regular expression below and capture its match into backreference number 2
lib= Match the characters “lib=” literally
\w Match a single character that is a “word character” (letters, digits, etc.)
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
Read in the file, loop through each line and replace the old value with the new value, send the output to the a new file:
gc file.html | % { $_.Replace('oldsite...','newsite...') } | out-file new-file.html
Related
SELECT * FROM "main_parse_user"
WHERE ("main_parse_user"."bio"::text ~* '\mFounder of JoJoWorld | Python'
OR "main_parse_user"."first_name"::text ~* '\mFounder of JoJoWorld | Python')
I'm looking for text with this code
And sometimes such words with '|'
How can I make it so that '|' treated like a normal line
But with text without such characters, everything works correctly
You'll have to escape characters that have a special meaning in regular expressions with a backslash to deprive them of their special meaning. Per the documentation:
\k (where k is a non-alphanumeric character) matches that character taken as an ordinary character, e.g., \\ matches a backslash character
What are the good ways of replacing uppercase letters to lower case letters while leaving only numbers and dashes from the strings with PowerShell?
We are automating deployment of Azure resources based on some strings. A lot of Azure Resources can only have in their names combinations of lowercase letters, numbers and dashes.
From the docs, valid names are:
Lowercase letters, numbers, and hyphens. Can't start or end with hyphen. Consecutive hyphens aren't allowed. Length 1-63 characters
To make sure a proposed name obides by these rules, one way would be to use a series of -replace actions, followed by trimming any possible hyphens at the beginning or start of the string, convert to lowercase and finally truncate the remaining string to the 63 character limit.
Something like this:
$proposedName = '_Container___Name-123-'
$containerName = (($proposedName -replace '[^-a-z0-9]', '-' -replace # any character not hyphen, letter or digit --> '-'
'-{2,}', '-').Trim("-").ToLower() -replace # remove leading and trailing hyphens and convert to lowercase
'(.{63}).*', '$1').TrimEnd("-") # truncate to 63 characters and trim any trailing hyphens
$containerName # --> container-name-123
I am trying to remove parts of a name "- xx_xx" from the end of multiple files. I'm using this and it works well.
dir | Rename-Item -NewName { $_.Name -replace " - xx_xx","" }
However, there are other parts like:
" - yy_yy"
" - zz_zz"
What can I do to remove all of these at once instead of running it again and again changing the part of the name I want removed?
Easiest way
You can keep on stringing -replace statements until the cows come home, if you need to.
$myLongFileName = "Something xx_xx yy_yy zz_zz" -replace "xx_xx","" -replace "yy_yy"
More Terse Syntax
If every file has these, you can also make an array of pieces you want to replace, like this, just separating them with commas.
$stuffWeDontWantInOurFile =#("xx_xx", "yy_yy", "zz_zz")
$myLongFileName -replace $stuffWeDontWantInOurFile, ""
Yet another way
If your file elements are separated by spaces or dashes or something predictable, you can split the file name on that.
$myLongFileName = "Something xx_xx yy_yy zz_zz"
PS> $myLongFileName.Split()
Something
xx_xx
yy_yy
zz_zz
PS> $myLongFileName.Split()[0] #select just the first piece
Something
For spaces, you use the Spit() method with no overload inside of it.
If it were dashes or another character, you'd provide it like so Split("-"). Between these techniques, you should be able to do what you want to do.
If as you say, the pattern - xx_xx is always at the end of the file name, I'd suggest using something like this:
Get-ChildItem -Path '<TheFolderWhereTheFilesAre>' -File |
Rename-Item -NewName {
'{0}{1}' -f ($_.BaseName -replace '\s*-\s*.._..$'), $_.Extension
} -WhatIf
Remove the -WhatIf switch if you are satisfied with the results shown in the console
Result:
D:\Test\blah - xx_yy.txt --> D:\Test\blah.txt
D:\Test\somefile - zy_xa.txt --> D:\Test\somefile.txt
Regex details:
\s Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
- Match the character “-” literally
\s Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
. Match any single character that is not a line break character
. Match any single character that is not a line break character
_ Match the character “_” literally
. Match any single character that is not a line break character
. Match any single character that is not a line break character
$ Assert position at the end of the string (or before the line break at the end of the string, if any)
Need to replace strings after pattern matching. Using powershell v4.
Log line is -
"08:02:37.961" level="DEBUG" "Outbound message: [32056][Sent: HTTP]" threadId="40744"
Need to remove level and threadId completely. Expected line is -
"08:02:37.961" "Outbound message: [32056][Sent: HTTP]"
Have already tried following but did not work -
$line.Replace('level="\w+"','')
AND
$line.Replace('threadId="\d+"','')
Help needed with correct replace command. Thanks.
Try this regex:
$line = "08:02:37.961" level="DEBUG" "Outbound message: [32056][Sent: HTTP]" threadId="40744"
$line -replace '(\s*(level|threadId)="[^"]+")'
Result:
"08:02:37.961" "Outbound message: [32056][Sent: HTTP]"
Regex details:
( # Match the regular expression below and capture its match into backreference number 1
\s # Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
( # Match the regular expression below and capture its match into backreference number 2
# Match either the regular expression below (attempting the next alternative only if this one fails)
level # Match the characters “level” literally
| # Or match regular expression number 2 below (the entire group fails if this one fails to match)
threadId # Match the characters “threadId” literally
)
=" # Match the characters “="” literally
[^"] # Match any character that is NOT a “"”
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
" # Match the character “"” literally
)
.replace() doesn't use regex. https://learn.microsoft.com/en-us/dotnet/api/system.string.replace?view=netframework-4.8 -replace does.
I am trying to understand how regex works and what are the possibilities of working with it.
So I have a txt file and I am trying to search for 8 char long strings containing numbers. for now I use a quite simple option:
clear
Get-ChildItem random.txt | Select-String -Pattern [0-9][a-z] | foreach {$_.line}
It sort of works but I am trying to find a better option. ATM it takes too long to read through the left out text since it writes entire lines and it does not filter them by length.
You can use a lookahead to assert that a string contains at least 1 digit, then specify the length of the match and finally anchor it with ^ (start of string) and $ (end of string) if the string is on a line of its own, or \b (word boundary) if it's part of an HTML document as your comments seem to suggest:
Get-ChildItem C:\files\ |Select-String -Pattern '^(?=.*\d)\w{8}$'
Get-ChildItem C:\files\ |Select-String -Pattern '\b(?=.*\d)\w{8}\b'
The pattern [0-9][a-z] matches a digit followed by a letter. If you want to match a sequence of 8 characters use .{8}. The dot in regular expressions matches any character except newlines. A number in curly brackets matches the preceding expression the given number of times.
If you want to match non-whitespace characters use \S instead of .. If you want to match only digits and letters use [0-9a-z] (a character class) instead of ..
For a more thorough introduction please go find a tutorial. The subject is way too complex to be covered by a single answer on SO.
What you're currently searching for is a single number ranging from 0-9 followed by a single lowercase letter ranging from a-z.
this, for example, will match any 8 char long strings containing only alphanumeric characters.
\w{8}
i often forget what some regex classes are, and it may be useful to you as a learning tool, but i use this as a point of reference: http://regexr.com/
It can also validate what you're typing inline via a text field so you can see if what you're doing works or not.
If you need more of a tutorial than a reference, i found this extremely useful when i learned: regexone.com