PowerShell: Replace underscores with either a period or # sign - powershell

In PowerShell, I'm looking to convert a SharePoint Location string (john_smith_domain_com) to the proper addressing of john.Smith#domain.com.
I'm not sure which is the best way to go about doing this.
I know $var.Replace("_",".") would work to replace all the "_"s with "."s, but that doesn't help the "#" going between the name and the domain.
Unfortunately, the domain isn't always the same, or I'd be able to just .replace it easily.

You can combine two -replace operations (this assumes a two-component domain name, such as domain.com):
# -> 'john.smith#domain.com'
'john_smith_domain_com' -replace '_(?![^_]+_[^_]+$)', '.' -replace '_', '#'
Regex _(?![^_]+_[^_]+$) matches all _ chars. except the second-to-last one.
For an explanation of the regex and the ability to interact with it, see this regex101.com page.
After all these have been replaced with ., only the second-to-last one is left, which can then be replaced with #
As JohnLBevan notes, if there are domain names with a varying number of components, you have two options:
If you can assume that all users names are in the form <firstName>.<lastName> (as governed by a policy), you can replace the second _ with #:
# -> 'john.smith#domain.co.uk'
'john_smith_domain_co_uk' -replace '^([^_]+)_([^_]+)_', '$1.$2#' -replace '_', '.'
Otherwise, as John notes:
You may want something that's aware of the more common TLDs / caters for more common domain formats. Some options are in this post.

Related

PowerShell Core Sorting Incorrectly [duplicate]

I started with Project Gutenberg's "The Complete Works of William Shakespeare by William Shakespeare", a UTF-8 text file available from http://www.gutenberg.org/ebooks/100. In PowerShell, I ran
Get-Content -Tail 50 $filename | Sort-Object -CaseSensitive
which - I believe - piped the last 50 lines (i.e., strings delimited by line breaks) of the file to Sort-Object, which was configured to sort alphabetically with strings beginning with lowercase letters before strings beginning with uppercase letters.
Why is the output in the following image (especially in the P's) not sorting according to the -CaseSensitive switch? What is a solution?
Note: This answer focuses on the general case of sorting entire strings (by all of their characters, not just by the first one).
You're looking for ordinal sorting, where characters are sorted numerically by their Unicode code point ("ASCII value") and therefore all uppercase letters, as a group, sort before all lowercase letters.
As of Windows PowerShell v5.1 / PowerShell Core v7.0, Sort-Object invariably uses lexical sorting[1] (using the invariant culture by default, but this can be changed with the -Culture parameter), where case-sensitive sorting simply means that the lowercase form of a given letter comes directly before its uppercase form, not all letters collectively; e.g., b sorts before B, but they both come after both a and A (also, the logic is reversed from the ordinal case, where it is uppercase letters that come first):
PS> 'B', 'b', 'A', 'a' | Sort-Object -CaseSensitive
a
A
b
B
There is a workaround, however, which (a) sorts uppercase letters before lowercase ones and (b) comes at the expense of performance:
For better performance via direct ordinal sorting you need to use the .NET framework directly - see below, which also offers a solution to sort the lowercase letters first.
Enhancing Sort-Object to also support ordinal sorting is being discussed in this GitHub issue.
# PSv4+ syntax
# Note: Uppercase letters come first.
PS> 'B', 'b', 'A', 'a' |
Sort-Object { -join ([int[]] $_.ToCharArray()).ForEach('ToString', 'x4') }
A
B
a
b
The solution maps each input string to a string composed of the 4-digit hex. representations of the characters' code points, e.g. 'aB' becomes '00610042', representing code points 0x61 and 0x42; comparing these representations is then equivalent to sorting the string by its characters' code points.
Use of .NET for direct, better-performing ordinal sorting:
# Get the last 50 lines as a list.
[Collections.Generic.List[string]] $lines = Get-Content -Tail 50 $filename
# Sort the list in place, using ordinal sorting
$lines.Sort([StringComparer]::Ordinal)
# Output the result.
# Note that uppercase letters come first.
$lines
[StringComparer]::Ordinal returns an object that implements the [System.Collections.IComparer] interface.
Using this solution in a pipeline is possible, but requires sending the array of lines as a single object through the pipeline, which the -ReadCount parameter provides:
Get-Content -Tail 50 $filename -ReadCount 0 | ForEach-Object {
($lines = [Collections.Generic.List[string]] $_).Sort([StringComparer]::Ordinal)
$lines # output the sorted lines
}
Note: As stated, this sorts uppercase letters first.
To sort all lowercase letters first, you need to implement custom sorting by way of a [System.Comparison[string]] delegate, which in PowerShell can be implemented as a script block ({ ... }) that accepts two input strings and returns their sorting ranking (-1 (or any negative value) for less-than, 0 for equal, 1 (or any positive value) for greater-than):
$lines.Sort({ param([string]$x, [string]$y)
# Determine the shorter of the two lengths.
$count = if ($x.Length -lt $y.Length) { $x.Length } else { $y.Length }
# Loop over all characters in corresponding positions.
for ($i = 0; $i -lt $count; ++$i) {
if ([char]::IsLower($x[$i]) -ne [char]::IsLower($y[$i])) {
# Sort all lowercase chars. before uppercase ones.
return (1, -1)[[char]::IsLower($x[$i])]
} elseif ($x[$i] -ne $y[$i]) { # compare code points (numerically)
return $x[$i] - $y[$i]
}
# So far the two strings compared equal, continue.
}
# The strings compared equal in all corresponding character positions,
# so the difference in length, if any, is the decider (longer strings sort
# after shorter ones).
return $x.Length - $y.Length
})
Note: For English text, the above should work fine, but in order to be support all Unicode text potentially containing surrogate code-unit pairs and differing normalization forms (composed vs. decomposed accented characters), even more work is needed.
[1] On Windows, so-called word sorting is performed by default: "Certain non-alphanumeric characters might have special weights assigned to them. For example, the hyphen (-) might have a very small weight assigned to it so that coop and co-op appear next to each other in a sorted list."; on Unix-like platforms, string sorting is the default, where no special weights apply to non-alphanumeric chars. - see the docs.
A way to get the desired result is to grab the first character of each string and cast it to an Int, this will provide you with the ASCII code for that character which you can then sort numerically into the desired order.
Get-Content -Tail 50 $filename | Sort-Object -Property #{E={[int]$_[0]};Ascending=$true}
We can create an expression using the -property parameter of sort-object, we cast to int using [int] and then grab the first character using $_ to take the current String/line that's in the pipeline and then [0] to take the first character in that string and the sort it in Ascending value.
This provides the following output.
You may wish to trim the whitespace from the output however, I'll leave that up to you to decide.
 
DONATIONS or determine the status of compliance for any particular state
Foundation, how to help produce our new eBooks, and how to subscribe to
Gutenberg-tm eBooks with only a loose network of volunteer support.
International donations are gratefully accepted, but we cannot make any
Most people start at our Web site which has the main PG search facility:
Project Gutenberg-tm eBooks are often created from several printed
Please check the Project Gutenberg Web pages for current donation
Professor Michael S. Hart was the originator of the Project Gutenberg-tm
Section 5. General Information About Project Gutenberg-tm electronic
This Web site includes information about Project Gutenberg-tm, including
While we cannot and do not solicit contributions from states where we
against accepting unsolicited donations from donors in such states who
approach us with offers to donate.
concept of a library of electronic works that could be freely shared
considerable effort, much paperwork and many fees to meet and keep up
editions, all of which are confirmed as not protected by copyright in
have not met the solicitation requirements, we know of no prohibition
how to make donations to the Project Gutenberg Literary Archive
including checks, online payments and credit card donations. To donate,
methods and addresses. Donations are accepted in a number of other ways
necessarily keep eBooks in compliance with any particular paper edition.
our email newsletter to hear about new eBooks.
please visit: www.gutenberg.org/donate
statements concerning tax treatment of donations received from outside
the United States. U.S. laws alone swamp our small staff.
the U.S. unless a copyright notice is included. Thus, we do not
visit www.gutenberg.org/donate
with anyone. For forty years, he produced and distributed Project
www.gutenberg.org
we have not received written confirmation of compliance. To SEND
with these requirements. We do not solicit donations in locations where
works.
Update
To sort lowercase first, and trim blank lines. Essentially I'm just multiplying the ascii number by an arbitrary amount so that numerically it is higher than it's lowercase counterparts.
In the sample text, no lines start with special characters or punctuation, this would probably need to modified to handle those scenarios correctly.
Get-Content -Tail 50 $filename | ? { -not [string]::IsNullOrEmpty($_) } | Sort-Object -Property {
if($_[0] -cmatch "[A-Z]")
{
5*[int]$_[0]
}
else
{
[int]$_[0]
}
}
This will output:
against accepting unsolicited donations from donors in such states who
approach us with offers to donate.
considerable effort, much paperwork and many fees to meet and keep up
concept of a library of electronic works that could be freely shared
editions, all of which are confirmed as not protected by copyright in
how to make donations to the Project Gutenberg Literary Archive
have not met the solicitation requirements, we know of no prohibition
including checks, online payments and credit card donations. To donate,
methods and addresses. Donations are accepted in a number of other ways
necessarily keep eBooks in compliance with any particular paper edition.
our email newsletter to hear about new eBooks.
please visit: www.gutenberg.org/donate
statements concerning tax treatment of donations received from outside
the U.S. unless a copyright notice is included. Thus, we do not
the United States. U.S. laws alone swamp our small staff.
visit www.gutenberg.org/donate
with these requirements. We do not solicit donations in locations where
works.
www.gutenberg.org
with anyone. For forty years, he produced and distributed Project
we have not received written confirmation of compliance. To SEND
DONATIONS or determine the status of compliance for any particular state
Foundation, how to help produce our new eBooks, and how to subscribe to
Gutenberg-tm eBooks with only a loose network of volunteer support.
International donations are gratefully accepted, but we cannot make any
Most people start at our Web site which has the main PG search facility:
Please check the Project Gutenberg Web pages for current donation
Professor Michael S. Hart was the originator of the Project Gutenberg-tm
Project Gutenberg-tm eBooks are often created from several printed
Section 5. General Information About Project Gutenberg-tm electronic
This Web site includes information about Project Gutenberg-tm, including
While we cannot and do not solicit contributions from states where we
Comparing Jacob and mklement0's responses, Jacob's solution has the advantages of being visually simple, being intuitive, using pipelines, and being extendable to sorting by second character of first word, or first character of second word, etc. mklement0's solution has the advantages of being faster and giving me ideas of how to sort lowercase then uppercase.
Below I want to share my extension of Jacob's solution, which sorts by first character of second word. Not particularly useful for the Complete Works of Shakespeare, but very useful for a comma-separated table.
Function Replace-Nulls($line) {
$dump_var = #(
if ( !($line) ) {
$line = [char]0 + " " + [char]0 + " [THIS WAS A LINE OF NULL WHITESPACE]"
} # End if
if ( !(($line.split())[1]) ) {
$line += " " + [char]8 + " [THIS WAS A LINE WITH ONE WORD AND THE REST NULL WHITESPACE]"
} # End if
) # End definition of dump_var
return $line
} # End Replace-Nulls
echo "."
$cleaned_output = Get-Content -Tail 20 $filename | ForEach-Object{ Replace-Nulls($_) }
$cleaned_output | Sort-Object -Property {[int]((($_).split())[1])[0]}

Partial String Replacement using PowerShell

Problem
I am working on a script that has a user provide a specific IP address and I want to mask this IP in some fashion so that it isn't stored in the logs. My problem is, that I can easily do this when I know what the first three values of the IP typically are; however, I want to avoid storing/hard coding those values into the code to if at all possible. I also want to be able to replace the values even if the first three are unknown to me.
Examples:
10.11.12.50 would display as XX.XX.XX.50
10.12.11.23 would also display as XX.XX.XX.23
I have looked up partial string replacements, but none of the questions or problems that I found came close to doing this. I have tried doing things like:
# This ended up replacing all of the numbers
$tempString = $str -replace '[0-9]', 'X'
I know that I am partway there, but I aiming to only replace only the first 3 sets of digits so, basically every digit that is before a '.', but I haven't been able to achieve this.
Question
Is what I'm trying to do possible to achieve with PowerShell? Is there a best practice way of achieving this?
Here's an example of how you can accomplish this:
Get-Content 'File.txt' |
ForEach-Object { $_ = $_ -replace '\d{1,3}\.\d{1,3}\.\d{1,3}','xx.xx.xx' }
This example matches a digit 1-3 times, a literal period, and continues that pattern so it'll capture anything from 0-999.0-999.0-999 and replace with xx.xx.xx
TheIncorrigible1's helpful answer is an exact way of solving the problem (replacement only happens if 3 consecutive .-separated groups of 1-3 digits are matched.)
A looser, but shorter solution that replaces everything but the last .-prefixed digit group:
PS> '10.11.12.50' -replace '.+(?=\.\d+$)', 'XX.XX.XX'
XX.XX.XX.50
(?=\.\d+$) is a (positive) lookahead assertion ((?=...)) that matches the enclosed subexpression (a literal . followed by 1 or more digits (\d) at the end of the string ($)), but doesn't capture it as part of the overall match.
The net effect is that only what .+ captured - everything before the lookahead assertion's match - is replaced with 'XX.XX.XX'.
Applied to the above example input string, 10.11.12.50:
(?=\.\d+$) matches the .-prefixed digit group at the end, .50.
.+ matches everything before .50, which is 10.11.12.
Since the (?=...) part isn't captured, it is therefore not included in what is replaced, so it is only substring 10.11.12 that is replaced, namely with XX.XX.XX, yielding XX.XX.XX.50 as a result.

Using either Trim or Replace in PowerShell to clean up anything not contained in quotation marks

I'm trying to write a script that that will remove everything except text contained in quotation marks from a result-set generated by a SQL query. Not sure whether trim or -replace will do this.
Here is a sampling of the result-set:
a:5:{s:3:"Client Service";a:4:{s:15:"Client Training";b:0;s:11:"Payer
Error";
I would like it to end up looking like this:
Client Service
Client Training
Payer Error
I've tried everything I know to do in my limited PowerShell and RegEx familiarity and still haven't been able to figure out a good solution.
Any help would be greatly appreciated.
$s = 'a:5:{s:3:"Client Service";a:4:{s:15:"Client Training";b:0;s:11:"Payer Error";'
Replace the start of string up to the first quote, or the last quote up to the end of string. Then what you're left with is:
Client Service";a:4:{s:15:"Client Training";b:0;s:11:"Payer Error
Now the bits you don't want are "in quotation marks" and that's easy to match with ".*?" so replace that with a space.
Overall, two replaces:
$s -replace '^[^"]*"|"[^"]*$' -replace '".*?"', ' '
Client Service Client Training Payer Error
Here's a version that uses Regex to capture the strings including their quotes in to an array and then removes the quote marks with -replace:
$text = 'a:5:{s:3:"Client Service";a:4:{s:15:"Client Training";b:0;s:11:"Payer Error";'
([regex]::matches($Text, '(")(.*?)(")')).value -replace '"'
There's without a doubt a regex to get the strings without their quotes in the first place but i'm a bit of a regex novice.

How to simply extract leading N parts of a path?

I've got a bunch of directory names and file names, some are absolute path, some are relative path. I just wish to get the 2 leading parts of each path. Input:
D:\a\b\c\d.txt\
c:\a
\my\desk\n.txt
you\their\mine
I expect to get:
D:\a
c:\a
\my\desk
you\their
Is there a convenient way in PowerShell to achieve this?
You can sometimes get your hand slapped for suggesting string manipulation as it can sometimes be "unreliable". However your test data contains 3 different possibilities. Also, never seen someone looking for the first parts from a path.
I present a simple solution the nets your desired output as you have it in your question
"D:\a\b\c\d.txt\","c:\a","\my\desk\n.txt","you\their\mine" | ForEach-Object{
($_ -split "(?<=\S)\\")[0..1] -join "\"
}
I needed to use a lookbehind since your sample output contains a leading a leading slash that you wanted to retain. It splits every string on slashes that have a non white-space character in front of them.
This would not return the correct path for UNC's. Split-Path would be the obvious choice if you only wanted a single portion of the path. I suppose you could nest the call to get 2 but at this time I am unable to find a simple way to account for all of your examples with the same logic.

PHP - How to identify e-mail addresses from input containing lines of misc data

Apologizing in advance for yet another email pattern matching query.
Here is what I have so far:
$text = strtolower($intext);
$lines = preg_split("/[\s]*[\n][\s]*/", $text);
$pattern = '/[A-Za-z0-9_-]+#[A-Za-z0-9_-]+\.([A-Za-z0-9_-][A-Za-z0-9_]+)/';
$pattern1= '/^[^#]+#[a-zA-Z0-9._-]+\.[a-zA-Z]+$/';
foreach ($lines as $email) {
preg_match($pattern,$email,$goodies);
$goodies[0]=filter_var($goodies[0], FILTER_SANITIZE_EMAIL);
if(filter_var($goodies[0], FILTER_VALIDATE_EMAIL)){
array_push($good,$goodies[0]);
}
}
$Pattern works fine but .rr.com addresses (and more issues I am sure) are stripped of .com
$pattern1 only grabs emails that are on a line by themselves.
I am pasting in a whole page of miscellaneous text into a textarea that contains some emails from an old data file I am trying to recover.
Everything works great except for the emails with more than one "." either before or after the "#".
I am sure there must be more issues as well.
I have tried several patterns I have found as well as some i tried to write.
Can someone show me the light here before I pull my remaining hair out?
How about this?
/((?:\w+[.]*)*(?:\+[^# \t]*)?#(?:\w+[.])+\w+)/
Explanation: (?:\w+[.])* recognizes 0 or more instances of strings of word characters (alphanumeric + _) optionally separated by strings of periods. Next, (?:\+[^# \t]*)? recognizes a plus sign followed by zero or more non-whitespace, non-at-sign characters. Then we have the # sign, and finally (?:\w+[.])+\w+, which matches a sequence of word character strings separated by periods and ending in a word character string. (ie, [subdomain.]domain.topleveldomain)