How to Remove a section of text from a string using powershell? - powershell

I a building an email and have this section of content in that email which I need to remove at times so I am tying to do a replace from #HOUSESTART through #HOUSEEND using the -replace but it is not working.
$body contains this section along with much more of the entire html email:
"<p class=MsoNormal style='margin-bottom:0in;margin-bottom:.0001pt;line-height:
normal'><b><u><span style='mso-ascii-font-family:Calibri;mso-fareast-font-family:
"Times New Roman";mso-hansi-font-family:Calibri;mso-bidi-font-family:Calibri'>#HOUSESTART<o:p></o:p></span></u></b></p>
<p class=MsoNormal style='margin-bottom:0in;margin-bottom:.0001pt;line-height:
normal'><b><u><span style='mso-ascii-font-family:Calibri;mso-fareast-font-family:
"Times New Roman";mso-hansi-font-family:Calibri;mso-bidi-font-family:Calibri'>PLEASE
NOTE</span></u></b><span style='mso-ascii-font-family:Calibri;mso-fareast-font-family:
"Times New Roman";mso-hansi-font-family:Calibri;mso-bidi-font-family:Calibri'>:
As a house manager, you have two email addresses. Your secondary email
address is #EMAIL. The only place you will need to use this email address
is when you are enrolling any device in the Targeted Threat Protection.<span
style='mso-spacerun:yes'>  </span><b><span style='background:yellow;mso-highlight:
yellow'>(Only used in manager welcome emails.)</span><o:p></o:p></b></span></p>
<p class=MsoNormal style='margin-bottom:0in;margin-bottom:.0001pt;line-height:
normal'><b><span style='mso-ascii-font-family:Calibri;mso-fareast-font-family:
"Times New Roman";mso-hansi-font-family:Calibri;mso-bidi-font-family:Calibri'>#HOUSEEND</span></b><span
style='mso-ascii-font-family:Calibri;mso-fareast-font-family:"Times New Roman";
mso-hansi-font-family:Calibri;mso-bidi-font-family:Calibri'><o:p></o:p></span></p>
I am using this command to try and remove everything between #HOUSESTART and #HOUSEEND but it is not removing it.
$body = $body -replace "#HOUSESTART.*#HOUSEEND"," "
Any help would be greatly appreciated.

By default, metacharacter . in .NET regexes matches any character except newlines.
Therefore, if you want .* to match across multiple lines, i.e, to match newlines too, you must use inline regex option s ((?s) at the very start of the regex):
$body = $body -replace '(?s)#HOUSESTART.*#HOUSEEND', ' '
Note:
* I'm using '...' (single quotes, i.e. verbatim strings) rather than "..." (expandable (interpolating) string), to avoid confusion between what PowerShell may interpret up front, and what the regex engine will see.
* .* matches greedily, so that everything to the input's last instance of #HOUSEEND is matched; if there can be multiple instances, and you want to match only through the next one, use the non-greedy .*?
Note that $body must be a single, multi-line string for this to work.
For instance, if you use something like $body = Get-Content file.txt to set $body, you'll end up with an array of strings, each of which the -replace operation is applied to, which won't work. In that case, use the -Raw switch to ensure that the file is read as a single, multi-line string: $body = Get-Content -Raw file.txt.

Related

Powershell add text after 20 characters

I want to add text after exactly 20 characters inklusiv blanks. Does someone have a short solution with add-content or can post a link where i can read about a way to do so.
My file looks somthing like this:
/path1/path1/path1 /path2/path2/path2 /path3/path3/path3
than an application will read this pahts (not my application and i can not edit it in any way) the application will read these paths and it will read them on their position so if the second path starts 10 characters later it wont recognize it, so i can not simply replace the path or edit it easy sinc the path has not always the same lenght. Why the application reads it that way dont ask me.
So i need to add a string at start than the next string at exactly character 20 and than the next at charcter 40.
You could use the regex -replace operator to inject a new substring after 20 characters:
PS ~> $inject = "Hello Manuel! ..."
PS ~> $string = "Injected text goes: and then there's more"
PS ~> $string -replace '(?<=^.{20})',$inject
Injected text goes: Hello Manuel! ...and then there's more
The regex pattern (?<=^.{20}) describes a position in the string where exactly 20 characters occur between the start of the string and the current position, and the -replace operator then replaces the empty string at said position with the value in $inject
This did it for me
$data.PadRight(20," ") | Out-File -FilePath F:\test\path.txt -NoNewline -Append

Find NRIC from a large text file and replace NRIC first 4 characters with X

I have requirement to find NRIC field from text file and mask first 4 characters using PowerShell. So far below is my code.
$FileName = "E:\test.txt"
$patternNRIC = '[SGTGsftg]\d{7}\w'
$file= New-Object System.IO.StreamReader -Arg $FileName
while($s= $file.ReadLine())
{
$s = $s -replace $patternNRIC ,'XXXX'
Write-Host $s -BackgroundColor Magenta
}
$file.Close()
Problem with above is it replaces whole NRIC with XXX character which I don't want. I want to replace only first 4 characters while keeping rest intact.
Taking the National Registration Identity Card definition, hiding just the first 4 characters of the NRIC doesn't guaranty much privacy of the card owner (there are probably far less than 100 persons that meat the disclosed part and if you know the birthday and the fact that the first following digit is likely a 0, it is probably down to a single person that matches the information!).
Excerpt from WikiPedia:
only the last three or four digits and the letters are publicly displayed or published as the first three digits can easily give away a person's age.
Anyways:
PowerShell is case insensitive (so there is no need to list lowercase and uppercase characters)
To check for and following characters (and not replacing them) use the lookahead assertion
I recommend you to also use word boundaries (\b) to make sure that the NRIC is of a given length and not a part of another sequence
$Test = 'I have requirement to find NRIC field, like S1234567G, and mask the first 4 characters.'
$Pattern = '\b[STFG]\d{3}(?=\d{4}\w\b)'
$Test -Replace $Pattern, 'XXXX'
Result
I have requirement to find NRIC field, like XXXX4567G, and mask the first 4 characters.

String variable position being overwritten in write-host

If I run the below code, $SRN can be written as output or added to another variable, but trying to include either another variable or regular text causes it to be overwritten from the beginning of the line. I'm assuming it's something to do with how I'm assigning $autocode and $SRN initially but can't tell what it's trying to do.
# Load the property set to allow us to get to the email body.
$item.load($psPropertySet) # Load the data.
$bod = ($item.Body.Text -creplace '(?m)^\s*\r?\n','') -split "\n" # Get the body text, remove blank lines, split on line breaks to create an array (otherwise it is a single string).
$autocode = $bod[4].split('-')[2] # Get line 4 (should be Title), split on dash, look for 3rd element, this should contain our automation code.
$SRN = $bod[1] -replace 'ID: ','' # Get line 2 (should be ID), find and replace the preceding text.
# Skip processing if autocode does not match our list of handled ones.
if ($autocode -cin $autocodes)
{
write-host "$SRN $autocode"
write-host "$autocode $SRN"
write-host "$SRN test"
$var = "$SRN $autocode"
$var
}
The code results in this, you can see if $SRN isn't at the start of the line it is fine. Unsure where the extra spaces come from either:
KRNE8385
KRNE SR1788385
test8385
KRNE8385
I would expect to see this:
SR1788385 KRNE
KRNE SR1788385
SR1788385 test
SR1788385 KRNE
LotPings pointed me down the right path, both variables still had either "0D" or "\r" in them. My regex replace was only getting rid of them on blank lines, and I split the array on "\n" only. Changing line 3 in the original code to the below appears to have resolved the issue. First time seeing Format-Hex, but it appears to be excellent for troubleshooting such issues.
$bod = ($item.Body.Text -creplace '(?m)^\s*\r?\n','') -split "\r\n"

Using either Trim or Replace in PowerShell to clean up anything not contained in quotation marks

I'm trying to write a script that that will remove everything except text contained in quotation marks from a result-set generated by a SQL query. Not sure whether trim or -replace will do this.
Here is a sampling of the result-set:
a:5:{s:3:"Client Service";a:4:{s:15:"Client Training";b:0;s:11:"Payer
Error";
I would like it to end up looking like this:
Client Service
Client Training
Payer Error
I've tried everything I know to do in my limited PowerShell and RegEx familiarity and still haven't been able to figure out a good solution.
Any help would be greatly appreciated.
$s = 'a:5:{s:3:"Client Service";a:4:{s:15:"Client Training";b:0;s:11:"Payer Error";'
Replace the start of string up to the first quote, or the last quote up to the end of string. Then what you're left with is:
Client Service";a:4:{s:15:"Client Training";b:0;s:11:"Payer Error
Now the bits you don't want are "in quotation marks" and that's easy to match with ".*?" so replace that with a space.
Overall, two replaces:
$s -replace '^[^"]*"|"[^"]*$' -replace '".*?"', ' '
Client Service Client Training Payer Error
Here's a version that uses Regex to capture the strings including their quotes in to an array and then removes the quote marks with -replace:
$text = 'a:5:{s:3:"Client Service";a:4:{s:15:"Client Training";b:0;s:11:"Payer Error";'
([regex]::matches($Text, '(")(.*?)(")')).value -replace '"'
There's without a doubt a regex to get the strings without their quotes in the first place but i'm a bit of a regex novice.

PHP - How to identify e-mail addresses from input containing lines of misc data

Apologizing in advance for yet another email pattern matching query.
Here is what I have so far:
$text = strtolower($intext);
$lines = preg_split("/[\s]*[\n][\s]*/", $text);
$pattern = '/[A-Za-z0-9_-]+#[A-Za-z0-9_-]+\.([A-Za-z0-9_-][A-Za-z0-9_]+)/';
$pattern1= '/^[^#]+#[a-zA-Z0-9._-]+\.[a-zA-Z]+$/';
foreach ($lines as $email) {
preg_match($pattern,$email,$goodies);
$goodies[0]=filter_var($goodies[0], FILTER_SANITIZE_EMAIL);
if(filter_var($goodies[0], FILTER_VALIDATE_EMAIL)){
array_push($good,$goodies[0]);
}
}
$Pattern works fine but .rr.com addresses (and more issues I am sure) are stripped of .com
$pattern1 only grabs emails that are on a line by themselves.
I am pasting in a whole page of miscellaneous text into a textarea that contains some emails from an old data file I am trying to recover.
Everything works great except for the emails with more than one "." either before or after the "#".
I am sure there must be more issues as well.
I have tried several patterns I have found as well as some i tried to write.
Can someone show me the light here before I pull my remaining hair out?
How about this?
/((?:\w+[.]*)*(?:\+[^# \t]*)?#(?:\w+[.])+\w+)/
Explanation: (?:\w+[.])* recognizes 0 or more instances of strings of word characters (alphanumeric + _) optionally separated by strings of periods. Next, (?:\+[^# \t]*)? recognizes a plus sign followed by zero or more non-whitespace, non-at-sign characters. Then we have the # sign, and finally (?:\w+[.])+\w+, which matches a sequence of word character strings separated by periods and ending in a word character string. (ie, [subdomain.]domain.topleveldomain)