How to remove special characters from a text file with PowerShell? - powershell

I have a text file and have to remove all weird characters from it. I've already tried the following:
(get-content C:\Users\JuanMa\Desktop\UNB\test.txt) -replace ('.','') | out-file C:\Users\JuanMa\Desktop\UNB\test2.txt
But this leads to an empty output - the file test2.txt remains empty.
This is my text file:
.!..p.ÿÿ.!..! .!. PESCATORE
.!. LEMON SPICE S.R.L.
600 SUR DE MULTIPLAZA ESCAZU
3-102-599284
TEL: 2289-8010 FAX: 2289-5129
INFO#PESCATORECR.COM
.!..! Terminal POS: BARRA
.!.
.! ------------FACTURA-----------
.! .!0 Mesa: B07
.!..! NUMERO : 0068371
.!.Mesa # : B07 Fecha: 25/09/2018
Mesero : CARLOS
Cajero : JOHN Hora : 22:35:06
# Pers : 1 Comandas: 1
Apertura: 22:34 Tiempo/E: 1 Min
.! .!..! .! CANT DESCRIPCION MONTOS
.!.---------------------------------------
1.00 LIMONADA HIERBABUE 2,033.00
.! SubTotal : 2,033.00
%IVA : 264.00
%SER : 203.00
.! .!. TOTALES : 2,501.00
.!..! (COLONES)
En Dolares : 4.55
.!.>> Pago: EFECTIVO> 2,555.00
>> Recibe: 2,555.00
>> Cambio: 54.00
.!
www.gruposinertech.com Vers.15.09A
.!.
AUTORIZADO MEDIANTE RESOLUCION
11-97 DE LA D.G.T.D
.i
.#
Thanks for your help!

Try:
(get-content -Raw C:\Users\JuanMa\Desktop\UNB\test.txt).Replace ('.','') | out-file C:\Users\JuanMa\Desktop\UNB\test2.txt
Get-content return an array by default but if you specify -Raw it will return a string

howdy Juan Manuel Sanchez,
the following will trim the unwanted chars from the beginning of each line in the array of lines you get from Get-Content. it acts on each line in the array without needing to iterate thru the array explicitly.
it's VERY fragile since it hard codes the items. also, it removes all the left hand padding spaces.
$GC_Array -creplace '^[.! pÿ]{1,}' -replace '^0 {2,}'
-creplace is the case-sensitive version of replace
^ means start at the beginning of the line
[] is the character set to replace
char list = dot, exclamation point, space, lowercase p, accented y
{1,} means one or more
the 2nd replace targets start-of-line, a zero digit, & two or more spaces
hope that helps,
lee

The -replace operator uses regular expressions, which use period to denote ANY character, so this strips out anything. If you want to remove literal periods, then prefix the period with a backslash:
(get-content C:\Users\JuanMa\Desktop\UNB\test.txt) -replace ('\.','') | out-file C:\Users\JuanMa\Desktop\UNB\test2.txt
Unfortunately this removes ALL periods, so the periods you may want to keep, e.g. in numbers are lost.
To clean out multiple bad characters, include them in square brackets. This removes 'ÿ','!'
(get-content C:\Users\JuanMa\Desktop\UNB\test.txt) -replace ('[ÿ!]','') | out-file C:\Users\JuanMa\Desktop\UNB\test2.txt
You can chain up these -replace operators to do multiple substitutions:
# Characters ÿ or !
# Replace .! at the start of the line with blank
(get-content C:\Users\JuanMa\Desktop\UNB\test.txt) `
-replace ('[ÿ!]','') `
-replace ('^.!','') |
out-file C:\Users\JuanMa\Desktop\UNB\test2.txt

Related

Replacing lines with single quote and other special characters in powershell

I'm trying to replace certain lines in several txt documents that might be in subfolders or the current folder. Some of the lines include characters like ' and parenthesis are giving me problems. The lines will repeat multiple times in each file.
This line seems to work
ls *.txt -rec | %{$f=$_; (gc $f.PSPath) | %{$_ -replace " in chips\)", ")"} | sc $f.PSPath}
this one also works
ls *.txt -rec | %{$f=$_; (gc $f.PSPath) | %{$_ -replace [regex]::Escape("won ("), "won "} | sc $f.PSPath}
but this one i cant make it work
ls *.txt -rec | %{$f=$_; (gc $f.PSPath) | %{$_ -replace ": Hold'em No Limit ($0.50/$1.00 USD)", " - Holdem(No Limit) - $0.50/$1.00"} | sc $f.PSPath}
I have tried with \ before the parenthesis putting the text i want to find with [regex]::Escape() but nothing has worked so far.
What am i missing in order to achieve this?
Bonus problem:
The next problem that i also haven't figured out is that i need to remove both opening and closing parenthesis from a line but has to keep them in other part of the line so for example:
Original line:
Seat 5: WTFWY (big blind) won ($17.10)
Wanted output
Seat 5: WTFWY (big blind) won $17.10
I was trying to look for "0)" and "won (" and replace them that way, but the "0)" part could be any number and there has to be a more elegant way to do it than to do one for each number with parenthesis. Any ideas for this?
As a general rule in PowerShell, only use double-quoteed strings
when you need the capabilities of an expansion string.
You can use the [Regex]::Escape() method on any string you want to match literally.
As an alternative to escaping quotation marks within strings, I find here-strings are easier to read and I don't have to remember how to escape quotes.
Replacement text is usually literal, but can reference the groups in the $matches variable created when the regex is matched, using the format $<Group#> in the replacement string. To specify a literal $ as a replacement character, use $$.
So, for your first problem string:
$find = [Regex]::Escape(#'
: Hold'em No Limit ($0.50/$1.00 USD)
'#)
$replace = ' - Holdem(No Limit) - $$0.50/$$1.00'
#'
Special: Hold'em No Limit ($0.50/$1.00 USD) - Limited Time
'# -replace $find , $replace
Output:
PS > $find = [Regex]::Escape(#'
>> : Hold'em No Limit ($0.50/$1.00 USD)
>> '#)
>>
>> $replace = ' - Holdem(No Limit) - $$0.50/$$1.00'
>>
>> #'
>> Special: Hold'em No Limit ($0.50/$1.00 USD) - Limited Time
>> '# -replace $find , $replace
>>
Special - Holdem(No Limit) - $0.50/$1.00 - Limited Time
PS >
For the Bonus question:
Escape a sample of the text you want to replace:
[Regex]::Escape('($17.10)') ==> \(\$17\.10\)
Replace literal dollar digis with a match for one or more digits - \d+ and the decimal digits with a match for exactly two digits - \d{2}:
\(\$\d+\.\d{2}\)
Use parentheses to define the amount won as a capture groups:
\((\$\d+\.\d{2})\)
Test the matching:
PS > 'Seat 5: WTFWY (big blind) won ($17.10)' -match '\((\$\d+\.\d{2})\)'
True
PS > $matches
Name Value
---- -----
1 $17.10
0 ($17.10)
Execute your replace operation:
PS > 'Seat 5: WTFWY (big blind) won ($17.10)' -replace '\((\$\d+\.\d{2})\)' , '$1'
Seat 5: WTFWY (big blind) won $17.10
PS >
MOre than one way to skin a cat...
Another way to deal with your first string would be to focus only on the text you wish to modify by chaining together two replacement operations. THe first to modify : Hold'em and the second to modify ($0.50/$1.00 USD) as we did in the bonus question:
#'
SPecial : Hold'em No Limit ($0.50/$1.00 USD) - Limited Time
'# -replace ": Hold'em" , ' - Holdem' -replace '\((\$0\.50/\$1\.00) USD\)' , '$1'
Ooutput:
PS > #'
>> SPecial : Hold'em No Limit ($0.50/$1.00 USD) - Limited Time
>> '# -replace ": Hold'em" , ' - Holdem' -replace '\((\$0\.50/\$1\.00) USD\)' , '$1'
SPecial - Holdem No Limit $0.50/$1.00 - Limited Time
PS >
But actually, ($ at the beginning and USD) at the end are enough to distinguish your targeted substring, and so we can simplify to:
#'
SPecial : Hold'em No Limit ($0.50/$1.00 USD) - Limited Time
'# -replace ": Hold'em" , ' - Holdem' -replace '\((\$.+) USD\)' , '$1'

Powershell input a dash "-" in each line of the file at character 79 if a number or letter is present

I could really use some help. I got most of my code to work but I am having issues finding a way to only add a - at every character location of 79 - but only if a number or letter exists. The code below adds the dash at the correct location but I can't figure out a way to not add a dash if the character at location 79 is a space.
(gc C:\test\tst1.txt) -replace ".{79}" , "$&-" | sc C:\test\out.txt
You can do the following to insert - at character 79 that meets the required conditions:
(Get-Content C:\test\tst1.txt) -replace '^.{79}(?<=[a-z0-9])','$&-' |
Set-Content C:\test\out.txt
(?<=) is a positive lookbehind for a character from the character set a-z (case-insensitive) or 0-9
If you want to insert - after every 79 characters that meet the conditions, you can do the following:
$regex = '(.{78}[^a-z0-9])|(.{78}[a-z0-9])'
$sb = { param($m)
if ($m.Groups[2].Success) {
"{0}-" -f $m.Groups[2]
} else {
$m.Groups[1]
}
}
Get-Content c:\test\tst1.txt | Foreach-Object {
[regex]::Replace($_,$regex,$sb)
} | Set-Content C:\test\out.txt
This scenario will likely have a more verbose solution since PowerShell regex does not support match resets.
Note that your comments and your post description do not have the same requirements. The code above adds a - at character position 79 and shifts previous positions 79-EndOfString to the right. It does not replace. Performing an actual replace, would look like the following:
(Get-Content C:\test\tst1.txt) -replace '(^.{79})(?<=[a-z0-9]).','$1-' |
Set-Content C:\test\out.txt

add quotation mark to a text file powershell

I need to add the quotation mark to a text file that contains 500 lines text.
The format is inconsistent. It has dashes, dots, numbers, and letters. For example
1527c705-839a-4832-9118-54d4Bd6a0c89
16575getfireshot.com.FireShotCaptureWebpageScreens
3EA2211E.GestetnerDriverUtility
I have tried to code this
$Flist = Get-Content "$home\$user\appfiles\out.txt"
$Flist | %{$_ -replace '^(.*?)', '"'}
I got the result which only added to the beginning of a line.
"Microsoft.WinJS.2.0
The expected result should be
"Microsoft.WinJS.2.0"
How to add quotation-mark to the end of each line as well?
There is no strict need to use a regex (regular expression) in your case (requires PSv4+):
(Get-Content $home\$user\appfiles\out.txt).ForEach({ '"{0}"' -f $_ })
Array method .ForEach() processes each input line via the script block ({ ... }) passed to it.
'"{0}"' -f $_ effectively encloses each input line ($_) in double quotes, via -f, the string-format operator.
If you did want to use a regex:
(Get-Content $home\$user\appfiles\out.txt) -replace '^|$', '"'
Regex ^|$ matches both the start (^) and the end ($) of the input string and replaces both with a " char., effectively enclosing the input string in double quotes.
As for what you tried:
^(.*?)
just matches the very start of the string (^), and nothing else, given that .*? - due to using the non-greedy duplication symbol ? - matches nothing else.
Therefore, replacing what matched with " only placed a " at the start of the input string, not also at the end.
You can use regex to match both:
The beginning of the line ^(.*?)
OR |
The End of the line $
I.e. ^(.*?)|$
$Flist = Get-Content "$home\$user\appfiles\out.txt"
$Flist | %{$_ -replace '^(.*?)|$', '"'}

Escaping foward slash problems

In the following semi-pseudo code, the forward-slash of the first element in the array $system is always read as a back-slash.
I have tried the various escape characters such as ` and \ but to no avail. Is this a known problem in PowerShell? How to solve?
$system = #("Something/Anything", "Super Development","Quality Assurance")
//the following is looped with $y
$string| ConvertTo-json | FT | Out-File -append C:\Test\Results\$($system[$y])_All.csv
//error:
Message : Could not find a part of the path 'C:\Test\Results\Something\Anything_All.csv'
As #autosvet already mentioned in the comments to your question there are several reserved characters that can't be used in filenames/paths on Windows, namely:
Use any character in the current code page for a name, including Unicode characters and characters in the extended character set (128–255), except for the following:
The following reserved characters:
< (less than)
> (greater than)
: (colon)
" (double quote)
/ (forward slash)
\ (backslash)
| (vertical bar or pipe)
? (question mark)
* (asterisk)
These characters can't be escaped, only replaced. You can use the GetInvalidFileNameChars() method for programmatically determining the characters that need to be replaced:
$invalid = [regex]::Escape([IO.Path]::GetInvalidFileNameChars())
$string | ConvertTo-json | FT |
Out-File -Append C:\Test\Results\$($something[$y] -replace $invalid, '_')_All.csv

Powershell's -replace() does not preserve line breaks, or I'm doing it wrong

I have a text file with some lines of text and I want to insert into another text file.
aaa.txt:
aaaaaaaaaa
bbbbbbbbb
ccccccccc
dddddddd
eeeeeeee
bbb.txt:
slkdfjlskdfj dlfjsldkfj slkdfjs
{{replace}}
sdlkfjslkfj sldkfjsld kfjsldk fjsldk f
sldkfjslkfjlskjflskdjf
sdkfjslkjflsklsdjkf
sldfkjslkfjlskfj
But when I replace {{replace}} with the contents of aaa.txt it puts all the text on one line- I want to preserve the line breaks from aaa.txt:
PS> $bbb = cat .\bbb.txt
PS> $bbb -replace('{{replace}}',(cat .\aaa.txt))
slkdfjlskdfj dlfjsldkfj slkdfjs
aaaaaaaaaa bbbbbbbbb ccccccccc dddddddd eeeeeeee
sdlkfjslkfj sldkfjsld kfjsldk fjsldk f
sldkfjslkfjlskjflskdjf
sdkfjslkjflsklsdjkf
sldfkjslkfjlskfj
-replace replaces individual strings. cat .\aaa.txt returns an array of strings, which -replace then has to convert to a single string before -replace can do something with it -- hence your result. In PowerShell v3, the -raw parameter was added to Get-Content to circumvent this behavior, so if you have that, it's as simple as this:
$bbb -replace '{{replace}}', (cat -raw .\aaa.txt)
This seems to work
($bbb -replace('{{replace}}',$aaa)) -replace " ","`n`r"