Replace CRLF in file - powershell

I am actually trying to make a .txt file looking like this:
text1
text 2
text3
text4
in something like this:
text1,text2
text3,text4
I have tried a lot of things, like:
foreach($line in $file){
$line=$line -replace'`r`n', ','
}
Or directly by replacing in the file and not the string:
$file=Get-Content('testo.txt')
$file=$file -replace '`r`n',''
But nothing worked. I have never been able to replace the CRLF.
If someone has an idea or anything!

Something even simpler I think. This would depend on the size of the text file since it has to be all read into memory. However this would will work with variable lines of "test" data and white space. I am going to assume that your data does not already contain commas.
$file = (Get-Content "c:\temp\data.txt") -join "," -replace ",{2,}","`r`n"
Set-Content -Path "C:\temp\Newfile.txt" -Value $file
Take the file and convert it into a comma delimited string. What will happen is that you will see a series of groups commas. Using your test data
text1,text 2,,,text3,text4
The we use regex to replace and consecutive comma groups of 2 or more with newlines. Getting the desired output.
text1,text 2
text3,text4

In fact i have done something like this:
$Writer = New-Object IO.StreamWriter "$($PWD.Path)\$OutputFile"
$Writer.Write( [String]::Join(".", $data) )
$Writer.Close()
$data=get-content temp.txt
$data=$data -replace '[#]{2,}',"`r`n"
$data=$data -replace '[#]',","
$data=$data -replace '( ){1,},',"no data"
$data>>temp1.txt
It works and do what I want. It is not exactly efficient, but I wanted it to work quickly.
I will try all the answers when I have the time and say if it works or not.
And maybe I will change this part of my script if I find something more efficient.
Thanks to all!

Related

POWERSHELL: Simplest way to save string variable to csv file

I know a little bit of Bash scripting, but I am very new to PowerShell. When I execute below code using bash, everything is fine. But, when I use PowerShell, each entry per echo is saved only in a single cell in Excel. Why is it like this? How can I accomplish my objective in the simplest way?
echo "1,2,3" > file.csv
echo "A,B,C" >> file.csv
UNDESIRED:
DESIRED:
I tried to Google it. But, in my understanding, they are converting the string type variables to something like PS Object and convert to CSV format. I tried it and it worked. But I had to force include a header.
New-Object -Type PSObject -Property #{
'X' = $A
'Y' = $B
'Z' = $C
} | Export-Csv 'C:\Temp\test.csv' -NoType
When I also opened the csv file using notepad, every word has double quotation marks (which I don't prefer to have)
You see, that is way more complicated compared to Linux Scripting. Can someone teach me the simplest way to do what I want? Thank you very much!
If in your system locale the ListSeparator character is NOT the comma, double-clicking a comma-delimited csv file will open Excel with all values in the same column.
I believe this is what happens here.
You can check by typing
[cultureinfo]::CurrentCulture.TextInfo.ListSeparator
in PowerShell
To have Excel 'understand' a CSV when you double-click it, add -UseCulture switch to the cmdlet:
Export-Csv 'C:\Temp\test.csv' -UseCulture -NoTypeInformation
As for the quotes around the values:
They are not always necessary, but sometimes essential, for instance if the value has leading or trailing space characters, or if the value contains the delimiter character itself.
Just leave them as-is, Excel knows how to handle that.
If you really can't resist on having a csv without quotes, please first have a look at the answers given here about that subject.
Edit
If you are absolutely sure all fields can do without quoting, you can do this:
$sep = [cultureinfo]::CurrentCulture.TextInfo.ListSeparator
"1,2,3", "A,B,C" -replace ',', $sep | Out-File -FilePath 'D:\Test\file.csv' -Encoding utf8

Pipes in replace causing line to be duplicated

I have a script that I need to replace a couple of lines in. The first replace is going fine but the second is wiping out my file and duplicating the line multiple times.
My code
(get-content $($sr)) -replace 'remoteapplicationname:s:SHAREDAPP',"remoteapplicationcmdline:s:$($sa)" | Out-File $($sr)
(get-content $($sr)) -replace 'remoteapplicationprogram:s:||SHAREDAPP',"remoteapplicationprogram:s:||$($sa)" | Out-File $($sr)
The first replace works perfectly. The second one is causing this:
remoteapplicationprogram:s:||stagaredrremoteapplicationprogram:s:||stagarederemoteapplicationprogram:s:||stagareddremoteapplicationprogram:s:||stagarediremoteapplicationprogram:s:||stagaredrremoteapplicationprogram:s:||stagarederemoteapplicationprogram:s:||stagaredcremoteapplicationprogram:s:||stagaredtremoteapplicationprogram:s:||stagaredcremoteapplicationprogram:s:||stagaredlremoteapplicationprogram:s:||stagarediremoteapplicationprogram:s:||stagaredpremoteapplicationprogram:s:||stagaredbremoteapplicationprogram:s:||stagaredoremoteapplicationprogram:s:||stagaredaremoteapplicationprogram:s:||stagaredrremoteapplicationprogram:s:||stagareddremoteapplicationprogram:s:||stagared:remoteapplicationprogram:s:||stagarediremoteapplicationprogram:s:||stagared:remoteapplicationprogram:s:||stagared1remoteapplicationprogram:s:||stagared
etc...
Is this because of the ||? If so, how do I get around it?
Thanks!
To begin with, you should be using slightly more meaningful names for your variables. Especially if you want someone else to be reviewing your code.
The gist of your issue is that -replace supports regexes (regular expressions), and you have regex control characters in your pattern string. Consider the following simple example, and notice everywhere the replacement string is found:
PS C:\Users\Matt> "ABCD" -replace "||", "bagel"
bagelAbagelBbagelCbagelDbagel
-replace is also an array operator, so it works on every line of the input file, which is nice. For simplicity's sake, if you are not using a regex, you should just consider using the string method .Replace(), but it is case-sensitive, so that might not be ideal. So let's escape those control characters in the easiest way possible:
$patternOne = [regex]::Escape('remoteapplicationname:s:SHAREDAPP')
$patternTwo = [regex]::Escape('remoteapplicationprogram:s:||SHAREDAPP')
(get-content $sr) -replace $patternOne, "remoteapplicationcmdline:s:$sa" | Out-File $($sr)
(get-content $sr) -replace $patternTwo, "remoteapplicationprogram:s:||$sa" | Out-File $($sr)
Now we get both patterns matched as you have them written. Run $patternTwo on the console to see what has changed to it! $patternOne, as written, has no regex control characters in it, but it does not hurt to use the escape method if you are just expecting simple matching.
Aside from the main issue pointed out, there is also some redundancy and misconception that can be addressed here. I presume you are updating a source file to replace all occurrences of those strings, yes? Well, you don't need to read the file in twice, given that you can chain -replace:
$patternOne = [regex]::Escape('remoteapplicationname:s:SHAREDAPP')
$patternTwo = [regex]::Escape('remoteapplicationprogram:s:||SHAREDAPP')
(get-content $sr) -replace $patternOne, "remoteapplicationcmdline:s:$sa" -replace $patternTwo, "remoteapplicationprogram:s:||$sa" |
Set-Content $sr
Perhaps that will do what you intended.
You might notice that I've removed the subexpressions operators ($(...)) around your variables. While they have their place, they don't need to be used here. They are only needed inside more complicated strings, like when you need to expand object properties or something.

extract 2 column rages from TXT; insert comma delimiter; save as CSV

I have a .txt file that does not have delimiters. I want to extract 2 column ranges from the file, and separate them by a comma delimiter. i then want to save the resulting data as a CSV.
For example, here is a 'raw' string:
abcdefghij
I want the script to convert it to this:
abc,h
I know GC / SC; I just need to know how to do the string manipulation.
Assuming that your columns are delimited by a start and end index, you could do something like this:
get-content raw.txt | %{ "$($_[$0..2] -join ''),$($_[7..7] -join '')"}
Using the -replace operator:
$text = 'abcdefghij'
$text -replace '(.{3}).{4}(.).+','$1,$2'
abc,h
Oh boy, this looks like fun.
How about:
"abcdefghij" | %{$_.Remove(3), $_[7] -join ","}
I think it would be more helpful to see the actual data you're trying to extract from. There are dozens of ways to do string manipulation and your choice is going to depend heavily on what it is you're trying to extract.

stripping extra text qualifier from a CSV - part 2

For part 1, see this SO post
I have a CSV that has certain fields separated by the " symbol as a TextQualifier.
See below for example. Note that each integer (eg. 1,2,3 etc) is supposed to be a string. the qualified strings are surrounded by the " symbol.
1,2,3,"qualifiedString1",4,5,6,7,8,9,10,11,12,13,14,15,16,"qualifiedString2""
Notice how the last qualified string has a " symbol as part of the string.
User #mjolinor suggested this powershell script, which works to fix the above scenario, but it does not fix the "Part 2" scenario below.
(get-content file.txt -ReadCount 0) -replace '([^,]")"','$1' |
set-content newfile.txt
Here is part 2 of the question. I need a solution for this:
The extra " symbol can appear randomly in the string. Here's another example:
1,2,3,"qualifiedString1",4,5,6,7,8,9,10,11,12,13,14,15,16,"qualifiedS"tring2"
Can you suggest an elegant way to automate the cleaning of the CSV to eliminate redundant " qualifiers?
You just need a different regex:
(get-content file.txt -ReadCount 0) -replace '(?<!,)"(?!,|$)',''|
set-content newfile.txt
That one will replace any double quote that is not immediately preceeded by a comma, or followed by either a comma or the end of the line.
$text = '1,2,3,"qualifiedString1",4,5,6,7,8,9,10,11,12,13,14,15,16,"qualifiedS"tring2"'
$text -replace '(?<!,)"(?!,|$)',''
1,2,3,"qualifiedString1",4,5,6,7,8,9,10,11,12,13,14,15,16,"qualifiedString2"

Find and Replace in a Large File

I want to find a piece of text in a large xml file and want to replace with some other text. The size of the file is around (50GB). I want to do this in command line. I am looking at PowerShell and want to know if it can handle the large size.
Currently I am trying something like this but it does not like it
Get-Content C:\File1.xml | Foreach-Object {$_ -replace "xmlns:xsi=\"http:\/\/www\.w3\.org\/2001\/XMLSchema-instance\"", ""} | Set-Content C:\File1.xml
The text I want to replace is xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" with an empty string "".
Questions
Can PowerShell handle large
files
I don't want the replace to happen in
memory and prefer streaming assuming
that will not bring the server to
its knees.
Are there any other approaches I can take (different
tools/strategy?)
Thanks
I had a similar need (and similar lack of powershell experience) but cobbled together a complete answer from the other answers on this page plus a bit more research.
I also wanted to avoid the regex processing, since I didn't need it either -- just a simple string replace -- but on a large file, so I didn't want it loaded into memory.
Here's the command I used (adding linebreaks for readability):
Get-Content sourcefile.txt
| Foreach-Object {$_.Replace('http://example.com', 'http://another.example.com')}
| Set-Content result.txt
Worked perfectly! Never sucked up much memory (it very obviously didn't load the whole file into memory), and just chugged along for a few minutes then finished.
Aside from worrying about reading the file in chunks to avoid loading it into memory, you need to dump to disk often enough that you aren't storing the entire contents of the resulting file in memory.
Get-Content sourcefile.txt -ReadCount 10000 |
Foreach-Object {
$line = $_.Replace('http://example.com', 'http://another.example.com')
Add-Content -Path result.txt -Value $line
}
The -ReadCount <number> sets the number of lines to read at a time. Then the ForEach-Object writes each line as it is read. For a 30GB file filled with SQL Inserts, I topped out around 200MB of memory and 8% CPU. While, piping it all into Set-Content at hit 3GB of memory before I killed it.
It does not like it because you can't read from a file and write back to it at the same time using Get-Content/Set-Content. I recommend using a temp file and then at the end, rename file1.xml to file1.xml.bak and rename the temp file to file1.xml.
Yes as long as you don't try to load the whole file at once. Line-by-line will work but is going to be a bit slow. Use the -ReadCount parameter and set it to 1000 to improve performance.
Which command line? PowerShell? If so then you can invoke your script like so .\myscript.ps1 and if it takes parameters then c:\users\joe\myscript.ps1 c:\temp\file1.xml.
In general for regexes I would use single quotes if you don't need to reference PowerShell variables. Then you only need to worry about regex escaping and not PowerShell escaping as well. If you need to use double-quotes then the back-tick character is the escape char in double-quotes e.g. "`$p1 is set to $ps1". In your example single quoting simplifies your regex to (note: forward slashes aren't metacharacters in regex):
'xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"'
Absolutely you want to stream this since 50GB won't fit into memory. However, this poses an issue if you process line-by-line. What if the text you want to replace is split across multiple lines?
If you don't have the split line issue then I think PowerShell can handle this.
This is my take on it, building on some of the other answers here:
Function ReplaceTextIn-File{
Param(
$infile,
$outfile,
$find,
$replace
)
if( -Not $outfile)
{
$outfile = $infile
}
$temp_out_file = "$outfile.temp"
Get-Content $infile | Foreach-Object {$_.Replace($find, $replace)} | Set-Content $temp_out_file
if( Test-Path $outfile)
{
Remove-Item $outfile
}
Move-Item $temp_out_file $outfile
}
And called like so:
ReplaceTextIn-File -infile "c:\input.txt" -find 'http://example.com' -replace 'http://another.example.com'
The escape character in powershell strings is the backtick ( ` ), not backslash ( \ ). I'd give an example, but the backtick is also used by the wiki markup. :(
The only thing you should have to escape is the quotes - the periods and such should be fine without.