Powershell script write back to sources from drag and drop - powershell

I need to create a powershell script that removes quotes from CSV files in a user friendly drag and drop way. I have the basics of the script down courtesy of this page:
http://blogs.technet.com/b/heyscriptingguy/archive/2011/11/02/remove-unwanted-quotation-marks-from-csv-files-by-using-powershell.aspx
And I've already sucessfully made .ps1 files drag and droppable courtesy of this stack overflow question:
Drag and Drop to a Powershell script
The author of the answer implies that it's just as easy to drop a single file, many files, and folders with lots of files in them. However, I have yet to figure this out in a way that can also can write back to the source file. Here's my current code:
Param([string[]]$file)
(gc $file) | % {$_ -replace '"', ""} | out-file C:\Users\pfoster\Desktop\Output\test.txt -Fo -En ascii
Currently, this will only accept a single file, and output the result as a txt to a specified file regardless of the source file type (I can change that to CSV easily but I'd like the script to mirror the source). Ideally, I'd like it to accept files and folders, and to rewrite the source file. I have a feeling this would involve the get-ChildItem but I'm not sure how to implement that in the current scenario. I've also tried out-file $file and that didn't work either.
Thanks for the help!

For writing the modified content back to the original files try something like this:
foreach ($file in $ARGS) {
(Get-Content $file) -replace '"', '' | Out-File $file -Encoding ASCII -Force
}
Use a foreach in loop, because you need the file name in more than one place in the pipeline. Reading the content in a subshell and then piping the modified content into the Out-File cmdlet makes sure that the output file is only written after the content was already read.
Don't use a redirection operator ((Get-Content $file) >$file), because that would first open the file for writing (effectively truncating it) and afterwards read the content from the now empty file.
Beware that this approach may cause problems with large files, because each file is read completely into the RAM before they're processed and written back to disk. If a file doesn't fit into the available RAM the computer will start swapping, thus causing significant performance degradation.

Related

Read and write to same txt file in loop with StreamReader

I have a working script in PowerShell:
$file = Get-Content -Path HKEY_USERS.txt -Raw
foreach($line in [System.IO.File]::ReadLines("EXCLUDE_HKEY_USERS.txt"))
{
$escapedLine = [Regex]::Escape($line)
$pattern = $("(?sm)^$escapedLine.*?(?=^\[HKEY)")
$file -replace $pattern, ' ' | Set-Content HKEY_USERS-filtered.txt
$file = Get-Content -Path HKEY_USERS-filtered.txt -Raw
}
For each line in EXCLUDE_HKEY_USERS.txt it is performing some changes in file HKEY_USERS.txt. So with every loop iteration it is writing to this file and re-reading the same file to pull the changes. However, Get-Content is notorious for memory leaks, so I wanted to refactor it to StreamReader and StreamWriter, but I'm a having a hard time to make it work.
As soon as I do:
$filePath = 'HKEY_USERS-filtered.txt';
$sr = New-Object IO.StreamReader($filePath);
$sw = New-Object IO.StreamWriter($filePath);
I get:
New-Object : Exception calling ".ctor" with "1" argument(s): "The process cannot access the file
'HKEY_USERS-filtered.txt' because it is being used by another process."
So it looks like I cannot use StreamReader and StreamWriter on same file simultaneously. Or can I?
tl;dr
Get-Content -Raw reads a file as a whole and is fast and consumes little unwanted memory.
[System.IO.File]::ReadLines() is a faster and more memory-efficient alternative to line-by-line reading with Get-Content (without -Raw), but you need to ensure that the input file is passed as a full path, because .NET's working directory usually differs from PowerShell's.
Convert-Path resolves a given relative path to a full, file-system-native one.
A PowerShell-native alternative to using [System.IO.File]::ReadLines() is the switch statement with the -File parameter, which performs similarly well while avoiding the working-directory discrepancy pitfall, and offers additional features.
There is no need to save the modified file content to disk after each iteration - just update the $file variable, and, after exiting the loop, save the value of $file to the output file.
$fileContent = Get-Content -Path HKEY_USERS.txt -Raw
# Be sure to specify a *full* path.
$excludeFile = Convert-Path -LiteralPath 'EXCLUDE_HKEY_USERS.txt'
foreach($line in [System.IO.File]::ReadLines($excludeFile)) {
$escapedLine = [Regex]::Escape($line)
$pattern = "(?sm)^$escapedLine.*?(?=^\[HKEY)"
# Modify the content and save the result back to variable $fileContent
$fileContent = $fileContent -replace $pattern, ' '
}
# After all modifications have been performed, save to the output file
$fileContent | Set-Content HKEY_USERS-filtered.txt
Building on Santiago Squarzon's helpful comments:
Get-Content does not cause memory leaks, but it can consume a lot of memory that isn't garbage-collected until an unpredictable later point in time.
The reason is that - unless the -Raw switch is used - it decorates each line read with PowerShell ETS (Extended Type System) properties containing metadata about the file of origin, such as its path (.PSPath) and the line number (.ReadCount).
This both consumes extra memory and slows the command down - GitHub issue #7537 asks for a way to opt out of this wasteful decoration, as it typically isn't needed.
However, reading with -Raw is efficient, because the entire file content is read into a single, multi-line string, which means that the decoration is only performed once.
So it looks like I cannot use StreamReader and StreamWriter on same file simultaneously. Or can I?
No, you cannot. You cannot simultaneously read from a file and overwrite it.
To update / replace an existing file you have two options (note that, for a fully robust solution, all attributes of the original file (except the last write time and size) should be retained, which requires extra work):
Read the old content into memory in full, perform the desired modification in memory, then write the modified content back to the original file, as shown in the top section.
There is a slight risk of data loss, however, namely if the process of writing back to the file gets interrupted.
More safely, write the modified content to a temporary file and, upon successful completion, replace the original file with the temporary one.

Need to batch convert a large quantity of text files from ANSI to Unicode

I have a lot of ANSI text files that vary in size (from a few KB up to 1GB+) that I need to convert to Unicode.
At the moment, this has been done by loading the files into Notepad and then doing "Save As..." and selecting Unicode as the Encoding. Obviously this is very time consuming!
I'm looking for a way to convert all the files in one hit (in Windows). The files are in a directory structure so it would need to be able to traverse the full folder structure and convert all the files within it.
I've tried a few options but so far nothing has really ticked all the boxes:
ansi2unicode command line utility. This has been the closest to what I'm after as it processes files recursively in a folder structure...but it keeps crashing whilst running before it's finished converting.
CpConverter GUI utility. Works OK to a point but struggles with multiple files in a folder structure - only seems to be able to handle files in one folder
There's a DOS command that works OK on smaller files but doesn't seem to be able to cope with large files.
Tried GnuWin sed utility but it crashes every time I try and install it
So I'm still looking! If anyone has any recommendations I'd be really grateful
Thanks...
OK, so in case anyone else is interested, I found a way to do this using PowerShell:
Get-ChildItem "c:\some path\" -Filter *.csv -recurse |
Foreach-Object {
Write-Host (Get-Date).ToString() $_.FullName
Get-Content $_.FullName | Set-Content -Encoding unicode ($_.FullName + '_unicode.csv')
}
This recurses through the entire folder structure and converts all CSV files to Unicode; the converted files are written to the same locations as the originals but with "unicode" appended to the filename. You can change the value of the -Encoding parameter if you want to convert to something different (e.g. utf-8).
It also outputs a list of all the files converted along with a timestamp against each

How can I replace every occurrence of a String in a file with PowerShell?

This question is similar to earlier question How can I replace every occurence of a String in a file with PowerShell?" except my challenge is to replace the text is multiple files. I tried using the solution in earlier question and used a command similar like below.
(Get-Content .\*.txt).replace("old text", "new text") | Set-Content .\*.txt
It seem to work but the each file size has increased drastically to the total of files in the directory. Although when I open any file it seems to look normal.
Anyone has ideas how to fix it. My litmus test would be I should revert my text changes and file sizes shouldn't change at all.
You must process the files one at a time:
Get-Item *.txt |
ForEach-Object {
$f = $_.FullName; (Get-Content $f).replace("old text", "new text") | Set-Content $f
}
Note that this will fail with completely empty (zero-byte) files.
Also, irrespective of what the encoding of the input files was, the output files will have Default encoding, according to the system's legacy code page (typically, a single-byte, extended-ASCII encoding).
As for what you tried:
(Get-Content .*.txt) sends the lines from all *.txt files as a single array of lines through the pipeline.
Set-Content *.txt then sends that one array (with replacements made) as a whole to every *.txt file in the current directory.

Powershell - Getting a directory to output a file at a time

I'm super new at all of this so please excuse my lack of technical elegance and all around idiocy.
dir c:\Users\me\desktop\Test\*.txt | %{ $sourceFile = $_; get-content $_} | Out-File "$sourceFile.results"
How can I modify this command line so that instead of one file with the contents of all the text files I have a one to one ratio so that each output files represents the contents of each text file?
I realize that this object is ridiculous in terms of application but I'm conceptually trying to piece this together bit by bit so I can really understand.
P.S. What's with the %? Haha another ridiculous question, doesn't seem worth a separate post, what does it do?
dir | % { Out-File -FilePath "new_$($_.Name)" -InputObject (gc $_.FullName) }
only one pipeline needed. this command appends "new_" to the filename because I was using the same directory to write to. You can remove this if it's not needed.

Find and Replace in a Large File

I want to find a piece of text in a large xml file and want to replace with some other text. The size of the file is around (50GB). I want to do this in command line. I am looking at PowerShell and want to know if it can handle the large size.
Currently I am trying something like this but it does not like it
Get-Content C:\File1.xml | Foreach-Object {$_ -replace "xmlns:xsi=\"http:\/\/www\.w3\.org\/2001\/XMLSchema-instance\"", ""} | Set-Content C:\File1.xml
The text I want to replace is xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" with an empty string "".
Questions
Can PowerShell handle large
files
I don't want the replace to happen in
memory and prefer streaming assuming
that will not bring the server to
its knees.
Are there any other approaches I can take (different
tools/strategy?)
Thanks
I had a similar need (and similar lack of powershell experience) but cobbled together a complete answer from the other answers on this page plus a bit more research.
I also wanted to avoid the regex processing, since I didn't need it either -- just a simple string replace -- but on a large file, so I didn't want it loaded into memory.
Here's the command I used (adding linebreaks for readability):
Get-Content sourcefile.txt
| Foreach-Object {$_.Replace('http://example.com', 'http://another.example.com')}
| Set-Content result.txt
Worked perfectly! Never sucked up much memory (it very obviously didn't load the whole file into memory), and just chugged along for a few minutes then finished.
Aside from worrying about reading the file in chunks to avoid loading it into memory, you need to dump to disk often enough that you aren't storing the entire contents of the resulting file in memory.
Get-Content sourcefile.txt -ReadCount 10000 |
Foreach-Object {
$line = $_.Replace('http://example.com', 'http://another.example.com')
Add-Content -Path result.txt -Value $line
}
The -ReadCount <number> sets the number of lines to read at a time. Then the ForEach-Object writes each line as it is read. For a 30GB file filled with SQL Inserts, I topped out around 200MB of memory and 8% CPU. While, piping it all into Set-Content at hit 3GB of memory before I killed it.
It does not like it because you can't read from a file and write back to it at the same time using Get-Content/Set-Content. I recommend using a temp file and then at the end, rename file1.xml to file1.xml.bak and rename the temp file to file1.xml.
Yes as long as you don't try to load the whole file at once. Line-by-line will work but is going to be a bit slow. Use the -ReadCount parameter and set it to 1000 to improve performance.
Which command line? PowerShell? If so then you can invoke your script like so .\myscript.ps1 and if it takes parameters then c:\users\joe\myscript.ps1 c:\temp\file1.xml.
In general for regexes I would use single quotes if you don't need to reference PowerShell variables. Then you only need to worry about regex escaping and not PowerShell escaping as well. If you need to use double-quotes then the back-tick character is the escape char in double-quotes e.g. "`$p1 is set to $ps1". In your example single quoting simplifies your regex to (note: forward slashes aren't metacharacters in regex):
'xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"'
Absolutely you want to stream this since 50GB won't fit into memory. However, this poses an issue if you process line-by-line. What if the text you want to replace is split across multiple lines?
If you don't have the split line issue then I think PowerShell can handle this.
This is my take on it, building on some of the other answers here:
Function ReplaceTextIn-File{
Param(
$infile,
$outfile,
$find,
$replace
)
if( -Not $outfile)
{
$outfile = $infile
}
$temp_out_file = "$outfile.temp"
Get-Content $infile | Foreach-Object {$_.Replace($find, $replace)} | Set-Content $temp_out_file
if( Test-Path $outfile)
{
Remove-Item $outfile
}
Move-Item $temp_out_file $outfile
}
And called like so:
ReplaceTextIn-File -infile "c:\input.txt" -find 'http://example.com' -replace 'http://another.example.com'
The escape character in powershell strings is the backtick ( ` ), not backslash ( \ ). I'd give an example, but the backtick is also used by the wiki markup. :(
The only thing you should have to escape is the quotes - the periods and such should be fine without.