Batch modifying directory with Powershell - powershell

I need to update directory in many txt files
Input files:
1.txt
using c:\data\1.dta
its own data
2.txt
using c:\data\2.dta
its own data
3.txt
using c:\data\3.dta
its own data
Expected Output files:
1.txt
using C:\Data\Subfile\1.dta
its own data
2.txt
using C:\Data\Subfile\2.dta
its own data
3.txt
using C:\Data\Subfile\3.dta
its own data
I've tried -replace but the results are strange: either all files have the same result or have all new directories (sell bellow)
I want to update the oldpath into the newpath in all files. The code is as following:
$pathway='C:Data\Subfile\*.txt'
$oldpath='c:\\data\\'
$newpath='C:\Data\Subfile\'
$content=Get-Content -path $pathway
Method 1:
$newline=((Get-Content -path $pathway -TotalCount 1) -replace $oldpath,$newpath)
$content[0]= $newline
This method will include all updated directories in every file:
Wrong output:
1.txt
using C:\Data\Subfile\1.txt
using C:\Data\Subfile\2.txt
using C:\Data\Subfile\3.txt
its own data
2.txt
using C:\Data\Subfile\1.txt
using C:\Data\Subfile\2.txt
using C:\Data\Subfile\3.txt
its own data
Method 2:
$content[0]=$content[0]-replace $oldpath,$newpath
This method will cause all file has the same new directory:
Wrong output:
1.txt
using C:\Data\Subfile\1.txt
its own data
2.txt
using C:\Data\Subfile\1.txt
its own data
3.txt
using C:\Data\Subfile\1.txt
its own data
$content | Set-Content -Path $pathway
Can someone help me with that? I want each file has its corresponding new directory. For 1.txt I want C:\Data\Subfile\1.txt, for 2.txt I want C:\Data\Subfile\2.txt etc.
Thanks a lot!

I'm a bit unclear on what you want the final content to be. Is it using C:\Data\Subfile\1.txt or using C:\Data\Subfile\1.dta? I think you are asking for the following but if not let me know. You may run into speed / performance issues depending on how large your files are.
If these are your input files with their content:
C:\data\Subfile\1.txt
using c:\data\1.dta
its own data...
C:\data\Subfile\2.txt
using c:\data\2.dta
its own data...
C:\data\Subfile\3.txt
using c:\data\3.dta
its own data...
then this:
Get-ChildItem c:\data\Subfile\*.txt | Foreach-Object{
#Read in all content lines and replace c:\data\ with c:\data\subfile
$content = Get-Content $_.FullName | %{$_ -replace 'c:\\Data\\', 'c:\Data\Subfile\' }
#write the new data to file
$content | Set-Content $_.FullName
}
This results in the following:
C:\data\Subfile\1.txt
using c:\Data\Subfile\1.dta
its own data...
C:\data\Subfile\2.txt
using c:\Data\Subfile\2.dta
its own data...
C:\data\Subfile\3.txt
using c:\Data\Subfile\3.dta
its own data...

With lookarounds you can precisely define where to insert text without repeating the search pattern.
foreach ($File in Get-ChildItem 'C:\Data\Subfile\*.txt'){
(Get-Content $File -raw) -replace "(?<=C:\\data\\)(?=\d\.dta)","Subfile\" |
Set-Content $File
}
"(?<=C:\\data\\) is a positive lookbehind zero length assertion,
(?=\d\.dta) is a positive lookahead zero length assertion,
the replacement text is inserted in between these two.
this is more secure than other approaches as it is repeatable without inserting Subfile\ again.

here is a way to do the job. [grin] what it does ...
between the #region/#endregion markers is just to make the files to work with
reads the list of files
iterates thru that list
loads the content of each one
replaces the old dir with the new one
finally writes out the new content
here's the code ...
#region - Make files to work with
$Null = New-Item -Path "$env:TEMP\TestFiles" -ItemType Directory -ErrorAction SilentlyContinue
$1stFileName = "$env:TEMP\TestFiles\1.txt"
$1stFileContent = #'
using c:\data\1.dta
its own data
'# -split [System.Environment]::NewLine |
Set-Content -LiteralPath $1stFileName
$2ndFileName = "$env:TEMP\TestFiles\2.txt"
$2ndFileContent = #'
using c:\data\2.dta
its own data
'# -split [System.Environment]::NewLine |
Set-Content -LiteralPath $2ndFileName
$3rdFileName = "$env:TEMP\TestFiles\3.txt"
#'
using c:\data\3.dta
its own data
'# -split [System.Environment]::NewLine |
Set-Content -LiteralPath $3rdFileName
#endregion - Make files to work with
$OldDir = 'c:\data'
$NewDir = 'c:\data\SubDir'
$SourceDir = "$env:TEMP\TestFiles"
$FileList = Get-ChildItem -LiteralPath $SourceDir -Filter '*.txt' -File
foreach ($FL_Item in $FileList)
{
$NewContent = Get-Content -LiteralPath $FL_Item.FullName |
ForEach-Object {
$_.Replace($OldDir, $NewDir)
}
$NewContent |
Set-Content -LiteralPath $FL_Item.FullName
}
content of file 1.txt before & after the script runs ...
# before ...
using c:\data\1.dta
its own data
# after ...
using c:\data\SubDir\1.dta
its own data

Related

Replacing \x00 (ASCII 0 , NUL) with empty string for a huge csv using PowerShell

I have this code that works like a charm for small files. It just dumps the whole file into memory, replaces NUL and writes back to the same file. This is not really very practical for huge files when file size is larger than the available memory. Can someone help me convert it to a streaming model such that it won't choke for huge files.
Get-ChildItem -Path "Drive:\my\folder\path" -Depth 2 -Filter *.csv |
Foreach-Object {
$content = Get-Content $_.FullName
#Replace NUL and save content back to the original file
$content -replace "`0","" | Set-Content $_.FullName
}
The way you have this structured the entire file contents have to be read into memory. Note: That reading a file into memory uses 3-4x the file size in RAM, which's documented here.
Without getting into .Net classes, particularly [System.IO.StreamReader], Get-Content is actually very memory efficient, you just have to leverage the pipeline so you don't build up the data in memory.
Note: if you do decide to try StreamReader, the article will give you some syntax clues. Moreover, that topic has been covered by many others on the web.
Get-ChildItem -Path "C:\temp" -Depth 2 -Filter *.csv |
ForEach-Object{
$CurrentFile = $_
$TmpFilePath = Join-Path $CurrentFile.Directory.FullName ($CurrentFile.BaseName + "_New" + $CurrentFile.Extension)
Get-Content $CurrentFile.FullName |
ForEach-Object{ $_ -replace "`0","" } |
Add-Content $TmpFilePath
# Now that you've got the new file you can rename it & delete the original:
Remove-Item -Path $CurrentFile.FullName
Rename-Item -Path $TmpFilePath -NewName $CurrentFile.Name
}
This is a streaming model, Get-Content is streaming inside the outer ForEach-Object loop. There may be other ways to do it, but I chose this so I could keep track of the names and do the file swap at the end...
Note: Per the same article, in terms of speed Get-Content is quite slow. However, your original code was likely already suffering that burden. Moreover, you can speed it up a bit using the -ReadCount XXXX parameter. That will send some number of lines down the pipe at a time. That of course does use more memory, so you'd have to find a level that helps you say within the boundaries of your available RAM. Performance improvement with -ReadCount is mentioned in this answer's comments.
Update Based on Comments:
Here's an example of using StreamReader/Writer to perform the same operations from the previous example. This should be just as memory efficient as Get-Content, but should be much faster.
Get-ChildItem -Path "C:\temp" -Depth 2 -Filter *.csv |
ForEach-Object{
$CurrentFile = $_.FullName
$CurrentName = $_.Name
$TmpFilePath = Join-Path $_.Directory.FullName ($_.BaseName + "_New" + $_.Extension)
$StreamReader = [System.IO.StreamReader]::new( $CurrentFile )
$StreamWriter = [System.IO.StreamWriter]::new( $TmpFilePath )
While( !$StreamReader.EndOfStream )
{
$StreamWriter.WriteLine( ($StreamReader.ReadLine() -replace "`0","") )
}
$StreamReader.Close()
$StreamWriter.Close()
# Now that you've got the new file you can rename it & delete the original:
Remove-Item -Path $CurrentFile
Rename-Item -Path $TmpFilePath -NewName $CurrentName
}
Note: I have some sense this issue is rooted in encoding. The Stream constructors do accept an encoding enum as an argument.
Available Encodings:
[System.Text.Encoding]::BigEndianUnicode
[System.Text.Encoding]::Default
[System.Text.Encoding]::Unicode
[System.Text.Encoding]::UTF32
[System.Text.Encoding]::UTF7
[System.Text.Encoding]::UTF8
So if you wanted to instantiate the streams with, for example, UTF8:
$StreamReader = [System.IO.StreamReader]::new( $CurrentFile, [System.Text.Encoding]::UTF8 )
$StreamWriter = [System.IO.StreamWriter]::new( $TmpFilePath, [System.Text.Encoding]::UTF8 )
The streams do default to UTF8. I think the system default is typically code page Windows 1251.
This would be the simplest way using the least memory, one line at a time, to another file. But it needs double the disk space.
get-content file.txt | % { $_ -replace "`0" } | set-content file2.txt

How can I (efficiently) match content (lines) of many small files with content (lines) of a single large file and update/recreate them

I've tried solving the following case:
many small text files (in subfolders) need their content (lines) matched to lines that exist in another (large) text file. The small files then need to be updated or copied with those matching Lines.
I was able to come up with some running code for this but I need to improve it or use a complete other method because it is extremely slow and would take >40h to get through all files.
One idea I already had was to use a SQL Server to bulk-import all files in a single table with [relative path],[filename],[jap content] and the translation file in a table with [jap content],[eng content] and then join [jap content] and bulk-export the joined table as separate files using [relative path],[filename]. Unfortunately I got stuck right at the beginning due to formatting and encoding issues so I dropped it and started working on a PowerShell script.
Now in detail:
Over 40k txt files spread across multiple subfolders with multiple lines each, every line can exist in multiple files.
Content:
UTF8 encoded Japanese text that also can contain special characters like \\[*+(), each Line ending with a tabulator character. Sounds like csv files but they don't have headers.
One large File with >600k Lines containing the translation to the small files. Every line is unique within this file.
Content:
Again UTF8 encoded Japanese text. Each line formatted like this (without brackets):
[Japanese Text][tabulator][English Text]
Example:
ใƒ†ใ‚นใƒˆ[1] Test [1]
End result should be a copy or a updated version of all these small files where their lines got replaced with the matching ones of the translation file while maintaining their relative path.
What I have at the moment:
$translationfile = 'B:\Translation.txt'
$inputpath = 'B:\Working'
$translationarray = [System.Collections.ArrayList]#()
$translationarray = #(Get-Content $translationfile -Encoding UTF8)
Get-Childitem -path $inputpath -Recurse -File -Filter *.txt | ForEach-Object -Parallel {
$_.Name
$filepath = ($_.Directory.FullName).substring(2)
$filearray = [System.Collections.ArrayList]#()
$filearray = #(Get-Content -path $_.FullName -Encoding UTF8)
$filearray = $filearray | ForEach-Object {
$result = $using:translationarray -match ("^$_" -replace '[[+*?()\\.]','\$&')
if ($result) {
$_ = $result
}
$_
}
If(!(test-path B:\output\$filepath)) {New-Item -ItemType Directory -Force -Path B:\output\$filepath}
#$("B:\output\"+$filepath+"\")
$filearray | Out-File -FilePath $("B:\output\" + $filepath + "\" + $_.Name) -Force -Encoding UTF8
} -ThrottleLimit 10
I would appreciate any help and ideas but please keep in mind that I rarely write scripts so anything to complex might fly right over my head.
Thanks
As zett42 states, using a hash table is your best option for mapping the Japanese-only phrases to the dual-language lines.
Additionally, use of .NET APIs for file I/O can speed up the operation noticeably.
# Be sure to specify all paths as full paths, not least because .NET's
# current directory usually differs from PowerShell's
$translationfile = 'B:\Translation.txt'
$inPath = 'B:\Working'
$outPath = (New-Item -Type Directory -Force 'B:\Output').FullName
# Build the hashtable mapping the Japanese phrases to the full lines.
# Note that ReadLines() defaults to UTF-8
$ht = #{ }
foreach ($line in [IO.File]::ReadLines($translationfile)) {
$ht[$line.Split("`t")[0] + "`t"] = $line
}
Get-ChildItem $inPath -Recurse -File -Filter *.txt | Foreach-Object -Parallel {
# Translate the lines to the matching lines including the $translation
# via the hashtable.
# NOTE: If an input line isn't represented as a key in the hashtable,
# it is passed through as-is.
$lines = foreach ($line in [IO.File]::ReadLines($_.FullName)) {
($using:ht)[$line] ?? $line
}
# Synthesize the output file path, ensuring that the target dir. exists.
$outFilePath = (New-Item -Force -Type Directory ($using:outPath + $_.Directory.FullName.Substring(($using:inPath).Length))).FullName + '/' + $_.Name
# Write to the output file.
# Note: If you want UTF-8 files *with BOM*, use -Encoding utf8bom
Set-Content -Encoding utf8 $outFilePath -Value $lines
} -ThrottleLimit 10
Note: Your use of ForEach-Object -Parallel implies that you're using PowerShell [Core] 7+, where BOM-less UTF-8 is the consistent default encoding (unlike in Window PowerShell, where default encodings vary wildly).
Therefore, in lieu of the .NET [IO.File]::ReadLines() API in a foreach loop, you could also use the more PowerShell-idiomatic switch statement with the -File parameter for efficient line-by-line text-file processing.

Trouble reading last line of CSV

I am getting CSV files (with no header) from another system. The last line ends the file, (there is not a newline after the last line of data). When I try Import-CSV, it will not read the last line of the file.
I do not have the ability to have the input file changed to include the newline.
I have noticed that the Get-Content doesn't have a problem reading the entire file, but then it isn't a CSV and I'm unable to reference the fields in the file.
Currently I'm doing:
$w = Import-CSV -path c:\temp\input.txt -header 'head1', 'head2', 'head3'
This will not read the last line of the file
This reads the entire file:
$w = Get-Content -path c:\temp\input.txt
But the data doesn't have the ability to reference the fields like: $w.head1
Is there a way to get Import-CSV to read the file including the last line?
OR Is there a way to read in the data using Get-Content, adding a header to it and then converting it back to a CSV?
I've tried use ConvertTo-CSV but have not had success:
$w = Get-Content -path c:\temp\input.txt
$csvdata = $w | ConvertTo-CSV # No header option for this function
I'd rather not create an intermediate file unless absolutely necessary.
You're very close! What you're after is not ConvertTo-Csv, you already have the file contents in CSV-format after all. So change that to ConvertFrom-Csv instead, which incidentally does support the -Headers parameter. So something like this:
$w = Get-Content -path c:\temp\input.txt
$csvdata = $w | ConvertFrom-Csv -Header 'head1', 'head2', 'head3'
If I understand correctly, you know the number of columns in the file and all it is missing is a header line. Since in your code you do not specify a -Delimiter parameter I'm assuming the delimiter character used in the file is a comma.
Best thing to do IMHO is to create a new output file and always keep the original.
$fileIn = 'c:\temp\input.txt'
$fileOut = 'c:\temp\input.csv'
# write the header line to a new file
Set-Content -Path $fileOut -Value 'head1,head2,head3'
# read the original file and append it to the one you have just created
Get-Content -Path $fileIn -Raw | Add-Content -Path $fileOut
If your file is really large, below a faster alternative:
$fileIn = 'c:\temp\input.txt'
$fileOut = 'c:\temp\input.csv'
# write the header line to a new file
Set-Content -Path $fileOut -Value 'head1,head2,head3'
# read the original file and append it to the one you have just created
[System.IO.File]::AppendAllText($fileOut, ([System.IO.File]::ReadAllText($fileIn)))
If you really do want to take the risk and overwrite the original file, you can do this:
$file = 'c:\temp\input.txt'
$content = Get-Content -Path $fileIn -Raw
# write the header line to a the file destroying what was in there
Set-Content -Path $file -Value 'head1,head2,head3'
# append the original content to it
$content | Add-Content -Path $file

Powershell: Logging foreach changes

I have put together a script inspired from a number of sources. The purpose of the powershell script is to scan a directory for files (.SQL), copy all of it to a new directory (retain the original), and scan each file against a list file (CSV format - containing 2 columns: OldValue,NewValue), and replace any strings that matches. What works: moving, modifying, log creation.
What doesn't work:
Recording in the .log for the changes made by the script.
Sample usage: .\ConvertSQL.ps1 -List .\EVar.csv -Files \SQLFiles\Rel_1
Param (
[String]$List = "*.csv",
[String]$Files = "*.sql"
)
function Get-TimeStamp {
return "[{0:dd/MM/yyyy} {0:HH:mm:ss}]" -f (Get-Date)
}
$CustomFiles = "$Files\CUSTOMISED"
IF (-Not (Test-Path $CustomFiles))
{
MD -Path $CustomFiles
}
Copy-Item "$Files\*.sql" -Recurse -Destination "$CustomFiles"
$ReplacementList = Import-Csv $List;
Get-ChildItem $CustomFiles |
ForEach-Object {
$LogFile = "$CustomFiles\$_.$(Get-Date -Format dd_MM_yyyy).log"
Write-Output "$_ has been modified on $(Get-TimeStamp)." | Out-File "$LogFile"
$Content = Get-Content -Path $_.FullName;
foreach ($ReplacementItem in $ReplacementList)
{
$Content = $Content.Replace($ReplacementItem.OldValue, $ReplacementItem.NewValue)
}
Set-Content -Path $_.FullName -Value $Content
}
Thank you very much.
Edit: I've cleaned up a bit and removed my test logging files.
Here's the snippet of code that I've been testing with little success. I put the following right under $Content= Content.Replace($ReplacementItem.OldValue, $ReplacementItem.NewValue)
if ( $_.FullName -like '*TEST*' ) {
"This is a test." | Add-Content $LogFile
}
I've also tried to pipe out the Set-Content using Out-File. The outputs I end up with are either a full copy of the contents of my CSV file or the SQL file itself. I'll continue reading up on different methods. I simply want to, out of hundreds to a thousand or so lines, to be able to identify what variables in the SQL has been changed.
Instead of piping output to Add-Content, pipe the log output to: Out-File -Append
Edit: compare the content using the Compare-Object cmdlet and evaluate it's ouput to identify where the content in each string object differs.

Replace letters in list of txt files - powershell

Hi I'm trying to change the letters in various files (which I have in listed in a text file) I can go through the files individually using the command below, but I was wondering if there's a way to loop through the list amending each of the file contents.
example I'd like to change test-pop-test to test-bar-test and this is the content of several files, not the name of the file.
The code I am using is below, amending the names before running.
(Get-Content c:\temp\list.txt) | ForEach-Object { $_ -replace "pop", "bar" } | Set-Content c:\temp\test2.txt
Each object is a text file that I would like it to loop through, so list.txt contains a list of text files where the contents are to be amended, not sure if I explained this very well.. :)
Thanks in advance :)
A basic example:
$Temp = Get-ChildItem C:\Temp -Force
ForEach ($f in $Temp)
{
(Get-Content $f) |
% { $_ -replace 'pop', 'bar' } |
Set-Content $f
}