Change value on specific lines of multiple files - powershell

I am testing software which has settings in text files.
Now i need to change a specific line in ~100 files.
I searched hours for it and i am close to a solution. But dont know how to get it done.
A solution in notepad++ would be nice, but i tried it with powershell with the following command:
# File to change
$file = *.dat
# Get file content and store it into $content variable
$content = Get-Content -Path $file
# Replace the line number 40 with "0"
$content[39] = '"0"'
# Set the new content
$content | Set-Content -Path $file
It changes the specific line, but it also adds the data of all the files, in all the files in the folder. So in case of 200 lines the files now have 20000 lines. Every file.
I want to change in all the files linenumber 40:
"0"
change to
"1"
Because there are multiple values with "0" on other lines, i only want to change line 40 in multiple files.

You probably have to iterate over these files. Example:
Get-ChildItem *.dat | ForEach-Object {
$content = Get-Content -Path $_
$content[39] = '"0"'
$content | Set-Content -Path $_
}

Related

How can I (efficiently) match content (lines) of many small files with content (lines) of a single large file and update/recreate them

I've tried solving the following case:
many small text files (in subfolders) need their content (lines) matched to lines that exist in another (large) text file. The small files then need to be updated or copied with those matching Lines.
I was able to come up with some running code for this but I need to improve it or use a complete other method because it is extremely slow and would take >40h to get through all files.
One idea I already had was to use a SQL Server to bulk-import all files in a single table with [relative path],[filename],[jap content] and the translation file in a table with [jap content],[eng content] and then join [jap content] and bulk-export the joined table as separate files using [relative path],[filename]. Unfortunately I got stuck right at the beginning due to formatting and encoding issues so I dropped it and started working on a PowerShell script.
Now in detail:
Over 40k txt files spread across multiple subfolders with multiple lines each, every line can exist in multiple files.
Content:
UTF8 encoded Japanese text that also can contain special characters like \\[*+(), each Line ending with a tabulator character. Sounds like csv files but they don't have headers.
One large File with >600k Lines containing the translation to the small files. Every line is unique within this file.
Content:
Again UTF8 encoded Japanese text. Each line formatted like this (without brackets):
[Japanese Text][tabulator][English Text]
Example:
ใƒ†ใ‚นใƒˆ[1] Test [1]
End result should be a copy or a updated version of all these small files where their lines got replaced with the matching ones of the translation file while maintaining their relative path.
What I have at the moment:
$translationfile = 'B:\Translation.txt'
$inputpath = 'B:\Working'
$translationarray = [System.Collections.ArrayList]#()
$translationarray = #(Get-Content $translationfile -Encoding UTF8)
Get-Childitem -path $inputpath -Recurse -File -Filter *.txt | ForEach-Object -Parallel {
$_.Name
$filepath = ($_.Directory.FullName).substring(2)
$filearray = [System.Collections.ArrayList]#()
$filearray = #(Get-Content -path $_.FullName -Encoding UTF8)
$filearray = $filearray | ForEach-Object {
$result = $using:translationarray -match ("^$_" -replace '[[+*?()\\.]','\$&')
if ($result) {
$_ = $result
}
$_
}
If(!(test-path B:\output\$filepath)) {New-Item -ItemType Directory -Force -Path B:\output\$filepath}
#$("B:\output\"+$filepath+"\")
$filearray | Out-File -FilePath $("B:\output\" + $filepath + "\" + $_.Name) -Force -Encoding UTF8
} -ThrottleLimit 10
I would appreciate any help and ideas but please keep in mind that I rarely write scripts so anything to complex might fly right over my head.
Thanks
As zett42 states, using a hash table is your best option for mapping the Japanese-only phrases to the dual-language lines.
Additionally, use of .NET APIs for file I/O can speed up the operation noticeably.
# Be sure to specify all paths as full paths, not least because .NET's
# current directory usually differs from PowerShell's
$translationfile = 'B:\Translation.txt'
$inPath = 'B:\Working'
$outPath = (New-Item -Type Directory -Force 'B:\Output').FullName
# Build the hashtable mapping the Japanese phrases to the full lines.
# Note that ReadLines() defaults to UTF-8
$ht = #{ }
foreach ($line in [IO.File]::ReadLines($translationfile)) {
$ht[$line.Split("`t")[0] + "`t"] = $line
}
Get-ChildItem $inPath -Recurse -File -Filter *.txt | Foreach-Object -Parallel {
# Translate the lines to the matching lines including the $translation
# via the hashtable.
# NOTE: If an input line isn't represented as a key in the hashtable,
# it is passed through as-is.
$lines = foreach ($line in [IO.File]::ReadLines($_.FullName)) {
($using:ht)[$line] ?? $line
}
# Synthesize the output file path, ensuring that the target dir. exists.
$outFilePath = (New-Item -Force -Type Directory ($using:outPath + $_.Directory.FullName.Substring(($using:inPath).Length))).FullName + '/' + $_.Name
# Write to the output file.
# Note: If you want UTF-8 files *with BOM*, use -Encoding utf8bom
Set-Content -Encoding utf8 $outFilePath -Value $lines
} -ThrottleLimit 10
Note: Your use of ForEach-Object -Parallel implies that you're using PowerShell [Core] 7+, where BOM-less UTF-8 is the consistent default encoding (unlike in Window PowerShell, where default encodings vary wildly).
Therefore, in lieu of the .NET [IO.File]::ReadLines() API in a foreach loop, you could also use the more PowerShell-idiomatic switch statement with the -File parameter for efficient line-by-line text-file processing.

Trouble reading last line of CSV

I am getting CSV files (with no header) from another system. The last line ends the file, (there is not a newline after the last line of data). When I try Import-CSV, it will not read the last line of the file.
I do not have the ability to have the input file changed to include the newline.
I have noticed that the Get-Content doesn't have a problem reading the entire file, but then it isn't a CSV and I'm unable to reference the fields in the file.
Currently I'm doing:
$w = Import-CSV -path c:\temp\input.txt -header 'head1', 'head2', 'head3'
This will not read the last line of the file
This reads the entire file:
$w = Get-Content -path c:\temp\input.txt
But the data doesn't have the ability to reference the fields like: $w.head1
Is there a way to get Import-CSV to read the file including the last line?
OR Is there a way to read in the data using Get-Content, adding a header to it and then converting it back to a CSV?
I've tried use ConvertTo-CSV but have not had success:
$w = Get-Content -path c:\temp\input.txt
$csvdata = $w | ConvertTo-CSV # No header option for this function
I'd rather not create an intermediate file unless absolutely necessary.
You're very close! What you're after is not ConvertTo-Csv, you already have the file contents in CSV-format after all. So change that to ConvertFrom-Csv instead, which incidentally does support the -Headers parameter. So something like this:
$w = Get-Content -path c:\temp\input.txt
$csvdata = $w | ConvertFrom-Csv -Header 'head1', 'head2', 'head3'
If I understand correctly, you know the number of columns in the file and all it is missing is a header line. Since in your code you do not specify a -Delimiter parameter I'm assuming the delimiter character used in the file is a comma.
Best thing to do IMHO is to create a new output file and always keep the original.
$fileIn = 'c:\temp\input.txt'
$fileOut = 'c:\temp\input.csv'
# write the header line to a new file
Set-Content -Path $fileOut -Value 'head1,head2,head3'
# read the original file and append it to the one you have just created
Get-Content -Path $fileIn -Raw | Add-Content -Path $fileOut
If your file is really large, below a faster alternative:
$fileIn = 'c:\temp\input.txt'
$fileOut = 'c:\temp\input.csv'
# write the header line to a new file
Set-Content -Path $fileOut -Value 'head1,head2,head3'
# read the original file and append it to the one you have just created
[System.IO.File]::AppendAllText($fileOut, ([System.IO.File]::ReadAllText($fileIn)))
If you really do want to take the risk and overwrite the original file, you can do this:
$file = 'c:\temp\input.txt'
$content = Get-Content -Path $fileIn -Raw
# write the header line to a the file destroying what was in there
Set-Content -Path $file -Value 'head1,head2,head3'
# append the original content to it
$content | Add-Content -Path $file

Powershell to Break up CSV by Number of Row

So I am now tasked with getting constant reports that are more than 1 Million lines long.
My last question did not explain all things so I'm tryin got do a better question.
I'm getting a dozen + daily reports that are coming in as CSV files. I don't know what the headers are or anything like that as I get them.
They are huge. I cant open in excel.
I wanted to basically break them up into the same report, just each report maybe 100,000 lines long.
The code I wrote below does not work as I keep getting a
Exception of type 'System.OutOfMemoryException' was thrown.
I am guessing I need a better way to do this.
I just need this file broken down to a more manageable size.
It does not matter how long it takes as I can run it over night.
I found this on the internet, and I tried to manipulate it, but I cant get it to work.
$PSScriptRoot
write-host $PSScriptRoot
$loc = $PSScriptRoot
$location = $loc
# how many rows per CSV?
$rowsMax = 10000;
# Get all CSV under current folder
$allCSVs = Get-ChildItem "$location\Split.csv"
# Read and split all of them
$allCSVs | ForEach-Object {
Write-Host $_.Name;
$content = Import-Csv "$location\Split.csv"
$insertLocation = ($_.Name.Length - 4);
for($i=1; $i -le $content.length ;$i+=$rowsMax){
$newName = $_.Name.Insert($insertLocation, "splitted_"+$i)
$content|select -first $i|select -last $rowsMax | convertto-csv -NoTypeInformation | % { $_ -replace '"', ""} | out-file $location\$newName -fo -en ascii
}
}
The key is not to read large files into memory in full, which is what you're doing by capturing the output from Import-Csv in a variable ($content = Import-Csv "$location\Split.csv").
That said, while using a single pipeline would solve your memory problem, performance will likely be poor, because you're converting from and back to CSV, which incurs a lot of overhead.
Even reading and writing the files as text with Get-Content and Set-Content is slow, however.
Therefore, I suggest a .NET-based approach for processing the files as text, which should substantially speed up processing.
The following code demonstrates this technique:
Get-ChildItem $PSScriptRoot/*.csv | ForEach-Object {
$csvFile = $_.FullName
# Construct a file-path template for the sequentially numbered chunk
# files; e.g., "...\file_split_001.csv"
$csvFileChunkTemplate = $csvFile -replace '(.+)\.(.+)', '$1_split_{0:000}.$2'
# Set how many lines make up a chunk.
$chunkLineCount = 10000
# Read the file lazily and save every chunk of $chunkLineCount
# lines to a new file.
$i = 0; $chunkNdx = 0
foreach ($line in [IO.File]::ReadLines($csvFile)) {
if ($i -eq 0) { ++$i; $header = $line; continue } # Save header line.
if ($i++ % $chunkLineCount -eq 1) { # Create new chunk file.
# Close previous file, if any.
if (++$chunkNdx -gt 1) { $fileWriter.Dispose() }
# Construct the file path for the next chunk, by
# instantiating the template with the next sequence number.
$csvFileChunk = $csvFileChunkTemplate -f $chunkNdx
Write-Verbose "Creating chunk: $csvFileChunk"
# Create the next chunk file and write the header.
$fileWriter = [IO.File]::CreateText($csvFileChunk)
$fileWriter.WriteLine($header)
}
# Write a data row to the current chunk file.
$fileWriter.WriteLine($line)
}
$fileWriter.Dispose() # Close the last file.
}
Note that the above code creates BOM-less UTF-8 files; if your input contains ASCII-range characters only, these files will effectively be ASCII files.
Here's the equivalent single-pipeline solution, which is likely to be substantially slower.
Get-ChildItem $PSScriptRoot/*.csv | ForEach-Object {
$csvFile = $_.FullName
# Construct a file-path template for the sequentially numbered chunk
# files; e.g., ".../file_split_001.csv"
$csvFileChunkTemplate = $csvFile -replace '(.+)\.(.+)', '$1_split_{0:000}.$2'
# Set how many lines make up a chunk.
$chunkLineCount = 10000
$i = 0; $chunkNdx = 0
Get-Content -LiteralPath $csvFile | ForEach-Object {
if ($i -eq 0) { ++$i; $header = $_; return } # Save header line.
if ($i++ % $chunkLineCount -eq 1) { #
# Construct the file path for the next chunk.
$csvFileChunk = $csvFileChunkTemplate -f ++$chunkNdx
Write-Verbose "Creating chunk: $csvFileChunk"
# Create the next chunk file and write the header.
Set-Content -Encoding ASCII -LiteralPath $csvFileChunk -Value $header
}
# Write data row to the current chunk file.
Add-Content -Encoding ASCII -LiteralPath $csvFileChunk -Value $_
}
}
Another option from linux world - split command. To get it on windows just install git bash, then you'll be able to use many linux tools in your CMD/powershell.
Below is the syntax to achieve your goal:
split -l 100000 --numeric-suffixes --suffix-length 3 --additional-suffix=.csv sourceFile.csv outputfile
It's very fast. If you want you can wrap split.exe as a cmdlet

How to read particular range to last line of file using PowerShell?

How to read particular range of line ie if my file have 100 lines and I want to read the lines from 80 to last line of file using PowerShell?here I'm not sure about how many lines are available in the file.
Just do it :
Get-Content "C:\temp\test.txt" | select -skip 80
Or just get the last 20 lines:
Get-Content -Path 'C:\temp\test.txt' -Tail 20
If you want to find out how many lines the file has so that you can use that to decide what to read, this could help:
$lines = Get-Content -Path 'C:\temp\test.txt'
$lines.count
So if you decide you want to get for example the last half of the lines you could do something like this:
$lines = Get-Content -Path 'C:\temp\test.txt'
$half = [math]::Round($lines.count/2)
Select-Object -InputObject $lines -Last $half

find and delete lines without string pattern in text files

I'm trying to find out how to use powershell to find and delete lines without certain string pattern in a set of files. For example, I have the following text file:
111111
22x222
333333
44x444
This needs to be turned into:
22x222
44x444
given that the string pattern of 'x' is not in any of the other lines.
How can I issue such a command in powershell to process a bunch of text files?
thanks.
dir | foreach { $out = cat $_ | select-string x; $out | set-content $_ }
The dir command lists the files in the current directory; the foreach goes through each file; cat reads the file and pipes into select-string; select-string finds the lines that contains the specific pattern, which in this case is "x"; the result of select-string is stored in $out; and finally, $out is written to the same file with set-content.
We need the temporary variable $out because you cannot read and write the same file at the same time.
This will process all txt files from the working directory. Each file content is checked and only lines that have 'x' in them are allowed to pass on. The result is written back to the file.
Get-ChildItem *.txt | ForEach-Object{
$content = Get-Content $_.FullName | Where-Object {$_ -match 'x'}
$content | Out-File $_.FullName
}