Replace multiple character strings at once using a replace matrix - powershell

Within the content of a couple large text files, I am aiming to replace all occurrences of a specific character string with a new character string, simultaneously for 300 different character strings.
Is there any way I can do this using a comma or tab-separated search-and-replace matrix such as this? (the actual character strings vary widely in their length and type of characters, but does not contain , or TAB)
currentstring1,newstring1
currentstring2,newstring2
currentstring3,newstring3
aB9_./cdef,newstring4
.
currentstring300,newstring300

Here is something to get you started. If the replacement file is ~300 lines, then Import-Csv should be ok. However, if the file in which to replace strings is large, Get-Content will be a problem. It will try to read the entire file into memory. You will need to iterate over the file reading line-by-line.
[cmdletbinding()]
Param()
$thefile = './largetextfile.txt'
$replfile = './repl.txt'
$reps = Import-Csv -Path $replfile -Header orgstring,repstring
foreach ($rep in $reps) {
Write-Verbose $rep
}
$lines = Get-Content -Path $thefile
foreach ($line in $lines) {
Write-Verbose $line
$newline = $line
foreach ($rep in $reps) {
$newline = $newline -replace $rep.orgstring,$rep.repstring
}
Write-Verbose $newline
}

On the server, unix: 1. Make the rename matrix as below in a text editor, then copy it. 2. In the server dir where the files located, paste the multi-line rename matrix as is. 3. Enter. 4. Some characters (like slashes) may need to be escaped if present in the string, and the * at the end may be replaced to specify files.
perl -pi -e 's/FINDTEXT1/REPLACETEXT1/g' *
perl -pi -e 's/FINDTEXT2/REPLACETEXT2/g' *
perl -pi -e 's/FINDTEXT3/REPLACETEXT3/g' *

Related

Re-assembling split file names with Powershell

I'm having trouble re-assembling certain filenames (and discarding the rest) from a text file. The filenames are split up (usually on three lines) and there is always a blank line after each filename. I only want to keep filenames that begin with OPEN or FOUR. An example is:
OPEN.492820.EXTR
A.STANDARD.38383
333
FOUR.383838.282.
STAND.848484.NOR
MAL.3939
CLOSE.3480384.ST
ANDARD.39393939.
838383
The output I'd like would be:
OPEN.492820.EXTRA.STANDARD.38383333
FOUR.383838.282.STAND.848484.NORMAL.3939
Thanks for any suggestions!
The following worked for me, you can give it a try.
See https://regex101.com/r/JuzXOb/1 for the Regex explanation.
$source = 'fullpath/to/inputfile.txt'
$destination = 'fullpath/to/resultfile.txt'
[regex]::Matches(
(Get-Content $source -Raw),
'(?msi)^(OPEN|FOUR)(.*?|\s*?)+([\r\n]$|\z)'
).Value.ForEach({ -join($_ -split '\r?\n').ForEach('Trim') }) |
Out-File $destination
For testing:
$txt = #'
OPEN.492820.EXTR
A.STANDARD.38383
333
FOUR.383838.282.
STAND.848484.NOR
MAL.3939
CLOSE.3480384.ST
ANDARD.39393939.
838383
OPEN.492820.EXTR
A.EXAMPLE123
FOUR.383838.282.
STAND.848484.123
ZXC
'#
[regex]::Matches(
$txt,
'(?msi)^(OPEN|FOUR)(.*?|\s*?)+([\r\n]$|\z)'
).Value.ForEach({ -join($_ -split '\r?\n').ForEach('Trim') })
Output:
OPEN.492820.EXTRA.STANDARD.38383333
FOUR.383838.282.STAND.848484.NORMAL.3939
OPEN.492820.EXTRA.EXAMPLE123
FOUR.383838.282.STAND.848484.123ZXC
Read the file one line at a time and keep concatenating them until you encounter a blank line, at which point you output the concatenated string and repeat until you reach the end of the file:
# this variable will keep track of the partial file names
$fileName = ''
# use a switch to read the file and process each line
switch -Regex -File ('path\to\file.txt') {
# when we see a blank line...
'^\s*$' {
# ... we output it if it starts with the right word
if($s -cmatch '^(OPEN|FOUR)'){ $fileName }
# and then start over
$fileName = ''
}
default {
# must be a non-blank line, concatenate it to the previous ones
$s += $_
}
}
# remember to check and output the last one
if($s -cmatch '^(OPEN|FOUR)'){
$fileName
}

Powershell - Count number of carriage returns line feed in .txt file

I have a large text file (output from SQL db) and I need to determine the row count. However, since the source SQL data itself contains carriage returns \r and line feeds \n (NEVER appearing together), the data for some rows spans multiple lines in the output .txt file. The Powershell I'm using below gives me the file line count which is greater than the actual SQL row count. So I need to modify the script to ignore the additional lines - one way of doing it might be just counting the number of times CRLF or \r\n occurs (TOGETHER) in the file and that should be the actual number of rows but I'm not sure how to do it.
Get-ChildItem "." |% {$n = $_; $c = 0; Get-Content -Path $_ -ReadCount 1000 |% { $c += $_.Count }; "$n; $c"} > row_count.txt
I just learned myself that the Get-Content splits and streams each lines in a file by CR, CRLF, and LF sothat it can read data between operating systems interchangeably:
"1`r2`n3`r`n4" | Out-File .\Test.txt
(Get-Content .\Test.txt).Count
4
Reading the question again, I might have misunderstood your question.
In any case, if you want to split (count) on only a specific character combination:
CR
((Get-Content -Raw .\Test.txt).Trim() -Split '\r').Count
3
LF
((Get-Content -Raw .\Test.txt).Trim() -Split '\n').Count
3
CRLF
((Get-Content -Raw .\Test.txt).Trim() -Split '\r\n').Count # or: -Split [Environment]::NewLine
2
Note .Trim() method which removes the extra newline (white spaces) at the end of the file added by the Get-Content -Raw parameter.
Addendum
(Update based on the comment on the memory exception)
I am afraid that there is currently no other option then building your own StreamReader using the ReadBlock method and specifically split lines on a CRLF. I have opened a feature request for this issue: -NewLine Parameter to customize line separator for Get-Content
Get-Lines
A possible way to workaround the memory exception errors:
function Get-Lines {
[CmdletBinding()][OutputType([string])] param(
[Parameter(ValueFromPipeLine = $True)][string] $Filename,
[String] $NewLine = [Environment]::NewLine
)
Begin {
[Char[]] $Buffer = new-object Char[] 10
$Reader = New-Object -TypeName System.IO.StreamReader -ArgumentList (Get-Item($Filename))
$Rest = '' # Note that a multiple character newline (as CRLF) could be split at the end of the buffer
}
Process {
While ($True) {
$Length = $Reader.ReadBlock($Buffer, 0, $Buffer.Length)
if (!$length) { Break }
$Split = ($Rest + [string]::new($Buffer[0..($Length - 1)])) -Split $NewLine
If ($Split.Count -gt 1) { $Split[0..($Split.Count - 2)] }
$Rest = $Split[-1]
}
}
End {
$Rest
}
}
Usage
To prevent the memory exceptions it is important that you do not assign the results to a variable or use brackets as this will stall the PowerShell PowerShell pipeline and store everything in memory.
$Count = 0
Get-Lines .\Test.txt | ForEach-Object { $Count++ }
$Count
The System.IO.StreamReader.ReadBlock solution that reads the file in fixed-size blocks and performs custom splitting into lines in iRon's helpful answer is the best choice, because it both avoids out-of-memory problems and performs well (by PowerShell standards).
If performance in terms of execution speed isn't paramount, you can take advantage of
Get-Content's -Delimiter parameter, which accepts a custom string to split the file content by:
# Outputs the count of CRLF-terminated lines.
(Get-Content largeFile.txt -Delimiter "`r`n" | Measure-Object).Count
Note that -Delimiter employs optional-terminator logic when splitting: that is, if the file content ends in the given delimiter string, no extra, empty element is reported at the end.
This is consistent with the default behavior, where a trailing newline in a file is considered an optional terminator that does not resulting in an additional, empty line getting reported.
However, in case a -Delimiter string that is unrelated to newline characters is used, a trailing newline is considered a final "line" (element).
A quick example:
# Create a test file without a trailing newline.
# Note the CR-only newline (`r) after 'line 1'
"line1`rrest of line1`r`nline2" | Set-Content -NoNewLine test1.txt
# Create another test file with the same content plus
# a trailing CRLF newline.
"line1`rrest of line1`r`nline2`r`n" | Set-Content -NoNewLine test2.txt
'test1.txt', 'test2.txt' | ForEach-Object {
"--- $_"
# Split by CRLF only and enclose the resulting lines in [...]
Get-Content $_ -Delimiter "`r`n" |
ForEach-Object { "[{0}]" -f ($_ -replace "`r", '`r') }
}
This yields:
--- test1.txt
[line1`rrest of line1]
[line2]
--- test2.txt
[line1`rrest of line1]
[line2]
As you can see, the two test files were processed identically, because the trailing CRLF newline was considered an optional terminator for the last line.

How to read a file line by line and create another new file with that content using powershell

I have a file 'abc.txt' that contains below lines.
c:myfilepath\filepath\filepath1\file1.csv
c:myfilepath\filepath\filepath1\file2.csv
c:myfilepath\filepath\filepath1\file2.csv
How to loop through the above file 'abc.txt' and read line by line and create another file called 'xyz.txt' that should contains like below. The file name in the path in 'xyz.txt' should be different, see below (ex. newfile_file1.txt)
c:mynewfile\newfilepath\newfilepath1\newfile_file1.txt (<-This is
corresponding to file1.csv)
c:mynewfile\newfilepath\newfilepath1\newfile_file2.txt
c:mynewfile\newfilepath\newfilepath1\newfile_file2.txt
I've tried using Get-Content to loop through the file but I just get nothing returned. I'm unclear as to where to put the syntax and how to completely construct it.
This should do it (edited to get file names and paths as requested, and dynamic so the paths in the abc-file are used).
$f = Get-Content C:\temp\abc.txt # this is the contents-file
foreach ($r in $f)
{
$r2 = (Split-Path $r).Replace("\", "\new") + '\newfile_' + [io.path]::GetFileNameWithoutExtension($r) + '.txt'
$r2 = $r2.replace(":\", ":\mynewfile\")
Get-Content $r | Out-File -filepath $r2
}
Assuming all of your file paths start with c:myfilepath\filepath\filepath1, then you can just replace the string then Out-File it.
$File1 = get-content E:\abc.txt
$File1 -replace ('c:myfilepath\\filepath\\filepath1\\', 'c:mynewfile\newfilepath\newfilepath1\newfile_') |
Out-File E:\xyz.txt
Note the double backslashes \\ which escape the regex.

Commas in Powershell hash table key

Take this .txt file:
11111-2222, My, file, is, here.xml
22222-1111, My, filename.xml
22222-2323, My filename 2.xml
22222-2323, Myfilename3.xml
This text file represents a map linking ID's with names of files on a filesystem directory (each row is an entry, seperated by the first comma following the ID number). I have a powershell script that at a high-level will import this .txt file as CSV and put it in a map to where I can match a filename with it's corresponding ID number, which I need to append to a PUT request to an endpoint.
My script is working great for lines 3 and 4 above in the .txt... except for the filenames that have commas. Since we are delimiting with commas, Powershell is cutting off those field names short, causing for incorrect field values.
Code snippet:
$mapFile = "location\of\mytextfilehere.txt"
$contentObjects = "location\of\filesIwantToPost"
$map = #{}
Import-Csv $mapFile -Header ID,Filename | ForEach-Object { $map[$_.Filename] = $_.ID }
foreach ($file in $contentObjects) {
$content = Get-Content $object.PSPath
$putURI = "http://myendpoint:3000/" + $map[$file.Name]
$request = Invoke-WebRequest -Uri $putURI -Method PUT -Body $content
}
The above breaks when trying to PUT the file "My, file, is, here.xml" and "My, filename.xml". The map building saves only "My, " as the key value, since we are comma delimited.
Is there a way to help me deal with these commas and save these fields completely and correctly? Perhaps by delimiting my .txt with pipes instead of commas? Or is there a different alternate approach?
Your file is not CSV. Wrap names with quotes or parse it manually like below:
#test data
new-item sample.txt -ItemType File -Value "11111-2222, My, file, is, here.xml
22222-1111, My, filename.xml
22222-2323, My filename 2.xml
22222-2323, Myfilename3.xml"
#parse
cat sample.txt | %{
$ID, $File = $_ -split ',',2;
[PSCustomObject]#{ ID=$ID; File=$File.Trim() }
}

How to read multiple data sets from one .csv file in powershell

I have a temp recorder that (daily) reads multiple sensors and saves the data to a single .csv with a whole bunch of header information before each set of date/time and temperature. the file looks something like this:
"readerinfo","onlylistedonce"
"downloadinfo",YYYY/MM/DD 00:00:00
"timezone",-8
"headerstuff","headersuff"
"sensor1","sensorstuff"
"serial#","0000001"
"about15lines","ofthisstuff"
"header1","header2"
datetime,temp
datetime,temp
datetime,temp
"sensor2","sensorstuff"
"serial#","0000002"
"about15lines","ofthisstuff"
"header1","header2"
datetime,temp
datetime,temp
datetime,temp
"downloadcomplete"
My aim is to pull out the date/time and temp data for each sensor and save it as a new file so that I can run some basic stats(hi/lo/avg temp)on it. (It would be beautiful if I could somehow identify which sensor the data came from based on a serial number listed in the header info, but that's less important than separating out the data into sets) The lengths of the date/time lists change from sensor to sensor based on how long they've been recording and the number of sensors changes daily also. Even if I could just split the sensor data, header info and all, into however many files there are sensors, that would be a good start.
This isn't exactly a CSV file in the traditional sense. I imagine you already know this, given your description of the file contents.
If the lines with datetime,temp truly do not have any double quotes in them, per your example data, then the following script should work. This script is self-containing, since it declares the example data in-line.
IMPORTANT: You will need to modify the line containing the declaration of the $SensorList variable. You will have to populate this variable with the names of the sensors, or you can parameterize the script to accept an array of sensor names.
UPDATE: I changed the script to be parameterized.
Results
The results of the script are as follows:
sensor1.csv (with corresponding data)
sensor2.csv (with corresponding data)
Some green text will be written to the PowerShell host, indicating which sensor is currently detected
Script
The contents of the script should appear as follows. Save the script file to a folder, such as c:\test\test.ps1, and then execute it.
# Declare text as a PowerShell here-string
$Text = #"
"readerinfo","onlylistedonce"
"downloadinfo",YYYY/MM/DD 00:00:00
"timezone",-8
"headerstuff","headersuff"
"sensor1","sensorstuff"
"serial#","0000001"
"about15lines","ofthisstuff"
"header1","header2"
datetime,tempfromsensor1
datetime,tempfromsensor1
datetime,tempfromsensor1
"sensor2","sensorstuff"
"serial#","0000002"
"about15lines","ofthisstuff"
"header1","header2"
datetime,tempfromsensor2
datetime,tempfromsensor2
datetime,tempfromsensor2
"downloadcomplete"
"#.Split("`n");
# Declare the list of sensor names
$SensorList = #('sensor1', 'sensor2');
$CurrentSensor = $null;
# WARNING: Clean up all CSV files in the same directory as the script
Remove-Item -Path $PSScriptRoot\*.csv;
# Iterate over each line in the text file
foreach ($Line in $Text) {
#region Line matches double quote
if ($Line -match '"') {
# Parse the property/value pairs (where double quotes are present)
if ($Line -match '"(.*?)",("(?<value>.*)"|(?<value>.*))') {
$Entry = [PSCustomObject]#{
Property = $matches[1];
Value = $matches['value'];
};
if ($matches[1] -in $SensorList) {
$CurrentSensor = $matches[1];
Write-Host -ForegroundColor Green -Object ('Current sensor is: {0}' -f $CurrentSensor);
}
}
}
#endregion Line matches double quote
#region Line does not match double quote
else {
# Parse the datetime/temp pairs
if ($Line -match '(.*?),(.*)') {
$Entry = [PSCustomObject]#{
DateTime = $matches[1];
Temp = $matches[2];
};
# Write the sensor's datetime/temp to its file
Add-Content -Path ('{0}\{1}.csv' -f $PSScriptRoot, $CurrentSensor) -Value $Line;
}
}
#endregion Line does not match double quote
}
Using the data sample you provided, the output of this script would as follows:
C:\sensoroutput_20140204.csv
sensor1,datetime,temp
sensor1,datetime,temp
sensor1,datetime,temp
sensor2,datetime,temp
sensor2,datetime,temp
sensor2,datetime,temp
I believe this is what you are looking for. The assumption here is the new line characters. The get-content line is reading the data and breaking it into "sets" by using 2 new line characters as the delimiter to split on. I chose to use the environment's (Windows) new line character. Your source file may have different new line characters. You can use Notepad++ to see which characters they are e.g. \r\n, \n, etc.
$newline = [Environment]::NewLine
$srcfile = "C:\sensordata.log"
$dstpath = 'C:\sensoroutput_{0}.csv' -f (get-date -f 'yyyyMMdd')
# Reads file as a single string with out-string
# then splits with a delimiter of two new line chars
$datasets = get-content $srcfile -delimiter ($newline * 2)
foreach ($ds in $datasets) {
$lines = ($ds -split $newline) # Split dataset into lines
$setname = $lines[0] -replace '\"(\w+).*', '$1' # Get the set or sensor name
$lines | % {
if ($_ -and $_ -notmatch '"') { # No empty lines and no lines with quotes
$data = ($setname, ',', $_ -join '') # Concats set name, datetime, and temp
Out-File -filepath $dstpath -inputObject $data -encoding 'ascii' -append
}
}
}