I have a large csv file (1.6gb). how can I delete a specific line e.g. line 1005?
Note: The solutions below eliminate a single line from any text-based file by line number. As marsze points out, additional considerations may apply to CSV files, where care must be taken not to eliminate the header row, and rows may span multiple lines if they have values with embedded newlines; use of a CSV parser is a better choice in that case.
If performance isn't paramount, here's a memory-friendly pipeline-based way to do it:
Get-Content file.txt |
Where-Object ReadCount -ne 1005 |
Set-Content -Encoding Utf8 new-file.txt
Get-Content adds a (somewhat obscurely named) .ReadCount property to each line it outputs, which contains the 1-based line number.
Note that the input file's character encoding isn't preserved by Get-Content, so you should control Set-Content'st output encoding explicitly, as shown above, using UTF-8 as an example.
Without reading the whole file into memory as a whole, you must output to a new file, at least temporarily; you can replace the original file with the temporary output file with
Move-Item -Force new-file.txt file.txt
A faster, but memory-intensive alternative based on direct use of the .NET framework, which also allows you to update the file in place:
$file = 'file.txt'
$lines = [IO.File]::ReadAllLines("$PWD/$file")
Set-Content -Encoding UTF8 $file -Value $lines[0..1003 + 1005..($lines.Count-1)]
Note the need to use "$PWD/$file", i.e., to explicitly prepend the current directory path to the relative path stored in $file, because the .NET framework's idea of what the current directory is differs from PowerShell's.
While $lines = Get-Content $file would be functionally equivalent to $lines = [IO.File]::ReadAllLines("$PWD/$file"), it would perform noticeably poorer.
0..1003 creates an array of indices from 0 to 1003; + concatenates that array with indices 1005 through the rest of the input array; note that array indices are 0-based, whereas line numbers are 1-based.
Also note how the resulting array is passed to Set-Content as a direct argument via -Value, which is faster than passing it via the pipeline (... | Set-Content ...), where element-by-element processing would be performed.
Finally, a memory-friendly method that is faster than the pipeline-based method:
$file = 'file.txt'
$outFile = [IO.File]::CreateText("$PWD/new-file.txt")
$lineNo = 0
try {
foreach ($line in [IO.File]::ReadLines("$PWD/$file")) {
if (++$lineNo -eq 1005) { continue }
$outFile.WriteLine($line)
}
} finally {
$outFile.Dispose()
}
Note the use of "$PWD/..." in the .NET API calls, which ensures that a full path is passed, which is necessary, because .NET's working directory usually differs from PowerShell's.
As with the pipeline-based command, you may have to replace the original file with the new file afterwards.
Related
I have one text file with Script that I want to split into two
Below is the dummy script
--serverone
this is first part of my script
--servertwo
this is second part of my script
I want to create two text files that would look like
file1
--serverone
this is first part of my script
file2
--servertwo
this is second part of my script
So far, I have added a special character within the script that I know don't exist ("}")
$script = get-content -Path "C:\Users\shamvil\Desktop\test.txt"
$newscript = $script.Replace("--servertwo","}--servertwo")
$newscript.split("}")
but I don't know how to save the split into two separate places.
This might not be a best approach, so I am also open to different solution as well.
Please help, thanks!
Use a regex-based -split operation:
$i = 0
(Get-Content -Raw test.txt) -split '(?m)^(?=--)' -ne '' |
ForEach-Object { $fileName = 'file' + (++$i); Set-Content $fileName $_ }
This assumes that each block of lines that starts with a line that starts with -- is to be saved to a separate file.
Get-Content -Raw reads the entire file into a single, multi-line string.
As for the separator regex passed to -split:
The (?m) inline regex option makes anchors ^ and $ match on each line
^(?=--) therefore matches every line that starts with --, using a by definition non-capturing look-ahead assertion ((?=...)) to ensure that the -- isn't removed from the resulting blocks (by default, what matches the separator regex is not included).
-ne '' filters out the extra empty element that results from the separator expression matching at the very start of the string.
Note that Set-Content knows nothing about the character encoding of the input file and uses its default encoding; use -Encoding as needed.
zett42 points out that the file-writing part can be streamlined with the help of a delay-bind script-block parameter:
$i = 0
(Get-Content -Raw test.txt) -split '(?m)^(?=--)' -ne '' |
Set-Content -LiteralPath { (Get-Variable i -Scope 1).Value++; "file$i" }
The Get-Variable call to access and increment the $i variable in the parent scope is necessary, because delay-bind script blocks (as well as script blocks for calculated properties) run in a child scope - perhaps surprisingly, as discusssed in GitHub issue #7157
A shorter - but even more obscure - option is to use ([ref] $i).Value++ instead; see this answer for details.
zett42 also points to a proposed future enhancement that would obviate the need to maintain the sequence numbers manually, via the introduction of an automatic $PSIndex variable that reflects the sequence number of the current pipeline object: see GitHub issue #13772.
I am new to powershell scripting and I am looking for a way to add 2 new rows at the top of the already present csv file.
Things that I have tried is replacing the header and rows with the new rows.
I am looking for a way to add 2 new rows above the header in CSV.
You mention that you want to add the new lines above the header, which means that no CSV-specific processing is needed - it sounds like you're asking how to prepend lines to an existing text file (which happens to contain CSV - note that the resulting file will no longer be a valid CSV file).
E.g., assuming a target file named some.csv:
Note: Best to make a backup of the target file before trying these commands.
If the input file is small enough to fit into memory as a whole:
Reading the entire target file into memory as a single string with Get-Content -Raw allows for a convenient and concise solution:
Set-Content -LiteralPath some.csv -NoNewLine -Value (
#'
New line 1 above header
New line 2 above header
'# + (Get-Content -Raw some.csv)
)
Note that Set-Content applies a default character encoding (the active ANSI code page in Windows PowerShell, UTF-8 without BOM in PowerShell Core), irrespective of the current encoding of some.csv, so you may have to use the -Encoding parameter to specify the encoding explicitly.
Also note that the single-quoted here-string (#'<newline>...<newline>'#) uses the same newline style (CRLF (Windows-style) vs. LF (Unix-style)) as the enclosing script, which may not match the style used in some.csv - though PowerShell itself has no problem processing files with mixed newlines styles.
If the file is too large to fit into memory, use a streaming (line-by-line) approach:
$ErrorActionPreference = 'Stop'
# Create a temporary file and fill it with the 2 new lines.
$tempFile = [IO.Path]::GetTempFileName()
'New line 1 above header', 'New line 2 above header' | Set-Content $tempFile
# Then append the CSV file's lines one by one.
Get-Content some.csv | Add-Content $tempFile
# If that succeeded, replace the original file.
Move-Item -Force $tempFile some.csv
Note: Use of the Get-Content, Set-Content and Add-Content cmdlets is convenient, but slow; the next section shows a faster alternative.
If performance matters, use .NET types such as [IO.File] instead:
$ErrorActionPreference = 'Stop'
# Create a temporary file...
$tempFile = [IO.Path]::GetTempFileName()
# ... and fill it with the 2 new lines.
$streamWriter = [IO.File]::CreateText($tempFile)
foreach ($lineToPrepend in 'New line 1 above header', 'New line 2 above header') {
$streamWriter.WriteLine($lineToPrepend)
}
# Then append the CSV file's lines one by one.
foreach ($csvLine in [IO.File]::ReadLines((Convert-Path some.csv))) {
$streamWriter.WriteLine($csvLine)
}
$streamWriter.Dispose()
# If that succeeded, replace the original file.
Move-Item -Force $tempFile some.csv
I am trying to extract each line starting with "%%" in all files in a folder and then copy those lines to a separate text file. Currently using this code in PowerShell code, but I am not getting any results.
$files = Get-ChildItem "folder" -Filter *.txt
foreach ($file in $files)
{
if ($_ -like "*%%*")
{
Set-Content "Output.txt"
}
}
I think that mklement0's suggestion to use Select-String is the way to go. Adding to his answer, you can pipe the output of Get-ChildItem into the Select-String so that the entire process becomes a Powershell one liner.
Something like this:
Get-ChildItem "folder" -Filter *.txt | Select-String -Pattern '^%%' | Select -ExpandProperty line | Set-Content "Output.txt"
The Select-String cmdlet offers a much simpler solution (PSv3+ syntax):
(Select-String -Path folder\*.txt -Pattern '^%%').Line | Set-Content Output.txt
Select-String accepts a filename/path pattern via its -Path parameter, so, in this simple case, there is no need for Get-ChildItem.
If, by contrast, you input file selection is recursive or uses more complex criteria, you can pipe Get-ChildItem's output to Select-String, as demonstrated in Dave Sexton's helpful answer.
Note that, according to the docs, Select-String by default assumes that the input files are UTF-8-encoded, but you can change that with the -Encoding parameter; also consider the output encoding discussed below.
Select-String's -Pattern parameter expects a regular expression rather than a wildcard expression.
^%% only matches literal %% at the start (^) of a line.
Select-String outputs [Microsoft.PowerShell.Commands.MatchInfo] objects that contain information about each match; each object's .Line property contains the full text of an input line that matched.
Set-Content Output.txt sends all matching lines to single output file Output.txt
Set-Content uses the system's legacy Windows codepage (an 8-bit single-byte encoding - even though the documentation mistakenly claims that ASCII files are produced).
If you want to control the output encoding explicitly, use the -Encoding parameter; e.g., ... | Set-Content Output.txt -Encoding Utf8.
By contrast, >, the output redirection operator always creates UTF-16LE files (an encoding PowerShell calls Unicode), as does Out-File by default (which can be changed with -Encoding).
Also note that > / Out-File apply PowerShell's default formatting to the input objects to obtain the string representation to write to the output file, whereas Set-Content treats the input as strings (calls .ToString() on input objects, if necessary). In the case at hand, since all input objects are already strings, there is no difference (except for the character encoding, potentially).
As for what you've tried:
$_ inside your foreach ($file in $files) refers to a file (a [System.IO.FileInfo] object), so you're effectively evaluating your wildcard expression *%%* against the input file's name rather than its contents.
Aside from that, wildcard pattern *%%* will match %% anywhere in the input string, not just at its start (you'd have to use %%* instead).
The Set-Content "Output.txt" call is missing input, because it is not part of a pipeline and, in the absence of pipeline input, no -Value argument was passed.
Even if you did provide input, however, output file Output.txt would get rewritten as a whole in each iteration of your foreach loop.
First you have to use
Get-Content
in order to get the content of the file. Then you do the string match and based on that you again set the content back to the file. Use get-content and put another loop inside the foreach to iterate all the lines in the file.
I hope this logic helps you
ls *.txt | %{
$f = $_
gc $f.fullname | {
if($_.StartWith("%%") -eq 1){
$_ >> Output.txt
}#end if
}#end gc
}#end ls
Alias
ls - Get-ChildItem
gc - Get-Content
% - ForEach
$_ - Iterator variable for loop
>> - Redirection construct
# - Comment
http://ss64.com/ps/
I have a directory with about 10'000 text files of varying lengths. All over 1GB in size.
I need to extract the first line of each file and insert it into a new text file in the same directory.
I've tried the usual MS-DOS batch file method, and it crashes due to the files being too large.
Is there a way of doing this in Powershell using Streamreader?
EDIT: Of course there in an inbuilt way:
$firstLine = Get-Content -Path $fileName -TotalCount 1
[Ack Raf's comment]
Original:
I would suggest looking at File.ReadLines: this method reads the contents of the file lazily – only reading content with each iteration over the returned enumerator.
I'm not sure if Select-Object -first 1 will pro-actively halt the pipeline after one line, if it does then that is the easiest way to get the first line:
$firstLine = [IO.File]::ReadLines($filename, [text.encoding]::UTF8) | Select-Object -first 1
Otherwise something like:
$lines = [IO.File]::ReadLines($filename, [text.encoding]::UTF8); # adjust to correct encoding
$lineEnum = $lines.GetEncumerator();
if ($lineEnum.MoveNext()) {
$firstLine = $lineEnum.Current;
} else {
# No lines in file
}
NB. this assumes at least PowerShell V3 to use .NET V4.
In order to read only one line, you can also use :
$file = new-object System.IO.StreamReader($filename)
$file.ReadLine()
$file.close()
Using OutVariable you can write it in one line :
$text = (new-object System.IO.StreamReader($filename) -OutVariable $file).ReadLine();$file.Close()
Short and sweet:
cd c:\path\to\my\text\files\
Get-Content *.txt -First 1 > output.txt
Edit Nov 2018: According to the docs, "The TotalCount parameter limits the retrieval to the first n lines." This appears to minimize resource usage. Test it yourself and add your comments.
cd c:\path\to\my\text\files\
Get-Content *.txt -TotalCount 1 > output.txt
I am writing a script which will basically do the following:
Read from a text file some arguments:
DriveLetter ThreeLetterCode ServerName VolumeLetter Integer
Eg. W MSS SERVER01 C 1
These values happen to form a folder destination W:\MSS\, and a filename which works in the following naming convention:
SERVERNAME_VOLUMELETTER_VOL-b00X-iYYY.spi - Where The X is the Integer above
The value Y I need to work out later, as this happens to be the value of the incremental image (backups) and I need to work out the latest incremental.
So at the moment --> Count lines in file, and loop for this many lines.
$lines = Get-Content -Path PostBackupCheck-Textfile.txt | Measure-Object -Line
for ($i=0; $i -le $lines.Lines; $i++)
Within this loop I need to do a Get-Content to read off the line I am currently looking at i.e. line 0, line 1, line 2, as there will be multiple lines in the format I wrote at the beginning and split the line into an array, whereby each part of the file, as seen above naming convention, is in a[0], a[1], a[2]. etc
The reason for this is because, I need to then sort the folder that contains these, find the latest file, by date, and take the _iXXX.spi part and place this into the array value a[X] so I then have a complete filename to mount. This value will replace iYYY.spi
It's a little complex because I also have to make sure when I do a Get-ChildItem with -Include before I sort it all by date, I am only including the filename that matches the arguments fed to it from the text file :
So,
SERVER01_C_VOL-b001-iYYY.spi and not anything else.
i.e. not SERVER01_D_VOL-b001-iYYY.spi
Then take the iYYY value from the sort on the Get-ChildItem -Include and place that into the appropriate array item.
I've literally no idea where to start, so any ideas are appreciated!
Hopefully I've explained in enough detail. I have also placed the code on Pastebin: http://pastebin.com/vtFifTW6
This doesn't need to be that complex. You can start by operating over lines in your file with a simple pipeline:
Get-Content PostBackupCheck-Textfile.txt |
Foreach-Object {
$drive, $folder, $server, $volume, [int]$i = -split $_
...
}
The line inside the loop splits the current input line at spaces and assigns appropriate variables. This saves you the trouble of handling an array there. Everything that follows needs to be in said loop as well.
You can then construct the file name pattern:
$filename = "$server_$drive_VOL-b$($i.ToString('000'))-i*.spi"
which you can use to find all fitting files and sort them by date:
$lastFile = Get-ChildItem $filename | sort LastWriteTime | select -last 1