Parsing and splitting files based on the string - powershell

I have a very large file (hence .ReadLines) which I need to efficiently and quickly parse and split into other files. For each line which contains a keyword I need to copy that line and append to a specific file. This is what I have so far, the script runs but the files aren't getting populated.
$filename = "C:\dev\powershell\test1.csv"
foreach ($line in [System.IO.File]::ReadLines($filename)) {
if ($line | %{$_ -match "Apple"}){Out-File -Append Apples.txt}
elseif($line | %{$_ -match "Banana"}){Out-File -Append Bananas.txt}
elseif($line | %{$_ -match "Pear"}){Out-File -Append Pears.txt}
}
Example content of the csv file:
Apple,Test1,Cross1
Apple,Test2,Cross2
Apple,Test3,Cross3
Banana,Test4,Cross4
Pear,Test5,Cross5
I want Apples.txt to contain:
Apple,Test1,Cross1
Apple,Test2,Cross2
Apple,Test3,Cross3

Couple of things:
Your if conditions don't need %/foreach-object - -match will do on its own:
foreach ($line in [System.IO.File]::ReadLines($filename)) {
if($line -match "Apple"){
# output to apple.txt
}
else($line -match "Banana"){
# output to banana.txt
}
# etc...
}
The files aren't getting populated because you're not actually sending any output to Out-File:
foreach ($line in [System.IO.File]::ReadLines($filename)) {
if($line -match "Apple"){
# send $line to the file
$line |Out-File apple.txt -Append
}
# etc...
}
If your files are really massive and you expect a lot of matching lines, I'd recommend using a StreamWriter for the output files - otherwise Out-File will be opening and closing the file all the time:
$OutFiles = #{
'apple' = New-Object System.IO.StreamWriter $PWD\apples.txt
'banana' = New-Object System.IO.StreamWriter $PWD\bananas.txt
'pear' = New-Object System.IO.StreamWriter $PWD\pears.txt
}
foreach ($line in [System.IO.File]::ReadLines($filename)) {
foreach($keyword in $OutFiles.Keys){
if($line -match $keyword){
$OutFiles[$keyword].WriteLine($line)
continue
}
}
}
foreach($Writer in $OutFiles.Values){
try{
$Writer.Close()
}
finally{
$Writer.Dispose()
}
}
This way you also only have to maintain the $OutFiles hashtable if you need to update the keywords for example.

Related

Stuck with this PS Script

I have a text file that contains millions of records
I want to find out from each line that does not start with string + that line number (String starts with double quote 01/01/2019)
Can you help me modify this code?
Get-Content "(path).txt" | Foreach { if ($_.Split(',')[-1] -inotmatch "^01/01/2019") { $_; } }
Thanks
Based on your comments the content will look something like the array.
So you want to read the content, filter it, and get the resulting line from that content:
# Get the content
# $content = Get-Content -Path 'pathtofile.txt'
$content = #('field1,field2,field3', '01/01/2019,b,c')
# Convert from csv
$csvContent = $content | ConvertFrom-Csv
# Add your filter based on the field
$results = $csvContent | Where-Object { $_.field1 -notmatch '01/01/2019'} | % { $_ }
# Convert your results back to csv if needed
$results | ConvertTo-Csv
If performance is an issue then .net would handle millions of records with CsvHelper just like PowerBi.
# install CsvHelper
nuget install CsvHelper
# import csvhelper
import-module CsvHelper.2.16.3.0\lib\net45\CsvHelper.dll
# write the content to the file just for this example
#('field1,field2,field3', '01/01/2019,b,c') | sc -path "c:\temp\text.csv"
$results = #()
# open the file for reading
try {
$stream = [System.IO.File]::OpenRead("c:\temp\text.csv")
$sr = [System.IO.StreamReader]::new($stream)
$csv = [CsvHelper.CsvReader]::new($sr)
# read in the records
while($csv.Read()){
# add in the result
$result= #{}
[string] $value = "";
for($i = 0; $csv.TryGetField($i, [ref] $value ); $i++) {
$result.Add($i, $value);
}
# add your filter here for the results
$results.Add($result)
}
# dispose of everything once we are done
}finally {
$stream.Dispose();
$sr.Dispose();
$csv.Dispose();
}
My .txt file looks like this...
date,col2,col3
"01/01/2019 22:42:00", "column2", "column3"
"01/02/2019 22:42:00", "column2", "column3"
"01/01/2019 22:42:00", "column2", "column3"
"02/01/2019 22:42:00", "column2", "column3"
This command does exactly what you are asking...
Get-Content -Path C:\myFile.txt | ? {$_ -notmatch "01/01/2019"} | Select -Skip 1
The output is:
"01/02/2019 22:42:00", "column2", "column3"
"02/01/2019 22:42:00", "column2", "column3"
I skipped the top row. If you want to deal with particular columns, change myFile.txt to a .csv and import it.
Looking at the question and comments, you are dealing with a headerless CSV file it seems. Because the file contains millions of records, I think using Get-Content or Import-Csv could slow down too much. Using [System.IO.File]::ReadLines() would then be faster.
If indeed each line starts with a quoted date, you could use various methods of figuring out if the line start with "01/01/2019 or not. Here, I use the -notlike operator:
$fileIn = "D:\your_text_file_which_is_in_fact_a_CSV_file.txt"
$fileOut = "D:\your_text_file_which_is_in_fact_a_CSV_file_FILTERED.txt"
foreach ($line in [System.IO.File]::ReadLines($fileIn)) {
if ($line -notlike '"01/01/2019*') {
# write to a NEW file
Add-Content -Path $fileOut -Value $line
}
}
Update
Judging from your comment, you are apparently using an older .NET framework, as the [System.IO.File]::ReadLines() became available as of version 4.0.
In that case, the below code should work for you:
$fileIn = "D:\your_text_file_which_is_in_fact_a_CSV_file.txt"
$fileOut = "D:\your_text_file_which_is_in_fact_a_CSV_file_FILTERED.txt"
$reader = New-Object System.IO.StreamReader($fileIn)
$writer = New-Object System.IO.StreamWriter($fileOut)
while (($line = $reader.ReadLine()) -ne $null) {
if ($line -notlike '"01/01/2019*') {
# write to a NEW file
$writer.WriteLine($line)
}
}
$reader.Dispose()
$writer.Dispose()

How to make changes to file content and save it to another file using powershell?

I want to do this
read the file
go through each line
if the line matches the pattern, do some changes with that line
save the content to another file
For now I use this script:
$file = [System.IO.File]::ReadLines("C:\path\to\some\file1.txt")
$output = "C:\path\to\some\file2.txt"
ForEach ($line in $file) {
if($line -match 'some_regex_expression') {
$line = $line.replace("some","great")
}
Out-File -append -filepath $output -inputobject $line
}
As you can see, here I write line by line. Is it possible to write the whole file at once ?
Good example is provided here :
(Get-Content c:\temp\test.txt) -replace '\[MYID\]', 'MyValue' | Set-Content c:\temp\test.txt
But my problem is that I have additional IF statement...
So, what could I do to improve my script ?
You could do it like that:
Get-Content -Path "C:\path\to\some\file1.txt" | foreach {
if($_ -match 'some_regex_expression') {
$_.replace("some","great")
}
else {
$_
}
} | Out-File -filepath "C:\path\to\some\file2.txt"
Get-Content reads a file line by line (array of strings) by default so you can just pipe it into a foreach loop, process each line within the loop and pipe the whole output into your file2.txt.
In this case Arrays or Array List(lists are better for large arrays) would be the most elegant solution. Simply add strings in array until ForEach loop ends. After that just flush array to a file.
This is Array List example
$file = [System.IO.File]::ReadLines("C:\path\to\some\file1.txt")
$output = "C:\path\to\some\file2.txt"
$outputData = New-Object System.Collections.ArrayList
ForEach ($line in $file) {
if($line -match 'some_regex_expression') {
$line = $line.replace("some","great")
}
$outputData.Add($line)
}
$outputData |Out-File $output
I think the if statement can be avoided in a lot of cases by using regular expression groups (e.g. (.*) and placeholders (e.g. $1, $2 etc.).
As in your example:
(Get-Content .\File1.txt) -Replace 'some(_regex_expression)', 'great$1' | Set-Content .\File2.txt
And for the good example" where [MYID\] might be somewhere inline:
(Get-Content c:\temp\test.txt) -Replace '^(.*)\[MYID\](.*)$', '$1MyValue$2' | Set-Content c:\temp\test.txt
(see also How to replace first and last part of each line with powershell)

Use .NET for fast read/write of large files

i am trying to search through a number of large files and replace parts of the text, but i keep running into errors.
i tried this, but sometimes i'll get an 'out of memory' error in powershell
#region The Setup
$file = "C:\temp\168MBfile.txt"
$hash = #{
ham = 'bacon'
toast = 'pancakes'
}
#endregion The Setup
$obj = [System.IO.StreamReader]$file
$contents = $obj.ReadToEnd()
$obj.Close()
foreach ($key in $hash.Keys) {
$contents = $contents -replace [regex]::Escape($key), $hash[$key]
}
try {
$obj = [System.IO.StreamWriter]$file
$obj.Write($contents)
} finally {
if ($obj -ne $null) {
$obj.Close()
}
}
then i tried this (in the ISE), but it crashes with a popup message (sorry, don't have the error on hand) and tries to restart the ISE
$arraylist = New-Object System.Collections.ArrayList
$obj = [System.IO.StreamReader]$file
while (!$obj.EndOfStream) {
$line = $obj.ReadLine()
foreach ($key in $hash.Keys) {
$line = $line -replace [regex]::Escape($key), $hash[$key]
}
[void]$arraylist.Add($line)
}
$obj.Close()
$arraylist
and finally, i came across something like this, but i'm not sure how to use it properly, and i am not even sure if i am going about this the right way.
$sourcestream = [System.IO.File]::Open($file)
$newstream = [System.IO.File]::Create($file)
$sourcestream.Stream.CopyTo($newstream)
$sourcestream.Close()
any advice would be greatly appreciated.
You can start with a readcount of 1000 and tweak it based on the performance you get:
get-content textfile -Readcount 1000 |
foreach-object {do something} |
set-content textfile
or
(get-content textfile -Readcount 1000) -replace 'something','withsomething' |
set-content textfile

Powershell Host File edit

Guys i'm having some issues converting my Perl script to powershell, I need some help. In the host file of our machines, we have all of the URL's to our test environments blocked. In my PERL script, based on which environment is selected, it will comment out the line of the environment selected to allow access and block others so the testers can't mistakenly do things in the wrong environment.
I need help converting to powershell
Below is what I have in PERL:
sub editHosts {
print "Editing hosts file...\n";
my $file = 'C:\\Windows\\System32\\Drivers\\etc\\hosts';
my $data = readFile($file);
my #lines = split /\n/, $data;
my $row = '1';
open (FILE, ">$file") or die "Cannot open $file\n";
foreach my $line (#lines) {
if ($line =~ m/$web/) {
print FILE '#'."$line\n"; }
else {
if ($row > '21') {
$line =~ s/^\#*127\.0\.0\.1/127\.0\.0\.1/;
$line =~ s/[#;].*$//s; }
print FILE "$line\n"; }
$row++;
}
close(FILE);
}
Here is what i've tried in Powershell:
foreach ($line in get-content "C:\windows\system32\drivers\etc\hosts") {
if ($line -contains $web) {
$line + "#"
}
I've tried variation including set-content with what used to be in the host file, etc.
Any help would be appreciated!
Thanks,
Grant
-contains is a "set" operator, not a substring operator. Try .Contains() or -like.
This will comment out lines matching the variable $word, while removing # from non-matches (except the header):
function Edit-Hosts ([string]$Web, $File = "C:\windows\system32\drivers\etc\hosts") {
#If file exists and $web is not empty/whitespace
if((Test-Path -Path $file -PathType Leaf) -and $web.Trim()) {
$row = 1
(Get-Content -Path $file) | ForEach-Object {
if($_ -like "*$web*") {
#Matched PROD, comment out line
"#$($_)"
} else {
#No match. If past header = remove comment
if($row -gt 21) { $_ -replace '^#' } else { $_ }
}
$row++
} | Set-Content -Path $file
} else {
Write-Error -Category InvalidArgument -Message "'$file' doesn't exist or Web-parameter is empty"
}
}
Usage:
Edit-Hosts -Web "PROD"
This is a similar answer to Frode F.'s answer, but I'm not yet able to comment to add my 2c worth, so have to provide an alternative answer instead.
It looks like one of the gotchas moving from perl to PowerShell, in this example, is that when we get the content of the file using Get-Content it is an "offline" copy, i.e. any edits are not made directly to the file itself. One approach is to compile the new content to the whole file and then write that back to disk.
I suppose that the print FILE "some text\n"; construct in perl might be similar to "some text" | Out-File $filename -Encoding ascii -Append in PowerShell, albeit you would use the latter either (1) to write line-by-line to a new/empty file or (2) accept that you are appending to existing content.
Two other things about editing the hosts file:
Be sure to make sure that your hosts file is ASCII encoded; I have caused a major outage for a key enterprise application (50k+ users) in learning that...
You may need to remember to run your PowerShell / PowerShell ISE by right-clicking and choosing Run as Administrator else you might not be able to modify the file.
Anyway, here's a version of the previous answer using Out-File:
$FileName = "C:\windows\system32\drivers\etc\hosts"
$web = "PROD"
# Get "offline" copy of file contents
$FileContent = Get-Content $FileName
# The following creates an empty file and returns a file
# object (type [System.IO.FileInfo])
$EmptyFile = New-Item -Path $FileName -ItemType File -Force
foreach($Line in $FileContent) {
if($Line -match "$web") {
"# $Line" | Out-File $EmptyFile -Append -Encoding ascii
} else {
"$Line" | Out-File $EmptyFile -Append -Encoding ascii
}
}
Edit
The ($Line -match "$web") takes whatever is in the $web variable and treats it as a regular expression. In my example I was assuming that you were just wanting to match a simple text string, but you might well be trying to match an IP address, etc. You have a couple of options:
Use ($Line -like "*$web*") instead.
Convert what is in $web to be an escaped regex, i.e. one that will match literally. Do this with ($Line -match [Regex]::Escape($web)).
You also wanted to strip off comments from any line past row 21 of the hosts file, should that line not match $web. In perl you have used the s substitution operator; the PowerShell equivalent is -replace.
So... here is an updated version of that foreach loop:
$LineCount = 1
foreach($Line in $FileContent) {
if($Line -match [Regex]::Escape($web) {
# ADD comment to any matched line
$Line = "#" + $Line
} elseif($LineCount -gt 21) {
# Uncomment the other lines
$Line = $Line -replace '^[# ]+',''
}
# Remove 'stacked up' comment characters, if any
$Line = $Line -replace '[#]+','#'
$Line | Out-File $EmptyFile -Append -Encoding ascii
$LineCount++
}
More Information
Are there good references for moving from Perl to Powershell?
How to use operator '-replace' in PowerShell to replace strings of texts with special characters and replace successfully
about_Comparison_Operators
http://www.comp.leeds.ac.uk/Perl/sandtr.html
If you wanted to verify what was in there and then add entries, you could use the below which is designed to be ran interactively and returns any existing entries you specify in the varibles:
Note: the `t is powershell's in script method for 'Tab' command.
$hostscontent
# Script to Verify and Add Host File Entries
$hostfile = gc 'C:\Windows\System32\drivers\etc\hosts'
$hostscontent1 = $hostfile | select-string "autodiscover.XXX.co.uk"
$hostscontent2 = $hostfile | select-string "webmail.XXX.co.uk"
$1 = "XX.XX.XXX.XX`tautodiscover.XXX.co.uk"
$2 = "webmail.XXX.co.uk"
# Replace this machines path with a path to your list of machines e.g. $machines = gc \\machine\machines.txt
$machines = gc 'c:\mytestmachine.txt'
ForEach ($machine in $machines) {
If ($hostscontent1 -ne $null) {
Start-Sleep -Seconds 1
Write-Host "$machine Already has Entry $1" -ForegroundColor Green
} Else {
Write-Host "Adding Entry $1 for $machine" -ForegroundColor Green
Start-Sleep -Seconds 1
Add-Content -Path C:\Windows\System32\drivers\etc\hosts -Value "XX.XX.XXX.XX`tautodiscover.XXX.co.uk" -Force
}
If ($hostscontent2 -ne $null) {
Start-Sleep -Seconds 1
Write-Host "$machine Already has Entry $2" -ForegroundColor Green
} Else {
Write-Host "Adding Entry $2 for $machine" -ForegroundColor Green
Start-Sleep -Seconds 1
Add-Content -Path C:\Windows\System32\drivers\etc\hosts -Value "XX.XX.XXX.XX`twebmail.XXX.co.uk" -Force
}
}

How to dump the foreach loop output into a file in PowerShell?

I have wrote the following script to read the CSV file to perform the custom format of output.
Script is below:
$Content = Import-Csv Alert.csv
foreach ($Data in $Content) {
$First = $Data.DisplayName
$Second = $Data.ComputerName
$Third = $Data.Description
$Four = $Data.Name
$Five = $Data.ModifiedBy
$Six = $Data.State
$Seven = $Data.Sev
$Eight = $Data.Id
$Nine = $Data.Time
Write-Host "START;"
Write-Host "my_object="`'$First`'`;
Write-Host "my_host="`'$Second`'`;
Write-Host "my_long_msg="`'$Third`'`;
Write-Host "my_tool_id="`'$Four`'`;
Write-Host "my_owner="`'$Five`'`;
Write-Host "my_parameter="`'$Four`'`;
Write-Host "my_parameter_value="`'$Six`'`;
Write-Host "my_tool_sev="`'$Seven`'`;
Write-Host "my_tool_key="`'$Eight`'`;
Write-Host "msg="`'$Four`'`;
Write-Host "END"
}
The above script executing without any error.
Tried with Out-File and redirection operator in PowerShell to dump the output into a file, but I'm not finding any solution.
Write-Host writes to the console. That output cannot be redirected unless you run the code in another process. Either remove Write-Host entirely or replace it with Write-Output, so that the messages are written to the Success output stream.
Using a foreach loop also requires additional measures, because that loop type doesn't support pipelining. Either run it in a subexpression:
(foreach ($Data in $Content) { ... }) | Out-File ...
or assign its output to a variable:
$output = foreach ($Data in $Content) { ... }
$output | Out-File ...
Another option would be replacing the foreach loop with a ForEach-Object loop, which supports pipelining:
$Content | ForEach-Object {
$First = $_.DisplayName
$Second = $_.ComputerName
...
} | Out-File ...
Don't use Out-File inside the loop, because repeatedly opening the file will perform poorly.