How to split a file of 5000 lines in several files of 200 lines each with Perl? - perl

I've a file of 5000 lines and I want several files of 200 lines each, and I tried this:
#!/usr/bin/perl
use strict;
use warnings;
my $targetfile = '200_lines.txt';
my $filename = '5000_lines.txt';
open ( my $DATA, '<', $filename ) or die "Could not open file '$filename': $!";
while ( my $line = <$DATA> ) {
my $counter++;
open (my $FILE, '>>', $targetfile ) or die "Could not open file '$targetfile': $!";
print $FILE $line;
close $FILE;
if ( $counter % 200 == 0
if ($. % 200 == 0) {
$targetfile =~ s/200/$counter/;
}
}
My $counter variable still at 1. I don't know why. So I got just one file named 200_lines.txt with 5000 lines.

You can use sprintf to generate new filenames after every 200 lines and use $. to keep track of line numbers in the file.
The below script will generate smaller chunks of files from the larger file with 200 lines each.
#!/usr/bin/perl
use strict;
use warnings;
open my $rfh,'<','file_5000' or die "unable to open file : $! \n";
my $filecount=0;
my $wfh;
while(<$rfh>){
if(($.-1) % 200 == 0){
close($wfh) if($wfh);
open($wfh, '>', sprintf("file%02d", ++$filecount)) or die $!;
}
print $wfh "$_";
}
close($rfh);

$counter is scoped within your while block. So it is reset each iteration.
You don't really need to do it like that, as you can test $. for the current line number
don't call your filehandle $DATA. There's a special filehandle called <DATA>.
How about:
#!/usr/bin/perl
use strict;
use warnings;
open( my $input, '<', '5000_lines.txt' ) or die $!;
open( my $output, '>', '200_lines.txt' ) or die $!;
while ( my $line = <$input> ) {
unless ( $. % 200 ) {
close($output);
open( $output, '>', int( $. / 200 + 2) . "00_lines.txt" ) or die $!;
}
print {$output} $line;
}
close($input);
close($output);
This creates files:
200_lines.txt
400_lines.txt
600_lines.txt
etc.

You have a bunch of errors in your code.
#!/usr/bin/perl
use strict;
use warnings;
# Creating a file with 5000 lines
my $filename = '5000_lines.txt';
open ( $DATA, '>', $filename ) or die "Could not open file '$filename': $!";
for (my $i=0;$i<5000;$i++){
print $DATA "$i\n";
}
close ( $DATA);
my $targetfile = '200_lines.txt';
open ( my $DATA, '<', $filename ) or die "Could not open file '$filename': $!";
my $counter = 0;
my $num = 0;
my $flag = 1;
while ( my $line = <$DATA> ) {
if ($flag == 1){
open (FILE, '>', $targetfile.'_'.$num ) or die "Could not open file '$targetfile.'_'.$num': $!";
}
print FILE $line;
$flag=0;
if ( $counter % 200 == 0){
$num = $counter/200;
close FILE;
$flag=1
}
$counter++;
}
close (FILE);
It will break the large file into smaller chunks of 200 lines each.

Related

Remove the first line from my directory

how can i remove the first line from my list of file , this is my code,
open my directory:
use strict;
use warnings;
use utf8;
use Encode;
use Encode::Guess;
use Devel::Peek;
my $new_directory = '/home/lenovo/corpus';
my $directory = '/home/lenovo/corpus';
open( my $FhResultat, '>:encoding(UTF-8)', $FichierResulat );
my $dir = '/home/corpus';
opendir (DIR, $directory) or die $!;
my #tab;
while (my $file = readdir(DIR)) {
next if ($file eq "." or $file eq ".." );
#print "$file\n";
my $filename_read = decode('utf8', $file);
#print $FichierResulat "$file\n";
push #tab, "$filename_read";
}
closedir(DIR);
open my file:
foreach my $val(#tab){
utf8::encode($val);
my $filename = $val;
open(my $in, '<:utf8', $filename) or die "Unable to open '$filename' for read: $!";
rename file
my $newfile = "$filename.new";
open(my $out, '>:utf8', $newfile) or die "Unable to open '$newfile' for write: $!";
remove the first line
my #ins = <$in>; # read the contents into an array
chomp #ins;
shift #ins; # remove the first element from the array
print $out #ins;
close($in);
close $out;
the probem my new file is empty !
rename $newfile,$filename or die "unable to rename '$newfile' to '$filename': $!";
}
It seems true but the result is an empty file.
The accepted pattern for doing this kind of thing is as follows:
use strict;
use warnings;
my $old_file = '/path/to/old/file.txt';
my $new_file = '/path/to/new/file.txt';
open(my $old, '<', $old_file) or die $!;
open(my $new, '>', $new_file) or die $!;
while (<$old>) {
next if $. == 1;
print $new $_;
}
close($old) or die $!;
close($new) or die $!;
rename($old_file, "$old_file.bak") or die $!;
rename($new_file, $old_file) or die $!;
In your case, we're using $. (the input line number variable) to skip over the first line.

Modify Perl script to run for each file with specified extension in a given directory

I need to modify a Perl script x937.pl to run on all files with extension .x937 within a specific directory. Currently, I use a separate script test.pl that calls my main script, and runs it for each file of that type. However, I need to combine both into one script.
Ideally, I would be able to specify a directory path in the script, and loop through all *.x937 files in that directory.
test.pl
#!/usr/bin/perl -w
use strict;
use Encode;
my #files = <*.x937>;
foreach my $file (#files) {
system('x937.pl', $file);
}
x937.pl
#!/usr/bin/perl -w
use strict;
use Encode;
use warnings;
my $tiff_flag = 0;
my $count = 0;
my $file = "output_$ARGV[0].txt";
unless ( open OPUT, '>' . $file ) {
die "Unable to create $file";
}
open FILE, '<:raw', $ARGV[0] or die "Error opening '$ARGV[0]' $!";
binmode( FILE ) or die 'Error setting binary mode on input file';
while ( read( FILE, $_, 4 ) ) {
my $rec_len = unpack( "N", $_ );
die "Bad record length: $rec_len" unless ( $rec_len > 0 );
read( FILE, $_, $rec_len );
if ( substr( $_, 0, 2 ) eq "\xF5\xF2" ) {
if ( $tiff_flag ) {
$count++;
open( TIFF, '>', 'output_' . $ARGV[0] . '_img' . sprintf( "%04d", $count ) . '.tiff' )
or die "Can't create image file $!";
binmode( TIFF ) or die 'Error setting binary mode on image file';
print TIFF substr( $_, 117 );
close TIFF;
}
$_ = substr( $_, 0, 117 );
}
print OPUT decode( 'cp1047', $_ ) . "\n";
}
close FILE;
close OPUT;
I think I managed to generate this correctly (on iPad, sat on the sofa) ... There could be some typos ; )
Usage: perl test_x397.pl <path>
test_x397.pl
#!/usr/bin/perl -w
use strict; use warnings;
use Encode;
my ($path) = #ARGV;
$path // die "No path specified";
(-e $path) or die "Path not found: $path";
(-d $path) or die "Not a directory: $path";
my #files = <$path/*.x937>;
foreach my $file (#files) {
process($file);
}
sub process {
my ($fname) = #_;
my ($dir, $file) = $fname =~ m{^(.*)/(.+)$};
my $tiff_flag = 0;
my $count = 0;
my $outfile = sprintf("%s/output_%s.txt", $dir, $file);
open (my $outfh, '>', $outfile) or die "Unable to create $outfile. $!";
open (my $infh, '<:raw', $file) or die "Error opening '$file'. $!";
my $buffer = undef;
while (read ($infh,$buffer,4)) {
my $rec_len = unpack("N", $buffer);
die "Bad record length: $rec_len" unless ($rec_len > 0);
read ($infh, $buffer, $rec_len);
if (substr($buffer, 0, 2) eq "\xF5\xF2") {
if ($tiff_flag) {
$count++;
my $tiff_filename = sprintf('%s/output_%s_img%04d.tiff', $dir, $file, $count);
open (my $tiffh, '>', $tiff_filename) or die "Can't create image file $!";
binmode($tiffh) or die 'Error setting binary mode on image file';
print $tiffh substr($buffer, 117);
close $tiffh;
}
$buffer = substr($buffer, 0, 117);
}
print $outfh decode ('cp1047', $buffer) . "\n";
}
close $infh;
close $outfh;
}
A few things to note:
Always use the three argument version of open
Using a scalar filehandle makes it easier to pass it around (not necessary in this example but good practice)
Don't modify $_. It can lead to nasty surprises in larger programs
You already used sprintf to make part of your tiff filename, so why not use it for the whole thing.

perl script only write one row to output file in perl

i wrote a script to open a file on web, and pull out all rows with wireless in the name. It writes the out put to a different file, but it only records one line in the output file, should be mulitipe lines.
#!\Perl64\eg\perl -w
use warnings;
use strict;
use LWP::Simple;
my $save = "C:\\wireless\\";
my $file = get 'http://dhcp_server.test.com/cgi-bin/dhcp_utilization_csv_region.pl?region=test';
open( FILE, '>', $save . 'DHCP_Utilization_test.csv' ) or die $!;
binmode FILE;
print FILE $file;
close(FILE);
open( F, "C:\\wireless\\DHCP_Utilization_test.csv" ) || die "can't opern file: $!";
my #file = <F>;
close(F);
my $line;
foreach $line (#file) {
chomp $line;
if ( $line =~ m/Wireless /g ) {
my ($ip, $rtr, $mask, $zip, $blc, $address, $city,
$state, $space, $country, $space2, $noc, $company, $extra,
$active, $used, $percent, $extra3, $nus, $construct
) = split( /,/, $line );
my $custom_directory = "C:\\wireless\\";
my $custom_filename = "wireless_DHCP.csv";
my $data = "$ip $mask $rtr $active $used $percent $nus $construct";
my $path = "$custom_directory\\$custom_filename";
open( my $handle, ">>", $path ) || die "can't open $path: $!";
binmode($handle); # for raw; else set the encoding
print $handle "$data\n";
close($handle) || die "can't close $path: $!";
}
}
I believe the problem is because you're on Windows, but then saving the file using :raw, and then reopening it using :crlf.
open( FILE, '>', $save . 'DHCP_Utilization_test.csv' ) or die $!;
binmode FILE;
print FILE $file;
close(FILE);
open( F, "C:\\wireless\\DHCP_Utilization_test.csv" ) || die "can't opern file: $!";
my #file = <F>;
close(F);
I therefore suspect that your #file array only contains one line for the entire file.
You can probably also tighten your code to something like the following:
#!\Perl64\eg\perl
use strict;
use warnings;
use autodie;
use LWP::Simple;
my $url = 'http://dhcp_server.test.com/cgi-bin/dhcp_utilization_csv_region.pl?region=test';
my $datafile = "C:\\wireless\\DHCP_Utilization_test.csv";
my $wireless = "C:\\wireless\\wireless_DHCP.csv";
getstore( $url, $datafile );
open my $infh, '<', $datafile;
open my $outfh, '>>', $wireless;
while (<$infh>) {
chomp;
next unless /Wireless /;
my ($ip, $rtr, $mask, $zip, $blc, $address, $city,
$state, $space, $country, $space2, $noc, $company, $extra,
$active, $used, $percent, $extra3, $nus, $construct
) = split /,/;
print $outfh "$ip $mask $rtr $active $used $percent $nus $construct\n";
}

How to read file in Perl and if it doesn't exist create it?

In Perl, I know this method :
open( my $in, "<", "inputs.txt" );
reads a file but it only does so if the file exists.
Doing the other way, the one with the +:
open( my $in, "+>", "inputs.txt" );
writes a file/truncates if it exists so I don't get the chance to read the file and store it in the program.
How do I read files in Perl considering if the file exists or not?
Okay, I've edited my code but still the file isn't being read. The problem is it doesn't enter the loop. Anything mischievous with my code?
open( my $in, "+>>", "inputs.txt" ) or die "Can't open inputs.txt : $!\n";
while (<$in>) {
print "Here!";
my #subjects = ();
my %information = ();
$information{"name"} = $_;
$information{"studNum"} = <$in>;
$information{"cNum"} = <$in>;
$information{"emailAdd"} = <$in>;
$information{"gwa"} = <$in>;
$information{"subjNum"} = <$in>;
for ( $i = 0; $i < $information{"subjNum"}; $i++ ) {
my %subject = ();
$subject{"courseNum"} = <$in>;
$subject{"courseUnt"} = <$in>;
$subject{"courseGrd"} = <$in>;
push #subjects, \%subject;
}
$information{"subj"} = \#subjects;
push #students, \%information;
}
print "FILE LOADED.\n";
close $in or die "Can't close inputs.txt : $!\n";
Use the proper test file operator:
use strict;
use warnings;
use autodie;
my $filename = 'inputs.txt';
unless(-e $filename) {
#Create the file if it doesn't exist
open my $fc, ">", $filename;
close $fc;
}
# Work with the file
open my $fh, "<", $filename;
while( my $line = <$fh> ) {
#...
}
close $fh;
But if the file is new (without contents), the while loop won't be processed. It's easier to read the file only if the test is fine:
if(-e $filename) {
# Work with the file
open my $fh, "<", $filename;
while( my $line = <$fh> ) {
#...
}
close $fh;
}
You can use +>> for read/append, creates the file if it doesn't exist but doesn't truncate it:
open(my $in,"+>>","inputs.txt");
First check whether the file exists or not. Check the sample code below :
#!/usr/bin/perl
use strict;
use warnings;
my $InputFile = $ARGV[0];
if ( -e $InputFile ) {
print "File Exists!";
open FH, "<$InputFile";
my #Content = <FH>;
open OUT, ">outfile.txt";
print OUT #Content;
close(FH);
close(OUT);
} else {
print "File Do not exists!! Create a new file";
open OUT, ">$InputFile";
print OUT "Hello World";
close(OUT);
}

How can I print lines from a file to separate files

I have a file which has lines like this:
1 107275 447049 scaffold1443 465 341154 -
There are several lines which starts with one, after that a blank line separates and start lines with 2 and so on.
I want to separate these lines to different files based on their number.
I wrote this script but it prints in every file only the first line.
#!/usr/bin/perl
#script for choosing chromosome
use strict;
my $filename= $ARGV[0];
open(FILE, $filename);
while (my $line = <FILE>) {
my #data = split('\t', $line);
my $length = #data;
#print $length;
my $num = $data[0];
if ($length == 6) {
open(my $fh, '>', $num);
print $fh $line;
}
$num = $num + 1;
}
please, i need your help!
use >> to open file for appending to end of it as > always truncates desired file to zero bytes,
use strict;
my $filename = $ARGV[0];
open(my $FILE, "<", $filename) or die $!;
while (my $line = <$FILE>) {
my #data = split('\t', $line);
my $length = #data;
#print $length;
my $num = $data[0];
if ($length == 6) {
open(my $fh, '>>', $num);
print $fh $line;
}
$num = $num + 1;
}
If I understand your question correctly, then paragraph mode might be useful. This breaks a record on two or more new-lines, instead of just one:
#ARGV or die "Supply a filename\n";
my $filename= $ARGV[0];
local $/ = ""; # Set paragraph mode
open(my $file, $filename) or die "Unable to open '$filename' for read: $!";
while (my $lines = <$file>) {
my $num = (split("\t", $lines))[0];
open(my $fh, '>', $num) or die "Unable to open '$num' for write: $!";
print $fh $lines;
close $fh;
}
close $file;