extract every nth number - perl

i want to extract every 3rd number ( 42.034 , 41.630 , 40.158 as so on ) from the file
see example-
42.034 13.749 28.463 41.630 12.627 28.412 40.158 12.173 30.831 26.823
12.596 32.191 26.366 13.332 32.938 25.289 12.810 32.419 23.949 13.329
Any suggestions using perl script ?
Thanks,
dac

You can split file's contents to separate numbers and use the modulo operator to extract every 3rd number:
my $contents = do { local $/; open my $fh, "file" or die $!; <$fh> };
my #numbers = split /\s+/, $contents;
for (0..$#numbers) {
$_ % 3 == 0 and print "$numbers[$_]\n";
}

use strict;
use warnings;
use 5.010; ## for say
use List::MoreUtils qw/natatime/;
my #vals = qw/42.034 13.749 28.463 41.630 12.627 28.412 40.158 12.173 30.831
26.823 12.596 32.191 26.366 13.332 32.938 25.289 12.810 32.419 23.949 13.329/;
my $it = natatime 3, #vals;
say while (($_) = $it->());

This is probably the shortest way to specify that. If #list is your list of numbers
#list[ grep { $_ % 3 == 0 } 0..$#list ]

It's a one-liner!
$ perl -lane 'print for grep {++$i % 3 == 1} #F' /path/to/your/input
-n gives you line-by-line processing, -a autosplitting for field processing, and $i (effectively initialized to zero for our purposes) keeps count of the number of fields processed...

This method avoids reading the entire file into memory at once:
use strict;
my #queue;
while (<>) {
push #queue, / ( \d+ (?: \. \d* ) ? ) /gx;
while (#queue >= 3) {
my $third = (splice #queue, 0, 3)[2];
print $third, "\n"; # Or do whatever with it.
}
}

If the file has 10 numbers in every line you can use this:
perl -pe 's/([\d.]+) [\d.]+ [\d.]+/$1/g;' file
It's not a clean solution but it should "do the job".

Looks like this post lacked a solution that didn't read the whole file and used grep.
#!/usr/bin/perl -w
use strict;
my $re = qr/-?\d+(?:\.\d*)/; # Insert a more precise regexp here
my $n = 3;
my $count = 0;
while (<>) {
my #res = grep { not $count++ % $n } m/($re)/go;
print "#res\n";
};

I believe you’ll find that this work per spec, behaves politely, and never reads in more than it needs to.
#!/usr/bin/env perl
use 5.010_001;
use strict;
use autodie;
use warnings qw[ FATAL all ];
use open qw[ :std IO :utf8 ];
END { close STDOUT }
use Regexp::Common;
my $real_num_rx = $RE{num}{real};
my $left_edge_rx = qr{
(?: (?<= \A ) # or use \b
| (?<= \p{White_Space} ) # or use \D
)
}x;
my $right_edge_rx = qr{
(?= \z # or use \b
| \p{White_Space} # or use \D
)
}x;
my $a_number_rx = $left_edge_rx
. $real_num_rx
. $right_edge_rx
;
if (-t STDIN && #ARGV == 0) {
warn "$0: reading numbers from stdin,"
. " type ^D to end, ^C to kill\n";
}
$/ = " ";
my $count = 0;
while (<>) {
while (/($a_number_rx)/g) {
say $1 if $count++ % 3 == 0;
}
}

Related

How can I read content file, do some instructions, and then copy out all the interesting content to another file line by line?

File 1 :
1. "a:1 b c:10 d e f g:2 a:1 a:1 a:1"
2. "h i l m"
3. "e:1 b"
4. "f:2 e:5 a"
File 2 should be
1. "a:1 c:10 g:2"
2. "f:2 e:5"
So I would like to:
save just one sample in case of repeated elements (for ex. line 1 "a:1"),
check if the line includes the element I'd like to save (so it must be for ex. "a:1" and not just "a")
if there's just one interesting element (line 3) I'll not evaluate the line.
I've tried to put the file content into an array inside a while cycle but when I printed Out my content it was all printed in line 1.
this is my attempt:
use List::MoreUtils qw(uniq);
$in = "in.txt";
$out = "out.txt";
open (IN, "<", $in);
open (OUT, ">", $out);
while(defined ( $l = <IN>)){
#a = split (/\s/, $l);
#c= uniq(#a);
for ($i = 0; $i < #c; $i++){
if ($c[$i] =~ /.*:-?\d\.\d+/) {
print OUT $c[$i];
}
}
}
This will do what you ask.
It isn't clear whether the line numbers and quotation marks are part of your data, but I have written it so that it doesn't matter either way
The program expects the path to the input file as a parameter on the command line
use strict;
use warnings;
while (<>) {
my %seen;
my #wanted = grep { /:/ and not $seen{$_}++ } /[^\s"]+/g;
print "#wanted\n" if #wanted > 1;
}
output
a:1 c:10 g:2
f:2 e:5
This ugly one-liner also uses a hash but in a way that preserves the order of fields.
perl -ne '
%h=();
print qq($nr. "$_"\n)
if $_=join " ", grep !$h{$_}++, /\w+:\d+/g and / / and ++$nr
' in.txt > out.txt
output:
1. "a:1 c:10 g:2"
2. "f:2 e:5"
Here's one way you could do it:
#!/usr/bin/env perl
use strict;
use warnings;
my $i = 1;
while (<>) {
my %h; # create an empty hash every line
foreach (split /["\s]/) { # split on double quotes and spaces
$h{$_}++ if /:/; # if colon found, add element to hash
}
if (keys %h > 1) { # if more than one element in hash
print (($i++), q/. "/, (join " ", (keys %h)), qq/"\n/);
}
}
Usage: file.pl in.txt > out.txt
I wasn't sure what your exact criterion for including the line was but the above code works for your sample data. Because a hash is being used, the contents isn't necessarily in the right order. If you wanted to sort the values, that would be a minor modification.
output:
1. "c:10 a:1 g:2"
2. "f:2 e:5"
I had a bit of fun playing with this problem.
It may not help you that much as it is a little tricky to read, but this is what I ended up with:
use List::MoreUtils qw(uniq);
$in = "in.txt";
$out = "out.txt";
open (IN, "<", $in);
open (OUT, ">", $out);
foreach (<IN>) {
#result = map /.\:\d*/ ? $_ : (), uniq ( split ) ;
print OUT join(" ", #result) . "\n" unless ($#result < 1);
}
Output:
a:1 c:10 g:2
f:2 e:5
Here is a version that doesn't use uniq and uses the -n option to handle the while loop.
#!/usr/bin/perl -n
my %seen;
#result = map /.\:\d*/ ? $_ : (), grep {! $seen{$_}++ } ( split ) ;
print join(" ", #result) . "\n" unless ($#result < 1);
Output:
./myscript.pl in.txt
a:1 c:10 g:2
f:2 e:5
I just noticed its not supposed to print if there is only one result.
That's easy to fix by changing the $#result test.

Perl: Read from file till specified character(s) found

I have a very huge(10 GB) single line file(basically insert statement) which i cant load into memory.
I want to process that line(doing some regex) and taking meaning full values.
The values are in tuples(data is between-> (.*) ).
So i want to just read each tuple from the file and process it.
What i am thinking of doing is using getc like this:
getc FILEHANDLE
So i read each character and check if it matches my tuple ending character(in my case it is ), ).
Is there a more efficient and better way to perform this in optimized way?
Thanks.
You could set the special perl variable INPUT_RECORD_SEPARATOR $/ to match your tuple-ending character.
#!/usr/bin/env perl
use strict;
use warnings;
use feature qw/ say /;
open( my $fh, '<', 'foo.txt' ) or die;
my $tuple_ending_char = ')';
local $/ = $tuple_ending_char;
while (<$fh>) {
say $_;
}
You can try the following code also but it is not as elegant as davewood's solution.
use strict;
use Data::Dumper;
my $filename='/tmp/sample.txt';
if (open(my $fh, $filename)) {
my #file_stats = stat($fh);
my $bytes_remaining = $file_stats[7];
my $answer = "";
my $buffer_size=1024;
while (1) {
my $bytes_read = read($fh, $answer, $buffer_size);
my #tuples = ($answer =~ /\(.*?\),\s*/g);
print Dumper(\#tuples);
$answer =~ s/.*\)\s*,\s*([^\)]*)$/$1/g;
$bytes_remaining -= $bytes_read;
if ($bytes_remaining < 0) {$bytes_remaining = 0;}
if (($bytes_read == 0) ||($bytes_remaining <= 0)) {
last;
};
};
close($fh);
}

replace a string of characters with the line number

I have a text file that has approximately 3,000 lines. 99% of the time I need all 3,000 lines. However, periodically I will grep out the lines I need and direct the output to another text file to use.
The only problem I have in doing so, is: Embedded in the text file is a 6 character string of numbers that indicate the line number. In order to use the file, this area needs to be correctly renumbered...(I don't need to re-sort the data, but I need to replace the current six characters with the new line number. and it must be padded with zeros! Unfortuantely the entire rows is one long row of data with no field separators!
For example, my first three rows might look something like:
20130918082020ZZ000001RANDOMDATAFOLLOWSAFTERTHISABCDEFGH
20130810112000ZZ000999MORERANDOMDATAFOLLOWSAFTERTHISABCD
20130810112000ZZ000027SILLMORERANDOMDATAFOLLOWSAFTERTHIS
The six characters at positions 17-22 (Immediately following the "ZZ"), need be renumbered based on the current row number...so the above needs to look like:
20130918082020ZZ000001RANDOMDATAFOLLOWSAFTERTHISABCDEFGH
20130810112000ZZ000002MORERANDOMDATAFOLLOWSAFTERTHISABCD
20130810112000ZZ000003SILLMORERANDOMDATAFOLLOWSAFTERTHIS
Any ideas would be greatly appreciated!
Thanks,
KSL.
Here's the solution I came up with Perl. It assumes that the numbering is always 6 digits after the ZZ sequence.
In convert.pl:
use strict;
use warnings;
my $i = 1; # or the value you want to start numbering
while (<STDIN>) {
my $replace = sprintf("%06d", $i++);
$_ =~ s/ZZ\d{6}/ZZ$replace/g;
print $_;
}
In data.dat:
20130918082020ZZ000001RANDOMDATAFOLLOWSAFTERTHISABCDEFGH
20130810112000ZZ000999MORERANDOMDATAFOLLOWSAFTERTHISABCD
20130810112000ZZ000027SILLMORERANDOMDATAFOLLOWSAFTERTHIS
To run:
cat data.dat | perl convert.pl
Output
20130918082020ZZ000001RANDOMDATAFOLLOWSAFTERTHISABCDEFGH
20130810112000ZZ000002MORERANDOMDATAFOLLOWSAFTERTHISABCD
20130810112000ZZ000003SILLMORERANDOMDATAFOLLOWSAFTERTHIS
If I would solve this, I would create a simple python script to read those lines by filtering as grep does and using a internal counter from inside the python script.
As simple hints you can read each line in a string and access them using variablename[17:22] (17:22 is the position of the string you are trying to use).
Now, there is a method in the string in python which does the replace, just replace the values by the counter you create.
I hope this helps.
To do this in awk:
awk '{print substr($0,1,16) sprintf("%06d", NR) substr($0,23)}'
or
gawk 'match($0,/^(.*ZZ)[0-9]{6}(.*)/,a) {print a[1] sprintf("%06d",NR) a[2]}'
This is exactly the type of thing where unpack is useful.
#!/usr/bin/env perl
use v5.10.0;
use strict;
use warnings;
while( my $line = <> ){
chomp $line;
my #elem = unpack 'A16 A6 A*', $line;
$elem[1] = sprintf '%06d', $.;
# $. is the line number for the last used file handle
say #elem;
}
Actually looking at the lines, it looks like there is date information stored in the first 14 characters.
Assuming that at some point you might want to parse the lines for some reason you can use the following as an example of how you could use unpack to split up the lines.
#!/usr/bin/env perl
use v5.10.0; # say()
use strict;
use warnings;
use DateTime;
my #date_elem = qw'
year month day
hour minute second
';
my #elem_names = ( #date_elem, qw'
ZZ
line_number
random_data
');
while( my $line = <> ){
chomp $line;
my %data;
#data{ #elem_names } = unpack 'A4 (A2)6 A6 A*', $line;
# choose either this:
$data{line_number} = sprintf '%06d', $.;
say #data{#elem_names};
# or this:
$data{line_number} = $.;
printf '%04d' . ('%02d'x5) . "%2s%06d%s\n", #data{ #elem_names };
# the choice will affect the contents of %data
# this just shows the contents of %data
for( #elem_names ){
printf qq'%12s: "%s"\n', $_, $data{$_};
}
# you can create a DateTime object with the date elements
my $dt = DateTime->new(
(map{ $_, $data{$_} } #date_elem),
time_zone => 'floating',
);
say $dt;
print "\n";
}
Although it would be better to use a regular expression, so that you could throw out bogus data.
use v5.14; # /a modifier
...
my $rdate = join '', map{"(\\d{$_})"} 4, (2)x5;
my $rx = qr'$rdate (ZZ) (\d{6}) (.*)'xa;
while( my $line = <> ){
chomp $line;
my %data;
unless( #data{ #elem_names } = $line =~ $rx ){
die qq'unable to parse line "$line" ($.)';
}
...
It would be better still; to use named capture groups added in 5.10.
...
my $rx = qr'
(?<year> \d{4} ) (?<month> \d{2} ) (?<day> \d{2} )
(?<hour> \d{2} ) (?<minute> \d{2} ) (?<second> \d{2} )
ZZ
(?<line_number> \d{6} )
(?<random_data> .* )
'xa;
while( my $line = <> ){
chomp $line;
unless( $line =~ $rx ){
die qq'unable to parse line "$line" ($.)';
}
my %data = %+;
# for compatibility with previous examples
$data{ZZ} = 'ZZ';
...

Using goto LABEL for comparing two files

I am unable to get desired output.
Please help to correct my errors.
file1
A
B
C
D
E
F
file2
A
D
C
Desired Output (if found print '1' at relative position in larger file and if not print '0')
1
0
1
1
0
0
code
#!/usr/bin/perl -w
open(FH,$file);
#q=<FH>;
open(FH1,$file2);
#d=<FH1>;
open(OUT,">out.txt");
foreach $i(#q) {
foreach $j(#d) {
if ($i eq $j) {
$id=1 ;
goto LABEL;
} elsif ($i ne $j) {
$id=1;
goto LABEL;
}
}
}
print OUT "1\t";
LABEL:
print OUT "0\t";
}
close FH;
close FH1;
close OUT;
note: actual files are much much larger and contain uneven number of elements.
You were looking for
for $q (#q) {
my $found = 0;
for $d (#d) {
if ($q eq $d) {
$found = 1;
goto LABEL;
}
}
LABEL: print "$found\n";
}
The above is better written as follows:
for $q (#q) {
my $found = 0;
for $d (#d) {
if ($q eq $d) {
$found = 1;
last;
}
}
print "$found\n";
}
But those solutions perform poorly. You can avoid iterating over #d repeatedly by using a hash.
my %d = map { $_ => 1 } #d;
for $q (#q) {
print $d{$q} ? "1" : "0", "\n";
}
Consider the following approach:
use strict;
use warnings;
use autodie;
use feature 'say';
open my $fh1, '<', 'file1';
open my $fh2, '<', 'file2';
say <$fh1> eq <$fh2> ? '1' : '0'
until eof $fh1 or eof $fh2;
Notes:
use strict; use warnings; to maintain sanity
autodie to take care of failed file opens
Lexical filehandles are preferred to bareword filehandles
say for syntactic sugar to automatically append a newline at the end of every 1 or 0
Diamond operator to read in each filehandle line-by-line
eq to string-compare the two lines
Ternary operator (COND ? TRUE : FALSE) to decide whether to print 1 or 0
until is a negated while
eof to tell the loop when either of the two filehandles has been exhausted
As it was said don't use LABEL. And to be honest you don't need perl for that, because join and sed do the job (may be you need to sort the files first):
join -a1 -a2 -e "0" -o 2.1 file1.txt file2.txt | sed "s/[^0]/1/g"
May be you need to sort your files first - in this case have a look at this post: comparing to unsorted files.
To be honest LABEL is not your friend - don't do that. For me it sounds more like a job for the join. But if you want to solve it using Perl I would try the following:
If the input files are sorted (otherwise you can use sort to achieve that) compare them line by line and print the result:
while ($line_from_f1 = <F1>)
{
$line_from_f2=<F2>;
if ($line_from_f1 eq $line_from_f2)
{
print "1\n";
}
else
{
print "0\n";
}
}
Shorter version (untested):
while (<F1>)
{
print ($_ eq <F2>)."\n";
}
Note: These versions compare the files line by line - if a line is missing in the middle it does not work properly.

How can I print the lines in STDIN in random order in Perl?

I want to do the inverse of sort(1) : randomize every line of stdin to stdout in Perl.
I bet real Perl hackers will tear this apart, but here it goes nonetheless.
use strict;
use warnings;
use List::Util 'shuffle';
my #lines = ();
my $bufsize = 512;
while(<STDIN>) {
push #lines, $_;
if (#lines == $bufsize) {
print shuffle(#lines);
undef #lines;
}
}
print shuffle(#lines);
Difference between this and the other solution:
Will not consume all the input and then randomize it (memory hog), but will randomize every $bufsize lines (not truly random and slow as a dog compared to the other option).
Uses a module which returns a new list instead of a in place editing Fisher - Yates implementation. They are interchangeable (except that you would have to separate the print from the shuffle). For more information type perldoc -q rand on your shell.
This perl snippet does the trick :
#! /usr/bin/perl
# randomize cat
# fisher_yates_shuffle code copied from Perl Cookbook
# (By Tom Christiansen & Nathan Torkington; ISBN 1-56592-243-3)
use strict;
my #lines = <>;
fisher_yates_shuffle( \#lines ); # permutes #array in place
foreach my $line (#lines) {
print $line;
}
# fisher_yates_shuffle( \#array ) : generate a random permutation
# of #array in place
sub fisher_yates_shuffle {
my $array = shift;
my $i;
for ($i = #$array; --$i; ) {
my $j = int rand ($i+1);
next if $i == $j;
#$array[$i,$j] = #$array[$j,$i];
}
}
__END__
use List::Util 'shuffle';
print shuffle <>
Or if you worry about last lines lacking \n,
chomp(my #lines = <>);
print "$_\n" for shuffle #lines;