I am creating a script that should take in a data file and a column index, read and store that column and then perform some statistics on the data. I am unsure how to specify that I only want to store a specific column in Perl. Here is my code so far:
#! /usr/bin/perl
use warnings;
use strict;
use feature qw(say);
use Scalar::Util qw(looks_like_number);
my ($FILE, $COLUMN_TO_PARSE) = #ARGV;
#check if file arg is present
if(not defined $FILE){
die "Please specify file input $!";
}
#check if column arg is present
if(not defined $COLUMN_TO_PARSE){
die "Please specify column number $!";
}
unless(open(INPUT_FILE, "<", $FILE)){
die "Couldn't open ", $FILE ," for reading!", $!;
}
my #data;
while(<INPUT_FILE>){
# Only store $COLUMN_TO_PARSE, save to #data
}
close(INPUT_FILE);
For reference, the data coming in looks something like this(sorry for format):
01 8 0 35 0.64 22
02 8 0 37 0.68 9
03 8 0 49 0.68 49
So for example, if I ran
perl descriptiveStatistics.pl dataFile.txt 3
I would expect to have [35,37,49] in the #data array.
I stumbled upon this question, but it has to do with headers which I don't have, and not very helpful imo. Any suggestions?
I've used split() to split the input into a list of records. By default, split() works on $_ and splits on white space - which is exactly what we want here.
I've then used a list slice to get the column that you want, and pushed that onto your array.
#! /usr/bin/perl
use warnings;
use strict;
# Check parameters
#ARGV == 2 or die "Please specify input file and column number\n";
my ($file, $column_to_parse) = #ARGV;
open my $in_fh, '<', $file
or die "Couldn't open $file for reading: $!";
my #data;
while (<$in_fh>){
push #data, (split)[$column_to_parse];
}
If I was writing it for myself, I think I would replace the while loop with a map.
my #data = map { (split)[$column_to_parse] } <$in_fh>;
Update: To ensure that you have been given a valid column number (and I think that's a good idea) you might write something like this:
while (<$in_fh>){
my #fields = split;
die "Not enough columns in row $.\n" if $#fields < $column_to_parse;
push #data, $fields[$column_to_parse];
}
split is a good choice:
while (my $line = <INPUT_FILE>) {
my #items = split(/\t/, $line);
push #data,$items[$COLUMN_TO_PARSE];
}
You could design a regexp pattern for matching a column, which you repeat $COLUMN_TO_PARSE times, and which then captures the content of the column and pushes it onto your array #data.
Like this:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my #data;
my $COLUMN_TO_PARSE = 3;
while (<DATA>) {
if (/([^\s]+\s+){$COLUMN_TO_PARSE}([^\s]+)/) {
push #data, $2;
} else {
print("error wrong line format: $_\n");
}
}
print Dumper(#data);
__DATA__
01 8 0 35 0.64 22
02 8 0 37 0.68 9
03 8 0 49 0.68 49
which gives the following dump for #data:
$VAR1 = '35';
$VAR2 = '37';
$VAR3 = '49';
$COLUMN_TO_PARSE is zero based, as in your example, and, as a side-effect, the regexp will fail if the requested column does not exist, thus giving you error-handling.
You can use split to get data column wise. Each column in stored in consecutive indices of the array.
while(<INPUT_FILE>){
my #columns = split(/\t/, $_); #Assuming delimiter to tab
print "First column====$columns[0]\n";
print "Second column====$columns[1]\n";
}
Do process whichever column you want and store into an array.
Related
I am new to perl, trying to read a file with columns and creating an array.
I am having a file with following columns.
file.txt
A 15
A 20
A 33
B 20
B 45
C 32
C 78
I wanted to create an array for each unique item present in A with its values assigned from second column.
eg:
#A = (15,20,33)
#B = (20,45)
#C = (32,78)
Tried following code, only for printing 2 columns
use strict;
use warnings;
my $filename = $ARGV[0];
open(FILE, $filename) or die "Could not open file '$filename' $!";
my %seen;
while (<FILE>)
{
chomp;
my $line = $_;
my #elements = split (" ", $line);
my $row_name = join "\t", #elements[0,1];
print $row_name . "\n" if ! $seen{$row_name}++;
}
close FILE;
Thanks
Firstly some general Perl advice. These days, we like to use lexical variables as filehandles and pass three arguments to open().
open(my $fh, '<', $filename) or die "Could not open file '$filename' $!";
And then...
while (<$fh>) { ... }
But, given that you have your filename in $ARGV[0], another tip is to use an empty file input operator (<>) which will return data from the files named in #ARGV without you having to open them. So you can remove your open() line completely and replace the while with:
while (<>) { ... }
Second piece of advice - don't store this data in individual arrays. Far better to store it in a more complex data structure. I'd suggest a hash where the key is the letter and the value is an array containing all of the numbers matching that letter. This is surprisingly easy to build:
use strict;
use warnings;
use feature 'say';
my %data; # I'd give this a better name if I knew what your data was
while (<>) {
chomp;
my ($letter, $number) = split; # splits $_ on whitespace by default
push #{ $data{$letter} }, $number;
}
# Walk the hash to see what we've got
for (sort keys %data) {
say "$_ : #{ $data{$_ } }";
}
Change the loop to be something like:
while (my $line = <FILE>)
{
chomp($line);
my #elements = split (" ", $line);
push(#{$seen{$elements[0]}}, $elements[1]);
}
This will create/append a list of each item as it is found, and result in a hash where the keys are the left items, and the values are lists of the right items. You can then process or reassign the values as you wish.
I'm a novice at using perl. What I want to do is compare two files. One is my index file that I am calling "temp." I am attempting to use this to search through a main file that I am calling "array." The index file has only numbers in it. There are lines in my array that have those numbers. I've been trying to find the intersection between those two files, but my code is not working. Here's what I've been trying to do.
#!/usr/bin/perl
print "Enter the input file:";
my $filename=<STDIN>;
open (FILE, "$filename") || die "Cannot open file: $!";
my #array=<FILE>;
close(FILE);
print "Enter the index file:";
my $temp=<STDIN>;
open (TEMP, "$temp") || die "Cannot open file: $!";
my #temp=<TEMP>;
close(TEMP);
my %seen= ();
foreach (#array) {
$seen{$_}=1;
}
my #intersection=grep($seen{$_}, #temp);
foreach (#intersection) {
print "$_\n";
}
If I can't use intersection, then what else can I do to move each line that has a match between the two files?
For those of you asking for the main file and the index file:
Main file:
1 CP TRT
...
14 C1 MPE
15 C2 MPE
...
20 CA1 MPE
Index file
20
24
22
17
18
...
I want to put those lines that contain one of the numbers in my index file into a new array. So using this example, only
20 CA1 MPE would be placed into a new array.
My main file and index file are both longer than what I've shown, but that hopefully gives you an idea on what I'm trying to do.
I am assuming something like this?
use strict;
use warnings;
use Data::Dumper;
# creating arrays instead of reading from file just for demo
# based on the assumption that your files are 1 number per line
# and no need for any particular parsing
my #array = qw/1 2 3 20 60 50 4 5 6 7/;
my #index = qw/10 12 5 3 2/;
my #intersection = ();
my %hash1 = map{$_ => 1} #array;
foreach (#index)
{
if (defined $hash1{$_})
{
push #intersection, $_;
}
}
print Dumper(\#intersection);
==== Out ====
$VAR1 = [
'5',
'3',
'2'
];
A few things:
Always have use strict; and use warnings; in your program. This will catch a lot of possible errors.
Always chomp after reading input. Perl automatically adds \n to the end of lines read. chomp removes the \n.
Learn a more modern form of Perl.
Use nemonic variable names. $temp doesn't cut it.
Use spaces to help make your code more readable.
You never stated the errors you were getting. I assume it has to do with the fact that the input from your main file doesn't match your index file.
I use a hash to create an index that the index file can use via my ($index) = split /\s+/, $line;:
#! /usr/bin/env perl
#
use strict;
use warnings;
use autodie;
use feature qw(say);
print "Input file name: ";
my $input_file = <STDIN>;
chomp $input_file; # Chomp Input!
print "Index file name: ";
my $index_file = <STDIN>;
chomp $index_file; # Chomp Input!
open my $input_fh, "<", $input_file;
my %hash;
while ( my $line = <$input_fh> ) {
chomp $line;
#
# Using split to find the item to index on
#
my ($index) = split /\s+/, $line;
$hash{$index} = $line;
}
close $input_fh;
open my $index_fh, "<", $index_file;
while ( my $index = <$index_fh> ) {
chomp $index;
#
# Now index can look up lines
#
if( exists $hash{$index} ) {
say qq(Index: $index Line: "$hash{$index}");
}
else {
say qq(Index "$index" doesn't exist in file.);
}
}
#!/usr/bin/perl
use strict;
use warnings;
use autodie;
#ARGV = 'main_file';
open(my $fh_idx, '<', 'index_file');
chomp(my #idx = <$fh_idx>);
close($fh_idx);
while (defined(my $r = <>)) {
print $r if grep { $r =~ /^[ \t]*$_/ } #idx;
}
You may wish to replace those hardcoded file names for <STDIN>.
FYI: The defined call inside a while condition might be "optional".
I have this Perl program which picks data from specific columns starting from a certain row.
#!/usr/bin/perl
# This script is to pick the specific columns from a file, starting from a specific row
# FILE -> Name of the file to be passed at run time.
# rn -> Number of the row from where the data has to be picked.
use strict;
use warnings;
my $file = shift || "FILE";
my $rn = shift;
my $cols = shift;
open(my $fh, "<", $file) or die "Could not open file '$file' : $!\n";
while (<$fh>) {
$. <= $rn and next;
my #fields = split(/\t/);
print "$fields[$cols]\n";
}
My problem is that I am only able to get one column at a time. I want to be able to specify a selection of indices like this
0, 1, 3..6, 21..33
but it's giving me only the first column.
I am running this command to execute the script
perl extract.pl FILE 3 0, 1, 3..6, 21..33
In the absence of any other solutions I am posting some code that I have been messing with. It works with your command line as you have described it by concatenating all of the fields after the first and removing all spaces and tabs.
The column set is converted to a list of integers using eval, after first making sure that it consists of a comma-separated list of either single integers or start-end ranges separated by two or three full stops.
use strict;
use warnings;
use 5.014; # For non-destructive substitution and \h regex item
my $file = shift || "FILE";
my $rn = shift || 0;
my $cols = join('', #ARGV) =~ s/\h+//gr;
my $item_re = qr/ \d+ (?: \.\.\.? \d+)? /ax;
my $set_re = qr/ $item_re (?: , $item_re )* /x;
die qq{Invalid column set "$cols"} unless $cols =~ / \A $set_re \z /x;
my #cols = eval $cols;
open my $fh, '<', $file or die qq{Couldn't open "$file": $!};
while (<$fh>) {
next if $. <= $rn;
my #fields = split /\t/;
print "#fields[#cols]\n";
}
My problem is that I am only able to get one column at a time
You don't understand what perl is passing to your program from the command line:
use strict;
use warnings;
use 5.016;
my $str = "1..3";
my $x = shift #ARGV; # $ perl myprog.pl 1..3
if ($str eq $x) {
say "It's a string";
}
else {
say "It's a range";
}
my #cols = (0, 1, 2, 3, 4);
say for #cols[$str];
--output:--
$perl myprog.pl 1..3
Scalar value #cols[$str] better written as $cols[$str] at 1.pl line 16.
It's a string
Argument "1..3" isn't numeric in array slice at 1.pl line 16.
1
Anything you write on the command line will be passed to your program as a string, and perl won't automatically convert the string "1..3" into the range 1..3 (in fact your string would be the strange looking "1..3,"). After throwing some errors, perl sees a number on the front of the string "1..3", so perl converts the string to the integer 1. So, you need to process the string yourself:
use strict;
use warnings;
use 5.016;
my #fields = (0, 1, 2, 3, 4);
my $str = shift #ARGV; # perl myprog.pl 0,1..3 => $str = "0,1..3"
my #cols = split /,/, $str;
for my $col (#cols) {
if($col =~ /(\d+) [.]{2} (\d+)/xms) {
say #fields[$1..$2]; # $1 and $2 are strings but perl will convert them to integers
}
else {
say $fields[$col];
}
}
--output:--
$ perl myprog.pl 0,1..3
0
123
Perl presents the parameters entered on the command line in an array called #ARGV. Since this is an ordinary array, you could use the length of this array to get additional information. Outside a subroutine, the shift command shifts values from the beginning of the #ARGV array when you don't give it any parameters.
You could do something like this:
my $file = shift; # Adding || "FILE" doesn't work. See below
my $rn = shift;
my #cols = #ARGV;
Instead of cols being a scalar variable, it's now an array that can hold all of the columns you want. In other words, the first parameter is the file name, the second parameter is the row, and the last set of parameters are the columns you want:
while (<$fh>) {
next if $. <= $rn;
my #fields = split(/\t/);
for my $column ( #columns ) {
printf "%-10.10s", $fields[$column];
}
print "\n";
break; # You printed the row. Do you want to stop?
}
Now, this isn't as fancy pants as your way of doing it where you can give ranges, etc, but it's fairly straight forward:
$ perl extract.pl FILE 3 0 1 3 4 5 6 21 22 23 24 25 26 27 28 29 30 31 32 33
Note I used printf instead of print so all of the fields will be the same width (assuming that they're strings and none is longer than 10 characters).
I tried looking for a Perl module that will handle range input like you want. I'm sure one exists, but I couldn't find it. You still need to allow for a range of input in #col like I showed above, and then parse #cols to get the actual columns.
What's wrong with my $file = shift || "FILE";?
In your program, you're assuming three parameters. That means you need a file, a row, and at least one column parameter. You will never have a situation where not giving a file name will work since it means you don't have a row or a set of columns to print out.
So, you need to look at $#ARGV and verify it has at least three values in it. If it doesn't have three values, you need to decide what to do at that point. The easy solution is to just abort the program with a little message telling you the correct usage. You could verify if there are one, two, or three parameters and decide what to do there.
Another idea is to use Getopt::Long which will allow you to use named parameters. You can load the parameters with pre-defined defaults, and then change when you read in the parameters:
...
use Getopt::Long;
my $file = "FILE"; # File has a default;
my $row, #cols; # No default values;
my $help; # Allow user to request help
GetOptions (
"file=s" => \$file,
"rows=i => \$rows,
"cols=i" => \#cols,
"help" => $help,
);
if ( "$help" ) {
print_help();
}
if ( not defined $rows ) {
error_out ( "Need to define which row to fetch" );
}
if ( not #cols ) {
error_out ( "Need to define which rows" );
}
The user could call this via:
$ perl extract.pl -file FILE -row 3 -col 0 -col 1 3 4 5 6 21 22 23 24 25 26 27 28 29 30 31 32 33
Note that if I use -col, by default, GetOptions will assume that all values after the -col are for that option. Also note I could, if I want, repeat -col for each column.
By the way, if you use GetOpt::Long, you might as well use Pod::Usage. POD stands for Plain Ol' Document which is Perl's way of documenting how a program is used. Might as well make this educational. Read up on POD Documentation, the POD Specifications, and the standard POD Style. This is how you document your Perl programming. You can use the perldoc command (Betcha you didn't know it existed), to print out the embedded Perl POD documentation, and use Pod::Usage to print it out for the user.
I think perl can do this, but I am pretty new to perl.
Hoping somebody can help me.
I have file like this (actual file is more than ten thousands lines, values are in ascending order, some values are duplicated).
1
2
2
35
45
I want to separate those lines into separate files based on the similarity of the values (for example difference of the value is less than 30).
outfile1
1
2
2
outfile2
35
45
Thanks
This is done very simply by just opening a new file every time it is necessary, i.e. for the first line of data and thereafter every time there is a gap of 30 or more.
This program expects the name of the input file as a parameter on the command line.
use strict;
use warnings;
use autodie;
my ($last, $fileno, $fh);
while (<>) {
my ($this) = /(\d+)/;
unless (defined $last and $this < $last + 30) {
open $fh, '>', 'outfile'.++$fileno;
}
print $fh $_;
$last = $this;
}
It should really be easy. Just remember the previous value in a variable so you can see whether the difference is large enough. You also have to count the output files created so far so you can name a new file when needed.
#!/usr/bin/perl
use warnings;
use strict;
my $threshold = 30;
my $previous;
my $count_out = 0;
my $OUTPUT;
while (<>) {
if (not defined $previous or $_ > $previous + $threshold) {
open $OUTPUT, '>', "outfile" . $count_out++ or die $!;
}
print $OUTPUT $_;
$previous = $_;
}
I have a simple log file which is very messy and I need it to be neat. The file contains log headers, but they are all jumbled up together. Therefore I need to sort the log files according to the log headers. There are no static number of lines - that means that there is no fixed number of lines for the each header of the text file. And I am using perl grep to sort out the headers.
The Log files goes something like this:
Car LogFile Header
<text>
<text>
<text>
Car LogFile Header
<text>
Car LogFile Header
<and so forth>
I have done up/searched a simple algorithm but it does not seem to be working. Can someone please guide me? Thanks!
#!/usr/bin/perl
#use 5.010; # must be present to import the new 5.10 functions, notice
#that it is 5.010 not 5.10
my $srce = "./root/Desktop/logs/Default.log";
my $string1 = "Car LogFile Header";
open(FH, $srce);
my #buf = <FH>;
close(FH);
my #lines = grep (/$string1/, #buffer);
After executing the code, there is no result shown at the terminal. Any ideas?
I think you want something like:
my $srce = "./root/Desktop/logs/Default.log";
my $string1 = "Car LogFile Header";
open my $fh, '<', $srce or die "Could not open $srce: $!";
my #lines = sort grep /\Q$string1/, <$fh>;
print #lines;
Make sure you have the right file path and that the file has lines that match your test pattern.
It seems like you are missing a lot of very basic concepts and maybe cutting and paste code you see elsewhere. If you're just starting out, pick up a Perl tutorial such as Learning Perl. There are other books and references listed in perlfaq2.
Always use:
use strict;
use warnings;
This would have told you that #buffer is not defined.
#!/usr/bin/perl
use strict;
use warnings;
my $srce = "./root/Desktop/logs/Default.log";
my $string1 = "Car LogFile Header";
open(my $FH, $srce) or die "Failed to open file $srce ($!)";
my #buf = <$FH>;
close($FH);
my #lines = grep (/$string1/, #buf);
print #lines;
Perl is tricky for experts, so experts use the warnings it provides to protect them from making mistakes. Beginners need to use the warnings so they don't make mistakes they don't even know they can make.
(Because you didn't get a chance to chomp the input lines, you still have newlines at the end so the print prints the headings one per line.)
I don't think grep is what you want really.
As you pointed out in brian's answer, the grep will only give you the headers and not the subsequent lines.
I think you need an array where each element is the header and the subsequent lines up to the next header.
Something like: -
#!/usr/bin/perl
use strict;
use warnings;
my $srce = "./default.log";
my $string1 = "Car LogFile Header";
my #logs;
my $log_entry;
open(my $FH, $srce) or die "Failed to open file $srce ($!)";
my $found = 0;
while(my $buf = <$FH>)
{
if($buf =~ /$string1/)
{
if($found)
{
push #logs, $log_entry;
}
$found = 1;
$log_entry = $buf;
}
else
{
$log_entry = $log_entry . $buf;
}
}
if($found)
{
push #logs, $log_entry;
}
close($FH);
print sort #logs;
i think it's what is being asked for.
Perl grep is not same as Unix grep command in that it does not print anything on the screen.
The general syntax is: grep Expr, LIST
Evaluates Expr for each element of LIST and returns a list consisting of those elements for which the expression evaluated to true.
In your case all the #buffer elements which have the vale of $string1 will be returned.
You can then print the #buffer array to actually see them.
You just stored everything in an array instead of printing it out. It's also not necessary to keep the whole file in memory. You can read and print the match results line by line, like this:
my $srce = "./root/Desktop/logs/Default.log";
my $string1 = "Car LogFile Header";
open(FH, $srce);
while(my $line = <FH>) {
if($line =~ m/$string1/) {
print $line;
}
}
close FH;
Hello I found a way to extract links from html file
!/usr/bin/perl -w
2
3 # Links graber 1.0
2
3 # Links graber 1.0
4 #Author : peacengell
5 #28.02.13
6
7 ####
8
9 my $file_links = "links.txt";
10 my #line;
11 my $line;
12
13
14 open( FILE, $file_links ) or die "Can't find File";
15
16 while (<FILE>) {
17 chomp;
18 $line = $_ ;
19
20 #word = split (/\s+/, $line);
21 #word = grep(/href/, #word);
22 foreach $x (#word) {
23
24 if ( $x =~ m /ul.to/ ){
25 $x=~ s/href="//g;
26 $x=~s/"//g;
27 print "$x \n";
28
29
30 }
31
32 }
33
34 }
you can use it and modify it please let me know if you modify it.