I have a file:
434462PW1 5
76252PPP8 5,714.79
76252PMB2 16,950.17
76252PRC5 25,079.70
76252PNY1 30,324.50
62630WCQ8 1.09
62630WCZ8 1.09
62630WBX4 36,731.90
62630WCQ8 1.07
62630WCZ8 1.07
76252PGB9 1.07
62630WBN6 1.07
62630WBA4 1.07
I need the commas stripped out of the second value, and a comma added between the 1st and 2nd values.
434462PW1,5
76252PPP8,5714.79
76252PMB2,16950.17
76252PRC5,25079.70
76252PNY1,30324.50
62630WCQ8,1.09
62630WCZ8,1.09
62630WBX4,36731.90
62630WCQ8,1.07
62630WCZ8,1.07
76252PGB9,1.07
62630WBN6,1.07
62630WBA4,1.07
Here is the code. I'm having trouble stripping just the number values.
#!/usr/bin/perl
use strict ;
use warnings;
open my $handle, '<', "foofile";
chomp(my #positionArray = <$handle>);
foreach my $pos(#positionArray) {
if ($pos =~ /(\w{9})\s+(.*)/) {
if ($2=~/,/) {
my $without = $2=~s/,//g ;
print "$1,$without\n";
}
}
}
Since commas only appear in the 2nd column, you can simply delete all commas from each line. Also, since whitespace only exists between your 2 columns, you can then replace all space with a comma.
foreach my $pos (#positionArray) {
$pos =~ s/,//g;
$pos =~ s/\s+/,/;
print "$pos\n";
}
Another way is you can sorted out this issues using map function (Input and output # array variable).
chomp(my #positionArray = <$handle>);
my #out = map { $_=~s/\,//g; $_=~s/\s+/,/; $_; } #positionArray;
use Data::Dumper;
print Dumper \#out;
For inexplicable reason you made code more complicated than it may be
use strict ;
use warnings;
use feature 'say';
my $filename = 'foofile';
open my $fh, '<', $filename
or die "Couldn't open $filename $!";
my #lines = <$fh>;
close $fh;
chomp #lines; # snip eol
for (#lines) {
my($id,$val) = split;
$val =~ s/,//; # modifier 'g' might be used if value goes beyond thousands
say "$id,$val";
}
Output
434462PW1,5
76252PPP8,5714.79
76252PMB2,16950.17
76252PRC5,25079.70
76252PNY1,30324.50
62630WCQ8,1.09
62630WCZ8,1.09
62630WBX4,36731.90
62630WCQ8,1.07
62630WCZ8,1.07
76252PGB9,1.07
62630WBN6,1.07
62630WBA4,1.07
Related
I have a list of Accession numbers that I want to pair randomly using a Perl script below:
#!/usr/bin/perl -w
use List::Util qw(shuffle);
my $file = 'randomseq_acc.txt';
my #identifiers = map { (split /\n/)[1] } <$file>;
chomp #identifiers;
#Shuffle them and put in a hash
#identifiers = shuffle #identifiers;
my %pairs = (#identifiers);
#print the pairs
for (keys %pairs) {
print "$_ and $pairs{$_} are partners\n";
but keep getting errors.
The accession numbers in the file randomseq_acc.txt are:
1094711
1586007
2XFX_C
Q27031.2
P22497.2
Q9TVU5.1
Q4N4N8.1
P28547.2
P15711.1
AAC46910.1
AAA98602.1
AAA98601.1
AAA98600.1
EAN33235.2
EAN34465.1
EAN34464.1
EAN34463.1
EAN34462.1
EAN34461.1
EAN34460.1
I needed to add the closing right curly brace to be able to compile the script.
As arrays are indexed from 0, (split /\n/)[1] returns the second field, i.e. what follows newline on each line (i.e. nothing). Change it to [0] to make it work:
my #identifiers = map { (split /\n/)[0] } <$file>; # Still wrong.
The diamond operator needs a file handle, not a file name. Use open to associate the two:
open my $FH, '<', $file or die $!;
my #identifiers = map { (split /\n/)[0] } <$FH>;
Using split to remove a newline is not common. I'd probably use something else:
map { /(.*)/ } <$FH>
# or
map { chomp; $_ } <$FH>
# or, thanks to ikegami
chomp(my #identifiers = <$FH>);
So, the final result would be something like the following:
#!/usr/bin/perl
use warnings;
use strict;
use List::Util qw(shuffle);
my $filename = '...';
open my $FH, '<', $filename or die $!;
chomp(my #identifiers = <$FH>);
my %pairs = shuffle(#identifiers);
print "$_ and $pairs{$_} are partners\n" for keys %pairs;
I am using the File::Grep module. I have following example:
#!/usr/bin/perl
use strict;
use warnings;
use File::Grep qw( fgrep fmap fdo );
my #matches = fgrep { 1.1.1 } glob "file.csv";
foreach my $str (#matches) {
print "$str\n";
}
But when I try to print $str value it gives me HEX value: GLOB(0xac2e78)
What's wrong with this code?
The documentation doesn't seem to be accurate, but judging from the source-code — http://cpansearch.perl.org/src/MNEYLON/File-Grep-0.02/Grep.pm — the list you get back from fgrep contains one element per file. Each element is a hash of the form
{
filename => $filename,
count => $num_matches_in_that_file,
matches => {
$line_number => $line,
...
}
}
I think it would be simpler to skip fgrep and its complicated return-value that has way more information than you want, in favor of fdo, which lets you just iterate over all lines of a file and do what you want:
fdo { my ( $file, $pos, $line ) = #_;
print $line if $line =~ m/1\.1\.1/;
} 'file.csv';
(Note that I removed the glob, by the way. There's not much point in writing glob "file.csv", since only one file can match that globstring.)
or even just dispense with this module and write:
{
open my $fh, '<', 'file.csv';
while (<$fh>) {
print if m/1\.1\.1/;
}
}
I assume you want to see all the lines in file.csv that contain 1.1.1?
The documentation for File::Grep isn't up to date, but this program will put into #lines all the matching lines from all the files (if there were more than one).
use strict;
use warnings;
use File::Grep qw/ fgrep /;
$File::Grep::SILENT = 0;
my #matches = fgrep { /1\.1\.1/ } 'file.csv';
my #lines = map {
my $matches = $_->{matches};
#{$matches}{ sort { $a <=> $b } keys %$matches};
} #matches;
print for #lines;
Update
The most Perlish way to do this is like so
use strict;
use warnings;
open my $fh, '<', 'file.csv' or die $!;
while (<$fh>) {
print if /1\.1\.1/;
}
My first file looks like:
CHR id position
1 rs58108140 10583
1 rs189107123 10611
1 rs180734498 13302
1 rs144762171 13327
1 chr1:13957:D 13957
And my second file looks like:
CHR SNP POS RiskAl OTHER_ALLELE RAF logOR Pval
10 rs1999138 110140096 T C 0.449034245446375 0.0924443 1.09e-06
6 rs7741604 20839503 C A 0.138318264238111 0.127947 1.1e-06
8 rs1486006 82553172 G C 0.833130882716561 0.147456 1.12727730194884e-06
My script reads in the first file and stores it in an array, and then I would like to find rsIDs from column 2 of the first file that are in column 2 in the second file. I think I am having a problem with how I'm matching the expressions. Here is my script:
#! perl -w
use strict;
use warnings;
my $F = shift #ARGV;
my #snps;
open IN, "$F";
while (<IN>) {
next if m/CHR/;
my #L = split;
push #snps, [$L[0], $L[1], $L[2]] if $L[0] !~ m/[XY]/;
}
close IN;
open IN, "DIAGRAMv3sansWTCCCqc0clumpd_noTCF7L2regOrLeadOrPlt1em6clumps- CHR_SNP_POS_RiskAl_OtherAl_RAF_logOR_Pval.txt";
while (<IN>) {
my #L = split;
next if m/CHR/;
foreach (#snps) {
next if ($L[0] != ${$_}[0]);
# if not on same chromosome
if ($L[0] = ${$_}[0]) {
# if on same chromosome
if ($L[1] =~ ${$_}[1]) {
print "$L[0] $L[1] ${$_}[2]\n";
last;
}
}
}
}
Your code doesn't seem to correspond to your description. You are comparing both the first and second columns of the file rather than just the second.
The main problems are:
You use $L[0] = ${$_}[0] to compare the first columns. This will do an assigmment instead of a comparison. You should use $L[0] == ${$_}[0] instead or, better, $L[0] == $_->[0]
You use $L[1] =~ ${$_}[1] to compare the second columns. This will check whether ${$_}[1] is a substring of $L[1]. You could use anchors like $L[1] =~ /^${$_}[1]$/ but it's much better to just do a string comparison as $L[1] eq $_->[1]
The easiest way is to read the second file first so as to build a list of values that you want included from the first file. I have written it so that it does what your code looks like it's supposed to do, i.e. match the first two columns.
That would look like this
use strict;
use warnings;
use autodie;
my ($f1, $f2) = #_;
my %include;
open my $fh2, '<', $f2;
while (<$fh2>) {
my #fields = split;
my $key = join '|', #fields[0,1];
++$include{$key};
}
close $fh2;
open my $fh1, '<', $f1;
while (<$fh1>) {
my #fields = split;
my $key = join '|', #fields[0,1];
print "#fields[0,1,2]\n" if $include{$key};
}
close $fh1;
output
Unfortunately your choice of sample data doesn't include any records in the first file that have matching keys in the second, so there is no output!
Update
This is a corrected version of your own program. It should work, but it is far more efficient and concise to use hashes, as above
use strict;
use warnings;
use autodie;
my ($filename) = #ARGV;
my #snps;
open my $in_fh, '<', $filename;
<$in_fh>; # Discard header line
while (<$in_fh>) {
my #fields = split;
push #snps, \#fields unless $fields[0] =~ /[XY]/;
}
close $in_fh;
open $in_fh, '<', 'DIAGRAMv3sansWTCCCqc0clumpd_noTCF7L2regOrLeadOrPlt1em6clumps- CHR_SNP_POS_RiskAl_OtherAl_RAF_logOR_Pval.txt';
<$in_fh>; # Discard header line
while (<$in_fh>) {
my #fields = split;
for my $snp (#snps) {
next unless $fields[0] == $snp->[0] and $fields[1] eq $snp->[1];
print "$fields[0] $fields[1] $snp->[2]\n";
last;
}
}
close $in_fh;
I am facing issues with perl chomp function.
I have a test.csv as below:
col1,col2
vm1,fd1
vm2,fd2
vm3,fd3
vm4,fd4
I want to print the 2nd field of this csv. This is my code:
#!/usr/bin/perl -w
use strict;
my $file = "test.csv";
open (my $FH, '<', $file);
my #array = (<$FH>);
close $FH;
foreach (#array)
{
my #row = split (/,/,$_);
my $var = chomp ($row[1]); ### <<< this is the problem
print $var;
}
The output of aboe code is :
11111
I really don't know where the "1" is comming from. Actually, the last filed can be printed as below:
foreach (#array)
{
my #row = split (/,/,$_);
print $row[1]; ### << Note that I am not printing "\n"
}
the output is:
vm_cluster
fd1
fd2
fd3
fd4
Now, i am using these field values as an input to the DB and the DB INSERT statement is failing due this invisible newline. So I thought chomp would help me here. instead of chomping, it gives me "11111".
Could you help me understand what am i doing wrong here.
Thanks.
Adding more information after reading loldop's responce:
If I write as below, then it will not print anything (not even the "11111" output mentioned above)
foreach (#array)
{
my #row = split (/,/,$_);
chomp ($row[1]);
my $var = $row[1];
print $var;
}
Meaning, chomp is removing the last string and the trailing new line.
The reason you see only a string of 1s is that you are printing the value of $val which is the value returned from chomp. chomp doesn't return the trimmed string, it modifies its parameter in-place and returns the number of characters removed from the end. Since it always removes exactly one "\n" character you get a 1 output for each element of the array.
You really should use warnings instead of the -w command-line option, and there is no reason here to read the entire file into an array. But well done on using a lexical filehandle with the three-parameter form of open.
Here is a quick refactoring of your program that will do what you want.
#!/usr/bin/perl
use strict;
use warnings;
my $file = 'test.csv';
open my $FH, '<', $file or die qq(Unable to open "$file": $!);
while (<$FH>) {
chomp;
my #row = split /,/;
print $row[1], "\n";
}
although, it is my fault at the beginning.
chomp function return 1 <- result of usage this function.
also, you can find this bad example below. but it will works, if you use numbers.
sometimes i use this cheat (don't do that! it is my bad-hack code!)
map{/filter/ && $_;}#all_to_filter;
instead of this, use
grep{/filter/}#all_to_filter;
foreach (#array)
{
my #row = split (/,/,$_);
my $var = chomp ($row[1]) * $row[1]; ### this is bad code!
print $var;
}
foreach (#array)
{
my #row = split (/,/,$_);
chomp ($row[1]);
my $var = $row[1];
print $var;
}
If you simply want to get rid of new lines you can use a regex:
my $var = $row[1];
$var=~s/\n//g;
So, I was quite frustrated with this easy looking task bugging me for the whole day long. I really appreciate everyone who responded.
Finaly I ended up using Text::CSV perl module and then calling each of the CSV field as array reference. There was no need left to run the chomp after using Text::CSV.
Here is the code:
#!/usr/bin/perl
use warnings;
use strict;
use Text::CSV;
my $csv = Text::CSV->new ( { binary => 1 } ) # should set binary attribute.
or die "Cannot use CSV: ".Text::CSV->error_diag ();
open my $fh, "<:encoding(utf8)", "vm.csv" or die "vm.csv: $!";
<$fh>; ## this is to remove the column headers.
while ( my $row = $csv->getline ($fh) )
{
print $row->[1];
}
and here is hte output:
fd1fd2fd3fd4
Later i was pulled these individual values and inserted into the DB.
Thanks everyone.
I quickly jotted off a Perl script that would average a few files with just columns of numbers. It involves reading from an array of filehandles. Here is the script:
#!/usr/local/bin/perl
use strict;
use warnings;
use Symbol;
die "Usage: $0 file1 [file2 ...]\n" unless scalar(#ARGV);
my #fhs;
foreach(#ARGV){
my $fh = gensym;
open $fh, $_ or die "Unable to open \"$_\"";
push(#fhs, $fh);
}
while (scalar(#fhs)){
my ($result, $n, $a, $i) = (0,0,0,0);
while ($i <= $#fhs){
if ($a = <$fhs[$i]>){
$result += $a;
$n++;
$i++;
}
else{
$fhs[$i]->close;
splice(#fhs,$i,1);
}
}
if ($n){ print $result/$n . "\n"; }
}
This doesn't work. If I debug the script, after I initialize #fhs it looks like this:
DB<1> x #fhs
0 GLOB(0x10443d80)
-> *Symbol::GEN0
FileHandle({*Symbol::GEN0}) => fileno(6)
1 GLOB(0x10443e60)
-> *Symbol::GEN1
FileHandle({*Symbol::GEN1}) => fileno(7)
So far, so good. But it fails at the part where I try to read from the file:
DB<3> x $fhs[$i]
0 GLOB(0x10443d80)
-> *Symbol::GEN0
FileHandle({*Symbol::GEN0}) => fileno(6)
DB<4> x $a
0 'GLOB(0x10443d80)'
$a is filled with this string rather than something read from the glob. What have I done wrong?
You can only use a simple scalar variable inside <> to read from a filehandle. <$foo> works. <$foo[0]> does not read from a filehandle; it's actually equivalent to glob($foo[0]). You'll have to use the readline builtin, a temporary variable, or use IO::File and OO notation.
$text = readline($foo[0]);
# or
my $fh = $foo[0]; $text = <$fh>;
# or
$text = $foo[0]->getline; # If using IO::File
If you weren't deleting elements from the array inside the loop, you could easily use a temporary variable by changing your while loop to a foreach loop.
Personally, I think using gensym to create filehandles is an ugly hack. You should either use IO::File, or pass an undefined variable to open (which requires at least Perl 5.6.0, but that's almost 10 years old now). (Just say my $fh; instead of my $fh = gensym;, and Perl will automatically create a new filehandle and store it in $fh when you call open.)
If you are willing to use a bit of magic, you can do this very simply:
use strict;
use warnings;
die "Usage: $0 file1 [file2 ...]\n" unless #ARGV;
my $sum = 0;
# The current filehandle is aliased to ARGV
while (<>) {
$sum += $_;
}
continue {
# We have finished a file:
if( eof ARGV ) {
# $. is the current line number.
print $sum/$. , "\n" if $.;
$sum = 0;
# Closing ARGV resets $. because ARGV is
# implicitly reopened for the next file.
close ARGV;
}
}
Unless you are using a very old perl, the messing about with gensym is not necessary. IIRC, perl 5.6 and newer are happy with normal lexical handles: open my $fh, '<', 'foo';
I have trouble understanding your logic. Do you want to read several files, which just contains numbers (one number per line) and print its average?
use strict;
use warnings;
my #fh;
foreach my $f (#ARGV) {
open(my $fh, '<', $f) or die "Cannot open $f: $!";
push #fh, $fh;
}
foreach my $fh (#fh) {
my ($sum, $n) = (0, 0);
while (<$fh>) {
$sum += $_;
$n++;
}
print "$sum / $n: ", $sum / $n, "\n" if $n;
}
Seems like a for loop would work better for you, where you could actually use the standard read (iteration) operator.
for my $fh ( #fhs ) {
while ( defined( my $line = <$fh> )) {
# since we're reading integers we test for *defined*
# so we don't close the file on '0'
#...
}
close $fh;
}
It doesn't look like you want to shortcut the loop at all. Therefore, while seems to be the wrong loop idiom.