I would like to remove the first character from a the elements of an array in a Perl script.
I have this line of script:
#dash = split /\s+/, $dash;
The variable "dash" is read from a particular row of my file: Example
21 A10 A11 A12 A13 ..
Then I have tried to push these values to my hash called "flowers"
for $i (1..$#dash) {
push(#flowers, $line[$i]);
}
This seems to work for what I need in my subsequent lines of script but I have found out that $dash contains unwanted character in front of each values:
A10 A11 A12 A13 ..
instead of
10 11 12 13 .....
but I wanted #flowers to contain:
10 11 12 13 ....
How can I delete the first character Before I pushed it to my hash (#flowers)
chop(#flowers);
could have worked but it only chops out the last character. When I tried to use
substr($dash, 0, 2)
It does produce 10, but all the rest of the values A11 A12 A13 is no longer in my #flowers.
Any help is appreciated.
This will operate on each element of the #dash array :
#dash = split /\s+/, $dash;
shift #dash;
#dash = map { substr($_, 1) } #dash;
Your substr($dash, 0, 2) was operating on the line as one string, not each element of it.
And, unless you need the index for some other operation :
push #flowers, #dash
That will push all elements of #dash onto #flowers. Which looks like what you're doing.
Why not just change the regex in the split?
split /\s+\D?/, $dash;
Adding them to #flowers this way if you want:
push( #flowers, split(/\s+\D?/, $dash) );
You need some kind of loop, since you want to do something to each element of #dash other than the first. map is convenient here.
my #flowers = map substr($dash[$_], 1), 1..$#dash;
which is the short way of writing
my #flowers;
for (1..$#dash) {
push #flowers, substr($dash[$_], 1);
}
I suggest that you just pull out all the digit sequences from $dash, like this:
my $dash = '21 A10 A11 A12 A13 .. ';
my #flowers = $dash =~ /\d+/g;
shift #flowers;
print "#flowers";
output
10 11 12 13
This is a possible solution:
use strict;
use warnings;
my $dash = "21 A10 A11 A12 A13"; #test data
my #dash = split /\s+/, $dash; #split into #dash array
shift #dash; #delete first array value
$_ = substr($_,1) for #dash; #for each item in array, remove the first character
print "#dash\n"; #prints: 10 11 12 13
Related
I'm having some issues trying to combine two multi-row strings into one after performing regular expression manipulations on those strings. As an example, I start with data in this form:
TMS: xxxxxxx11110000
TDI: xxxxxxx00001111
TMS: xxxx00001111
TDI: xxxx11110000
To get it in the form I need, I search the file for the key word "TMS: ", extract just the data, use regular expressions to remove the "x's", reverse the data, and then place each bit on its own line and store it in a string. Resultant string would look like this:
0
0
0
0
1
1
1
1
I then search the file for "TDI: " and repeat that same process. The last step would be to concatenate the first string with the second string to get the following output (given the example above):
01
01
01
01
10
10
10
10
10
10
10
10
01
01
01
01
However, when I concatenate the two strings, what I'm getting as an output right now is
0
0
0
0
1
1
1
1
1
1
1
1
0
0
0
0
1
1
1
1
0
0
0
0
0
0
0
0
1
1
1
1
Is there a way to get the result I'm looking for without changing too much about my process? I've tried using the split command, chomp command, etc. without any luck.
Would be good to have a minimal example to see how you're approaching this problem. Additionally, there are a lot of things about your input file that are not clear. For example, are TMS and TDI always paired in the file, or do you have to check for that? Will you always take the next TDI instance to pair with the preceeding TMS event, or can they be more disjointed? Does TMS always preceed TDI or can they be reversed?
One simple way to do this assuming that the data look just like you've indicated in your example, might be to read each line and store the data in one array for the TMS string and one array for the TDI string. If both arrays are full, then we have a pair to output, so output the pair and clear the arrays for the next events. Otherwise, read the next line to get the TDI data:
#!/usr/bin/env perl
use strict;
use warnings;
my (#first, #second);
while (my $elem = <DATA>) {
($elem =~ /^TMS/)
? (#first = read_string($elem))
: (#second = read_string($elem));
if (#second) {
for my $index (0..$#first) {
print "$first[$index]$second[$index]\n";
}
print "\n";
#first = #second = ();
}
}
sub read_string {
my $string = shift;
my #bits = grep {/\d/} split('', $string);
return reverse(#bits);
}
__DATA__
TMS: xxxxxxx11110000
TDI: xxxxxxx00001111
TMS: xxxx00001111
TDI: xxxx11110000
Output from this would be:
01
01
01
01
10
10
10
10
10
10
10
10
01
01
01
01
What you want is a zip operation. Conveniently List::MoreUtils provides one for you.
#x = qw/a b c d/;
#y = qw/1 2 3 4/;
# returns [a, 1], [b, 2], [c, 3], [d, 4]
#z = zip6 #x, #y;
To get the input for zip either put your resultants into an array in the first place, or split your input string.
hobbs answer from Code Golf: Lasers was solving quite a different problem, but part of the solution was about how to "rotate" a multi-line string, and it could be useful here.
First, don't put each bit on its own line, just separate bits from different rows of input on different lines. Put the multi-line string into $_.
$_ = '0000111111110000
1111000000001111';
Now execute the following code:
$_ = do {
my $o;
$o .= "\n" while s/^./$o.=$&,""/meg;
$o };
(the substitution in hobbs's algorithm started with s/.$/.../. By using s/^./.../, it becomes an algorithm for transposition rather than for rotation)
Input:
$_ = '0000111111110000
1111000000001111';
Output:
01
01
01
01
10
10
10
10
10
10
10
10
01
01
01
01
This algorithm easily generalizes to any number of rows and columns in the input.
Input:
$_='ABCDE
12345
FGHIJ
67890';
Output:
A1F6
B2G7
C3H8
D4I9
E5J0
Input to my script is this file which contains data as below.
A food 75
B car 136
A car 69
A house 179
B food 75
C car 136
C food 85
For each distinct value of the second column, I want to print any line where the number in the third column is different.
Example output
C food 85
A car 69
Here is my Perl code.
#! /usr/local/bin/perl
use strict;
use warning;
my %data = ();
open FILE, '<', 'data.txt' or die $!;
while ( <FILE> ) {
chomp;
$data{$1} = $2 while /\s*(\S+),(\S+)/g;
}
close FILE;
print $_, '-', $data{$_}, $/ for keys %data;
I am able to print the hash keys and values, but not able to get the desired output.
Any pointers on how to do that using Perl?
As far as I can tell from your question, you want a list of all the lines where there is an "odd one out" with the same item type and a different number in the third column from all the rest
I think this is what you need
It reads all the data into hash %data, so that $data{$type}{$n} is a (reference to an) array of all the data lines that use that object type and number
Then the hash is scanned again, looking for and printing all instances that have only a single line with the given type/number and where there are other values for the same object type (otherwise it would be the only entry and not an "odd one out")
use strict;
use warnings 'all';
use autodie;
my %data;
open my $fh, '<', 'data.txt';
while ( <$fh> ) {
my ( $label, $type, $n) = split;
push #{ $data{$type}{$n} }, $_;
}
for my $type ( keys %data ) {
my $items = $data{$type};
next unless keys %$items > 1;
for my $n ( keys %$items ) {
print $items->{$n}[0] if #{ $items->{$n} } == 1;
}
}
output
C food 85
A car 69
Note that this may print multiple lines for a given object type if the input looks like, say
B car 22
A car 33
B car 136
C car 136
This has two "odd ones out" that appear only once for the given object type, so both B car 22 and A car 33 will be printed
Here are the pointers:
First, you need to remember lines somewhere before outputting them.
Second, you need to discard previously remembered line for the object according to the rules you set.
In your case, the rule is to discard when the number for the object differs from the previous remembered.
Both tasks can be accomplished with the hash.
For each line:
my ($letter, $object, $number)=split /\s+/, $line;
if (!defined($hash{$object}) || $hash{$object}[0]!=$number) {
$hash{$object}=[$number, $line];
}
Third, you need to output the hash:
for my $object(keys %hash) {
print $hash{$object}[1];
}
But there is the problem: a hash is an unordered structure, it won't return its keys in the order you put them into the hash.
So, the fourth: you need to add the ordering to your hash data, which can be accomplished like this:
$hash{$object}=[$number,$line,$.]; # $. is the row number over all the input files or STDIN, we use it for sorting
And in the output part you sort with the stored row number
(see sort for details about $a, $b variables):
for my $object(sort { $hash{$a}[2]<=>$hash{$b}[2] } keys %hash) {
print $hash{$object}[1];
}
Regarding the comments
I am certain that my code does not contain any errors.
If we look at the question before it was edited by some high rep users, it states:
[cite]
Now where if Numeric column(Third) has different value (Where in 2nd column matches) ...Then print only the mismatched number line. example..
A food 75
B car 136
A car 69
A house 179
B food 75
B car 136
C food 85
Example output (As number columns are not matching)
C food 85
[/cite]
I can only interpret that print only the mismatched number line as: to print the last line for the object where the number changed. That clearly matches the example the OP provided.
Even so, in my answer I addressed the possibility of misinterpretation, by stating that line omitting is done according to whatever rules the OP wants.
And below that I indicated what was the rule by that time in my opinion.
I think it well addressed the OP problem, because, after all, the OP wanted the pointers.
And now my answer is critiqued because it does not match the edited (long after and not by OP) requirements.
I disagree.
Regarding the whitespace: specifying /\s+/ for split is not an error here, despite of some comments trying to assert that.
While I agree that " " is common for split, I would disagree that there are a lot of cases where you must use " " instead of /\s+/.
/\s+/ is a regular expression which is the conventional argument for split, while " " is the shorthand, that actually masks the meaning.
With that I decided to use explicit split /\s+/, $line in my example instead of just split " ", $line or just split specifically to show the innerworkings of perl.
I think it is important to any one new to perl.
It is perfectly ok to use /\s+/, but be careful if you expect to have leading whitespace in your data, consult perldoc -f split and decide whether /\s+/ suits your needs or not.
I'm working on a perl assignment, that has three arrays - #array_A, #array_B and array_C with some values in it, I grep for a string "CAT" on array A and fetching its indices too
my #index = grep { $#array_A[$_] =~ 'CAT' } 0..$#array_A;
print "Index : #index\n";
Output: Index : 2 5
I have to take this as an input and check the value of other two arrays at indices 2 and 5 and print it to a file.
Trick is the position of the string - "CAT" varies. (Index might be 5 , 7 and 9)
I'm not quite getting the logic here , looking for some help with the logic.
Here's an overly verbose example of how to extract the values you want as to show what's happening, while hopefully leaving some room for you to have to further investigate. Note that it's idiomatic Perl to use regex delimiters when using =~. eg: $name =~ /steve/.
use warnings;
use strict;
my #a1 = qw(AT SAT CAT BAT MAT CAT SLAT);
my #a2 = qw(a b c d e f g);
my #a3 = qw(1 2 3 4 5 6 7);
# note the difference in the next line... no # symbol...
my #indexes = grep { $a1[$_] =~ /CAT/ } 0..$#a1;
for my $index (#indexes){
my $a2_value = $a2[$index];
my $a3_value = $a3[$index];
print "a1 index: $index\n" .
"a2 value: $a2_value\n" .
"a3 value: $a3_value\n" .
"\n";
}
Output:
a1 index: 2
a2 value: c
a3 value: 3
a1 index: 5
a2 value: f
a3 value: 6
I'm writing a script in which I'm using a text file, where in one column there can be two letters (A,B,C or D) seperated by a ",". This column can also just contain one of those letters. I have to use both letters for further calculations in the rest of the script. This is a simplified example of my input file (here $variants):
C1 C2 C3 C4 C5 C6 ... C9
text 2 A D values and text in the other columns
text 4 B C values and text in the other columns
text 5 A B,D values and text in the other columns
So in line 3 of C4 there is a B and D. After C4 there are still a lot of columns, which cannot be changed since I need them in other parts of my script.
I have a second input file from which, based on the letters present in C3 and C4, some values are extracted. This is how this second input file looks like (here $frequency)
C1 C2 A a B b C c D d
text 1 0 1 0 0 0 0 0 0
text 2 1 0 5 4 0 0 0 0
text 3 0 0 0 0 10 11 3 6
text 4 1 0 9 4 0 2 0 0
text 5 5 3 0 0 6 7 4 0
This is how my output should look like:
C1 C2 C3 C4 C5 C6 C7 C8 C9 C10
text 2 A D 1 0 0 0 empty
text 4 B C 9 4 0 2 empty
text 5 A B,D 5 3 0 0 4 0
So for line 1, there is A in C3, then the script extracts the values for A and a from $frequency and puts them in C5 and C6. The values from C4 are then put in C7 and C8 from the output file. Now in the 3rd line there is B,D in C4. So what the script needs to do now is putting the corresponding values from B and b in C7 and C8 and the values for D and d in C9 and C10.
The only thing where I have still problems in my script is in splitting up this C4 when there is a ','. The rest is working.
This is how the problematic part of my script looks like
while(<$variants>){
next if /^\s*#/;
next if /^\s*"/;
chomp;
my ($chr, $pos, $refall, #altall) = split /\t/; # How should I specify here the C4, as an array? So that I don't know
my #ref_data = #{$frequency_data[$pos]}{$refall, lc($refall)};
my #alt_data = #{$frequency_data[$pos]}{$altall, lc($altall)}; # this works for C3 ($refall), but not for C4 when there are two letters
$pos = $#genes if $circular and $pos > $#genes; # adding annotation # this can be ignored here, since this line isn't part of my question
print join("\t","$_ ", $genes[$pos] // q(), #ref_data, #alt_data), "\n"; # printing annotation
}
So could someone help me with splitting of this C4 by ',' and still use the information for extracting values from $variants
I think the easiest would be treating columns 3 and 4 as lists from the get-go:
while(<$variants>){
next if /^\s*#/;
next if /^\s*"/;
chomp;
my ($chr, $pos, $refall_string, $altall_string, #other) = split /\t/;
my #refall = split(",", $refall_string);
my #altall = split(",", $altall_string);
my #ref_data_all = (); # Treat C3 as array just in case...
foreach my $refall (#refall) {
push #ref_data_all, #{$frequency_data[$pos]}{ $refall, lc($refall) };
}
my #alt_data_all = ();
foreach my $altall (#altall) {
push #alt_data_all, #{$frequency_data[$pos]}{ $altall, lc($altall) };
}
$pos = $#genes if $circular and $pos > $#genes;
print join("\t","$_ ", $genes[$pos] // q(),
#ref_data_all, #alt_data_all), "\n";
}
I didn't test this but the approach should be clear even if there's some minor bugs.
All you need is a couple of map calls.
If you write
map { $_, lc } split /,/, $refall
then you have split the field at any commas and duplicated each letter as upper case and lower case.
This is the complete loop (tested).
while (<$variants>) {
next if /^\s*#/;
next if /^\s*"/;
chomp;
my ($chr, $pos, $refall, $altall) = split /\t/;
my $entry = $frequency_data[$pos];
my #ref_data = map { $entry->{$_} } map { $_, lc } split /,/, $refall;
my #alt_data = map { $entry->{$_} } map { $_, lc } split /,/, $altall;
$pos = $#genes if $circular and $pos > $#genes;
print join("\t","$_ ", $genes[$pos] // q(), #ref_data, #alt_data), "\n";
}
Column A | Column B | Column C | Column D
35627799100 8 8 2
35627788000 60 34 45
35627799200 10 21 21
35627780000 60 5 8
Basically I have a file as shown above and would like to add the contents of Column B i.e 8+60+10+60. To be frank I'm not sure if need to remove the first line being text and if I can use the split function and put it in a hash something along the lines:
my %hash = map {split/\s+/,$_,4} <$file>;
Thanks in advance for the help.
If you just want to sum up the second column, a hash is overkill. You can do something like this and calculate the sum directly in the map.
my $sum;
$sum += (split /\s+/, $_)[1] while <$file>;
Edit: If you have header rows or other rows with non-numeric values in column 2, then as the comments below indicate, you will run into problems. You can avoid this by trading split for a regular expression, like so:
my $sum = 0;
while (<STDIN>)
{
$sum += $1 if $_ =~ /^\S+\s+(\d+)/;
}
If it's possible that column 1 has no text (ie. the line starts with a single blank and the first non-blank represents the second column), then change the first part of the pattern from ^\S+ to ^\S*.
This is an example based on your data:
use strict;
use warnings;
my $sum_column_b = 0;
<DATA>; #drop header
while( my $line = <DATA>) {
$line =~ m/\s+(\d+)/; #regexpr to catch second column values
$sum_column_b += $1;
}
print $sum_column_b, "\n"; #<-- prints: 138
__DATA__
Column A | Column B | Column C | Column D
35627799100 8 8 2
35627788000 60 34 45
35627799200 10 21 21
35627780000 60 5 8