Add new hash keys and then print in a new file - perl

Previously, I post a question to search for an answer to using regex to match specifics sequence identification (ID).
Now I´m looking for some recommendations to print the data that I looking for.
If you want to see the complete file, here's a GitHub link.
This script takes two files to work. The first file is something like this (this is only a part of the file):
AGY29650_2_NA netOGlyc-4.0.0.13 CARBOHYD 2 2 0.0804934 . .
AGY29650_2_NA netOGlyc-4.0.0.13 CARBOHYD 4 4 0.0925522 . .
AGY29650_2_NA netOGlyc-4.0.0.13 CARBOHYD 13 13 0.0250116 . .
AGY29650_2_NA netOGlyc-4.0.0.13 CARBOHYD 23 23 0.565981 . .
...
This file tells me when there is a value >= 0.5, this information is in the sixth column. When this happens my script takes the first column (this is an ID, to match in with the second file) and the fourth column (this is a position of a letter in the second file).
Here my second file (this is only a part):
>AGY29650.2|NA spike protein
MTYSVFPLMCLLTFIGANAKIVTLPGNDA...EEYDLEPHKIHVH*
Like I said previously, the script takes the ID in the first file to match with the ID in the second file when these are the same and then searches for the position (fourth column) in the contents of the data.
Here an example, in file one the fourth row is a positive value (>=0.5) and the position in the fourth column is 23.
Then the script searches for position 23 in the data contents of the second file, here position 23 is a letter T:
MTYSVFPLMCLLTFIGANAKIV T LP
When the script match with the letter, the looking for 2 letters right and 2 letters left to the position of interest:
IVTLP
In the previous post, thank the help of some people in Stack I could solve the problem because of a difference between ID in each file (difference like this: AGY29650_2_NA (file one) and AGY29650.2 (file two)).
Now I looking for help to obtain the output that I need to complete the script.
The script is incomplete because I couldn't found the way to print the output of interest, in this case, the 5 letters in the second file (one letter of the position that appears in file one) 2 letters right, and 2 left.
I have thousands of files like the one and two, now I need some help to complete the script with any idea that you recommend.
Here is the script:
use strict;
use warnings;
use Bio::SeqIO;
​
my $file = $ARGV[0];
my $in = $ARGV[1];
my %fastadata = ();
my #array_residues = ();
my $seqio_obj = Bio::SeqIO->new(-file => $in,
-format => "fasta" );
while (my $seq_obj = $seqio_obj->next_seq ) {
my $dd = $seq_obj->id;
my $ss = $seq_obj->seq;
###my $ee = $seq_obj->desc;
$fastadata{$dd} = "$ss";
}
​
my $thres = 0.5; ### Selection of values in column N°5 with the following condition: >=0.5
​
# Open file
open (F, $file) or die; ### open the file or end the analyze
while(my $one = <F>) {### readline => F
$one =~ s/\n//g;
$one =~ s/\r//g;
my #cols = split(/\s+/, $one); ### split columns
next unless (scalar (#cols) == 7); ### the line must have 7 columns to add to the array
my $val = $cols[5];
​
if ($val >= 0.5) {
my $position = $cols[3];
my $id_list = $cols[0];
$id_list =~ s/^\s*([^_]+)_([0-9]+)_([a-zA-Z0-9]+)/$1.$2|$3/;
if (exists($fastadata{$id_list})) {
my $new_seq = $fastadata{$id_list};
my $subresidues = substr($new_seq, $position -3, 6);
}
}
}
close F;
I´m thinking in add a push function to generate the new data and then print in a new file.
My expected output is to print the position of a positive value (>=0.5), in this case, T (position 23) and the 2 letters right and 2 letters left.
In this case, with the data example in GitHub (link above) the expected output is:
IVTLP
Any recommendation or help is welcome.
Thank!

Main problem seems to be that the line has 8 columns not 7 as assumed in the script. Another small issue is that the extracted substring should have 5 characters not 6 as assumed by the script. Here is a modified version of the loop that works for me:
open (F, $file) or die; ### open the file or end the analyze
while(my $one = <F>) {### readline => F
chomp $one;
my #cols = split(/\s+/, $one); ### split columns
next unless (scalar #cols) == 8; ### the line must have 8 columns to add to the array
my $val = $cols[5];
if ($val >= 0.5) {
my $position = $cols[3];
my $id_list = $cols[0];
$id_list =~ s/^\s*([^_]+)_([0-9]+)_([a-zA-Z0-9]+)/$1.$2|$3/;
if (exists($fastadata{$id_list})) {
my $new_seq = $fastadata{$id_list};
my $subresidues = substr($new_seq, $position -3, 5);
print $subresidues, "\n";
}
}
}

Related

Regular expression to print a string from a command outpout

I have written a function that uses regex and prints the required string from a command output.
The script works as expected. But it's does not support a dynamic output. currently, I use regex for "icmp" and "ok" and print the values. Now, type , destination and return code could change. There is a high chance that command doesn't return an output at all. How do I handle such scenarios ?
sub check_summary{
my ($self) = #_;
my $type = 0;
my $return_type = 0;
my $ipsla = $self->{'ssh_obj'}->exec('show ip sla');
foreach my $line( $ipsla) {
if ( $line =~ m/(icmp)/ ) {
$type = $1;
}
if ( $line =~ m/(OK)/ ) {
$return_type = $1;
}
}
INFO ($type,$return_type);
}
command Ouptut :
PSLAs Latest Operation Summary
Codes: * active, ^ inactive, ~ pending
ID Type Destination Stats Return Last
(ms) Code Run
-----------------------------------------------------------------------
*1 icmp 192.168.25.14 RTT=1 OK 1 second ago
Updated to some clarifications -- we need only the last line
As if often the case, you don't need a regex to parse the output as shown. You have space-separated fields and can just split the line and pick the elements you need.
We are told that the line of interest is the last line of the command output. Then we don't need the loop but can take the last element of the array with lines. It is still unclear how $ipsla contains the output -- as a multi-line string or perhaps as an arrayref. Since it is output of a command I'll treat it as a multi-line string, akin to what qx returns. Then, instead of the foreach loop
my #lines = split '\n', $ipsla; # if $ipsla is a multi-line string
# my #lines = #$ipsla; # if $ipsla is an arrayref
pop #lines while $line[-1] !~ /\S/; # remove possible empty lines at end
my ($type, $return_type) = (split ' ', $lines[-1])[1,4];
Here are some comments on the code. Let me know if more is needed.
We can see in the shown output that the fields up to what we need have no spaces. So we can split the last line on white space, by split ' ', $lines[-1], and take the 2nd and 5th element (indices 1 and 4), by ( ... )[1,4]. These are our two needed values and we assign them.
Just in case the output ends with empty lines we first remove them, by doing pop #lines as long as the last line has no non-space characters, while $lines[-1] !~ /\S/. That is the same as
while ( $lines[-1] !~ /\S/ ) { pop #lines }
Original version, edited for clarifications. It is also a valid way to do what is needed.
I assume that data starts after the line with only dashes. Set a flag once that line is reached, process the line(s) if the flag is set. Given the rest of your code, the loop
my $data_start;
foreach (#lines)
{
if (not $data_start) {
$data_start = 1 if /^\s* -+ \s*$/x; # only dashes and optional spaces
}
else {
my ($type, $return_type) = (split)[1,4];
print "type: $type, return code: $return_type\n";
}
}
This is a sketch until clarifications come. It also assumes that there are more lines than one.
I'm not sure of all possibilities of output from that command so my regular expression may need tweaking.
I assume the goal is to get the values of all columns in variables. I opted to store values in a hash using the column names as the hash keys. I printed the results for debugging / demonstration purposes.
use strict;
use warnings;
sub check_summary {
my ($self) = #_;
my %results = map { ($_,undef) } qw(Code ID Type Destination Stats Return_Code Last_Run); # Put results in hash, use column names for keys, set values to undef.
my $ipsla = $self->{ssh_obj}->exec('show ip sla');
foreach my $line (#$ipsla) {
chomp $line; # Remove newlines from last field
if($line =~ /^([*^~])([0-9]+)\s+([a-z]+)\s+([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+)\s+([[:alnum:]=]+)\s+([A-Z]+)\s+([^\s].*)$/) {
$results{Code} = $1; # Code prefixing ID
$results{ID} = $2;
$results{Type} = $3;
$results{Destination} = $4;
$results{Stats} = $5;
$results{Return_Code} = $6;
$results{Last_Run} = $7;
}
}
# Testing
use Data::Dumper;
print Dumper(\%results);
}
# Demonstrate
check_summary();
# Commented for testing
#INFO ($type,$return_type);
Worked on the submitted test line.
EDIT:
Regular expressions allow you to specify patterns instead of the exact text you are attempting to match. This is powerful but complicated at times. You need to read the Perl Regular Expression documentation to really learn them.
Perl regular expressions also allow you to capture the matched text. This can be done multiple times in a single pattern which is how we were able to capture all the columns with one expression. The matches go into numbered variables...
$1
$2

Remove duplicate lines on file by substring - preserve order (PERL)

i m trying to write a perl script to deal with some 3+ gb text files, that are structured like :
1212123x534534534534xx4545454x232322xx
0901001x876879878787xx0909918x212245xx
1212123x534534534534xx4545454x232323xx
1212133x534534534534xx4549454x232322xx
4352342xx23232xxx345545x45454x23232xxx
I want to perform two operations :
Count the number of delimiters per line and compare it to a static number (ie 5), those lines that exceed said number should be output to a file.control.
Remove duplicates on the file by substring($line, 0, 7) - first 7 numbers, but i want to preserve order. I want the output of that in a file.output.
I have coded this in simple shell script (just bash), but it took too long to process, the same script calling on perl one liners was quicker, but i m interested in a way to do this purely in perl.
The code i have so far is :
open $file_hndl_ot_control, '>', $FILE_OT_CONTROL;
open $file_hndl_ot_out, '>', $FILE_OT_OUTPUT;
# INPUT.
open $file_hndl_in, '<', $FILE_IN;
while ($line_in = <$file_hndl_in>)
{
# Calculate n. of delimiters
my $delim_cur_line = $line_in =~ y/"$delimiter"//;
# print "$commas \n"
if ( $delim_cur_line != $delim_amnt_per_line )
{
print {$file_hndl_ot_control} "$line_in";
}
# Remove duplicates by substr(0,7) maintain order
my substr_in = substr $line_in, 0, 11;
print if not $lines{$substr_in}++;
}
And i want the file.output file to look like
1212123x534534534534xx4545454x232322xx
0901001x876879878787xx0909918x212245xx
1212133x534534534534xx4549454x232322xx
4352342xx23232xxx345545x45454x23232xxx
and the file.control file to look like :
(assuming delimiter control number is 6)
4352342xx23232xxx345545x45454x23232xxx
Could someone assist me? Thank you.
Posting edits : Tried code
my %seen;
my $delimiter = 'x';
my $delim_amnt_per_line = 5;
open(my $fh1, ">>", "outputcontrol.txt");
open(my $fh2, ">>", "outputoutput.txt");
while ( <> ) {
my $count = ($_ =~ y/x//);
print "$count \n";
# print $_;
if ( $count != $delim_amnt_per_line )
{
print fh1 $_;
}
my ($prefix) = substr $_, 0, 7;
next if $seen{$prefix}++;
print fh2;
}
I dont know if i m supposed to post new code in here. But i tried the above, based on your example. What baffles me (i m still very new in perl) is that it doesnt output to either filehandle, but if i redirected from the command line just as you said, it worked perfect. The problem is that i need to output into 2 different files.
It looks like entries with the same seven-character prefix may appear anywhere in the file, so it's necessary to use a hash to keep track of which ones have already been encountered. With a 3GB text file this may result in your perl process running out of memory, in which case a different approach is necessary. Please give this a try and see if it comes in under the bar
The tr/// operator (the same as y///) doesn't accept variables for its character list, so I've used eval to create a subroutine delimiters() that will count the number of occurrences of $delimiter in $_
It's usually easiest to pass the input file as a parameter on the command line, and redirect the output as necessary. That way you can run your program on different files without editing the source, and that's how I've written this program. You should run it as
$ perl filter.pl my_input.file > my_output.file
use strict;
use warnings 'all';
my %seen;
my $delimiter = 'x';
my $delim_amnt_per_line = 5;
eval "sub delimiters { tr/$delimiter// }";
while ( <> ) {
next if delimiters() == $delim_amnt_per_line;
my ($prefix) = substr $_, 0, 7;
next if $seen{$prefix}++;
print;
}
output
1212123x534534534534xx4545454x232322xx
0901001x876879878787xx0909918x212245xx
1212133x534534534534xx4549454x232322xx
4352342xx23232xxx345545x45454x23232xxx

How to find number of numerical data for each and every line in a file

Please help me to count the numerical data in each line of a file,
and also to find the line length. The code has to written in Perl.
For example if I have a line such as:
INPUT:I was born on 24th october,1994.
Output:2
You could do something like this:
perl -ne 'BEGIN{my $x} $x += () = /[0-9]+/g; END{print($x . "\n")}' file
-n: causes Perl to assume the following loop around your program, which makes it iterate over filename arguments somewhat like sed -n or awk:
LINE:
while (<>) {
... # your program goes here
}
-e: may be used to enter one line of program;
() will make /[0-9]+/g be evaluated in list context (i.e. () = /[0-9]+/g will return an array containing the sequences of one or more digits found in the default input), while $x += will make the result be evaluated again in scalar context (i.e. $x += () = /[0-9]+/g will add the number of sequences of one or more digits found in the default input to $x); END{print($x . "\n") will print $x after the whole file has been processed.
% cat file
string 123 string 1 string string string
456 string
% perl -ne 'BEGIN{my $x} $x += () = /[0-9]+/g; END{print($x . "\n")}' file
3
%
I'd do something like this
#!/usr/bin/perl
use warnings;
use strict;
my $file = 'num.txt';
open my $fh, '<', $file or die "Failed to open $file: $!\n";
while (my $line = <$fh>){
chomp $line;
my #num = $line =~ /([0-9.]+)/g;
print "On this line --- " .scalar(#num) . "\n";
}
close ($fh);
The input file I tested --
This should say 1
Line 2 should say 2
I want this line to say 5 so I have added 4 other numbers like 0.02 -1 and 5.23
The output as tested ----
On this line --- 1
On this line --- 2
On this line --- 5
Using the regex match ([0-9.]+) will match ANY number and include any decimals (I guess really you could use just ([0-9]+) since you are only counting them and not using the actually number represented.)
Hope it helps.

Aligning file output with "\t"

I have an assignment that requires me to print out some sorted lists and delimit the fields by '\t'. I've finished the assignment but I cannot seem to get all the fields to line up with just the tab character. Some of the output is below, names that are over a certain length break the fields. How can I still use '\t' and get everything aligned by only that much space?
open(DOB, ">dob.txt") || die "cannot open $!";
# Output name and DOB, sorted by month
foreach my $key (sort {$month{$a} <=> $month{$b}} keys %month)
{
my #fullName = split(/ /, $namelist{$key});
print DOB "$fullName[1], $fullName[0]\t$doblist{$key}\n";
}
close(DOB);
Current output:
Santiago, Jose 1/5/58
Pinhead, Zippy 1/1/67
Neal, Jesse 2/3/36
Gutierrez, Paco 2/28/53
Sailor, Popeye 3/19/35
Corder, Norma 3/28/45
Kirstin, Lesley 4/22/62
Fardbarkle, Fred 4/12/23
You need to know how many spaces are equivalent to a tab. Then you can work out how many tabs are covered by each entry.
If tabs take 4 spaces then the following code works:
$TAB_SPACE = 4;
$NUM_TABS = 4;
foreach my $key (sort {$month{$a} <=> $month{$b}} keys %month) {
my #fullName = split(/ /, $namelist{$key});
my $name = "$fullName[1], $fullName[0]";
# This rounds down, but that just means you need a partial tab
my $covered_tabs = int(length($name) / $TAB_SPACE);
print $name . ("\t" x ($NUM_TABS - $covered_tabs)) . $doblist{$key}\n";
}
You need to know how many tabs to pad out to, but you could work that out in a very similar way to actually printing the lines.

calculating velocity from massive simulation data

I have simulation data for the velocity of water molecules. The format of the data is as below. I would like to describe the format of the data for clarity purposes, and it easily would lead to what I want to calculate.
A water molecule is made of three atoms: Oxygen(O) and two Hydrogen (H). Here I would name them O, H1, and H2.
The data below starts with line title 0 and the number 4335, saying it contains 4335 atoms (4335/3 = 1445 water molecules).
The first three numbers starting from the third row ( 0.0923365 0.0341984 -0.1248516 ) representing velocity for oxygen (O) atom at three Cartesian directions Ox, Oy, Oz. The next three numbers, in the same row representing velocities for hydrogen (H1) ==> H1x, H1y, H1z. And finally the first three numbers in fourth row representing velocities for hydrogen (H2) ==> H2x,H2y,H2z. finally, the following three numbers in the same fourth row representing velocities for oxygen atom.
These sequence is goes on for all 4335 atoms in 2170 lines including the top two lines in the data file and it repeats for the following section starting from title 1.
title 0
4335 2.0001000e+04
0.0923365 0.0341984 -0.1248516 -0.8946258 1.6688854 0.8259304
0.2890579 0.8051153 -1.5612963 0.0625492 -0.1361579 0.2869132
0.2343408 -0.0665305 1.0745378 -0.8375892 0.6953992 0.5149021
-0.1628550 0.0131844 0.0688080 0.2429340 0.2168210 -0.0289806
-0.3677613 0.2054004 -0.1511643 -0.3487551 -0.1454157 0.0801884
-0.9039297 -0.0682939 -0.2337404 -0.5605327 -0.0369157 0.2243892
-0.3100274 -0.2673132 -0.2093299 0.1975043 -0.4572202 -0.8410826
-0.6995287 -0.4123909 0.0649209 -0.1910519 0.2289656 0.2443295
-0.0279093 0.5790939 -0.0104249 -1.1961776 -0.5387340 0.1445187
-0.3188485 0.3789352 -0.0112114 0.7831523 0.6043882 -0.7131590
-0.7214440 -0.5358508 -0.3035673 -0.1549275 -0.1402387 -0.0101964
-0.2027608 1.5107149 0.2963312 -1.5104872 -0.1554981 -1.3323215
0.1097982 -0.1553742 0.3803437 0.0816858 0.0265007 0.4215823
0.1157368 0.2100116 0.4712551 0.1799426 -0.1260255 -0.2131755
0.1811777 -0.9442581 -0.6036636 0.9681703 -0.1523646 -0.3502441
0.0976771 0.0019619 -0.1832204 -0.0055989 0.2701100 -0.4416720
0.8496723 0.4070951 -0.0819204 0.1156806 -0.1619873 -0.0016126
-0.4051959 0.4263505 -0.9460036 0.4412067 0.1002270 0.5864405
-0.3831136 0.3240860 -0.0005143 -0.5667163 0.2618876 0.0103317
-0.6442209 0.3965833 -0.0778050 -0.2404238 -0.1339887 -0.1662417
0.3421198 0.7480828 -1.8316993 -0.4454920 -0.0499657 -0.1951254
-0.2895359 -0.1934811 -0.2674928 0.1255802 1.3522828 -0.2829485
-0.4129106 -0.6842645 -1.0147657 -0.1278501 -0.0597648 -0.1478294
-0.2519974 0.0665314 -0.0690079 -0.0480210 -0.1179547 -0.2091919
-0.1942484 0.2583650 -0.0734658 -0.1216313 0.5158040 -0.0676843
-0.3063602 0.8148463 -0.1959571 -0.1009838 -0.3394633 -0.0866587
.
. (goes on until line 2170)
.
0.1028815 -0.0844088 -0.2156557 -0.1698745 -0.2018967 -0.3863209
0.1793070 -0.1005802 0.1800752 -0.1404713 0.2216020 0.2236271
0.5192825 -0.7398186 0.0418758 0.0347715 -0.3457840 -0.1300237
-0.3089482 1.1125441 -0.4020403 0.2739744 -0.9062766 0.0012294
0.1498538 0.0883857 -0.0094638 0.0963565 -1.1027019 0.0115313
-0.0432824 0.3330713 0.0304943
title 1
4335 2.0002000e+04
-0.2082078 0.1774843 -0.1023302 -0.1100437 0.5973607 1.0627041
-0.2216015 0.0448885 -0.8415924 0.1691296 0.6008261 -0.0373434
0.9387534 -0.3642305 0.6756270 -0.6000357 0.6632088 1.0567899
-0.3234407 -0.1781680 -0.1936070 -0.4799916 -0.1522612 -0.2347461
0.1045985 0.1999704 -0.1482928 -0.0439331 0.0413923 0.1605458
0.3403952 -0.2012104 0.4851457 -0.9665228 0.2202362 0.0046218
.
. (goes on until line 2170)
.
What I want to calculate is the resultant velocity for each molecule and I would like to do this using Perl. The algorithm goes in this way.
First store the velocities for oxygen (O) and hydrogens (H1 & H2) in Ox,Oy,Oz, H1x,H1y,H1z and H2x,H2y,H2z respectively.
Next we define:
velocity_x = Ox + Hx + Hx
velocity_y = Oy + Hy + Hy
velocity_z = Oz + Hz + Hz
Finally calculate
resultant_velocity = sqrt(velocity_x**2 + velocity_y**2 + velocity_z**2)
and store the "resultant_velocity" into new file (the file should be title_0.dat). And the program shall calculate the velocities starting from title 1 until title 200 in the file.
I am a newbie at Perl, but I would like to do this operation in Perl since I find that it is very interesting. I can write simple "read and write" operations in Perl but found no idea how to split and assign the values to the variables and carryout the calculation though the calculation is high school standard.
#!/usr/bin/perl -w
$data_file="malto.dat";
open(DAT, $data_file) || die("Could not open file!");
#raw_data=<DAT>;
close(DAT);
while(<#raw_data>){
#columns=split /\s+/,$_;
if($columns[0]=~ m/ATOM/){
print "$columns[5], $columns[6], $columns[7]\n";
}
}
I would like to get some guidance from experts so that I can enhance my understanding of Perl while working on the code.
Appreciate any help.
Regards
Perhaps the following will assist you:
use strict;
use warnings;
use Math::Complex;
my $dataFile = 'malto.dat';
{
local $/ = 'title ';
open my $fh, '<', $dataFile or die $!;
while (<$fh>) {
chomp;
my #data = split or next;
my $titleNum = 'Title ' . shift #data;
my $atom = shift(#data) . ' ' . shift #data;
my $resultantVel = calcResultantVel( \#data );
print $titleNum, "\n";
print $atom, "\n";
print 'ResultantVel: ' . $resultantVel, "\n\n";
}
close $fh;
}
sub calcResultantVel {
my ($dataRef) = #_;
my ($velocity_x, $velocity_y, $velocity_z);
while ( my #nums = splice( #$dataRef, 0, 9 ) ) {
$velocity_x += $nums[0] + $nums[3] + $nums[6];
$velocity_y += $nums[1] + $nums[4] + $nums[7];
$velocity_z += $nums[2] + $nums[5] + $nums[8];
}
return sqrt( $velocity_x**2 + $velocity_y**2 + $velocity_z**2 );
}
The word and space title is used as the record separator, so each read takes in a chunk of data that's delimited by title. The chomp removes the record separator, and then the record is split on whitespace.
The zeroth element is the title number, and that's shifted off #data. The first and second elements of #data are the atom count, and they're shifted off, too. The remaining array elements are the Cartesian directions, and a reference to that array is send to the subroutine calcResultantVel.
The subroutine takes a chunk of nine elements at a time: three for O atom, three for the first H atom, and three for the second H atom, and a running sum is kept based upon the definition you've provided. Finally, the resultant velocity is returned.
Here's some sample output:
Title 0
4335 2.0001000e+04
ResultantVel: 13.2945751170603
Title 1
4335 2.0001000e+04
ResultantVel: 12.7696611061461
You can visually verify that it's working correctly. Since you "...can write simple 'read and write' operations in Perl...," the next step is to have it write the desired results to a file.
Hope this helps!
Here's my advice: break the job down into small components, and write a method for each meaningful part of the work. To wit:
use strict;
use warnings;
main(#ARGV); # Pass data file name on command line. Don't hard-code it.
sub main {
my $data_f = shift;
open(my $data_h, '<', $data_f) or die "$!: $data_f";
while (my $section = get_section($data_h)){
# Also write methods that can be called here to make
# desired computations, print output, etc.
}
}
sub get_section {
# Takes a file handle.
# Returns a hash reference containing all of the data
# for an entire section of the file.
my $h = shift;
return if eof($h);
chomp (my $title = <$h>);
my ($n_atoms) = <$h> =~ /^(\d+)/;
return {
'title' => $title,
'n_atoms' => $n_atoms,
'molecules' => get_molecules($h, $n_atoms / 3),
};
}
sub get_molecules {
my #molecules;
return \#molecules;
}
I have not written the get_molecules() method. It takes a file handle and an integer (N of molecules). It could return a reference to an array-of-arrays or maybe an array-of-hashes, with each inner array/hash holding the info for a single molecule.
Thanks for your help and guide. I have tried to modify your code as below. It works at least for my need.
#!/usr/bin/perl
###############
#use strict;
#use warnings;
use Math::Complex;
open OUTPUT, '>', "velocityOnly.dat" or die "Can't create filehandle: $!";
my $dataFile = 'velF1F2.vel';
{
local $/ = 'title ';
open my $FH, '<', $dataFile or die $!;
while (<$FH>) {
chomp;
my #data = split or next;
my $titleNum = 'Title ' . shift(#data);
my $atom = shift(#data) . ' ' . shift(#data);
#my $resultantVel = calcResultantVel( \#data );
#print OUTPUT "$titleNum", "\n";
print "$titleNum", "\n";
for my $i (1..1445)
{
$j=(9*($i-1));
$velocity_x = $data[($j+0)] + $data[($j+3)] + $data[($j+6)];
$velocity_y = $data[($j+1)] + $data[($j+4)] + $data[($j+7)];
$velocity_z = $data[($j+2)] + $data[($j+5)] + $data[($j+8)];
$velo = sprintf '%.3f',sqrt( $velocity_x**2 + $velocity_y**2 + $velocity_z**2 );
chomp $velo;
print "$velo","\n";
print OUTPUT "$velo\n";
}
#print 'ResultantVel: ' . $resultantVel, "\n\n";
}
close $FH;
}
But I would like to extend further by adding some other functionality for doing some complex calculations. The code
Before that, I need some guide on making the below code into subroutine. I am bit lost here. Your CODE actually add all the X, Y and Z and finally find the velocity. But what I want is not that. Each 9 values subsequently represent coordinate for a water molecule which contain three atoms.
(The number 1445 is number of molecules. Each molecule contain three atoms and each atom has three coordinates.So for a water molecule has 9 Cartesian coordinates.)
the i here represent number of water molecule
for my $i (1..1445)
{
$j=(9*($i-1));
$velocity_x = $data[($j+0)] + $data[($j+3)] + $data[($j+6)];
$velocity_y = $data[($j+1)] + $data[($j+4)] + $data[($j+7)];
$velocity_z = $data[($j+2)] + $data[($j+5)] + $data[($j+8)];
$velo = sprintf '%.3f',sqrt( $velocity_x**2 + $velocity_y**2 + $velocity_z**2 );
chomp $velo;
print "$velo","\n";
print OUTPUT "$velo\n";
}