Rearrange print order of key-value data read from file in Perl - perl

My data looks like:
"latDD":33.732867,
"lonDD":-84.404525,
"callsign":"AAL1300 ",
"timeStamp":"2019-07-19T13:47:49.46Z",
"latDD":33.732867,
"lonDD":-84.404525,
"callsign":"AAL1300 ",
"timeStamp":"2019-07-19T13:47:50.186Z",
"latDD":33.781071,
"lonDD":-84.401736,
"callsign":"GT017 ",
"timeStamp":"2019-07-19T13:47:50.0Z",
"latDD":33.781071,
"lonDD":-84.401736,
"callsign":"GT017 ",
"timeStamp":"2019-07-19T13:47:50.0Z",
"latDD":33.732867,
"lonDD":-84.401664,
"callsign":"AAL1300 ",
I want to print callsign first, then print the latDD that is before callsign second, then print the lonDD that is before callsign third,
then the timeStamp that is after callsign fourth.
the finished data should look like:
"callsign":"AAL1300 ",
"latDD":33.732867,
"lonDD":-84.404525,
"timeStamp":"2019-07-19T13:47:49.46Z",
"callsign":"AAL1300 ",
"latDD":33.732867,
"lonDD":-84.404525,
"timeStamp":"2019-07-19T13:47:50.186Z",
"callsign":"GT017 ",
"latDD":33.781071,
"lonDD":-84.401736,
"timeStamp":"2019-07-19T13:47:50.0Z",
"callsign":"GT017 ",
"latDD":33.781071,
"lonDD":-84.401736,
"timeStamp":"2019-07-19T13:47:50.0Z",
I've tried shifting the print statements around in the code but since the data is in a specific order, it seems to only want to print the data out in the order that it is originally in.
use strict;
my $find3 = "latDD";
my $find4 = "lonDD";
my $find1 = '"callsign"';
my $find2 = "timeStamp";
open (NEW1, ">", "new1.txt" ) or die "could not open:$!";
open (FILE, "<", "test revam.txt") or die "could not open:$!";
while (<FILE>) {
print NEW1 if (/$find1/);
print NEW1 if (/$find2/);
print NEW1 if (/$find3/);
print NEW1 if (/$find4/);
}
close (FILE);
close (NEW1);

while (1) {
defined( my $line1 = <> ) or last;
defined( my $line2 = <> ) or die;
defined( my $line3 = <> ) or die;
defined( my $line4 = <> ) or die;
print $line3, $line1, $line2, $line4;
}

You need to keep track of one set of your data (one of each of line) and then when you have a last line in the set (in this case timestamp) print them all in the order you prefer.
#!/usr/bin/perl
use strict;
my %vals; # hash to hold a set of lines
# read data (I'm using __DATA__, you probably want a file or stdin)
for my $line (<DATA>) {
my ($key) = $line =~ /"(.+?)"/; # parse the key out of the line
$vals{$key} = $line; # save the line for this key
if ($key eq 'timeStamp') { # are we done with this set of lines?
printBlock(); # print them
%vals = (); # clear the hash for the next set of lines
}
}
printBlock();
sub printBlock {
print '-'x50 . "\n"; # OPTIONAL print a line between each set
print $vals{callsign};
print $vals{latDD};
print $vals{lonDD};
print $vals{timeStamp};
}
__DATA__
"latDD":33.732867,
"lonDD":-84.404525,
"callsign":"AAL1300 ",
"timeStamp":"2019-07-19T13:47:49.46Z",
"latDD":33.732867,
"lonDD":-84.404525,
"callsign":"AAL1300 ",
"timeStamp":"2019-07-19T13:47:50.186Z",
"latDD":33.781071,
"lonDD":-84.401736,
"callsign":"GT017 ",
"timeStamp":"2019-07-19T13:47:50.0Z",
"latDD":33.781071,
"lonDD":-84.401736,
"callsign":"GT017 ",
"timeStamp":"2019-07-19T13:47:50.0Z",
"latDD":33.732867,
"lonDD":-84.401664,
"callsign":"AAL1300 "

Related

Perl print match and no match strings from file1 match in file2 match with million records

This is what is required:
File1
rama
krishna
mahadev
bentick
william
with million records
File2
hello how are you
rama is king of ayadhya
krishna is king of dwarka
mahadev is great lord
this is what is you go
with million records
Output required
*strings matched-below
rama is king of ayadhya
krishna is king of dwarka
mahadev is great lord
***strings unmatched-below *****
-----------------------------
bentick
william
and so on
I tried this but did not work:
#!/usr/bin/perl -w
use strict;
if (scalar(#ARGV) != 2) {
printf STDERR "Usage: fgrep.pl smallfile bigfile\n";
exit(2);
}
my ($small_file, $big_file) = ($ARGV[0], $ARGV[1]);
my ($small_fp, $big_fp, %small_hash, $field);
open($small_fp, "<", $small_file) || die "Can't open $small_file: " . $!;
open($big_fp, "<", $big_file) || die "Can't open $big_file: " . $!;
# store contents of small file in a hash
while (<$small_fp>) {
chomp;
$small_hash{$_} = undef;
}
close($small_fp);
# loop through big file and find matches
while (<$big_fp>) {
# no need for chomp
$field = (split(/ /, $_))[1];
if (defined($field) && exists($small_hash{$field})) {
printf("%s", $_);
}
}
close($big_fp);
exit(0);
Correcting the small mistakes you made:
use strict;
use warnings;
if (scalar(#ARGV) != 2) {
printf STDERR "Usage: fgrep.pl smallfile bigfile\n";
exit(2);
}
my ($small_file, $big_file) = ($ARGV[0], $ARGV[1]);
my ($small_fp, $big_fp, %small_hash, $field);
open($small_fp, "<", $small_file) || die "Can't open $small_file: " . $!;
open($big_fp, "<", $big_file) || die "Can't open $big_file: " . $!;
# store contents of small file in a hash
while (<$small_fp>) {
s/\s+//g;
next unless $_;
$small_hash{$_} = undef;
}
close($small_fp);
# loop through big file and find matches
while (<$big_fp>) {
# no need for chomp
$field = (split(/ /, $_))[0];
if (defined($field) && exists($small_hash{$field})) {
printf("%s", $_);
$small_hash{$field}++;
}
}
close($big_fp);
print "\n ***unmatched Strings***\n";
foreach my $key (keys %small_hash) {
print "$key\n"
unless $small_hash{$key};
}
exit(0);
you left whitespace in the names.
the first word is (split(/ /, $_))[0] not (split(/ /, $_))[1]
you forgot to save which words you found and print them out

"if statement filter for Perl"

I'm using Perl to search a large file for specific Lat/Long data.
but I only want the Lat/Long data that belongs to the data that contains "GT017 " just before it
I've successfully grabbed the following data types
Call sign
Lat
Long
timestamp
But , I can not seem to limit the data acquired to only the Lat/Long/Timestamp that is tied to the Call sign GT017
I've tried if statements and sub routines but neither seem to allow me to only print the Lat/Long/timestamp that is specifically tied to the Call sign GT017
use strict;
#search for headers
my $find1= "GT017 ";
my $find2 = "timeStamp";
my $find3 = "latDD";
my $find4= "lonDD";
#above provides in response to my $find
#"callsign":"GT017 "
#"latDD":33.733200,
#"lonDD":-84.475667,
#"timeStamp":"2019-07-19T13:46:57.8Z",
open (NEW1, ">", "new1.txt" ) or die "could not open:$!";
open (FILE, "<", "test revab.txt") or die "could not open:$!";
while (<FILE>) {
#I want only a specific callsign's lat/long/timestamp printed
if ($find1 =~ /GT017/) {
print NEW1 if (/$find1/);
print NEW1 if (/$find2/);
print NEW1 if (/$find3/);
print NEW1 if (/$find4/);
}
}
close (FILE);
close (NEW1);
I get a large file with every Lat/Long/timestamp from the original file.
"latDD":33.733200,
"lonDD":-84.474266,
"timeStamp":"2019-07-19T13:46:58.22Z",
"latDD":33.733200,
"lonDD":-84.474266,
"timeStamp":"2019-07-19T13:46:58.8Z",
"latDD":33.708528,
"lonDD":-84.388506,
"timeStamp":"2019-07-19T13:46:58.33Z",
The below is what I want for each occurrence of "callsign":"GT017 ",
"callsign":"GT017 ",
"timeStamp":"2019-07-19T13:47:50.0Z",
"latDD":33.781071,
"lonDD":-84.401736,
Here is an example:
my $find = '"callsign":"GT017 "';
while (<FILE>) {
if (/\Q$find\E/) {
print NEW1 $_;
for (1..3) {
print NEW1 scalar <FILE>;
}
}
}
this will print the line with callsign and the following 3 lines.

match columns on different lines and sum

I have a csv with about 160,000 lines, it looks like this:
chr1,160,161,3,0.333333333333333,+
chr1,161,162,4,0.5,-
chr1,309,310,14,0.0714285714285714,+
chr1,311,312,2,0.5,-
chr1,499,500,39,0.717948717948718,+
chr2,500,501,8,0.375,-
chr2,510,511,18,0.5,+
chr2,511,512,6,0.333333333333333,-
I would like to pair lines where column 1 is the same, column 3 matches column 2 and where column 6 is a '+' while on the other line it is a '-'. If this is true I would like to sum column 4 and column 5.
My desired out put would be
chr1,160,161,7,0.833333333333333,+
chr1,309,310,14,0.0714285714285714,+
chr1,311,312,2,0.5,-
chr1,499,500,39,0.717948717948718,+
chr2,500,501,8,0.375,-
chr2,510,511,24,0.833333333333333,-
the best solution I can think of is to duplicate the file and then match columns between the file and it's duplicate with perl:
#!/usr/bin/perl
use strict;
use warnings;
open my $firstfile, '<', $ARGV[0] or die "$!";
open my $secondfile, '<', $ARGV[1] or die "$!";
my ($chr_a, $chr_b,$start,$end,$begin,$finish, $sum_a, $sum_b, $total_a,
$total_b,$sign_a,$sign_b);
while (<$firstfile>) {
my #col = split /,/;
$chr_a = $col[0];
$start = $col[1];
$end = $col[2];
$sum_a = $col[3];
$total_a = $col[4];
$sign_a = $col[5];
seek($secondfile,0,0);
while (<$secondfile>) {
my #seccol = split /,/;
$chr_b = $seccol[0];
$begin = $seccol[1];
$finish = $seccol[2];
$sum_b = $seccol[3];
$total_b = $seccol[4];
$sign_b = $seccol[5];
print join ("\t", $col[0], $col[1], $col[2], $col[3]+=$seccol[3],
$col[4]+=$seccol[4], $col[5]),
"\n" if ($chr_a eq $chr_b and $end==$begin and $sign_a ne $sign_b);
}
}
And that works fine, but ideally I'd like to be able to do this within the file itself without having to duplicate it, because I have many files and so I would like to run a script over all of them that is less time-consuming.
Thanks.
In the absence of a response to my comment, this program will do as you ask with the data you provide.
use strict;
use warnings;
my #last;
while (<DATA>) {
s/\s+\z//;
my #line = split /,/;
if (#last
and $last[0] eq $line[0]
and $last[2] eq $line[1]
and $last[5] eq '+' and $line[5] eq '-') {
$last[3] += $line[3];
$last[4] += $line[4];
print join(',', #last), "\n";
#last = ()
}
else {
print join(',', #last), "\n" if #last;
#last = #line;
}
}
print join(',', #last), "\n" if #last;
__DATA__
chr1,160,161,3,0.333333333333333,+
chr1,161,162,4,0.5,-
chr1,309,310,14,0.0714285714285714,+
chr1,311,312,2,0.5,-
chr1,499,500,39,0.717948717948718,+
chr2,500,501,8,0.375,-
chr2,510,511,18,0.5,+
chr2,511,512,6,0.333333333333333,-
output
chr1,160,161,7,0.833333333333333,+
chr1,309,310,14,0.0714285714285714,+
chr1,311,312,2,0.5,-
chr1,499,500,39,0.717948717948718,+
chr2,500,501,8,0.375,-
chr2,510,511,24,0.833333333333333,+

Open a file.txt and find the possible start and end positions of its genes

Hi I have a file and I would like to open it and find the start and end positions of its genes,also I have some extra imformations.The beginning of each gene is mapped by the following pattern. There is an 8 letter consensus known as the Shine-Dalgarno sequence (TAAGGAGG) followed by 4-10 bases downstream before the initiation codon (ATG). However there are variants of the Shine-Dalgarno sequence with the most common of which being [TA][AC]AGGA[GA][GA].The end of the gene is specified by the stop codon TAA, TAG and TGA. It must be taken care the stop codon is found after the correct Open.Reading Frame (ORF).
Now I have make a txt file with genome and I open it with this code,and the error begin when I go to read the genome and put start and end.Any help?Thanks a lot.:
#!/usr/bin/perl -w
use strict;
use warnings;
# Searching for motifs
# Ask the user for the filename of the file containing
my $proteinfilename = "yersinia_genome.fasta";
print "\nYou open the filename of the protein sequence data: yersinia_genome.fasta \n";
# Remove the newline from the protein filename
chomp $proteinfilename;
# open the file, or exit
unless (open(PROTEINFILE, $proteinfilename) )
{
print "Cannot open file \"$proteinfilename\"\n\n";
exit;
}
# Read the protein sequence data from the file, and store it
# into the array variable #protein
my #protein = <PROTEINFILE>;
# Close the file - we've read all the data into #protein now.
close PROTEINFILE;
# Put the protein sequence data into a single string, as it's easier
# to search for a motif in a string than in an array of
# lines (what if the motif occurs over a line break?)
my $protein = join( '', #protein);
# Remove whitespace.
$protein =~ s/\s//g;
# In a loop, ask the user for a motif, search for the motif,
# and report if it was found.
my $motif='TAAGGAGG';
do
{
print "\n Your motif is:$motif\n";
# Remove the newline at the end of $motif
chomp $motif;
# Look for the motif
if ( $protein =~ /$motif/ )
{
print "I found it!This is the motif: $motif in line $.. \n\n";
}
else
{
print "I couldn't find it.\n\n";
}
}
until ($motif =~ /TAAGGAGG/g);
my $reverse=reverse $motif;
print "Here is the reverse Motif: $reverse. \n\n";
#HERE STARTS THE PROBLEMS,I DONT KNOW WHERE I MAKE THE MISTAKES
#$genome=$motif;
#$genome = $_[0];
my $ORF = 0;
while (my $genome = $proteinfilename) {
chomp $genome;
print "processing $genome\n";
my $mrna = split(/\s+/, $genome);
while ($mrna =~ /ATG/g) {
# $start and $stop are 0-based indexes
my $start = pos($mrna) - 3; # back up to include the start sequence
# discard remnant if no stop sequence can be found
last unless $mrna=~ /TAA|TAG|TGA/g;
#m/^ATG(?:[ATGC]{3}){8,}?(?:TAA|TAG|TGA)/gm;
my $stop = pos($mrna);
my $genlength = $stop - $start;
my $genome = substr($mrna, $start, $genlength);
print "\t" . join(' ', $start+1, $stop, $genome, $genlength) . "\n";
# $ORF ++;
#print "$ORF\n";
}
}
exit;
Thanks,I have make it the solution is :
local $_=$protein;
while(/ATG/g){
my $start = pos()-3;
if(/T(?:TAA|TAG|TGA)/g){
my $stop = pos;
print $start, " " , $stop, " " ,$stop - $start, " " ,
substr ($_,$start,$stop - $start),$/;
}
}
while (my $genome = $proteinfilename) {
This creates an endless loop: you are copying the file name (not the $protein data) over and over.
The purpose of the while loop is unclear; it will never terminate.
Perhaps you simply mean
my ($genome) = $protein;
Here is a simplistic attempt at fixing the obvious problems in your code.
#!/usr/bin/perl -w
use strict;
use warnings;
my $proteinfilename = "yersinia_genome.fasta";
chomp $proteinfilename;
unless (open(PROTEINFILE, $proteinfilename) )
{
# die, don't print & exit
die "Cannot open file \"$proteinfilename\"\n";
}
# Avoid creating a potentially large temporary array
# Read directly into $protein instead
my $protein = join ('', <PROTEINFILE>);
close PROTEINFILE;
$protein =~ s/\s//g;
# As this is a static variable, no point in looping
my $motif='TAAGGAGG';
chomp $motif;
if ( $protein =~ /$motif/ )
{
print "I found it! This is the motif: $motif in line $.. \n\n";
}
else
{
print "I couldn't find it.\n\n";
}
my $reverse=reverse $motif;
print "Here is the reverse Motif: $reverse. \n\n";
# $ORF isn't used; removed
# Again, no point in writing a loop
# Also, $genome is a copy of the data, not the filename
my $genome = $protein;
# It was already chomped, so no need to do that again
my $mrna = split(/\s+/, $genome);
while ($mrna =~ /ATG/g) {
my $start = pos($mrna) - 3; # back up to include the start sequence
last unless $mrna=~ /TAA|TAG|TGA/g;
my $stop = pos($mrna);
my $genlength = $stop - $start;
my $genome = substr($mrna, $start, $genlength);
print "\t" . join(' ', $start+1, $stop, $genome, $genlength) . "\n";
}
exit;

Editing Artist Name in MP3 file Perl Script ID3V2 tag

I am trying to write a Perl code for editing the ID3V2 tags i.e artist name for start i can seek and read it but when i try to write new it just replace all the content of the file with the new artist name and corrupt the file i am new to this so kindly give me some direction as i am not suppose to use library below is the code i have tried.
#!/usr/bin/perl
use Fcntl qw( SEEK_SET );
my($fh, $filename, $byte_position, $byte_value);
$filename = $ARGV[0];
open(IN, "+>", $filename);
#open IN, $filename;
seek(IN,0,SEEK_SET);
read IN, $temp, 128;
print $temp;
print "\n";
seek(IN,14,SEEK_SET);
read IN, $temp, 16;
print "Artist is :" .$temp;
print "\n";
sysseek(IN,14,SEEK_SET);
#want to replace the Artist Name with new one.
syswrite (IN,$newArtist);
print "\n";
close(IN);
Your open call truncates the file. You need to open the file using +< rather than +> to open it for reading and writing without truncating it.
See perldoc -f open for more information.
Some thing that worked for me in case some one is looking for a solutions here.
my $myFile = shift or die "Usage: <file.mp3> <artist> <title>\n";
my $newArtist = shift or die "Usage: <file.mp3> <artist> <title>\n";
my $newTitle = shift or die "Usage:<file.mp3> <artist> <title> \n";
#Pack data for artist name and title. creating frame of size 4byte.
my $artist_len = length($newArtist);
my $title_len = length($newTitle);
my $artist = pack('N',$artist_len);
my $title = pack('N',$title_len);
print"\n New artist name will be >>". $newArtist."\n";
print"\n New title will be >> ". $newTitle ."\n";
my $encoding = ":raw :bytes";
open myMP3, "+<$encoding",$myFile or die "Error!Can NOT open the MP3 file.\n";
open OUT_FILE, "+>$encoding", 'new.mp3' or die "Error!Can NOT open the file to write.\n";
my $length = 512;
read (myMP3, my $buffer, $length);
chomp($buffer);
#reading the first 10 bytes of the ID3v2 Header
my $tagheader = substr($buffer, 0, 10);
my ($IDtag, $version, $revision, $flags, $size) = unpack('A3 h h h N4',$tagheader);
print OUT_FILE $tagheader;
my $len = 0;
my $ptr1 = 0;
my $ptr2 = 0;
while(1)
{
#reading 10 bytes for each frame and adding 10 bytes for next frame
$ptr1 += 10+$len;
$ptr2 = $ptr1+10;
#reading header
#4bytes frame ID,
#4bytes frame size
#2bytes flags
my $frameHeader = substr($buffer,$ptr1,10);
my($frameID, $frameSize, $flag) = unpack('A4 N4 h2',$frameHeader);
#TALB:album-name,TCON:content-type,TIT2:title,TPE1:Artist, TRCK:Track Number,TYER: year
if (($frameID eq 'TALB') || ($frameID eq 'TCON') || ($frameID eq 'TIT2') || ($frameID eq 'TPE1') || ($frameID eq 'TRCK') || ($frameID eq 'TYER'))
{
my $readFrame = substr($buffer, $ptr2, $frameSize);
my $myFrame = unpack('A*($frameSize)' ,$readFrame);
if ($frameID eq "TPE1")
{
print OUT_FILE $frameID;
print OUT_FILE $artist;
print OUT_FILE $flag;
print OUT_FILE $newArtist;
}
elsif ($frameID eq "TIT2")
{
print OUT_FILE $frameID;
print OUT_FILE $title;
print OUT_FILE $flag;
print OUT_FILE $newTitle;
}
else
{
print OUT_FILE $frameHeader;
print OUT_FILE $readFrame;
}
}
else
{
my $leng = length($buffer) - $ptr1;
my $music = substr($buffer, $ptr1, $leng);
print OUT_FILE $music;
seek(myMP3 , 10 , 0);
seek(OUT_FILE, 10 , 0);
while(<OUT_FILE>)
{
print myMP3 $_;
}
unlink 'new.mp3';
#close files
close(myMP3);
close(OUT_FILE);
die "\n Update success!\n";
}
$len=$frameSize;
}