Duplicate values in column

Duplicate values in column - perl

I have a original file which has following columns,
02-May-2018,AAPL,Sell,0.25,1000
02-May-2018,C,Sell,0.25,2000
02-May-2018,JPM,Sell,0.25,3000
02-May-2018,WFC,Sell,0.25,5000
02-May-2018,AAPL,Sell,0.25,7000
02-May-2018,GOOG,Sell,0.25,8000
02-May-2018,GOOG,Sell,0.25,9000
02-May-2018,C,Sell,0.25,2000
02-May-2018,AAPL,Sell,0.25,3000
I am trying to print this original line if I see value in the second column more then 2 times.. for example, if I see AAPL more then 2 times desired result should print
02-May-2018,AAPL,Sell,0.25,1000
02-May-2018,AAPL,Sell,0.25,7000
02-May-2018,AAPL,Sell,0.25,3000
So Far, I have written the following which prints results multiple times which is wrong.. can you please help on what I am doing wrong?
open (FILE, "<$TMPFILE") or die "Could not open $TMPFILE";
open (OUT, ">$TMPFILE1") or die "Could not open $TMPFILE1";
%count = ();
#symbol = ();
while ($line = <FILE>)
{
chomp $line;
(#data) = split(/,/,$line);
$count{$data[1]}++;
#keys = sort {$count{$a} cmp $count{$b}} keys %count;
for my $key (#keys)
{
if ( $count{$key} > 2 )
{
print "$line\n";
}
}
}

I'd do it something like this - store lines you've seen in a 'buffer' and print them out again if the condition is hit (before continuing to print as you go):
#!/usr/bin/env perl
use strict;
use warnings;
my %buffer;
my %count_of;
while ( my $line = <> ) {
my ( $date, $ticker, #values ) = split /,/, $line;
#increment the count
$count_of{$ticker}++;
if ( $count_of{$ticker} < 3 ) {
#count limit not hit, so stash the current line in the buffer.
$buffer{$ticker} .= $line;
next;
}
#print the buffer if the count has been hit
if ( $count_of{$ticker} == 3 ) {
print $buffer{$ticker};
}
#only gets to here once the limit is hit, so just print normally.
print $line;
}
With your input data, this outputs:
02-May-2018,AAPL,Sell,0.25,1000
02-May-2018,AAPL,Sell,0.25,7000
02-May-2018,AAPL,Sell,0.25,3000

Simple answer:
push #{ $lines{(split",")[1]} }, $_ while <>;
print #{ $lines{$_} } for grep #{ $lines{$_} } > 2, sort keys %lines;
perl program.pl inputfile > outputfile

You need to read the input file twice, because you don't know the final counts until you get to the end of the file
use strict;
use warnings 'all';
my ($TMPFILE, $TMPFILE1) = qw/ infile outfile /;
my %counts;
{
open my $fh, '<', $TMPFILE or die "Could not open $TMPFILE: $!";
while ( <$fh> ) {
my #fields = split /,/;
++$counts{$fields[1]};
}
}
open my $fh, '<', $TMPFILE or die "Could not open $TMPFILE: $!";
open my $out_fh, '>', $TMPFILE1 or die "Could not open $TMPFILE1: $!";
while ( <$fh> ) {
my #fields = split /,/;
print $out_fh $_ if $counts{$fields[1]} > 2;
}
output
02-May-2018,AAPL,Sell,0.25,1000
02-May-2018,AAPL,Sell,0.25,7000
02-May-2018,AAPL,Sell,0.25,3000

This should work:
use strict;
use warnings;
open (FILE, "<$TMPFILE") or die "Could not open $TMPFILE";
open (OUT, ">$TMPFILE1") or die "Could not open $TMPFILE1";
my %data;
while ( my $line = <FILE> ) {
chomp $line;
my #line = split /,/, $line;
push(#{$data{$line[1]}}, $line);
}
foreach my $key (keys %data) {
if(#{$data{$key}} > 2) {
print "$_\n" foreach #{$data{$key}};
}
}

Related

match string in file and replacement with other string

I have a file containing lines as follows
#comments abc
#comments xyz
SerialPort=100
Baudrate=9600
Parity=2
Databits=8
Stopbits=1
also I have array #in = ( SerialPort=500 , Baudrate=300, parity=0, Databits=16, Stopbits=0 ),these array elements read from browser, I am trying to write perl script to match "SerialPort" in file and replace SerialPort=100 in file with SerialPort=500 of array, I want match all other elments in loop I tried code not working please improve the code which is below, I think regular expression is not working and each time if condition to match and substitution resulting false, and also when I look at file after execution of script file consists of duplicates.
#!/usr/bin/perl
$old_file = "/home/work/conf";
open (fd_old, "<", $old_file) || die "cant open file";
#read_file = <fd_old>;
close (fd_old);
#temp = ();
$flag = 0;
foreach $infile ( #read_file )
{
foreach $rr ( #in )
{
($key, $value ) = split(/=/, $rr );
if ( $infile =~ s/\b$key\b(.*)/$rr/ )
{
push ( #temp , $infile );
$flag = 0;
}
else
{
$flag = 1;
}
}
if ( $flag )
{
push (#temp, $infile );
}
}
open ( fd, ">", $old_file ) || die "can't open";
print fd #temp;
close(fd);

Perl 101: use strict; use warnings;.
Prefix variable names with $.
$old_file is undef when you try to open it.
And spell falg correctly, which if you'd turned on those options, you'd have been told about.
Also: When asking questions on SO, it's helpful if you point out what's not working.

#Maruti: Never write a perl program without use strict; and use warnings;. I have modified your code. Just have a look.
Code:
#!/usr/bin/perl
use strict;
use warnings;
my $old_file = "/home/work/conf";
open (my $fh, "<", $old_file) || die "cant open file";
my #read_file = <$fh>;
close ($fh);
my #temp = ();
my #in = ('SerialPort=500' , 'Baudrate=300', 'parity=0', 'Databits=16', 'Stopbits=0');
foreach my $infile ( #read_file )
{
foreach my $rr ( #in )
{
my ($key, $value) = split(/=/, $rr );
if ( $infile =~ m/\b$key\b\=\d+/ && $infile =~ /#.*/)
{
$infile =~ s/\b$key\b\=\d+/$rr/ig;
}
}
push (#temp, $infile );
}
open (my $out, ">", $old_file ) || die "can't open";
foreach my $res(#temp)
{
print $out $res;
}
close($out);

how to count the number of specific characters through each line from file?

I'm trying to count the number of 'N's in a FASTA file which is:
>Header
AGGTTGGNNNTNNGNNTNGN
>Header2
AGNNNNNNNGNNGNNGNNGN
so in the end I want to get the count of number of 'N's and each header is a read so I want to make a histogram so I would at the end output something like this:
# of N's # of Reads
0 300
1 240
etc...
so there are 300 sequences or reads that have 0 number of 'N's
use strict;
use warnings;
my $file = shift;
my $output_file = shift;
my $line;
my $sequence;
my $length;
my $char_N_count = 0;
my #array;
my $count = 0;
if (!defined ($output_file)) {
die "USAGE: Input FASTA file\n";
}
open (IFH, "$file") or die "Cannot open input file$!\n";
open (OFH, ">$output_file") or die "Cannot open output file $!\n";
while($line = <IFH>) {
chomp $line;
next if $line =~ /^>/;
$sequence = $line;
#array = split ('', $sequence);
foreach my $element (#array) {
if ($element eq 'N') {
$char_N_count++;
}
}
print "$char_N_count\n";
}

Try this. I changed a few things like using scalar file handles. There are many ways to do this in Perl, so some people will have other ideas. In this case I used an array which may have gaps in it - another option is to store results in a hash and key by the count.
Edit: Just realised I'm not using $output_file, because I have no idea what you want to do with it :) Just change the 'print' at the end to 'print $out_fh' if your intent is to write to it.
use strict;
use warnings;
my $file = shift;
my $output_file = shift;
if (!defined ($output_file)) {
die "USAGE: $0 <input_file> <output_file>\n";
}
open (my $in_fh, '<', $file) or die "Cannot open input file '$file': $!\n";
open (my $out_fh, '>', $output_file) or die "Cannot open output file '$output_file': $!\n";
my #results = ();
while (my $line = <$in_fh>) {
next if $line =~ /^>/;
my $num_n = ($line =~ tr/N//);
$results[$num_n]++;
}
print "# of N's\t# of Reads\n";
for (my $i = 0; $i < scalar(#results) ; $i++) {
unless (defined($results[$i])) {
$results[$i] = 0;
# another option is to 'next' if you don't want to show the zero totals
}
print "$i\t\t$results[$i]\n";
}
close($in_fh);
close($out_fh);
exit;

Extracting specific multiple line of records that is pipe delimited in perl

I have a file that looks like
NAME|JOHN|TOKYO|JPN
AGE|32|M
INFO|SINGLE|PROFESSIONAL|IT
NAME|MARK|MANILA|PH
AGE|37|M
INFO|MARRIED|PROFESSIONAL|BPO
NAME|SAMANTHA|SYDNEY|AUS
AGE|37|F
INFO|MARRIED|PROFESSIONAL|OFFSHORE
NAME|LUKE|TOKYO|JPN
AGE|27|M
INFO|SINGLE|PROFESSIONAL|IT
I want to separate the records by country. I have stored each line into array variable #fields
my #fields = split(/\|/, $_ );
making $fields[3] as my basis for sorting it. I wanted it to separate into 2 output text files
OUTPUT TEXT FILE 1:
NAME|JOHN|TOKYO|JPN
AGE|32|M
INFO|SINGLE|PROFESSIONAL|IT
NAME|LUKE|TOKYO|JPN
AGE|27|M
INFO|SINGLE|PROFESSIONAL|IT
OUTPUT TEXT FILE 2
NAME|MARK|MANILA|PH
AGE|37|M
INFO|MARRIED|PROFESSIONAL|BPO
NAME|SAMANTHA|SYDNEY|AUS
AGE|37|F
INFO|MARRIED|PROFESSIONAL|OFFSHORE
Putting all that is from JPN to output text 1 & non-JPN country to output text file 2
here's the code that what trying to work out
use strict;
use warnings;
use Data::Dumper;
use Carp qw(croak);
my #fields;
my $tmp_var;
my $count;
;
my ($line, $i);
my $filename = 'data.txt';
open(my $input_fh, '<', $filename ) or croak "Can't open $filename: $!";
open(OUTPUTA, ">", 'JPN.txt') or die "wsl_reformat.pl: could not open $ARGV[0]";
open(OUTPUTB, ">", 'Non-JPN.txt') or die "wsl_reformat.pl: could not open $ARGV[0]";
my $fh;
while (<$input_fh>) {
chomp;
my #fields = split /\|/;
if ($fields[0] eq 'NAME') {
for ($i=1; $i < #fields; $i++) {
if ($fields[3] eq 'JPN') {
$fh = $_;
print OUTPUTA $fh;
}
else {
$fh = $_;
print OUTPUTB $fh;
}
}
}
}
close(OUTPUTA);
close(OUTPUTB)
Still has no luck on it :(

Here is the way I think ikegami was saying, but I've never tried this before (although it gave the correct results).
#!/usr/bin/perl
use strict;
use warnings;
open my $jpn_fh, ">", 'o33.txt' or die $!;
open my $other_fh, ">", 'o44.txt' or die $!;
my $fh;
while (<DATA>) {
if (/^NAME/) {
if (/JPN$/) {
$fh = $jpn_fh;
}
else {
$fh = $other_fh;
}
}
print $fh $_;
}
close $jpn_fh or die $!;
close $other_fh or die $!;
__DATA__
NAME|JOHN|TOKYO|JPN
AGE|32|M
INFO|SINGLE|PROFESSIONAL|IT
NAME|MARK|MANILA|PH
AGE|37|M
INFO|MARRIED|PROFESSIONAL|BPO
NAME|SAMANTHA|SYDNEY|AUS
AGE|37|F
INFO|MARRIED|PROFESSIONAL|OFFSHORE
NAME|LUKE|TOKYO|JPN
AGE|27|M
INFO|SINGLE|PROFESSIONAL|IT

You didn't say what you needed help with, so I'm assuming it's coming up with an algorithm. Here's a good one:
Open the file to read.
Open the file for the JPN entries.
Open the file for the non-JPN entries.
While not eof,
Read a line.
Parse the line.
If it's the first line of a record,
If the person's country is JPN,
Set current file handle to the file handle for JPN entries.
Else,
Set current file handle to the file handle for non-JPN entries.
Print the line to the current file handle.
my $jpn_qfn = '...';
my $other_qfn = '...';
open(my $jpn_fh, '>', $jpn_qfn)
or die("Can't create $jpn_qfn: $!\n");
open(my $other_fh, '>', $other_qfn)
or die("Can't create $other_qfn: $!\n");
my $fh;
while (<>) {
chomp;
my #fields = split /\|/;
if ($fields[0] eq 'NAME') {
$fh = $fields[3] eq 'JPN' ? $jpn_fh : $other_fh;
}
say $fh $_;
}

#!/usr/bin/env perl
use 5.012;
use autodie;
use strict;
use warnings;
# store per country output filehandles
my %output;
# since this is just an example, read from __DATA__ section
while (my $line = <DATA>) {
# split the fields
my #cells = split /[|]/, $line;
# if first field is NAME, this is a new record
if ($cells[0] eq 'NAME') {
# get the country code, strip trailing whitespace
(my $country = $cells[3]) =~ s/\s+\z//;
# if we haven't created and output file for this
# country, yet, do so
unless (defined $output{$country}) {
open my $fh, '>', "$country.out";
$output{$country} = $fh;
}
my $out = $output{$country};
# output this and the next two lines to
# country specific output file
print $out $line, scalar <DATA>, scalar <DATA>;
}
}
close $_ for values %output;
__DATA__
NAME|JOHN|TOKYO|JPN
AGE|32|M
INFO|SINGLE|PROFESSIONAL|IT
NAME|MARK|MANILA|PH
AGE|37|M
INFO|MARRIED|PROFESSIONAL|BPO
NAME|SAMANTHA|SYDNEY|AUS
AGE|37|F
INFO|MARRIED|PROFESSIONAL|OFFSHORE
NAME|LUKE|TOKYO|JPN
AGE|27|M
INFO|SINGLE|PROFESSIONAL|IT

Thanks for your Help heaps
I was able to solved this problem in perl,
many thanks
#!/usr/local/bin/perl
use strict;
use warnings;
use Data::Dumper;
use Carp qw(croak);
my #fields;
my $tmp_var;
my ($rec_type, $country);
my $filename = 'data.txt';
open (my $input_fh, '<', $filename ) or croak "Can't open $filename: $!";
open my $OUTPUTA, ">", 'o33.txt' or die $!;
open my $OUTPUTB, ">", 'o44.txt' or die $!;
my $Combline;
while (<$input_fh>) {
$_ = _trim($_);
#fields = split (/\|/, $_);
$rec_type = $fields[0];
$country = $fields[3];
if ($rec_type eq 'NAME') {
if ($country eq 'JPN') {
*Combline = $OUTPUTA;
}
else {
*Combline = $OUTPUTB;
}
}
print Combline;
}
close $OUTPUTA or die $!;
close $OUTPUTB or die $!;
sub _trim {
my $word = shift;
if ( $word ) {
$word =~ s/\s*\|/\|/g; #remove trailing spaces
$word =~ s/"//g; #remove double quotes
}
return $word;
}

Using perl, how do I search a text file for _NN (at the end of a word) and print the word in front?

This gives the whole line:
#!/usr/bin/perl
$file = 'output.txt';
open(txt, $file);
while($line = <txt>) {
print "$line" if $line =~ /_NN/;
}
close(txt);

#!/usr/bin/perl
use strict;
use warnings FATAL => "all";
binmode(STDOUT, ":utf8") || die;
my $file = "output.txt";
open(TEXT, "< :utf8", $file) || die "Can't open $file: $!";
while(<TEXT>) {
print "$1\n" while /(\w+)_NN\b/g;
}
close(TEXT) || die "Can't close $file: $!";

Your answer script reads a bit awkwardly, and has a couple of potential errors. I'd rewrite the main logic loop like so:
foreach my $line (grep { /expend_VB/ } #sentences) {
my #nouns = grep { /_NN/ } split /\s+/, $line;
foreach my $word (#nouns) {
$word =~ s/_NN//;
print "$word\n";
}
print "$line\n" if scalar(#nouns);
}
You need to put the my declaration inside the loop - otherwise it will persist longer than you want it to, and could conceivably cause problems later.
foreach is a more common perl idiom for iterating over a list.

print "$1" if $line =~ /(\S+)_NN/;

#!/usr/bin/perl
use strict;
use warnings FATAL => "all";
my $search_key = "expend"; ## CHANGE "..." to <>
open(my $tag_corpus, '<', "ch13tagged.txt") or die $!;
my #sentences = <$tag_corpus>; # This breaks up each line into list
my #words;
for (my $i=0; $i <= #sentences; $i++) {
if ( defined( $sentences[$i] ) and $sentences[$i] =~ /($search_key)_VB.*/i) {
#words = split /\s/,$sentences[$i]; ## \s is a whitespace
for (my $j=0; $j <= #words; $j++) {
#FILTER if word is noun:
if ( defined( $words[$j] ) and $words[$j] =~ /_NN/) {
#PRINT word and sentence:
print "**",split(/_\S+/,$words[$j]),"**", "\n";
print split(/_\S+/,$sentences[$i]), "\n"
}
} ## put print sentences here to print each sentence after all the nouns inside
}
}
close $tag_corpus || die "Can't close $tag_corpus: $!";

Multiple Sorts within a text file

Looking for a non-case sensitive search using perl, So if a "!" is detected at the start of the line, a new sort begins (only on the section).
[test file]
! Sort Section
!
a
g
r
e
! New Sort Section
1
2
d
3
h
becomes,
[test file]
! Sort Section
!
a
e
g
r
! New Sort Section
1
2
3
d
h

Here's one way to do it:
use strict;
use warnings;
my $filename = shift or die 'filename!';
my #sections;
my $current;
# input
open my $fh, '<', $filename or die "open $filename: $!";
while ( <$fh> ) {
if ( m/^!/ ) {
$current = [ $_ ];
push #sections, $current;
}
else {
push #$current, $_;
}
}
close $fh;
# output
for ( #sections ) {
print shift #$_; # print first line
print sort #$_; # print rest
}

Another one, using an output file. More importantly, not loading an entire file into memory:
use strict;
use warnings;
sub output {
my( $lines, $fh ) = #_;
return unless #$lines;
print $fh shift #$lines; # print first line
print $fh sort { lc $a cmp lc $b } #$lines; # print rest
return;
}
# ==== main ============================================================
my $filename = shift or die 'filename!';
my $outfn = "$filename.out";
die "output file $outfn already exists, aborting\n" if -e $outfn;
# prereqs okay, set up input, output and sort buffer
open my $fh, '<', $filename or die "open $filename: $!";
open my $fhout, '>', $outfn or die "open $outfn: $!";
my $current = [];
# process data
while ( <$fh> ) {
if ( m/^!/ ) {
output $current, $fhout;
$current = [ $_ ];
}
else {
push #$current, $_;
}
}
output $current, $fhout;
close $fhout;
close $fh;

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Duplicate values in column - perl

Simple answer: push #{ $lines{(split",")[1]} }, $_ while <>; print #{ $lines{$_} } for grep #{ $lines{$_} } > 2, sort keys %lines; perl program.pl inputfile > outputfile

Related

match string in file and replacement with other string

how to count the number of specific characters through each line from file?

Extracting specific multiple line of records that is pipe delimited in perl

Using perl, how do I search a text file for _NN (at the end of a word) and print the word in front?

Multiple Sorts within a text file

Categories

Resources