Using Perl to create another Perl file - perl

I have an input file that looks like
*firsttitle
nameA
nameB
nameC
*secondtitle
xnameA
xnameB
xnameC
I want to create a Perl script that takes this file and basically will create another perl script that looks like
#!/usr/bin/perl
use strict;
use warnings;
my %tags = (
"firsttitle" => [ qw (nameA nameB nameC) ],
"secondtitle" => [ qw (xnameA xnameB xnameC) ]);
my $rx = join '|', keys %tags;
while (<>) {
s/^\s*($rx):\s*(\d+)/$1: $tags{$1}[$2]/;
print;
}
My thought process is that I have to first match print out the regular perl code (#!,use..etc.).Then add " my%tags=(. Then take the input file and look for the * and that's the lookup for the hash and start parsing everything after until the next(*) or end of life. If it's another * then do it again. If it's EOF then add ");" and end. And then finish with printing the last bit of perl code. Help/ideas would be appreciated. If you're going to post code snippets could you go through and explain what each part is doing? Thanks!

Very simple script. First just parse through the input file. Lines that start with * will be titles, and all the following lines up until the next *-line will be values. We put this into a hash of arrays.
The map statement gives us a list of the hash key (the title), and it's values joined together with space. We put this in an array for printing. The printing itself is done with printf, which can be a bit difficult to use, since meta characters will mess us up. Any % that are to be literal must be written as %%. I also changed single quotes from the original to double quotes. I use single quotes on the printf pattern to avoid accidental interpolation of variables.
An alternative - possibly better one - is to not just printf at all, and simply concatenate the string in a normal fashion.
use strict;
use warnings;
my ($title, %hash);
while (<DATA>) {
chomp;
if (/^\*(.+)$/) {
$title = $1;
} else {
push #{$hash{$title}}, $_;
}
}
my #args = ( map { $_, join(' ', #{$hash{$_}}) } keys %hash );
printf '#!/usr/bin/perl
use strict;
use warnings;
my %%tags = (
"%s" => [ qw ( %s ) ],
"%s" => [ qw ( %s ) ]);
my $rx = join "|", keys %%tags;
while (<>) {
s/^\s*($rx):\s*(\d+)/$1: $tags{$1}[$2]/;
print;
}', #args;
__DATA__
*firsttitle
nameA
nameB
nameC
*secondtitle
xnameA
xnameB
xnameC
Update:
This will use a different method of printing, which will be more stable.
my #args = ( map { " '$_' => [ qw ( #{$hash{$_}} ) ],\n" } keys %hash );
print '#!/usr/bin/perl
use strict;
use warnings;
my %tags = (
', #args, '
);
my $rx = join "|", keys %tags;
while (<>) {
s/^\s*($rx):\s*(\d+)/$1: $tags{$1}[$2]/;
print;
}';

Related

How to remove array's newlines and add an element at the beginning of it in Perl?

First of I have to apologize for editing my initial post. But after I provide my code I did the question fuzzy.
So, I have this an array (#start_cod) containing lines separated by /n as follows:
print #start_cod;
tatatattataattatatttat
cacacacaacaccacaac
aaaaaaaaaaaaaaa
I need to remove the newlines and add ">text" ONLY at the beginning of the array as follow:
>text
tatatattataattatatttatcacacacaacaccacaacaaaaaaaaaaaaaaa
I tried:
s/\s+\z// for #start_cod;
print ">text#start_cod";
I tried also with chomp
chomp #start_cod;
print ">text#start_cod";
and
my #start_cod = split("\n",$start_cod);
$start_cod = join("",#start_cod);
print ">text$start_cod";
but I get
aaaaaaaaaaaaaaaaaaa>textcacacacacaacaccacaac>textaattatatattataattatatttat
Any suggestions on how to handle this in Perl Programming?
Here is my code which works 100%.
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
my %alliloux =();
$/="\n>";
while (<>) {
s/>//g;
my ($onoma, #seq) = split (/\n/, $_);
my ($sp, $head) = split (/\./, $onoma);
push #{ $alliloux{$sp} }, join "\n", ">$onoma", #seq;
}
foreach my $sp (keys %alliloux) {
chomp $sp;
my ($head, $dna) = split(/\t/, $sp);
my #start_cod = substr($dna, 3);
say #start_cod;
Input file:
>name aaaaaaaaaaaaaaaaaa
>name2 acacacacacaacaccacaac
>namex aattatatattataattatatttat
output after Perl run
tatatattataattatatttat
cacacacaacaccacaac
aaaaaaaaaaaaaaa
Desired output:
>text
tatatattataattatatttatcacacacaacaccacaacaaaaaaaaaaaaaaa
If I understand your question correctly, this should do what you want:
use strict;
use warnings;
my #start_cod = (
'aaaaaaaaaaaaaaaaaa',
'acacacacacaacaccacaac',
'aattatatattataattatatttat',
);
print ">text\n", #start_cod, "\n";
The print first prints ">text" and a newline once, then you get the #start_cod items on a line, and the last "\n" makes sure you have a newline after the last element.
Output:
>text
aaaaaaaaaaaaaaaaaaacacacacacaacaccacaacaattatatattataattatatttat
You might want to see Read FASTA into Hash. It's the same problem and very close to the code I wrote before I read it. Also, there are modules on CPAN that can handle FASTA.
I think you want to combine the sequences that start with the same name, disregarding the numbers. The sequences shouldn't have interior whitespace. In your code, you are constantly adding whitespace. You even join on a newline. So, you go to the doctor and say "My arm hurts when I do this", and the doctor says "So don't do that". :)
When you run into these sort of problems, check the results of your operations at each step to see if you get what you expect. Here's a much simplified version of a program that I think does what you want. I've removed most of the data structure because they are complicating your process.
In short, read a line and remove the newline at the end. That's one source of your newlines. Then, extract the sequence and concatenate that to the previous sequence. When you join with newlines, you are adding newlines. So, don't do that:
use v5.14;
use warnings;
use Data::Dumper;
my %alliloux = ();
while (<DATA>) {
chomp; # get rid of that newline!
s/>//g;
# now split on whitespace, but only up to two parts.
# There's no array here.
my( $name, $seq ) = split /\s+/, $_, 2;
# remove the numbers at the end to get the prefix of the
# name.
my $prefix = $name =~ s/\d+\z//r;
# append the current sequence for this prefix to what we
# have already seen.f
$alliloux{$prefix} .= $seq;
}
say Dumper( \%alliloux );
foreach my $base ( keys %alliloux ) {
say ">text $alliloux{$base}";
}
__DATA__
>name aaa
>name2 cccc
>name99 aattaatt
You don't need the intermediate array. You can build up your string as you go. You don't need to have all the parts before you do that.
Now, to figure out where you might be going wrong, do a little at once. Ensure that you've extracted the right thing. It's handle to put characters around the variables you interpolate so you can see whitespace at the beginning or end:
while (<DATA>) {
chomp; # get rid of that newline!
s/>//g;
my( $name, $seq ) = split /\s+/, $_, 2;
say "Name: <$name>";
say "Seq: <$seq>"
}
Then, add another step, and ensure that works:
while (<DATA>) {
chomp; # get rid of that newline!
s/>//g;
my( $name, $seq ) = split /\s+/, $_, 2;
say "Name: <$name>";
say "Seq: <$seq>"
my $prefix = $name =~ s/\d+\z//r;
say "Prefix: <$prefix>";
}
Repeat this process for each step. Then, when you come with a question, you've pinpointed the point where things diverge. Here's the same technique in your program:
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
while (<DATA>) {
s/>//g;
my ($onoma, #seq) = split (/\n/, $_);
say "Onoma: <$onoma>";
}
__DATA__
>name aaa
>name2 cccc
>name99 aattaatt
The output shows that you never had anything in #seq. You are splitting on a newline, but unless you've changed the default line ending, you'll only get a newline at the end:
Onoma: <name aaa>
Onoma: <name2 cccc>
Onoma: <name99 aattaatt>
Now there's nothing in #seq, so a line like join "\n", ">$onoma", #seq; is really just join "\n", ">$onoma". You could have seen that with a little checking.
The description lacks clarity of the problem.
By looking at the desired output the following code comes to mind. Please see if it does what you was looking for.
Even looking at your code it is not clear what you try to do -- some part of the code does not make much sense.
use strict;
use warnings;
use feature 'say';
my #start_cod;
while( <DATA> ) {
chomp;
next unless />\s?name.?\s+(.*)/;
push #start_cod, $1;
}
print ">text\n " . join('',#start_cod);
__DATA__
>name aaaaaaaaaaaaaaaaaa
>name2 acacacacacaacaccacaac
.
.
.
> namex aattatatattataattatatttat

How to print data in column form in Perl?

I have a program that prints the contents of arrays in rows. I would like it to print each array in a column next to each other.
This is the code:
#!/usr/local/bin/perl
use strict;
use warnings;
my #M_array;
my #F_array;
open (my $input, "<", 'ssbn1898.txt');
while ( <$input> ) {
chomp;
my ( $name, $id ) = split ( /,/ );
if ( $id eq "M" ) {
push ( #M_array, $name );
}
else {
push ( #F_array, $name );
}
}
close ( $input );
print "M: #M_array \n";
print "F: #F_array \n";
Is this possible or am I trying to do something that can't be done?
Desired format:
M F
Namem1 Namef1
Namem2 Namef2
You can add whatever separator you would like between your data by using the join function, the example below formats the data in your array separated by tabs:
...
use List::MoreUtils qw/pairwise/;
my $separator = "\t";
print join($separator, qw(M F)), "\n";
print join(
"\n",
pairwise { ( $a // '') . $separator . ( $b // '') } #M_array, #F_array
), "\n";
...
I think, you should use Perl formats. Have a look at the Perl documentation. You may want to use the #* format field in your case.
I extended your code in order to print the desired output at the end
use strict;
use warnings;
my #M_array;
my #F_array;
open (my $input, "<", 'ssbn1898.txt');
while ( <$input> ) {
chomp;
my ( $name, $id ) = split ( /,/ );
if ( $id eq "M" ) {
push ( #M_array, $name );
}
else {
push ( #F_array, $name );
}
}
close ( $input );
unshift #M_array, 'M';
unshift #F_array, 'F';
my $namem;
my $namef;
my $max = 0;
$max = (length($_) gt $max ? length($_) : $max) for #M_array;
my $w = '#' . '<' x $max;
eval "
format STDOUT =
$w #*
\$namem, \$namef
.
";
while ( #M_array or #F_array) {
$namem = shift #M_array || '';
$namef = shift #F_array || '';
write;
}
join is probably the simplest approach to take tabs will align your columns nicely.
join ( "\t", #array ),
Alternatively, perl allows formatting via (s)printf:
printf ( "%-10s %-10s", "first", "second" );
Or a more detailed 'format'
Given what you're trying to do is put your two arrays into columns though:
#!/usr/local/bin/perl
use strict;
use warnings;
my $format = "%-10s\t%-10s\n";
my #M_array = qw ( M1 M2 M3 M4 M5 );
my #F_array = qw ( F1 F2 F3 );
my $maxrows = $#M_array > $#F_array ? $#M_array : $#F_array;
printf ( $format, "M", "F" );
for my $rownum ( 0..$maxrows ) {
printf ( $format, $M_array[$rownum] // '', $F_array[$rownum] // '' );
}
This will print a header row, and then loop through you arrays printing one line at a time. // is a conditional operation that tests if something is defined. It's only available in newer perls though*. In older versions || will do the trick - it's almost the same, but handles '' and 0 slightly differently.
* Perl 5.10 onward, so is pretty safe, but worth mentioning because some system are still rocking around with perl 5.8 on them.
You may format output with the sprintf function, but there are some more problems to solve: What if the arrays don't have the same count of entries? For this, you need a place-holder. How much letters must fit into a column? How should it be aligned? Some code for illustration:
#!/usr/bin/perl
use strict;
use warnings;
my #m = (1, 2, 3);
my #f = (11, 22, 33, 44);
# calculate how many rows to display
my $max = #m;
if (#m < #f) {
$max = #f;
}
# placeholder for missing data
my $none = '-';
# formatting 20 chars per column, left aligned
my $fmt = "%-20s%-20s\n";
# print header
print sprintf($fmt, "M", "F");
# print data rows
foreach my $i (0..$max-1) {
print sprintf($fmt, ($m[$i] or $none), ($f[$i] or $none));
}
If you are interested in more sophisticated formatting (for instance center-aligned text), you should switch to the special formatting capabilities Perl provides for report generation.
Borrowing from #HunterMcMillen
use strict;
use warnings;
use feature "say";
local $, = "\t"; # separator when printing list
my $i = (#F_array > #M_array) ? $#F_array : $#M_array;
say qw(M F);
say $M_array[$i] //"", $F_array[$i] //"" for 0 .. $i;
I guess Text::Table is the required module which comes with the perl distribution(just need to install).Go through the below documentation -
Documentation of Text::Table
You need to pass the content as array to the add() method and it will do the wonders for you.

Merge CSV rows based on duplicate key and combine unique values using Perl Text::CSV

I have a big tab-separated file with duplicate products but with different colours and amounts. I’m trying to merge the data based on the key so that I end up with one product and the combined colours and amounts separated by a delimiter (comma in this case).
I'm using the Text::CSV module so that I have better control, and because it allows me to output the file with a different delimiters (from semicolon to pipe).
My question is, how do I merge the data properly? I don't want it simply to combine colours and amounts but remove duplicate values as well. So I was thinking a key/value with the Id/Amount and Id/Colour. But Id isn't unique so how do I do this? Do I create an array or use hashes?
Here is some sample source data, with the tab separators replaced by semicolons ;. Note that the marked row has no Colour so the empty value is not combined in the result.
Cat_id;Cat_name;Id;Name;Amount;Colour;Bla;
101;Fruits;50020;Strawberry;500;Red;1;
101;Fruits;50020;Strawberry;1000;Red;1;
201;Vegetables;60090;Tomato;50;Green;1;
201;Vegetables;60080;Onion;1;Purple;1;
201;Vegetables;60090;Tomato;100;Red;1;
201;Vegetables;60010;Carrot;100;Purple;1;
201;Vegetables;60050;Broccoli;500;Green;1;
201;Vegetables;60050;Broccoli;1000;Green;1;
201;Vegetables;60090;Tomato;500;Yellow;1;
101;Fruits;50060;Apple;500;Green;1;
101;Fruits;50010;Grape;500;Red;1;
201;Vegetables;60010;Carrot;500;White;1;
201;Vegetables;60050;Broccoli;2000;Green;1;
201;Vegetables;60090;Tomato;1000;Red;1;
101;Fruits;50020;Strawberry;100;Red;1;
101;Fruits;50060;Apple;1000;Red;1;
201;Vegetables;60010;Carrot;250;Yellow;1;
101;Fruits;50010;Grape;100;White;1;
101;Fruits;50030;Banana;500;Yellow;1;
201;Vegetables;60010;Carrot;1000;Yellow;1;
101;Fruits;50030;Banana;1000;Green;1;
101;Fruits;50020;Strawberry;200;Red;1;
101;Fruits;50010;Grape;200;White;1;
201;Vegetables;60010;Carrot;50;Orange;1;
201;Vegetables;60080;Onion;2;White;1;
And the desired result I'm trying to get:
101;Fruits;50010;Grape;100,500,200;Red,White;1;
201;Vegetables;60090;Tomato;50,500,1000,10;Yellow,Green,Red;1;
101;Fruits;50060;Apple;500,1000;Red,Green;1;
201;Vegetables;60010;Carrot;250,50,500,1000,100;Orange,Yellow,White,Purple;1;
201;Vegetables;60050;Broccoli;1000,500,2000;Green;1;
101;Fruits;50020;Strawberry;100,1000,200,500;Red;1;
101;Fruits;50030;Banana;500,1000;Yellow,Green;1;
201;Vegetables;60080;Onion;2,1;White,Purple;1;
This is my script so far. It's not finished (and not working) because I'm not sure how to continue. I don't think this can work right because I'm trying to use the same key for different colours.
use strict;
use warnings;
use Text::CSV;
use List::MoreUtils 'uniq';
my $inputfile = shift || die "Give input and output names!\n";
my $outputfile = shift || die "Give output name!\n";
open my $infile, '<', $inputfile or die "Sourcefile in use / not found :$!\n";
open my $outfile, '>', $outputfile or die "Outputfile in use :$!\n";
binmode($outfile, ":encoding(utf8)");
my $csv_in = Text::CSV->new({binary => 1,sep_char => ";",eol => $/});
my $csv_out = Text::CSV->new({binary => 1,sep_char => "|",always_quote => 1,eol => $/}); #,quote_null => 0 #
my %data;
while (my $elements = $csv_in->getline($infile)){
my $id = $elements->[2];
push #{ $data{$id} }, \#elements;
}
for my $id ( sort keys %data ){
my $set = $data{$id};
my #elements = #{ $set->[0] };
$elements[4] = join ',', uniq map { $_->[4] } #$set;
$elements[5] = join ',', uniq map { $_->[5] } #$set;
$csv_in->combine(#$elements);
$csv_out->print($outfile, $elements);
}
Edit: I'm using data::dumper for testing but eventually want it written to a file.
Hashes deal with unique keys. As you've correctly surmised - if you 'overwrite' colour, then ... the old value is replaced.
But hashes can contain array(ref)s. So you can do:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my $id = 50010;
my %hash;
$hash{$id}{'colour'} = [ "red", "green", "blue" ];
push( #{ $hash{$id}{'colour'} }, "orange" );
print Dumper \%hash;
This'll work, provided you don't have any duplicates for the colours. (e.g. there's only one line for White Grapes with that ID.).
You may have to post-process with join, to turn the array into a string.
Or as an alternative, you could concatenate the colours onto the existing:
if ( defined $hash->{$id}->{colour} ) {
$hash->{$id}->{colour} .= ",$colour";
}
I would also note - I'm unclear what you're doing with $elements->[10] because there aren't 10 columns. I would also strongly suggest not using generic names for variables - like %hash - because it's just a bad habit to get into. Vague variable names are bad style and whilst it's largely academic when you're looking at a small chunk of code, it pays to get into the habit of making it clear what you can expect to be in a particular variable. (Especially true if it's not clear of data types)
I don't have time to write a proper commentary, but this program seems to do what you need. It uses the uniq function from the List::MoreUtils modules. It isn't a core module and so may need installing. I trust that it's not important what order the Amounts and Colours appear in the combined fields?
use strict;
use warnings;
use List::MoreUtils 'uniq';
print scalar <DATA>;
my %data;
while (<DATA>) {
chomp;
my #fields = split /;/;
my $id = $fields[2];
push #{ $data{$id} }, \#fields;
}
for my $id ( sort keys %data ) {
my $set = $data{$id};
my #fields = #{ $set->[0] };
$fields[4] = join ',', uniq map { $_->[4] } #$set;
$fields[5] = join ',', uniq map { $_->[5] } #$set;
print join(';', #fields, ''), "\n";
}
__DATA__
Cat_id;Cat_name;Id;Name;Amount;Colour;Bla;
101;Fruits;50020;Strawberry;500;Red;1;
101;Fruits;50020;Strawberry;1000;Red;1;
201;Vegetables;60090;Tomato;50;Green;1;
201;Vegetables;60080;Onion;1;Purple;1;
201;Vegetables;60090;Tomato;100;Red;1;
201;Vegetables;60010;Carrot;100;Purple;1;
201;Vegetables;60050;Broccoli;500;Green;1;
201;Vegetables;60050;Broccoli;1000;Green;1;
201;Vegetables;60090;Tomato;500;Yellow;1;
101;Fruits;50060;Apple;500;Green;1;
101;Fruits;50010;Grape;500;Red;1;
201;Vegetables;60010;Carrot;500;White;1;
201;Vegetables;60050;Broccoli;2000;Green;1;
201;Vegetables;60090;Tomato;1000;Red;1;
101;Fruits;50020;Strawberry;100;Red;1;
101;Fruits;50060;Apple;1000;Red;1;
201;Vegetables;60010;Carrot;250;Yellow;1;
101;Fruits;50010;Grape;100;White;1;
101;Fruits;50030;Banana;500;Yellow;1;
201;Vegetables;60010;Carrot;1000;Yellow;1;
101;Fruits;50030;Banana;1000;Green;1;
101;Fruits;50020;Strawberry;200;Red;1;
101;Fruits;50010;Grape;200;White;1;
201;Vegetables;60010;Carrot;50;Orange;1;
201;Vegetables;60080;Onion;2;White;1;
output
Cat_id;Cat_name;Id;Name;Amount;Colour;Bla;
101;Fruits;50010;Grape;500,100,200;Red,White;1;
101;Fruits;50020;Strawberry;500,1000,100,200;Red;1;
101;Fruits;50030;Banana;500,1000;Yellow,Green;1;
101;Fruits;50060;Apple;500,1000;Green,Red;1;
201;Vegetables;60010;Carrot;100,500,250,1000,50;Purple,White,Yellow,Orange;1;
201;Vegetables;60050;Broccoli;500,1000,2000;Green;1;
201;Vegetables;60080;Onion;1,2;Purple,White;1;
201;Vegetables;60090;Tomato;50,100,500,1000;Green,Red,Yellow;1;

Why is Perl's chomp not doing what I want it to do?

I am receiving output from some process (shown in #result_listosp below). When I try to chomp output is weird. I desire the following output:
origin-server-pool-1 http_TestABC https_TestABC
Code:
use strict;
use warnings;
my #result_listosp = ( # From backticks
"origin-server-pool-1\n",
"http_TestABC \n",
"https_TestABC\n",
);
chomp #result_listosp;
Output:
origin-server-pool-1http_TestABC https_TestABC
I'm not sure what you think chomp is supposed to do, but it's not to add spaces?!
And it does not remove trailing whitespace either. If you want to remove trailing whitespace (including newlines) use the following instead of chomp(#result_listosp):
s/\s+\z// for #result_listosp;
As for adding a space between elements, you can use
print(join(' ', #result_listosp), "\n");
or even just
print("#result_listosp\n");
The function chomp only removes the newline (\n in this case) character at the end of a line.
If you want to trim (remove whitespaces from the ends), you can do this:
#!/usr/bin/perl
use strict;
use warnings;
sub trim_elements {
for my $i (#_) {
$i =~ s/^\s+|\s+$//g;
}
}
my #result_listosp = ( # From backticks
"origin-server-pool-1\n",
"http_TestABC \n",
"https_TestABC\n",
);
trim_elements #result_listosp;
for my $i (#result_listosp) {
print $i;
}
As you can see, I didn't use parenthesis. That works only because the sub is declared before the call. If you declare the sub after the code, you need to use parenthesis.
Francisco
If you have newlines in each line and you want to remove them, use chomp. If you want to concatenate strings with a space in between then use join:
my #result_listosp = ( # From backticks
"origin-server-pool-1\n",
"http_TestABC \n",
"https_TestABC\n",
);
print join (" ", map { /^\s*(.*?)\s*$/ } #result_listosp), "\n";
Output
origin-server-pool-1 http_TestABC https_TestABC

Sort CSV based on a certain column?

I'm sure I've done this in the past and there is something small I'm forgetting, but how can I sort a CSV file on a certain column? I'm interested in answers with and without 3rd party Perl modules. Mainly methods without, since I don't always have access to install additional modules.
Example data:
name,25,female
name,24,male
name,27,female
name,21,male
desired end result after sorting on the 2nd numeric column:
name,21,male
name,24,male
name,25,female
name,27,female
As CSV is a pretty complex format, it is better to use a module that does the work for us.
Following is an example using the Text::CSV module:
#!/usr/bin/env perl
use strict;
use warnings;
use constant AGE => 1;
use Text::CSV;
my $csv = Text::CSV->new();
my #rows;
while ( my $row_ref = $csv->getline( \*DATA ) ) {
push #rows, $row_ref;
}
#rows = sort { $a->[AGE] <=> $b->[AGE] } #rows;
for my $row_ref (#rows) {
$csv->combine(#$row_ref);
print $csv->string(), "\n";
}
__DATA__
name,25,female
name,24,male
name,27,female
name,21,male
In the spirit of there always being another way to do it, bear in mind that plain old GNU sort might be enough.
$ sort -t, -k2 -n unsorted.txt
name,21,male
name,24,male
name,25,female
name,27,female
Where the command line args are:
-t, # use comma as the record separator
-k2 # sort on the second key (record) in the line
-n # sort using numerical comparison (like using <=> instead of cmp in perl)
If you want a Perl solution, wrap it in qx() ;-)
There is also DBD::CSV:
#!/usr/bin/perl
use strict; use warnings;
use DBI;
my $dbh = DBI->connect('dbi:CSV:', undef, undef, {
RaiseError => 1,
f_ext => '.csv',
csv_tables => { test => { col_names => [qw' name age sex '] } },
});
my $sth = $dbh->prepare(q{
SELECT name, age, sex FROM test ORDER BY age
});
$sth->execute;
while ( my #row = $sth->fetchrow_array ) {
print join(',' => #row), "\n";
}
$sth->finish;
$dbh->disconnect;
Output:
name,21,male
name,24,male
name,25,female
name,27,female
The original poster asked for no third-party modules (which I take to mean nothing from CPAN). Whilst this is restriction that will horribly limit your ability to write good modern Perl code, in this instance it's possible using the (core) Text::ParseWords module in place of the (non-core) Text::CSV. So, borrowing heavily from Alan's example, we get:
#!/usr/bin/env perl
use strict;
use warnings;
use Text::ParseWords;
my #rows;
while (<DATA>) {
push #rows, [ parse_line(',', 0, $_) ];
}
#rows = sort { $a->[1] <=> $b->[1] } #rows;
foreach (#rows) {
print join ',', #$_;
}
__DATA__
name,25,female
name,24,male
name,27,female
name,21,male
When you provide your own comparison code, you can sort on anything. Just extract the desired element with a regex, or probably a split in this case, and then compare on that. If you have a lot of elements, I would parse the data into a list of lists and then the comparison code can access it without parsing. That would eliminate parsing the same row over and over as it's compared with other rows.
using Raku (née Perl6)
This is a fairly quick-and-dirty solution, mainly intended for "hand-rolled" CSV. The code works as long as there's only one (1) age-per-row: Read lines $a, comb for 1-to-3 <digit> surrounded by commas and assign to #b, derive sorting index $c, use $c to reorder lines $a:
~$ raku -e 'my $a=lines(); my #b=$a.comb(/ \, <(\d**1..3)> \, /).pairs; my $c=#b.sort(*.values)>>.keys.flat; $a[$c.flat]>>.put;' sort_age.txt
name,21,male
name,24,male
name,25,female
name,27,female
I prepended a few dummy lines to the OP's input file see how the code above reacts with 1). a blank age field, 2). a blank "" string for age, 3). a bogus "9999" for age, and 4). a bogus "NA" for age. The code above fails catastrophically. To fix this you have to write a ternary that inserts a numeric placeholder value (e.g. zero) whenever the regex fails to match a line.
Below is a longer but more robust solution. Note--I use a placeholder value of 999 to move lines with blank/invalid ages to the bottom:
~$ raku -e 'my #a=lines(); my #b = do for #a {if $_ ~~ m/ \, <(\d**1..3)> \, / -> { +$/ } else { 999 }; }; my $c=#b.pairs.sort(*.values)>>.keys.flat; #a[$c.flat]>>.put;' sort_age.txt
name,21,male
name,24,male
name,25,female
name,27,female
name,,male
name,"",female
name,9999,male
name,NA,male
To sort in reverse, add .reverse to the end of the method chain that creates $c. Again, change the else placeholder argument to move lines absent a valid age to the top or to the bottom. Also, creation of #b above can be written using the ternary operator: my #b = do for #a {(m/ \, <(\d**1..3)> \, /) ?? +$/ !! 999 };, as an alternative.
Here's the unsorted input file for posterity:
$ cat sort_age.txt
name,,male
name,"",female
name,9999,male
name,NA,male
name,25,female
name,24,male
name,27,female
name,21,male
HTH.
https://raku.org/
I would do something like this:
#!/usr/bin/perl
use warnings;
use strict;
my #rows = map { chomp; [split /[,\s]+/, $_] } <DATA>; #read each row into an array
my #sorted = sort { $a->[1] <=> $b->[1] } #rows; # sort the rows (numerically) by second column
for (#sorted) {
print join(', ', #$_) . "\n"; # print them out as CSV
}
__DATA__
name,25,female
name,24,male
name,27,female
name,21,male