Unix script to transpose rows to columns with defined file layout as output - perl

Script tried:
perl -nle '
if($. == 1)
{ (#a)=/([\w - .]+)(?=,|\s*$)/g }
else
{
(#b)=/([\w - .]+)(?=,|\s*$)/g;
print "$a[0]|$b[0]|$b[1]|$b[2}|$a[$_]|$b[$_+3]" foreach (0..$#a)
}
' ip.txt >op.txt
input data :
src,FI,QMA,PCG,PCC,PREI,G T
PIM2016.csv,MMR.S T - RED,334,114,120,34,123,725
output with latest script:
SRC|PIM2016.csv|MMRPPS|RED|SRC|334
SRC|PIM2016.csv|MMRPPS|RED|FI|114
SRC|PIM2016.csv|MMRPPS|RED|QMA|120
SRC|PIM2016.csv|MMRPPS|RED|PCG|34
SRC|PIM2016.csv|MMRPPS|RED|PCC|123
SRC|PIM2016.csv|MMRPPS|RED|PREI|725
SRC|PIM2016.csv|MMRPPS|RED|G T|
Required output:
SRC|PIM2016.csv|MMRPPS|S T - RED|FI|334
SRC|PIM2016.csv|MMRPPS|S T - RED|QMA|114
SRC|PIM2016.csv|MMRPPS|S T - RED|PCG|120
SRC|PIM2016.csv|MMRPPS|S T - RED|PCC|34
SRC|PIM2016.csv|MMRPPS|S T - RED|PREI|123
SRC|PIM2016.csv|MMRPPS|S T - RED|G T|725

I think your life gets a lot easier if you know about split()
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
my (#head);
while (<>) {
chomp;
if ($. == 1) {
#head = split /,/;
next;
}
my #data = split /,/;
say "$head[0]|$data[0]|$data[1]|$data[2]|$head[$_]|$data[$_+2]" for (1 .. $#head);
}
I've written it as a program rather than as a command line as I think it's too long to be run on the command line.
Also, I hope that the dot after "MMR" in your sample input is a typo for a comma. If that's not the case it gets a little more complex - but not very much.
Oh, and there's no "PPS" in your sample input, so I've no idea where that comes from in your sample output.

Related

Replace single space with multiple spaces in perl

I have a requirement of replacing a single space with multiple spaces so that the second field always starts at a particular position (here 36 is the position of second field always).
I have a perl script written for this:
while(<INP>)
{
my $md=35-index($_," ");
my $str;
$str.=" " for(1..$md);
$_=~s/ +/$str/;
print "$_" ;
}
Is there any better approach with just using the regex in =~s/// so that I can use it on CLI directly instead of script.
Assuming that the fields in your data are demarcated by spaces
while (<$fh>) {
my ($first, #rest) = split;
printf "%-35s #rest\n", $first;
}
The first field is now going to be 36 wide, aligned left due to - in the format of printf. See sprintf for the many details. The rest is printed with single spaces between the original space-separated fields, but can instead be done as desired (tab separated, fixed width...).
Or you can leave the "rest" after the first field untouched by splitting the line into two parts
while (<$fh>) {
my ($first, $rest) = /(\S+)\s+(.*)/;
printf "%-35s $rest\n", $first;
}
(or use split ' ', $_, 2 instead of regex)
Please give more detail if there are other requirements.
One approach is to use plain ol' Perl formats:
#!/usr/bin/perl
use warnings;
use strict;
my($first, $second, $remainder);
format STDOUT =
#<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< #<<<<<< #<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
$first, $second,$remainder
.
while (<DATA>) {
($first, $second, $remainder) = split(/\s+/, $_, 3);
write;
}
exit 0;
__DATA__
ABCD TEST EFGH don't touch
FOO BAR FUD don't touch
Test output. I probably miscounted the columns, but you should get the idea:
$ perl dummy.pl
ABCD TEST EFGH don't touch
FOO BAR FUD don't touch
Other option would be Text::Table

lowercase everything except content between single quotes - perl

Is there a way in perl to replace all text in input line except ones within single quotes(There could be more than one) using regex, I have achieved this using the code below but would like to see if it can be done with regex and map.
while (<>) {
my $m=0;
for (split(//)) {
if (/'/ and ! $m) {
$m=1;
print;
}
elsif (/'/ and $m) {
$m=0;
print;
}
elsif ($m) {
print;
}
else {
print lc;
}
}
}
**Sample input:**
and (t.TARGET_TYPE='RAC_DATABASE' or (t.TARGET_TYPE='ORACLE_DATABASE' and t.TYPE_QUALIFIER3 != 'racinst'))
**Sample output:**
and (t.target_type='RAC_DATABASE' or (t.target_type='ORACLE_DATABASE' and t.type_qualifier3 != 'racinst'))
You can give this a shot. All one regexp.
$str =~ s/(?:^|'[^']*')\K[^']*/lc($&)/ge;
Or, cleaner and more documented (this is semantically equivalent to the above)
$str =~ s/
(?:
^ | # Match either the start of the string, or
'[^']*' # some text in quotes.
)\K # Then ignore that part,
# because we want to leave it be.
[^']* # Take the text after it, and
# lowercase it.
/lc($&)/gex;
The g flag tells the regexp to run as many times as necessary. e tells it that the substitution portion (lc($&), in our case) is Perl code, not just text. x lets us put those comments in there so that the regexp isn't total gibberish.
Don't you play too hard with regexp for such a simple job?
Why not get the kid 'split' for it today?
#!/usr/bin/perl
while (<>)
{
#F = split "'";
#F = map { $_ % 2 ? $F[$_] : lc $F[$_] } (0..#F);
print join "'", #F;
}
The above is for understanding. We often join the latter two lines reasonably into:
print join "'", map { $_ % 2 ? $F[$_] : lc $F[$_] } (0..#F);
Or enjoy more, making it a one-liner? (in bash shell) In concept, it looks like:
perl -pF/'/ -e 'join "'", map { $_ % 2 ? $F[$_] : lc $F[$_] } (0..#F);' YOUR_FILE
In reality, however, we need to respect the shell and do some escape (hard) job:
perl -pF/\'/ -e 'join "'"'"'", map { $_ % 2 ? $F[$_] : lc $F[$_] } (0..#F);' YOUR_FILE
(The single-quoted single quote needs to become 5 letters: '"'"')
If it doesn't help your job, it helps sleep.
One more variant with Perl one-liner. I'm using hex \x27 for single quotes
$ cat sql_str.txt
and (t.TARGET_TYPE='RAC_DATABASE' or (t.TARGET_TYPE='ORACLE_DATABASE' and t.TYPE_QUALIFIER3 != 'racinst'))
$ perl -ne ' { #F=split(/\x27/); for my $val (0..$#F) { $F[$val]=lc($F[$val]) if $val%2==0 } ; print join("\x27",#F) } ' sql_str.txt
and (t.target_type='RAC_DATABASE' or (t.target_type='ORACLE_DATABASE' and t.type_qualifier3 != 'racinst'))
$

perl: why not returning for case 0 or 1?

I am implementing a perl fib with hash table:
#!/usr/bin/perl
use strict;
use warnings;
no warnings 'recursion';
my %m_fib = (0,1,1,1);
while (my $a = <STDIN>) {
print "--".&fib($a)."\n";
}
sub fib {
foreach my $i (#_) {
if (not defined $m_fib{$i}) {
$m_fib{$i} = &fib($i - 1) + &fib($i - 2);
}
return $m_fib{$i};
}
}
It is working well with input larger than 1, but silent with either 0 or 1.
The hash should be fine since it is returning the correct result, but why it won't work if I feed that with 0 or 1?
Your input contains the end of line (\n). Remove it with chomp (documentation)
while (my $a = <STDIN>) {
chomp $a;
print "--".&fib($a)."\n";
}
Edit: What the problem is
with any input the defined test will always fail as the string number\n is not present in the hash
Perl is able to perform a mathematical operation with your input 20\n - 1 is 19
Now with 0 or 1 no defined value is found and your code will call fib(-1) and fib(-2) or fib(0) and fib(-1) respectively. This will generate an endless loop.
With 2 the test will fail and Perl will perform the subtraction calling fib(1) + fib(0) (without the \n). In the second call your test will work as $m_fib(0) does indeed exist.
Edit 2
A small review with a few comments
your function processes more than one argument but exits after the first one. You never call it with more than one argument (and even if you did it will never process the second)
some other comments inline (you can review you code using Perl::Critic)
#!/usr/bin/perl
use strict;
use warnings;
# Not needed
# no warnings 'recursion';
my %m_fib = ( 0, 1, 1, 1 );
# From Perl::Critic
#
# Use "<>" or "<ARGV>" or a prompting module instead of "<STDIN>" at line 10, column 17.
# InputOutput::ProhibitExplicitStdin (Severity: 4)
# Perl has a useful magic filehandle called `*ARGV' that checks the
# command line and if there are any arguments, opens and reads those as
# files. If there are no arguments, `*ARGV' behaves like `*STDIN' instead.
# This behavior is almost always what you want if you want to create a
# program that reads from `STDIN'. This is often written in one of the
# following two equivalent forms:
#
# while (<ARGV>) {
# # ... do something with each input line ...
# }
# # or, equivalently:
# while (<>) {
# # ... do something with each input line ...
# }
#
# If you want to prompt for user input, try special purpose modules like
# IO::Prompt.
while ( my $a = <> ) {
chomp $a;
# use " just when needed
print '--' . fib($a) . "\n";
}
sub fib {
my $i = shift;
if ( not defined $m_fib{$i} ) {
# it is not necessary to use & for subroutine calls and
# can be confused with the logical and
$m_fib{$i} = fib( $i - 1 ) + fib( $i - 2 );
}
return $m_fib{$i};
}

Perl parsing - mixture of chars, tabs and spaces

I have the following types of line in my code:
MMAPI_CLOCK_OUTPUTS = 1, /*clock outputs system*/
MMAPI_SYSTEM_MANAGEMENT = 0, /*sys man system*/
I want to parse them to get:
'MMAPI_CLOCK_OUTPUTS'
'1'
'clock outputs system'
So I tried:
elsif($TheLine =~ /\s*(.*)s*=s*(.*),s*\/*(.*)*\//)
but this doesn't get the last string 'clock outputs system'
What should the parsing code actually be?
You should escape the slashes, stars and the s for spaces. Instead of writing /, * or s in your regex, write \/, \* and \s:
/\s*(.*)\s=\s*(.*),\s\/\*(.*)\*\//
if($TheLine =~ m%^(\S+)\s+=\s+(\d+),\s+/\*(.*)\*/%) {
print "$1 $2 $3\n"
}
This uses % as an alternative delimiter in order to avoid leaning toothpick syndrome when you escape the / characters.
Try this regex: /^\s*(.*?)\s*=\s*(\d+),\s*\/\*(.*?)\*\/$/
Here is an example in which you can test it:
#!/usr/bin/perl
use strict;
use warnings;
my $str = "MMAPI_CLOCK_OUTPUTS = 1, /*clock outputs system*/\n
MMAPI_SYSTEM_MANAGEMENT = 0, /*sys man system*/";
while ($str =~ /^\s*(.*?)\s*=\s*(\d+),\s*\/\*(.*?)\*\/$/gm) {
print "$1 $2 $3 \n";
}
# Output:
# MMAPI_CLOCK_OUTPUTS 1 clock outputs system
# MMAPI_SYSTEM_MANAGEMENT 0 sys man system

How can I use awk or Perl to increment a number in a large XML file?

I have an XML file with the following line:
<VALUE DECIMAL_VALUE="0.2725" UNIT_TYPE="percent"/>
I would like to increment this value by .04 and keep the format of the XML in place. I know this is possible with a Perl or awk script, but I am having difficulty with the expressions to isolate the number.
If you're on a box with the xsltproc command in place I would suggest you use XSLT for this.
For a Perl solution I'd go for using the DOM. Check this DOM Processing with Perl article out.
That said. If your XML file is produced in a predictable way something naïve like the following could work:
perl -pe 's#(<VALUE DECIMAL_VALUE=")([0-9.]+)(" UNIT_TYPE="percent"/>)#"$1" . ($2 + 0.4) . "$3"#e;'
If you are absolutely sure that the format of your XML will never change, that the order of the attributes is fixed, that you can indeed get the regexp for the number right... then go for the non-parser based solution.
Personally I would use XML::Twig (maybe because I wrote it ;--). It will process the XML as XML, while still respecting the original format of the file, and won't load it all in memory before starting to work.
Untested code below:
#!/usr/bin/perl
use strict;
use warnings;
use XML::Twig;
XML::Twig->new( # call the sub for each VALUE element with a DECIMAL_VALUE attribute
twig_roots => { 'VALUE[#DECIMAL_VALUE]' => \&upd_decimal },
# print anything else as is
twig_print_outside_roots => 1,
)
->parsefile_inplace( 'foo.xml');
sub upd_decimal
{ my( $twig, $value)= #_; # twig is the XML::Twig object, $value the element
my $decimal_value= $value->att( 'DECIMAL_VALUE');
$decimal_value += 0.4;
$value->set_att( DECIMAL_VALUE => $decimal_value);
$value->print;
}
This takes input on stdin, outputs to stdout:
while(<>){
if( $_ =~ /^(.*DECIMAL_VALUE=\")(.*)(\".*)$/ ){
$newVal = $2 + 0.04;
print "$1$newVal$3\n";
}else{
print $_;
}
}
Something akin to the following will work. It may need tweaking if there is extra spacing, but that is left as an exercise for the reader.
function update_after(in_string, locate_string, delta) {
local_pos = index(in_string,locate_string);
leadin = substr(in_string,0,local_pos-1);
leadout = substr(in_string,local_pos+length(locate_string));
new_value = leadout+delta;
quote_pos = index(leadout,"\"");
leadout = substr(leadout, quote_pos + 1);
return leadin locate_string new_value"\"" leadout;
}
/^ *\<VALUE/{
print update_after($0, "DECIMAL_VALUE=\"",0.4);
}
here's gawk
awk '/DECIMAL_VALUE/{
for(i=1;i<=NF;i++){
if( $i~/DECIMAL_VALUE/){
gsub(/DECIMAL_VALUE=|\042/,"",$i)
$i="DECIMAL_VALUE=\042"$i+0.4"\042"
}
}
}1' file