Merging of columns in text file - perl

Input of the text file
A | 1 | def | 1432
A | 1 | ffr | 1234
A | 1 | dfs | 3241
A | 2 | asf | 2213
Desired Output
A | 1 | def 1432,ffr 1234,dfs 3241
A | 2 | asf 2213
Merging values related to 2nd column in a single row

And here is a Perl attempt:
perl -F'\s+\|\s+' -alne '
$a{$F[1]} .= "$F[2] $F[3],";
END {
$_ = "A | $_ | $a{$_}", s/,$//, print for sort keys %a;
}' FILE

Your problem is not well specified, but here's a step towards a solution:
awk -F\| '{ a[$1 "|" $2] = a[$1 "|" $2 ] "," $3 $4 }
END { for( x in a ) print x a[x]}' input |
sed 's/,/|/' # Trim leading comma
This will incorrectly delete the first comma in either of the first 2 columns instead of the incorrectly inserted leading comma in the 3rd column of output, and sorts on both of the first 2 columns rather than just the 2nd. Also, the order of the output will be different than the input. There are probably other issues, but this may help.

awk '
BEGIN { FS = " \\| "; OFS = SUBSEP = " | " }
{
val[$1,$2] = val[$1,$2] sep[$1,$2] $3 " " $4
sep[$1,$2] = ","
}
END { for (key in val) print key, val[key] }
'
This will likely not preserve the order of the input. Also, it uses both the 1st and 2nd columns as the key, but as you say the 1st column does not change it is irrelevant.

Related

how to convert 23/1/17 to 23/01/2017 in a row of csv file with unix?

I am looking for how to convert all dates in a csv file row into this format ? example I want to convert 23/1/17 to 23/01/2017
I use unix
thank you
my file is like this :
23/1/17
17/08/18
1/1/2
5/6/03
18/05/2019
and I want this :
23/01/2017
17/08/2018
01/01/2002
05/06/2003
18/05/2019
I used date_samples.csv as my test data:
23/1/17,17/08/18,1/1/02,5/6/03,18/05/2019
cat date_samples.csv | tr "," "\n" | awk 'BEGIN{FS=OFS="/"}{print $2,$1,$3}' | \
while read CMD; do
date -d $CMD +%d/%m/%Y >> temp
done; cat temp | tr "\n" "," > converted_dates.csv ; rm temp; truncate -s-1 converted_dates.csv
Output:
23/01/2017,17/08/2018,01/01/2002,05/06/2003,18/05/2019
This portion of the code converts your "," to new lines and makes your input DD/MM/YY to MM/DD/YY, since the date command does not accept date inputs of DD/MM/YY. It then loops through re-arranged dates and convert them to DD/MM/YYYY format and temporarily stores them in temp.
cat date_samples.csv | tr "," "\n" | awk 'BEGIN{FS=OFS="/"}{print $2,$1,$3}' | \
while read CMD; do
date -d $CMD +%d/%m/%Y >> temp
done;
This line cat temp | tr "\n" "," > converted_dates.csv ; rm temp; truncate -s-1 converted_dates.csv converts the new line back to "," and puts the output to converted_dates.csv and deletes temp.
Using awk:
awk -F, '{ for (i=1;i<=NF;i++) { split($i,map,"/");if (length(map[3])==1) { map[3]="0"map[3] } "date -d \""map[2]"/"map[1]"/"map[3]"\" \"+%d/%m/%y\"" | getline dayte;close("date -d \""map[2]"/"map[1]"/"map[3]"\" \"+%d/%m/%y\"");$i=dayte }OFS="," }1' file
Explanation:
awk -F, '{
for (i=1;i<=NF;i++) {
split($i,map,"/"); # Loop through each comma separated field and split into the array map using "/" as the field seperator
if (length(map[3])==1) {
map[3]="0"map[3] # If the year is just one digit, pad out with prefix 0
}
"date -d \""map[2]"/"map[1]"/"map[3]"\" \"+%d/%m/%y\"" | getline dayte; # Run date command on day month and year and read result into variable dayte
close("date -d \""map[2]"/"map[1]"/"map[3]"\" \"+%d/%m/%y\""); # Close the date execution pipe
$i=dayte # Replace the field for the dayte variable
}
OFS="," # Set the output field seperator
}1' file

Perl: Perl6::Form format

I have file something like this,
SR Name Rollno Class
1 Sanjay 01 B
2 Rahul_Kumar_Khanna 09 A
Now I need to add "|" between each. So it should look like
SR | Name |Rollno | Class|
1 | Sanjay |01 | B |
2 | Rahul_Kumar_Khanna|09 | A |
I am using Perl6::form
my $text;
foreach my $line (#arr) {
my ($SR, $Name, $Rollno, $Class) = split (" ", $line);
my $len = length $Name;
$text = form
'| {||||||||} | {||||||||} | {||||||||} | {||||||||}|',
$SR, $Name, $Rollno, $Class;
print $text;
}
Here till now I have done but the name is not comming out properly. I have add extra "|" in name for that. Is there any way we can add "|" by calculating length like(below). I tried but getting error.
'| {||||||||} | {||||||||}x$len | {||||||||} | {||||||||}|',
Problem #1
'| {||||||||} | {||||||||}x$len | {||||||||} | {||||||||}|'
produces
| {||||||||} | {||||||||}x20 | {||||||||} | {||||||||}|
but you're trying to get
| {||||||||} | {||||||||||||||||||||} | {||||||||} | {||||||||}|
For that, you'd want
'| {||||||||} | {'.( "|" x $len ).'} | {||||||||} | {||||||||}|'
Problem #2
$len is the length of the name field of the current row. It's different for every row. This is wrong, cause you want the output to be the same width for every row. $len needs to be the length of the longest name field.
You will need to find the correct value for $len before even starting the loop.
# Read in the data as an array of rows.
# Each row is an array of values.
my #rows = map { [ split ] } <>;
# Find the maximum width of each column.
my #col_lens = (0) x #{rows[0]};
for my $row (#rows) {
# Skip the blank line after the header.
next if !#$row;
for my $col_idx (0..$#$row) {
my $col_len = $row->[$col_idx];
if ($col_lens->[$col_idx] < $col_len) {
$col_lens->[$col_idx] = $col_len;
}
}
}
my $form =
join "",
"| ",
"{".( "|"x($col_lens[0]-2) )."}",
" | ",
"{".( "|"x($col_lens[1]-2) )."}",
" | ",
"{".( "|"x($col_lens[2]-2) )."}",
" | ",
"{".( "|"x($col_lens[3]-2) )."}",
" |";
for my $row (#rows) {
if (#$row) {
print form($form, #$row);
} else {
print "\n";
}
}

replacing CR LF in perl with "|"

Using Perl, I want to replace CRLF by | in the end of a line beginning with "ID".
So, to be more explicit: If a line begins with "ID", I replace CRLF in the end of this sentence by |.
This is what I have done:
elsif ($line =~ /^ID:\n/) { print $outputFile $line."|"; }
I think that it is not good ..
Depending on platform, \n has diffrent meanings. From perlport:
LF eq \012 eq \x0A eq \cJ eq chr(10) eq ASCII 10
CR eq \015 eq \x0D eq \cM eq chr(13) eq ASCII 13
| Unix | DOS | Mac |
---------------------------
\n | LF | LF | CR |
\r | CR | CR | LF |
\n * | LF | CRLF | CR |
\r * | CR | CR | LF |
---------------------------
* text-mode STDIO
You could do:
elsif ($line =~ /^(ID\b.*)\R/) { print $outputFile "$1|" }
\R stands for any kind of linebreak.

PERL : Using Text::Wrap and specify the end of line

Yes, I'm re-writing cowsay :)
#!/usr/bin/perl
use Text::Wrap;
$Text::Wrap::columns = 40;
my $FORTUNE = "The very long sentence that will be outputted by another command and it can be very long so it is word-wrapped The very long sentence that will be outputted by another command and it can be very long so it is word-wrapped";
my $TOP = " _______________________________________
/ \\
";
my $BOTTOM = "\\_______________________________________/
";
print $TOP;
print wrap('| ', '| ', $FORTUNE) . "\n";
print $BOTTOM;
Produces this
_______________________________________
/ \
| The very long sentence that will be
| outputted by another command and it
| can be very long so it is
| word-wrapped The very long sentence
| that will be outputted by another
| command and it can be very long so it
| is word-wrapped
\_______________________________________/
How can I get this ?
_______________________________________
/ \
| The very long sentence that will be |
| outputted by another command and it |
| can be very long so it is |
| word-wrapped The very long sentence |
| that will be outputted by another |
| command and it can be very long so it |
| is word-wrapped |
\_______________________________________/
I could not find a way in the documentation, but you can apply a small hack if you save the string. It is possible to assign a new line ending by using a package variable:
$Text::Wrap::separator = "|$/";
You also need to prevent the module from expanding tabs and messing with the character count:
$Text::Wrap::unexpand = 0;
This is simply a pipe | followed by the input record separator $/ (newline most often). This will add a pipe to the end of the line, but no padding space, which will have to be added manually:
my $text = wrap('| ', '| ', $FORTUNE) . "\n";
$text =~ s/(^.+)\K\|/' ' x ($Text::Wrap::columns - length($1)) . '|'/gem;
print $text;
This will match the beginning of each line, ending with a |, add the padding space by multiplying a space by columns minus length of matched string. We use the /m modifier to make ^ match newlines inside the string. .+ by itself will not match newlines, which means each match will be an entire line. The /e modifier will "eval" the replacement part as code, not a string.
Note that it is somewhat of a quick hack, so bugs are possible.
If you're willing to download a more powerful module, you can use Text::Format. It has a lot more options for customizing, but the most relevant one is rightFill which fills the rest of the columns in each line with spaces.
Unfortunately, you can't customize the left and right sides with non-space characters. You can use a workaround by doing regex substitutions, just as Text::NWrap does in its source code.
#!/usr/bin/env perl
use utf8;
use Text::Format;
chop(my $FORTUNE = "The very long sentence that will be outputted by another command and it can be very long so it is word-wrapped " x 2);
my $TOP = "/" . '‾'x39 . "\\\n";
my $BOTTOM = "\\_______________________________________/\n";
my $formatter = Text::Format->new({ columns => 37, firstIndent => 0, rightFill => 1 });
my $text = $formatter->format($FORTUNE);
$text =~ s/^/| /mg;
$text =~ s/\n/ |\n/mg;
print $TOP;
print $text;
print $BOTTOM;

list the files with minimum sequence

I have some files in a directory as below (not necessarily sorted):
A_10
A_20
A_30
B_10
B_30
C_10
C_20
D_20
D_30
E_10
E_20
E_30
10, 20 and 30 are the sequence numbers of A,B,C,D,E respectively.
I want to select only those files with minimum sequence of all A,B,C,D,E
the output should be :
A_10
B_10
C_10
D_20
E_10
could anybody help me?
perl -le '
print join $/,
grep !$_{( split "_" )[0]}++,
sort glob "*_*"
'
or:
printf '%s\n' *_* | sort | awk -F_ '!_[$1]++'
or:
printf '%s\n' *_* | sort -t_ -uk1,1
In bash:
for x in A B C D E; do
ls -1 ${x}_* | sort | head -n1
done