Treat Two Columns as One - perl

Sample Text:
$ cat X
Birth Death Name
02/28/42 07/03/69 Brian Jones
11/27/42 09/18/70 Jimi Hendrix
11/19/43 10/04/70 Janis Joplin
12/08/43 07/03/71 Jim Morrison
11/20/46 10/29/71 Duane Allman
After Processing With Perl, column & sed:
$ perl -lae 'print "$F[2]_$F[3] $F[0]"' X | column -t | sed 's/_/ /g'
Name Birth
Brian Jones 02/28/42
Jimi Hendrix 11/27/42
Janis Joplin 11/19/43
Jim Morrison 12/08/43
Duane Allman 11/20/46
This is the exact output I want. But the issue is, I do not want to use column -t | sed 's/_/ /g' at the end.
My intuition is that this can be done only with perl oneliner (without the need of sed or column).
Is it possible? How can I do that?
P.S. I have an awk solution (awk '{print $3"_"$4" "$1}' X | column -t | sed 's/_/ /g')as well for this exact same result. However, I am looking for a perl only solution.

One way
perl -wlnE'say join " ", (split " ", $_, 3)[-1,0]' input.txt
This limits the split to three terms -- first two fields obtained by normally splitting by the given pattern, and then the rest, here comprising the name.
It won't line up nicely as in the shown output.
If the proper alignment is a must, then there's more to do since one must first see the whole file in order to know what the field width should be. Then the "one"-liner (command-line program) is
perl -MList::Util=max -wlne'
push #recs, [ (split " ", $_, 3)[-1,0] ];
END {
$m = max map { length $_->[0] } #recs;
printf("%-${m}s %s\n", #$_) for #recs
}' input.txt
If an apriori-set field width is acceptable, as brought up in a comment, we can do
perl -wlne'printf "%-20s %s\n", (split " ", $_, 3)[-1,0]' input.txt
The saving grace for the obvious short-coming here -- what with names that are longer? -- is that it is only those particular lines that will be out of order.

See if following one liner will be an acceptable solution
perl -ne "/(\S+)\s+\S+\s+(.*)/, printf \"%-13s %s\n\",$2,$1" birth_data.dat
Input birth_data.dat
Birth Death Name
02/28/42 07/03/69 Brian Jones
11/27/42 09/18/70 Jimi Hendrix
11/19/43 10/04/70 Janis Joplin
12/08/43 07/03/71 Jim Morrison
11/20/46 10/29/71 Duane Allman
Output
Name Birth
Brian Jones 02/28/42
Jimi Hendrix 11/27/42
Janis Joplin 11/19/43
Jim Morrison 12/08/43
Duane Allman 11/20/46

Related

Sed - replace words

I have a problem with replacing string.
|Stm=2|Seq=2|Num=2|Svc=101|MsgSize(514)=514|MsgType=556|SymbolIndex=16631
I want to find occurrence of Svc till | appears and swap place with Stm till | appears.
My attempts went to replacing characters and this is not my goal.
awk -F'|' -v OFS='|'
'{a=b=0;
for(i=1;i<=NF;i++){a=$i~/^Stm=/?i:a;b=$i~/^Svc=/?i:b}
t=$a;$a=$b;$b=t}7' file
outputs:
|Svc=101|Seq=2|Num=2|Stm=2|MsgSize(514)=514|MsgType=556|SymbolIndex=16631
the code exchange the column of Stm.. and Svc.., no matter which one comes first.
If perl solution is okay, assumes only one column matches each for search terms
$ cat ip.txt
|Stm=2|Seq=2|Num=2|Svc=101|MsgSize(514)=514|MsgType=556|SymbolIndex=16631
$ perl -F'\|' -lane '
#i = grep { $F[$_] =~ /Svc|Stm/ } 0..$#F;
$t=$F[$i[0]]; $F[$i[0]]=$F[$i[1]]; $F[$i[1]]=$t;
print join "|", #F;
' ip.txt
|Svc=101|Seq=2|Num=2|Stm=2|MsgSize(514)=514|MsgType=556|SymbolIndex=16631
-F'\|' -lane split input line on |, see also Perl flags -pe, -pi, -p, -w, -d, -i, -t?
#i = grep { $F[$_] =~ /Svc|Stm/ } 0..$#F get index of columns matching Svc and Stm
$t=$F[$i[0]]; $F[$i[0]]=$F[$i[1]]; $F[$i[1]]=$t swap the two columns
Or use ($F[$i[0]], $F[$i[1]]) = ($F[$i[1]], $F[$i[0]]); courtesy How can I swap two Perl variables
print join "|", #F print the modified array
You need to use capture groups and backreferences in a string substition.
The below will swap the 2:
echo '|Stm=2|Seq=2|Num=2|Svc=101|MsgSize(514)=514|MsgType=556|SymbolIndex=16631' | sed 's/\(Stm.*|\)\(.*\)\(Svc.*|\)/\3\2\1/'
As pointed out in the comment from #Kent, this will not work if the strings were not in that order.

sed/awk/cut/grep - Best way to extract string

I have a results.txt file that is structured in this format:
Uncharted 3: Javithaxx l Rampant l Graveyard l Team Deathmatch HD (D1VpWBaxR8c)
Matt Darey feat. Kate Louise Smith - See The Sun (Toby Hedges Remix) (EQHdC_gGnA0)
The Matrix State (SXP06Oax70o)
Above & Beyond - Group Therapy Radio 014 (guest Lange) (2013-02-08) (8aOdRACuXiU)
I want to create a new file extracting the youtube URL ID specified in the last characters in each line line "8aOdRACuXiU"
I'm trying to build a URL like this in a new file:
http://www.youtube.com/watch?v=8aOdRACuXiU&hd=1
Note, I appended the &hd=1 to the string that I am trying to be replaced. I have tried using Linux reverse and cut but reverse or rev munges my data. The hard part here is that each line in my text file will have entries with parentheses and I only care about getting the data between the last set of parentheses. Each line has a variable length so that isn't helpful either. What about using grep and .$ for the end of the line?
In summary, I want to extract the youtube ID from results.txt and export it to a new file in the following format: http://www.youtube.com/watch?v=8aOdRACuXiU&hd=1
Using awk:
awk '{
v = substr( $NF, 2, length( $NF ) - 2 )
printf "%s%s%s\n", "http://www.youtube.com/watch?v=", v, "&hd=1"
}' infile
It yields:
http://www.youtube.com/watch?v=D1VpWBaxR8c&hd=1
http://www.youtube.com/watch?v=EQHdC_gGnA0&hd=1
http://www.youtube.com/watch?v=SXP06Oax70o&hd=1
http://www.youtube.com/watch?v=8aOdRACuXiU&hd=1
$ sed 's!.*(\(.*\))!http://www.youtube.com/watch?v=\1\&hd=1!' results.txt
http://www.youtube.com/watch?v=D1VpWBaxR8c&hd=1
http://www.youtube.com/watch?v=EQHdC_gGnA0&hd=1
http://www.youtube.com/watch?v=SXP06Oax70o&hd=1
http://www.youtube.com/watch?v=8aOdRACuXiU&hd=1
Here, .*(\(.*\)) looks for the last occurrence of a pair of parentheses, and captures the characters inside those parentheses. The captured group is then inserted into the URL using \1.
Using a perl one-liner :
perl -lne 'printf "http://www.youtube.com/watch?v=%s&hd=1\n", $& if /[^\(]+(?=\)$)/' file.txt
Or multi-line version :
perl -lne '
printf(
"http://www.youtube.com/watch?v=%s&hd=1\n",
$&
) if /[^\(]+(?=\)$)/
' file.txt

sed — joining a range of selected lines

I'm a beginner to sed. I know that it's possible to apply a command (or a set of commands) to a certain range of lines like so
sed '/[begin]/,/[end]/ [some command]'
where [begin] is a regular expression that designates the beginning line of the range and [end] is a regular expression that designates the ending line of the range (but is included in the range).
I'm trying to use this to specify a range of lines in a file and join them all into one line. Here's my best try, which didn't work:
sed '/[begin]/,/[end]/ {
N
s/\n//
}
'
I'm able to select the set of lines I want without any problem, but I just can't seem to merge them all into one line. If anyone could point me in the right direction, I would be really grateful.
One way using GNU sed:
sed -n '/begin/,/end/ { H;g; s/^\n//; /end/s/\n/ /gp }' file.txt
This is straight forward if you want to select some lines and join them. Use Steve's answer or my pipe-to-tr alternative:
sed -n '/begin/,/end/p' | tr -d '\n'
It becomes a bit trickier if you want to keep the other lines as well. Here is how I would do it (with GNU sed):
join.sed
/\[begin\]/ {
:a
/\[end\]/! { N; ba }
s/\n/ /g
}
So the logic here is:
When [begin] line is encountered start collecting lines into pattern space with a loop.
When [end] is found stop collecting and join the lines.
Example:
seq 9 | sed -e '3s/^/[begin]\n/' -e '6s/$/\n[end]/' | sed -f join.sed
Output:
1
2
[begin] 3 4 5 6 [end]
7
8
9
I like your question. I also like Sed. Regrettably, I do not know how to answer your question in Sed; so, like you, I am watching here for the answer.
Since no Sed answer has yet appeared here, here is how to do it in Perl:
perl -wne 'my $flag = 0; while (<>) { chomp; if (/[begin]/) {$flag = 1;} print if $flag; if (/[end]/) {print "\n" if $flag; $flag = 0;} } print "\n" if $flag;'

Lowercase to Uppercase of character in Shell Scripting [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Character Lowercase to Uppercase in Shell Scripting
I have value as: james,adam,john I am trying to make it James,Adam,John (First character of each name should be Uppercase).
echo 'james,adam,john' | sed 's/\<./\u&/g'
is not working in all the systems. In one system its showing ok..but not ok in another system...
A="james adam john"
B=( $A )
echo "${B[#]^}"
its throwing some syntax error...So, i am doing it through a long query sing while loop, which is too lengthy.
Is there any shortcut way to do this?
There are many ways to define "beginning of a name". This method chooses any letter after a word boundary and transforms it to upper case. As a side effect, this will also work with names such as "Sue Ellen", or "Billy-Bob".
echo "james,adam,john" | perl -pe 's/(\b\pL)/\U$1/g'
With Perl:
echo "james,adam,john" | \
perl -ne 'print join(",", map{ ucfirst } split(/,/))'
You can use awk like this to capitalize first letter of every word in your input:
echo "james,adam,john" | awk 'BEGIN { RS=","; FS=""; ORS=","; OFS=""; }
{ $1=toupper($1); print $0; }'
OUTPUT
James,Adam,John
Same method as TLP but with GNU sed:
echo "james,adam,john,sue ellen,billy-bob" | sed -r 's/\b(.)/\u\1/g'
output:
James,Adam,John,Sue Ellen,Billy-Bob
If only the first letter should be capitalized, use this instead:
echo "james,adam,john,sue ellen,billy-bob" | sed 's/[^,]*/\u&/g'
output:
James,Adam,John,Sue ellen,Billy-bob

awk or sed CSV file manipulation

"a004-1b","North","at006754"
"a004-1c","south","atytgh0"
"a004-1d","east","atrthh"
"a010-1a","midwest","atyu"
"a010-1b","south","rfg67"
I want to print the first column and the second column without any extra character I want eliminate all ("", and the third column) Thanks in advance
awk -F'^"|","|"$' '{print $2,$3}' ./infile.csv
The above script will even handle fields that have embedded double quotes or commas. The only downside (if you can call it that) is that the first field starts at $2
Proof of Concept
$ awk -F'^"|","|"$' '{print $2,$3}' ./infile.csv
a004-1b North
a004-1c south
a010-1a midwest
a010-1b south
You need GNU Awk 4 for this to work:
$ gawk -vFPAT='[^",]+' '{print $1,$2}'
I love this new "field pattern" feature. It's my new hammer and everything is a nail. Read up on it at http://www.gnu.org/software/gawk/manual/html_node/Splitting-By-Content.html
(Written this way it doesn't account for embedded commas or quotes, because the question implies this is not needed.)
If you're using awk for this, why put a Perl tag on it?
In Perl:
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
# Make Data::Dumper pretty
$Data::Dumper::Sortkeys = 1;
$Data::Dumper::Indent = 1;
# Set maximum depth for Data::Dumper, zero means unlimited
local $Data::Dumper::Maxdepth = 0;
use Text::CSV;
my $csv = Text::CSV->new();
while( my $row = $csv->getline( \*DATA )){
print 'row: ', Dumper $row;
}
__DATA__
"a004-1b","North","at006754"
"a004-1c","south","atytgh0""a004-1d","east","atrthh"
"a010-1a","midwest","atyu"
"a010-1b","south","rfg67"
awk -F'\"|\,' '{print $2,$5}' sample
Not handling embedded double quotes:
sed -e 's/^"\([^"]*\)","\([^"]*\)".*/\1 \2/'
To handle them:
sed -n -e 's/^"//;s/"$//;s/","/ /;s/","/\n/;P'
The above works even for a 1 or 2 field input.
If you want it "pure" awk or sed, this won't fit the bill, but otherwise it works:
awk -F, '{print $1 " " $2}' | tr -d '"'