Related
I need to search for a key and append the value to every key:value pair in a Unix file
Input file data:
1A:trans_ref_id|10:account_no|20:cust_name|30:trans_amt|40:addr
1A:trans_ref_id|10A:ccard_no|20:cust_name|30:trans_amt|40:addr
My desired Output:
account_no|1A:trans_ref_id
account_no|10:account_no
account_no|20:cust_name
account_no|30:trans_amt
account_no|40:addr
ccard_no|1A:trans_ref_id
ccard_no|10A:ccard_no
ccard_no|20:cust_name
ccard_no|30:trans_amt
ccard_no|40:addr
Basically, I need the value of 10 or 10A appended to every key:value pair and split into new lines. To be clear, this won't always be the second field.
I am new to sed, awk and perl. I started with extracting the value using awk:
awk -v FS="|" -v key="59" '$2 == key {print $2}' target.txt
I need the value of 10 or 10A appended to every key:value pair
Going by these requirements, you may try this awk:
awk '
BEGIN{FS=OFS="|"}
match($0, /\|10A?:[^|]+/) {
s = substr($0, RSTART, RLENGTH)
sub(/.*:/, "", s)
}
{
for (i=1; i<=NF; ++i)
print s, $i
}' file
account_no|1A:trans_ref_id
account_no|10:account_no
account_no|20:cust_name
account_no|30:trans_amt
account_no|40:addr
ccard_no|1A:trans_ref_id
ccard_no|10A:ccard_no
ccard_no|20:cust_name
ccard_no|30:trans_amt
ccard_no|40:addr
# Looks for 10 or 10A
perl -F'\|' -lane'my ($id) = map /^10A?:(.*)/s, #F; print "$id|$_" for #F'
# Looks for 10 or 10<non-digit><maybe more>
perl -F'\|' -lane'my ($id) = map /^10(?:\D[^:]*)?:(.*)/s, #F; print "$id|$_" for #F'
-n executes the program for each line of input.
-l removes LF on read and adds it on print.
-a splits the line on | (specified by -F) into #F.
The first statement extracts what follows : in the field with id 10 or 10-plus-something.
The second statement prints a line for each field.
Specifying file to process to Perl one-liner
If you are still stuck on where to get started, you will use a field-separator and output-field-separator (FS and OFS) set equal to '|' that will split each record into fields at each '|'. Your fields are available as $1, $2, ... $NF. You care about getting, e.g. account_no from field two ($2) so you split() field two with the separator ':' saving the split fields in an array (a used below). You want the second part from field two which will be in the 2nd array element a[2] to use as the new field-1 in output.
The rest is just looping over each field and outputting a[2] a separator and then the current field. You can do that with:
awk 'BEGIN{FS=OFS="|"} {split ($2,a,":"); for(i=1;i<=NF;i++) print a[2],$i}' file
Example Use/Output
With your example input in file, the result would be:
account_no|1A:trans_ref_id
account_no|10:account_no
account_no|20:cust_name
account_no|30:trans_amt
account_no|40:addr
ccard_no|1A:trans_ref_id
ccard_no|10A:ccard_no
ccard_no|20:cust_name
ccard_no|30:trans_amt
ccard_no|40:addr
Which appears to be what you are after. Let me know if you have further questions.
"10" or "10A" at Unknown Field
You can handle the fields containing "10" and "10A" in any order. You just add a loop to loop over the fields and determine which holds "10" or "10A" and save the 2nd element from the array resulting from split() from that field. The rest is the same, e.g.
awk '
BEGIN { FS=OFS="|" }
{ for (i=1;i<=NF;i++){
split ($i,a,":")
if (a[1]=="10"||a[1]=="10A"){
key=a[2]
break
}
}
for (i=1;i<=NF;i++)
print key, $i
}
' file1
Example Input
1A:trans_ref_id|10:account_no|20:cust_name|30:trans_amt|40:addr
1A:trans_ref_id|20:cust_name|30:trans_amt|10A:ccard_no|40:addr
Example Use/Output
awk '
> BEGIN { FS=OFS="|" }
> { for (i=1;i<=NF;i++){
> split ($i,a,":")
> if (a[1]=="10"||a[1]=="10A"){
> key=a[2]
> break
> }
> }
> for (i=1;i<=NF;i++)
> print key, $i
> }
> ' file1
account_no|1A:trans_ref_id
account_no|10:account_no
account_no|20:cust_name
account_no|30:trans_amt
account_no|40:addr
ccard_no|1A:trans_ref_id
ccard_no|20:cust_name
ccard_no|30:trans_amt
ccard_no|10A:ccard_no
ccard_no|40:addr
Which picks up the proper new field 1 for output from the 4th field containing "10A" for the second line above.
Let em know if this is what you needed.
EDIT: To find 10 OR 10A values in anywhere in line and then print as per that try following then.
awk '
BEGIN{
FS=OFS="|"
}
match($0,/(10|10A):[^|]*/){
split(substr($0,RSTART,RLENGTH),arr,":")
}
{
for(i=1;i<=NF;i++){
print arr[2],$i
}
}' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
BEGIN{ ##Starting BEGIN section of this program.
FS=OFS="|" ##Setting FS and OFS to | here.
}
match($0,/(10|10A):[^|]*/){ ##using match function to match either 10: till | OR 10A: till | here.
split(substr($0,RSTART,RLENGTH),arr,":") ##Splitting matched sub string into array arr with delmiter of : here.
}
{
for(i=1;i<=NF;i++){ ##Running for loop for each field for each line.
print arr[2],$i ##Printing 2nd element of ar, along with current field.
}
}' Input_file ##Mentioning Input_file name here.
With your shown samples, please try following.
awk '
BEGIN{
FS=OFS="|"
}
{
split($2,arr,":")
print arr[2],$1
for(i=2;i<=NF;i++){
print arr[2],$i
}
}
' Input_file
Perl script implementation
use strict;
use warnings;
use feature 'say';
my $fname = shift || die "run as 'script.pl input_file key0 key1 ... key#'";
open my $fh, '<', $fname || die $!;
while( <$fh> ) {
chomp;
my %data = split(/[:\|]/, $_);
for my $key (#ARGV) {
if( $data{$key} ) {
say "$data{$key}|$_" for split(/\|/,$_);
}
}
}
close $fh;
Run as script.pl input_file 10 10A
Output
account_no|1A:trans_ref_id
account_no|10:account_no
account_no|20:cust_name
account_no|30:trans_amt
account_no|40:addr
ccard_no|1A:trans_ref_id
ccard_no|10A:ccard_no
ccard_no|20:cust_name
ccard_no|30:trans_amt
ccard_no|40:addr
Here's an alternate perl solution:
perl -pe '($id) = /(?<![^|])10A?:([^|]+)/; s/([^|]+)[|\n]/$id|$1\n/g'
($id) = /(?<![^|])10A?:([^|]+)/ this will capture the string after 10: or 10A: and save in $id variable. First such match in the line will be captured.
s/([^|]+)[|\n]/$id|$1\n/g every field is then prefixed with value in $id and | character
I have a file with following data
cat text.txt
281475473926267,46,47
281474985385546,310,311
281474984889537,248,249
281475473926267,16,17
281474985385546,20,28
281474984889537,112,68
The values in 1st column are duplicate at some places
i want o/p as given below
cat output.txt
281475473926267 16,17,46,47
281474985385546 20,28,310,311
281474984889537 68,112,248,249
It should print uniq values of column 1 and then space and then it should print respective values of other column in one line arranged in ascending order.
I tried below:
cat text.txt | perl -F, -lane ' $kv{$F[0]}{$F[1]}++; END { while(my($x,$y) = each(%kv)) { print "$x ",join(",",keys %$y) }}'
281474984889537 112,248
281474985385546 310,20
281475473926267 46,16
here i am not able to print all the values in front of value in 1st column
for 281474984889537 it should print 68,112,248,249, but its printing only 112,248
also i am not sure how to arrange them in ascending order.
cat text.txt | perl -F, -lane ' $kv{$F[0]}{$F[1]}++; END { while(my($x,$y) = each(%kv)) { print "$x ",join(",",keys %$y) }}'
281474984889537 112,248
281474985385546 310,20
281475473926267 46,16
here i am not able to print all the values in front of value in 1st column
multi-step
$ awk -F, '{print $1,$2; print $1,$3}' file |
sort -k1n -k2n |
awk 'p!=$1{if(p) print p,a[p]; a[$1]=$2; p=$1; next}
{a[$1]=a[$1] "," $2}
END {print p,a[p]}' |
sort -k2n
281475473926267 16,17,46,47
281474985385546 20,28,310,311
281474984889537 68,112,248,249
With GNU awk for true multi-dimensional arrays and sorted_in:
$ cat tst.awk
BEGIN { FS="," }
{
for (i=2; i<=NF; i++) {
keyVals[$1][$i]
}
}
END {
PROCINFO["sorted_in"] = "#ind_num_asc"
for (key in keyVals) {
vals = ""
for (val in keyVals[key]) {
vals = (vals == "" ? "" : vals ",") val
}
print key, vals
}
}
$ awk -f tst.awk file
281474984889537 68,112,248,249
281474985385546 20,28,310,311
281475473926267 16,17,46,47
The above will work no matter how many fields you have on each line and it will remove duplicate values when they occur on multiple lines for the same key value.
This might work for you (GNU sed):
sed -r 'H;x;s/((\n[^\n,]*),[^\n]*)(.*)\2([^\n]*)\n?/\1\4\3/;x;$!d;x;s/.//;:b;h;s/\n.*//;s/[^,]*,//;s/,/\n/g;s/.*/echo "&"|sort -n|paste -sd,/e;G;s/^([^\n]*)\n([^\n,]*),[^\n]*/\2 \1/;P;:c;tc;s/[^\n]*\n//;tb;d' file
The script works in two parts. In the first part of the processing the lines of the file are held in memory and reduced in size by appending values of the same key to a single key. At the end of file the second part of processing is enacted. Each line is broken into two, the appended values are sorted and re-appended to the key, printed and removed, until all the lines have been processed.
To correct your Perl-oneliner, use this.
$ cat text.txt
281475473926267,46,47
281474985385546,310,311
281474984889537,248,249
281475473926267,16,17
281474985385546,20,28
281474984889537,112,68
$ cat text.txt | perl -F, -lanE ' #t1=#{$kv{$F[0]}}; push(#t1,#F[1..2]); $kv{$F[0]}=[#t1]; END { while(my($x,$y) = each(%kv)) { print "$x ",join(",",#{$y}) }}'
281474985385546 310,311,20,28
281475473926267 46,47,16,17
281474984889537 248,249,112,68
$
When you have more columns, a small change on the above one-liner from 1..2 to 1..$#F will do the trick. Check this out
$ cat > text2.txt
281475473926267,46,47,49
281474985385546,310,311
281474984889537,248,249,311,677,213
281475473926267,16,17
281474985385546,20,28
281474984889537,112,68,54,78,324,67
$ cat text2.txt | perl -F, -lanE ' #t1=#{$kv{$F[0]}}; push(#t1,#F[1..$#F]); $kv{$F[0]}=[#t1]; END { while(my($x,$y) = each(%kv)) { print "$x ",join(",",#{$y}) }}'
281474984889537 248,249,311,677,213,112,68,54,78,324,67
281474985385546 310,311,20,28
281475473926267 46,47,49,16,17
$
What is the best way to sort the group members in the /etc/group file?
e.g.
tomcat::201:root,tux23,alex
ftp::66000:tom,alex,mike
I need following output:
tomcat::201:alex,root,tux23
ftp::66000:alex,mike,tom
Thanks in advance,
tux
You can use perl one liner to sort usernames on every line,
perl -pe 's|([^:\n]+)$| join ",", sort split /,/, $1 |e' /etc/group
output
tomcat::201:alex,root,tux23
ftp::66000:alex,mike,tom
Here's a solution based on awk:
awk -F: '{ split($4, a, ",");
n = asort(a);
s = a[1];
for(i = 2; i <= n; ++i) { s = s "," a[i] }
print $1":"$2":"$3":"s
}' /etc/group
Another Perl one-liner:
perl -F: -lape 's#$F[3]#join ",",sort split /,/,$F[3]#e' /etc/group
or
perl -F: -lane 'print join ":",#F[0..2],join ",",sort split /,/,$F[3]' /etc/group
Another perl one liner:
perl -ne 'if (/(.*:\d+:)(.*)/) {print $1.join(",",sort(split(/,/,$2)))."\n";}' /etc/group
I have done my research, but not able to find the solution to my problem.
I am trying to extract all valid words(Starting with a letter) in a string and concatenate them with underscore("_"). I am looking for solution with awk, sed or grep, etc.
Something like:
echo "The string under consideration" | (awk/grep/sed) (pattern match)
Example 1
Input:
1.2.3::L2 Traffic-house seen during ABCD from 2.2.4/5.2.3a to 1.2.3.X11
Desired output:
L2_Traffic_house_seen_during_ABCD_from
Example 2
Input:
XYZ-2-VRECYY_FAIL: Verify failed - Client 0x880016, Reason: Object exi
Desired Output:
XYZ_VRECYY_FAIL_Verify_failed_Client_Reason_Object_exi
Example 3
Input:
ABCMGR-2-SERVICE_CRASHED: Service "abcmgr" (PID 7582) during UPGRADE
Desired Output:
ABCMGR_SERVICE_CRASHED_Service_abcmgr_PID_during_UPGRADE
This might work for you (GNU sed):
sed 's/[[:punct:]]/ /g;s/\<[[:alpha:]]/\n&/g;s/[^\n]*\n//;s/ [^\n]*//g;y/\n/_/' file
A perl one-liner. It searches any alphabetic character followed by any number of word characters enclosed in word boundaries. Use the /g flag to try several matches for each line.
Content of infile:
1.2.3::L2 Traffic-house seen during ABCD from 2.2.4/5.2.3a to 1.2.3.X11
XYZ-2-VRECYY_FAIL: Verify failed - Client 0x880016, Reason: Object exi
ABCMGR-2-SERVICE_CRASHED: Service "abcmgr" (PID 7582) during UPGRADE
Perl command:
perl -ne 'printf qq|%s\n|, join qq|_|, (m/\b([[:alpha:]]\w*)\b/g)' infile
Output:
L2_Traffic_house_seen_during_ABCD_from_to_X11
XYZ_VRECYY_FAIL_Verify_failed_Client_Reason_Object_exi
ABCMGR_SERVICE_CRASHED_Service_abcmgr_PID_during_UPGRADE
One way using awk, with the contents of script.awk:
BEGIN {
FS="[^[:alnum:]_]"
}
{
for (i=1; i<=NF; i++) {
if ($i !~ /^[0-9]/ && $i != "") {
if (i < NF) {
printf "%s_", $i
}
else {
print $i
}
}
}
}
Run like:
awk -f script.awk file.txt
Alternatively, here is the one liner:
awk -F "[^[:alnum:]_]" '{ for (i=1; i<=NF; i++) { if ($i !~ /^[0-9]/ && $i != "") { if (i < NF) printf "%s_", $i; else print $i; } } }' file.txt
Results:
L2_Traffic_house_seen_during_ABCD_from_to_X11
XYZ_VRECYY_FAIL_Verify_failed_Client_Reason_Object_exi
ABCMGR_SERVICE_CRASHED_Service_abcmgr_PID_during_UPGRADE
This solution requires some tuning and I think one needs gawk to have regexp as "record separator"
http://www.gnu.org/software/gawk/manual/html_node/Records.html#Records
gawk -v ORS='_' -v RS='[-: \"()]' '/^[a-zA-Z]/' file.dat
Is there an Awk or Perl one-liner that can remove all rows that do not have complete data. I find myself frequently needing something like this. For example, I currently have a tab-delimited file that looks like this:
1 asdf 2
9 asdx 4
3 ddaf 6
5 4
2 awer 4
How can a line that does not have a value in field 2 be removed?
How could a line that does not have a value in one of ANY field be removed?
I was trying to do something like this:
awk -F"\t" '{if ($2 != ''){print $0}}' 1.txt > 2.txt
Thanks.
In awk, if you know that every row should have exactly 3 elements:
awk -F'\t+' 'NF == 3' INFILE > OUTFILE
I would just look for consecutive tabs, or a leading or trailing tab:
perl -ne 'next if /\t\t/ or /^\t/ or /\t$/; print' tabfile
perl -lane 'print if $#F == 2' INFILE
For the specific solution:
awk -F'\t' '$2 != ""' input.txt > output.txt
For a solution that is generic:
awk -F'\t' -vCOLS=3 '{
valid=1;
for (i=1; i<=COLS; ++i) {
if ($i == "") {
valid=0;
break;
}
}
if (valid) {
print;
}
}' input.txt > output.txt
perl -F/\t/ -nle 'print if #F == 3' 1.txt > 2.txt
Just use awk 'NF == 3' INFILE > OUTFILE and it will just use white space (tabs, spaces etc..) as the field splitter.