I need to split a querystring to several unbounded amount of variables for debugging purposes:
The output comes from tshark and the purpose is to live debug google analytics events. The output from tshark looks like this:
82.387501 hampus -> domain.net 1261 GET /__utm.gif?utmwv=5.3.7&utms=22&utmn=1234&utmhn=domain.com&utmt=event&utme=5(x*y*z%2Fstart%2Fklipp%2F166_SS%20example)(10)&utmcs=UTF-8~ HTTP/1.1
What i want is a more human readable version:
utmhn: domain.com
utmt: event
utme: 5(x*y*z/start/klipp/166_SS/example)(10)
utmcs: UTF-8
or even better:
utmhn: domain.com
utmt: event
utme: 5(
x
y
z/start/klipp/166_SS/example
)(10)
utmcs: UTF-8
But can't get my head around sed (or awk) for this purpose...
file
82.387501 hampus -> domain.net 1261 GET /__utm.gif?utmwv=5.3.7&utms=22&utmn=1234&utmhn=domain.com&utmt=event&utme=5(x*y*z%2Fstart%2Fklipp%2F166_SS%20example)(10)&utmcs=UTF-8~ HTTP/1.1
command
sed 's/.*utmhn=/uthmhn: /
s/&utmt=/\nutmt: /
s/&utme=/\nutme: /
s/utmcs=/\nutmcs: /
s:[%]2F:/:g
s:[%]20: :g
s:[\(]:(\n\t :
s:\*:\n\t :g
s:[\)]:\n\t ):
s/[~].*$//' samp1.txt
output
uthmhn: domain.com
utmt: event
utme: 5(
x
y
z/start/klipp/166_SS example
)(10)&
utmcs: UTF-8
I'm not sure what to say about your %20 VS the expected result of '/' char in your sample data. Did you manually type some of this in?
Another way using Perl :
#!/usr/bin/perl -l
use strict; use warnings;
while (<>) {
my #arr;
my ($qs) = m/.*?GET.*?\?(\S+)\s/;
my #pairs = split(/[&~]/, $qs);
foreach my $pair (#pairs){
my ($name, $value) = split(/=/, $pair);
if ($name eq 'utme') {
$value =~ s!(%2F|%20)!/!g;
$value =~ s!\*!\n\t\t!g;
$value =~ s!\(!(\n\t\t!;
$value =~ s/\)\(/\n\t)(/;
}
# let's URI unescape stuff
$value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
if ($name eq 'utmhn') {
print "$name: $value";
}
else {
push #arr, "$name: $value";
}
}
print join "\n", #arr;
print "\n";
}
OUTPUT
utmhn: domain.com
utmwv: 5.3.7
utms: 22
utmn: 1234
utmt: event
utme: 5(
x
y
z/start/klipp/166_SS/example
)(10)
utmcs: UTF-8
USAGE
tshark ... | ./script.pl
ADVANTAGES
I take care to display utmhn: domain.com at the first line
I run an URI unescape on values
It's not limited to
"utmhn",
"utmt",
"utme",
"utmcs" only
Here's one way using GNU awk. Run like:
awk -f script.awk file.txt
Contents of script.awk:
BEGIN {
FS="[ \t=&~]+"
OFS="\t"
}
{
for (i=1; i<=NF; i++) {
if ($i ~ /^utmhn$|^utmt$|^utme$|^utmcs$/) {
if ($i == "utme") {
sub(/\(/,"(\n\t ", $(i+1))
gsub(/*/,"\n\t ", $(i+1))
sub(/\)/,"\n\t )", $(i+1))
}
print $i":", $(i+1)
}
}
}
Results:
utmhn: domain.net
utmt: event
utme: 5(
x
y
z%2Fstart%2Fklipp%2F166_SS%20example
)(10)
utmcs: UTF-8
Alternatively, here's the one-liner:
awk 'BEGIN { FS="[ \t=&~]+"; OFS="\t" } { for (i=1; i<=NF; i++) { if ($i ~ /^utmhn$|^utmt$|^utme$|^utmcs$/) { if ($i == "utme") { sub(/\(/,"(\n\t ", $(i+1)); gsub(/*/,"\n\t ", $(i+1)); sub(/\)/,"\n\t )", $(i+1)) } print $i":", $(i+1) } } }' file.txt
assuming your data is in a file called "file":
awk -F "&" '{ for ( i=2;i<=NF;i++ ){sub(/=/,":\t",$i);sub(/[~].*$/,"",$i);gsub(/\%2F/,"/",$i);gsub(/\%20/," ",$i);print $i} }' tst
produces output:
utms: 22
utmn: 1234
utmhn: domain.com
utmt: event
utme: 5(x*y*z/start/klipp/166_SS example)(10)
utmcs: UTF-8
it's a bit dirty, but it works.
$ cat tst.awk
BEGIN { FS="[&=~]"; OFS=":\t" }
{
for (i=1;i<=NF;i++) {
map[$i]=$(i+1)
}
sub(/\(/,"&\n\t ", map["utme"])
gsub(/\*/,"\n\t ", map["utme"])
gsub(/%2./,"/", map["utme"])
sub(/\)/,"\n\t&", map["utme"])
print "utmhn", map["utmhn"]
print "utmt", map["utmt"]
print "utme", map["utme"]
print "utmcs", map["utmcs"]
}
$
$ awk -f tst.awk file
utmhn: domain.com
utmt: event
utme: 5(
x
y
z/start/klipp/166_SS/example
)(10)
utmcs: UTF-8
This might work for you (GNU sed):
sed 's/.*\(utmhn.*=\S*\).*/\1/;s/&/\n/g;s/=/:\t/g;s/(/&\n\t/;s/*/\n\t/g;s/%2F/\//g;s/%20/ /g;s/)/\n\t&/' file
Related
I need to search for a key and append the value to every key:value pair in a Unix file
Input file data:
1A:trans_ref_id|10:account_no|20:cust_name|30:trans_amt|40:addr
1A:trans_ref_id|10A:ccard_no|20:cust_name|30:trans_amt|40:addr
My desired Output:
account_no|1A:trans_ref_id
account_no|10:account_no
account_no|20:cust_name
account_no|30:trans_amt
account_no|40:addr
ccard_no|1A:trans_ref_id
ccard_no|10A:ccard_no
ccard_no|20:cust_name
ccard_no|30:trans_amt
ccard_no|40:addr
Basically, I need the value of 10 or 10A appended to every key:value pair and split into new lines. To be clear, this won't always be the second field.
I am new to sed, awk and perl. I started with extracting the value using awk:
awk -v FS="|" -v key="59" '$2 == key {print $2}' target.txt
I need the value of 10 or 10A appended to every key:value pair
Going by these requirements, you may try this awk:
awk '
BEGIN{FS=OFS="|"}
match($0, /\|10A?:[^|]+/) {
s = substr($0, RSTART, RLENGTH)
sub(/.*:/, "", s)
}
{
for (i=1; i<=NF; ++i)
print s, $i
}' file
account_no|1A:trans_ref_id
account_no|10:account_no
account_no|20:cust_name
account_no|30:trans_amt
account_no|40:addr
ccard_no|1A:trans_ref_id
ccard_no|10A:ccard_no
ccard_no|20:cust_name
ccard_no|30:trans_amt
ccard_no|40:addr
# Looks for 10 or 10A
perl -F'\|' -lane'my ($id) = map /^10A?:(.*)/s, #F; print "$id|$_" for #F'
# Looks for 10 or 10<non-digit><maybe more>
perl -F'\|' -lane'my ($id) = map /^10(?:\D[^:]*)?:(.*)/s, #F; print "$id|$_" for #F'
-n executes the program for each line of input.
-l removes LF on read and adds it on print.
-a splits the line on | (specified by -F) into #F.
The first statement extracts what follows : in the field with id 10 or 10-plus-something.
The second statement prints a line for each field.
Specifying file to process to Perl one-liner
If you are still stuck on where to get started, you will use a field-separator and output-field-separator (FS and OFS) set equal to '|' that will split each record into fields at each '|'. Your fields are available as $1, $2, ... $NF. You care about getting, e.g. account_no from field two ($2) so you split() field two with the separator ':' saving the split fields in an array (a used below). You want the second part from field two which will be in the 2nd array element a[2] to use as the new field-1 in output.
The rest is just looping over each field and outputting a[2] a separator and then the current field. You can do that with:
awk 'BEGIN{FS=OFS="|"} {split ($2,a,":"); for(i=1;i<=NF;i++) print a[2],$i}' file
Example Use/Output
With your example input in file, the result would be:
account_no|1A:trans_ref_id
account_no|10:account_no
account_no|20:cust_name
account_no|30:trans_amt
account_no|40:addr
ccard_no|1A:trans_ref_id
ccard_no|10A:ccard_no
ccard_no|20:cust_name
ccard_no|30:trans_amt
ccard_no|40:addr
Which appears to be what you are after. Let me know if you have further questions.
"10" or "10A" at Unknown Field
You can handle the fields containing "10" and "10A" in any order. You just add a loop to loop over the fields and determine which holds "10" or "10A" and save the 2nd element from the array resulting from split() from that field. The rest is the same, e.g.
awk '
BEGIN { FS=OFS="|" }
{ for (i=1;i<=NF;i++){
split ($i,a,":")
if (a[1]=="10"||a[1]=="10A"){
key=a[2]
break
}
}
for (i=1;i<=NF;i++)
print key, $i
}
' file1
Example Input
1A:trans_ref_id|10:account_no|20:cust_name|30:trans_amt|40:addr
1A:trans_ref_id|20:cust_name|30:trans_amt|10A:ccard_no|40:addr
Example Use/Output
awk '
> BEGIN { FS=OFS="|" }
> { for (i=1;i<=NF;i++){
> split ($i,a,":")
> if (a[1]=="10"||a[1]=="10A"){
> key=a[2]
> break
> }
> }
> for (i=1;i<=NF;i++)
> print key, $i
> }
> ' file1
account_no|1A:trans_ref_id
account_no|10:account_no
account_no|20:cust_name
account_no|30:trans_amt
account_no|40:addr
ccard_no|1A:trans_ref_id
ccard_no|20:cust_name
ccard_no|30:trans_amt
ccard_no|10A:ccard_no
ccard_no|40:addr
Which picks up the proper new field 1 for output from the 4th field containing "10A" for the second line above.
Let em know if this is what you needed.
EDIT: To find 10 OR 10A values in anywhere in line and then print as per that try following then.
awk '
BEGIN{
FS=OFS="|"
}
match($0,/(10|10A):[^|]*/){
split(substr($0,RSTART,RLENGTH),arr,":")
}
{
for(i=1;i<=NF;i++){
print arr[2],$i
}
}' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
BEGIN{ ##Starting BEGIN section of this program.
FS=OFS="|" ##Setting FS and OFS to | here.
}
match($0,/(10|10A):[^|]*/){ ##using match function to match either 10: till | OR 10A: till | here.
split(substr($0,RSTART,RLENGTH),arr,":") ##Splitting matched sub string into array arr with delmiter of : here.
}
{
for(i=1;i<=NF;i++){ ##Running for loop for each field for each line.
print arr[2],$i ##Printing 2nd element of ar, along with current field.
}
}' Input_file ##Mentioning Input_file name here.
With your shown samples, please try following.
awk '
BEGIN{
FS=OFS="|"
}
{
split($2,arr,":")
print arr[2],$1
for(i=2;i<=NF;i++){
print arr[2],$i
}
}
' Input_file
Perl script implementation
use strict;
use warnings;
use feature 'say';
my $fname = shift || die "run as 'script.pl input_file key0 key1 ... key#'";
open my $fh, '<', $fname || die $!;
while( <$fh> ) {
chomp;
my %data = split(/[:\|]/, $_);
for my $key (#ARGV) {
if( $data{$key} ) {
say "$data{$key}|$_" for split(/\|/,$_);
}
}
}
close $fh;
Run as script.pl input_file 10 10A
Output
account_no|1A:trans_ref_id
account_no|10:account_no
account_no|20:cust_name
account_no|30:trans_amt
account_no|40:addr
ccard_no|1A:trans_ref_id
ccard_no|10A:ccard_no
ccard_no|20:cust_name
ccard_no|30:trans_amt
ccard_no|40:addr
Here's an alternate perl solution:
perl -pe '($id) = /(?<![^|])10A?:([^|]+)/; s/([^|]+)[|\n]/$id|$1\n/g'
($id) = /(?<![^|])10A?:([^|]+)/ this will capture the string after 10: or 10A: and save in $id variable. First such match in the line will be captured.
s/([^|]+)[|\n]/$id|$1\n/g every field is then prefixed with value in $id and | character
Is there a way in perl to replace all text in input line except ones within single quotes(There could be more than one) using regex, I have achieved this using the code below but would like to see if it can be done with regex and map.
while (<>) {
my $m=0;
for (split(//)) {
if (/'/ and ! $m) {
$m=1;
print;
}
elsif (/'/ and $m) {
$m=0;
print;
}
elsif ($m) {
print;
}
else {
print lc;
}
}
}
**Sample input:**
and (t.TARGET_TYPE='RAC_DATABASE' or (t.TARGET_TYPE='ORACLE_DATABASE' and t.TYPE_QUALIFIER3 != 'racinst'))
**Sample output:**
and (t.target_type='RAC_DATABASE' or (t.target_type='ORACLE_DATABASE' and t.type_qualifier3 != 'racinst'))
You can give this a shot. All one regexp.
$str =~ s/(?:^|'[^']*')\K[^']*/lc($&)/ge;
Or, cleaner and more documented (this is semantically equivalent to the above)
$str =~ s/
(?:
^ | # Match either the start of the string, or
'[^']*' # some text in quotes.
)\K # Then ignore that part,
# because we want to leave it be.
[^']* # Take the text after it, and
# lowercase it.
/lc($&)/gex;
The g flag tells the regexp to run as many times as necessary. e tells it that the substitution portion (lc($&), in our case) is Perl code, not just text. x lets us put those comments in there so that the regexp isn't total gibberish.
Don't you play too hard with regexp for such a simple job?
Why not get the kid 'split' for it today?
#!/usr/bin/perl
while (<>)
{
#F = split "'";
#F = map { $_ % 2 ? $F[$_] : lc $F[$_] } (0..#F);
print join "'", #F;
}
The above is for understanding. We often join the latter two lines reasonably into:
print join "'", map { $_ % 2 ? $F[$_] : lc $F[$_] } (0..#F);
Or enjoy more, making it a one-liner? (in bash shell) In concept, it looks like:
perl -pF/'/ -e 'join "'", map { $_ % 2 ? $F[$_] : lc $F[$_] } (0..#F);' YOUR_FILE
In reality, however, we need to respect the shell and do some escape (hard) job:
perl -pF/\'/ -e 'join "'"'"'", map { $_ % 2 ? $F[$_] : lc $F[$_] } (0..#F);' YOUR_FILE
(The single-quoted single quote needs to become 5 letters: '"'"')
If it doesn't help your job, it helps sleep.
One more variant with Perl one-liner. I'm using hex \x27 for single quotes
$ cat sql_str.txt
and (t.TARGET_TYPE='RAC_DATABASE' or (t.TARGET_TYPE='ORACLE_DATABASE' and t.TYPE_QUALIFIER3 != 'racinst'))
$ perl -ne ' { #F=split(/\x27/); for my $val (0..$#F) { $F[$val]=lc($F[$val]) if $val%2==0 } ; print join("\x27",#F) } ' sql_str.txt
and (t.target_type='RAC_DATABASE' or (t.target_type='ORACLE_DATABASE' and t.type_qualifier3 != 'racinst'))
$
I have a file which had many lines which containts "x_y=XXXX" where XXXX can be a number from 0 to some N.
Now,
a) I would like to get only the XXXX part of the line in every such line.
b) I would like to get the average
Possibly both of these in one liners.
I am trying out sometihng like
cat filename.txt | grep x_y | (this need to be filled)
I am not sure what to file
In the past I have used commands like
perl -pi -e 's/x_y/m_n/g'
to replace all the instances of x_y.
But now, I would like to match for x_y=XXXX and get the XXXX out and then possibly average it out for the entire file.
Any help on this will be greatly appreciated. I am fairly new to perl and regexes.
Timtowtdi (as usual).
perl -nE '$s+=$1, ++$n if /x_y=(\d+)/; END { say "avg:", $s/$n }' data.txt
The following should do:
... | grep 'x_y=' | perl -ne '$x += (split /=/, $_)[1]; $y++ }{ print $x/$y, "\n"'
The }{ is colloquially referred to as eskimo operator and works because of the code which -n places around the -e (see perldoc perlrun).
Using awk:
/^[^_]+_[^=]+=[0-9]+$/ {sum=sum+$2; cnt++}
END {
print "sum:", sum, "items:", cnt, "avg:", sum/cnt
}
$ awk -F= -f cnt.awk data.txt
sum: 55 items: 10 avg: 5.5
Pure bash-solution:
#!/bin/bash
while IFS='=' read str num
do
if [[ $str == *_* ]]
then
sum=$((sum + num))
cnt=$((cnt + 1))
fi
done < data.txt
echo "scale=4; $sum/$cnt" | bc ;exit
Output:
$ ./cnt.sh
5.5000
As a one-liner, split up with comments.
perl -nlwe '
push #a, /x_y=(\d+)/g # push all matches onto an array
}{ # eskimo-operator, is evaluated last
$sum += $_ for #a; # get the sum
print "Average: ", $sum / #a; # divide by the size of the array
' input.txt
Will extract multiple matches on a line, if they exist.
Paste version:
perl -nlwe 'push #a, /x_y=(\d+)/g }{ $sum += $_ for #a; print "Average: ", $sum / #a;' input.txt
I have done my research, but not able to find the solution to my problem.
I am trying to extract all valid words(Starting with a letter) in a string and concatenate them with underscore("_"). I am looking for solution with awk, sed or grep, etc.
Something like:
echo "The string under consideration" | (awk/grep/sed) (pattern match)
Example 1
Input:
1.2.3::L2 Traffic-house seen during ABCD from 2.2.4/5.2.3a to 1.2.3.X11
Desired output:
L2_Traffic_house_seen_during_ABCD_from
Example 2
Input:
XYZ-2-VRECYY_FAIL: Verify failed - Client 0x880016, Reason: Object exi
Desired Output:
XYZ_VRECYY_FAIL_Verify_failed_Client_Reason_Object_exi
Example 3
Input:
ABCMGR-2-SERVICE_CRASHED: Service "abcmgr" (PID 7582) during UPGRADE
Desired Output:
ABCMGR_SERVICE_CRASHED_Service_abcmgr_PID_during_UPGRADE
This might work for you (GNU sed):
sed 's/[[:punct:]]/ /g;s/\<[[:alpha:]]/\n&/g;s/[^\n]*\n//;s/ [^\n]*//g;y/\n/_/' file
A perl one-liner. It searches any alphabetic character followed by any number of word characters enclosed in word boundaries. Use the /g flag to try several matches for each line.
Content of infile:
1.2.3::L2 Traffic-house seen during ABCD from 2.2.4/5.2.3a to 1.2.3.X11
XYZ-2-VRECYY_FAIL: Verify failed - Client 0x880016, Reason: Object exi
ABCMGR-2-SERVICE_CRASHED: Service "abcmgr" (PID 7582) during UPGRADE
Perl command:
perl -ne 'printf qq|%s\n|, join qq|_|, (m/\b([[:alpha:]]\w*)\b/g)' infile
Output:
L2_Traffic_house_seen_during_ABCD_from_to_X11
XYZ_VRECYY_FAIL_Verify_failed_Client_Reason_Object_exi
ABCMGR_SERVICE_CRASHED_Service_abcmgr_PID_during_UPGRADE
One way using awk, with the contents of script.awk:
BEGIN {
FS="[^[:alnum:]_]"
}
{
for (i=1; i<=NF; i++) {
if ($i !~ /^[0-9]/ && $i != "") {
if (i < NF) {
printf "%s_", $i
}
else {
print $i
}
}
}
}
Run like:
awk -f script.awk file.txt
Alternatively, here is the one liner:
awk -F "[^[:alnum:]_]" '{ for (i=1; i<=NF; i++) { if ($i !~ /^[0-9]/ && $i != "") { if (i < NF) printf "%s_", $i; else print $i; } } }' file.txt
Results:
L2_Traffic_house_seen_during_ABCD_from_to_X11
XYZ_VRECYY_FAIL_Verify_failed_Client_Reason_Object_exi
ABCMGR_SERVICE_CRASHED_Service_abcmgr_PID_during_UPGRADE
This solution requires some tuning and I think one needs gawk to have regexp as "record separator"
http://www.gnu.org/software/gawk/manual/html_node/Records.html#Records
gawk -v ORS='_' -v RS='[-: \"()]' '/^[a-zA-Z]/' file.dat
I am trying to grep a file for the first 2 matches of a string (there will only ever be a maximum 2 matches) including some context (grep -B 1 -A 5), split each set of 7 lines into two separate variables and write an if statement based on whether or not each set contains a different string.
In some cases, the file may contain only one match.
I know how to grep for the two matches, but not how to split them into separate variables. I can also write an if statement to check if the variable is empty (indicating a lack of a second match). I am not sure how to check each variable to see if it contains the second string. Any assistance would be helpful. Thanks!
Example:
grep -B1 -A5 "Resolution:" file.txt
Color LCD:
Resolution: 1440 x 900
Pixel Depth: 32-Bit Color (ARGB8888)
Main Display: Yes
Mirror: Off
Online: Yes
Built-In: Yes
LED Cinema Display:
Resolution: 1920 x 1200
Depth: 32-Bit Color
Core Image: Hardware Accelerated
Mirror: Off
Online: Yes
Quartz Extreme: Supported
Desired result based on whether or not each match set contains "Main Display":
$mainDisplay = Color LCD
$secondDisplay = LED Cinema Display (or null indicating no second match)
Your file is valid YAML, so if you have installed YAML perl module, here is an oneliner:
eval $(perl -MYAML -0777 -e '$r=Load(<>);map { exists($r->{$_}->{"Main Display"}) ? print "main=\"$_\";\n" : print "second=\"$_\";\n" } keys %$r' < filename.txt)
echo =$main= =$second=
so, after the eval, here are shell variables main and second
or, exactly for your OS X, with system_profiler command:
eval $(
system_profiler SPDisplaysDataType |\
grep -B1 -A5 'Resolution:' |\
perl -MYAML -0777 -e '$r=Load(<>);map { printf "%s=\"%s\"\n", exists($r->{$_}->{"Main Display"}) ? "main" : "second", $_ } keys %$r'
)
echo =$main=$second=
my($first, $second) = split /--\n/, qx/grep -B1 -A5 foo data.text/;
awk:
awk -F : '
/^[^[:space:]]/ {current = $1; devices[$1]++}
$1 ~ /Main Display/ {main = current}
END {
for (d in devices)
if (d == main)
print "mainDisplay=\"" d "\""
else
print "secondDisplay=\"" d "\""
}
'
outputs
mainDisplay="Color LCD"
secondDisplay="LED Cinema Display"
which you can capture and eval in the shell.
Here's a perl solution. Use it like so: script.pl Resolution:. Default search is "Resolution:".
The values are stored in %values, for example:
$values{Color LCD}{Resolution} == "1440 x 900";
use strict;
use warnings;
my $grep = shift || "Resolution:";
my %values;
my $pre;
while (my $line = <DATA>) {
chomp $line;
if ($line =~ /$grep/) {
my #data;
push #data, scalar <DATA> for (0 .. 4);
chomp #data;
for my $pair ($line, #data) {
if ($pair =~ /^([^:]+): (.*)$/) {
$values{$pre}{$1} = $2;
} else { die "Unexpected data: $pair" }
}
} else {
$pre = $line;
}
}
use Data::Dumper;
print Dumper \%values;
__DATA__
Color LCD:
Resolution: 1440 x 900
Pixel Depth: 32-Bit Color (ARGB8888)
Main Display: Yes
Mirror: Off
Online: Yes
Built-In: Yes
LED Cinema Display:
Resolution: 1920 x 1200
Depth: 32-Bit Color
Core Image: Hardware Accelerated
Mirror: Off
Online: Yes
Quartz Extreme: Supported