Grep specific text from log - sed

I try to figure it out how can grep in linux only the "SIG:" hash part from that log below:
20120927:10:57:23|89252871|3342|ESP individual score details for Message ID: <esp:msgid> -|<RBL:<0> SHA:<0> SHA_FLAGS:<0> UHA:<12> ISC:<0> BAYES:<0> SenderID:<0> DKIM:<0> TS:<-1> SIG:<309875857436-4372-986476-327698-7436-984376-43276-98437643-8276-84327-6743-6874-986-86743-86732-867432-687432-687> DSC:<0> ('TRU_spam1', 47):<0> ('TRU_legal_spam', 31):<0> ('TRU_marketing_spam', 34):<0> ('TRU_profanity_spam', 39):<0> ('TRU_medical_spam', 35):<0> ('TRU_playsites', 46):<0> ('TRU_money_spam', 37):<0> ('TRU_stock_spam', 41):<0> ('TRU_embedded_image_spam', 27):<0> ('TRU_urllinks', 49):<0> ('TRU_watch_spam', 42):<0> ('TRU_phish_spam', 38):<0> ('TRU_spam2', 48):<0> ('TRU_misc_spam', 36):<0> ('TRU_LOREAL', 55):<0> ('TRU_freehosting', 45):<0> ('TRU_lotto_spam', 32):<0> ('TRU_ru_spamsubj', 56):<0> ('TRU_adult_spam', 18):<0> ('URL Real-Time Signatures', 9):<0> ('TRU_scam_spam', 40):<0>:89252871>|
Final view:
309875857436-4372-986476-327698-7436-984376-43276-98437643-8276-84327-6743-6874-986-86743-86732-867432-687432-687

With Perl regex (works on GNU grep):
grep -oP '(?<=SIG:<)[^>]*(?=>)'

grep alone cannot help you much here. You can add cut to your toolbox:
grep -o 'SIG:<[^>]\+' | cut -f2 -d\<
First, select SIG and everything following it up to a >. Then, only return what's after the first <.

awk '{for(i=1;i<=NF;i++)if($i~/SIG:/){gsub("SIG:<","",$i);gsub(">","",$i);print $i;break}}' your_file

Use two greps, one to grab the right field, and one clean up:
<infile grep -o 'SIG:<[^>]*' | grep -o '[^<]*$'

sed 's/.*SIG:<\([^>]\+\)>.*/\1/g' INPUTFILE
Might work for you

Related

sed add different lines in different files

I'm trying to find a way to run multiple sed commands that adds lines to the start of different files (on Mac OS).
This works when run from terminal.
sed -i '' '1i\
\\version \"2.19.65\"\
\\language \"english\"\
\\include \"dynamics-defs.ily\"\
altosaxINotes = \\transpose c ef {\
\\relative c {\
' altosaxI.ily
But I want to add slightly different text on a different file:
sed -i '' '1i\
\\version \"2.19.65\"\
\\language \"english\"\
\\include \"dynamics-defs.ily\"\
altosaxIINotes = \\transpose c ef {\
\\relative c {\
' altosaxII.ily
I have about 30 or 40 of these to run, all slightly different. Is it possible to combine them all into one terminal command, or perhaps use Mac's automator, or maybe a better solution?
This might work for you (GNU sed):
# create a function f with one parameter
f(){ cat <<! >tempFile && sed -i '1e cat tempFile' ${1}.ily; }
\\version "2.19.65"
\\language "english"
\\include "dynamics-defs.ily"
${1}Notes = \\transpose c ef {
\\relative c {
!
# call the function
f altosaxI
The function f can then be included in a for-loop or a script.

Sed: How to insert a pattern which includes single ticks?

I'm trying to use sed to replace a specific line within a configuration file:
The pattern for the line I want to replace is:
ALLOWED_HOSTS.*
The text I want to insert is:
'$PublicIP' (Including the single ticks)
But when I run the command:
sed 's/ALLOWED_HOSTS.*/ALLOWED_HOSTS = ['$PublicIP']/g' /root/project/django/mysite/mysite/settings.py
The line is changed to:
ALLOWED_HOSTS = [1.1.1.1]
instead of:
ALLOWED_HOSTS = ['1.1.1.1']
How shall I edit the command to include the single ticks as well?
You could try to escape the single ticks , or better you can reassign the variable including the simple ticks:
PublicIP="'$PublicIP'".
By the way even this sed without redifining var, works ok in my case:
$ a="3.3.3.3"
$ echo "ALLOWED_HOSTS = [2.2.2.2]" |sed 's/2.2.2.2/'"'$a'"'/g'
ALLOWED_HOSTS = ['3.3.3.3']
Even this works ok:
$ echo "ALLOWED_HOSTS = [2.2.2.2]" |sed "s/2.2.2.2/'$a'/g"
ALLOWED_HOSTS = ['3.3.3.3']

Multiple mathematical operations on a file containing numbers

I have extracted the following data using 'grep' & 'sed' pipes from a file and now I want to perform a mathematical equation on the last two numbers, delete them and replace them with a single number.
Mathematical operations
Add the numbers together
divide by 2
multiply by 141
ROUNDUP to whole number
File Data
AJ29 IO_0_VRN_10 77.234 78.011
AJ30 IO_L1P_T0_100M 89.886 90.789
AJ31 IO_L1N_T0_100S 101.388 102.406
AK29 IO_L2P_T0_101M 66.163 66.828
AL29 IO_L2N_T0_101S 63.626 64.266
So the line starting AJ29 should appear as:
AJ29 IO_0_VRN_10 10945
I could put it in MS excel / Open Office calc and do this but want to avoid MS and keep it in a single linux script if it is possible. Hope you can help. The script I have so far is below and ideally I'd like to add a few more pipes to achieve this.
grep IOB xc7vx690tffg1930.pkg | sed 's/pin//g' | sed 's/IOB_[A-Za-z0-9]*//g' | sed 's/ /-/g' | sed 's/\t//g' | sed 's/^[-]*//g' | sed 's/-/ /g' | sed 's/ [0-9][0-9] //g' | sed 's/[[:space:]]\+/,/g' | sed 's/,X[0-9A-Z]*,//g' | sed 's/,[0-9]*[A-Z],//g' | sed 's/N\.A\.,/,/g' | sed 's/,$//g' | sed 's/,/ /g'
For calculations, use awk!
$ awk '{$(NF-1)=sprintf("%.0f", ($(NF-1) + $NF)/2 * 141); NF--}1' file
AJ29 IO_0_VRN_10 10945
AJ30 IO_L1P_T0_100M 12738
AJ31 IO_L1N_T0_100S 14367
AK29 IO_L2P_T0_101M 9376
AL29 IO_L2N_T0_101S 9016
This replaces the penultimate field with the result of (penultimate*last)/2 * 141). To make it round, we use %.0f format as indicated in Awk printf number in width and round it up.
Also, it looks to me that you are piping way too many things: I counted one call to grep and 13 (!) to sed. You can probably use sed -e 'first block' -e 'second block' ... instead.
Explanation
In awk, NF refers to the number of fields on the current line. Since $n refers to the field number n, with $(NF-1) we refer to the penultimate field.
{...}1 do stuff and then print the resulting line. 1 evaluates as True and anything True triggers awk to perform its default action, which is to print the current line.
$(NF-1) + $NF)/2 * 141 perform the calculation: `(penultimate + last) / 2 * 141
{$(NF-1)=sprintf( ... ) assign the result of the previous calculation to the penultimate field. Using sprintf with %.0f we make sure the rounding is performed, as described above.
{...; NF--} once the calculation is done, we have its result in the penultimate field. To remove the last column, we just say "hey, decrease the number of fields" so that the last one gets "removed".

sed replacement value between to matches

Hi I want to replace a string coming between to symbols by using sed
example: -amystring -bxyz
what to replace mystring with ****
value after -a can be anything like -amystring 123 -bxyz, -amystring 123<newline_char>, -a'mystring 123' -bxyz, -a'mystring 123'<newline_char>
I tried following regex but it does not work in all the cases
sed -re "s#(-w)([^\s\-]+)#\1**** #g"
can anybody help me to solve this issue ?
MyString="YourStringWithoutRegExSpecialCharNotEscaped"
sed "s/-a${MyString} -b/-a**** -b/g"
if you can escape your string for any regex key char like * + . \ / with something like
echo "${MyString}" | sed 's/\[*.\\/+?]/\\&/g' | read -r MyString
before us it in sed.
otherwise, you need to better define the edge pattern

Wordnet synsets using perl

I installed Wordnet::Similarity and Wordnet::QueryData as an easy way to calculate information content score and probability that comes with these modules. But I'm stuck at this basic problem: given a word, print n words similar to it - which should not be difficult that iterating through the synsets and doing join.
using the wn command and piping it with a whole lot of tr, sort | uniq I can get all the words:
wn cat -synsn | grep -v Sense | tr '=' ' ' | tr '>' ' ' | tr '\t' ' ' | tr ',' '\n' | sort | uniq
OUTPUT
8 senses of cat
adult female
adult male
African tea
Arabian tea
big cat
bozo
cat
cat
CAT
Caterpillar
cat-o'-nine-tails
computed axial tomography
computed tomography
computerized axial tomography
computerized tomography
CT
excitant
felid
feline
gossip
gossiper
gossipmonger
guy
hombre
kat
khat
man
newsmonger
qat
quat
rumormonger
rumourmonger
stimulant
stimulant drug
Synonyms/Hypernyms (Ordered by Estimated Frequency) of noun cat
tracked vehicle
true cat
whip
woman
X-radiation
X-raying
but its kinda nasty,and needs further clean up.
What my script looks like is below, and what I want to get is all the words in cat#n1...8.
SCRIPT
use WordNet::QueryData;
my $wn = WordNet::QueryData->new( noload => 1);
print "Senses: ", join(", ", $wn->querySense("cat#n")), "\n";
print "Synset: ", join(", ", $wn->querySense("cat", "syns")), "\n";
print "Hyponyms: ", join(", ", $wn->querySense("cat#n#1", "hypo")), "\n";
OUTPUT:
Senses: cat#n#1, cat#n#2, cat#n#3, cat#n#4, cat#n#5, cat#n#6, cat#n#7, cat#n#8
Synset: cat#n, cat#v
Hyponyms: domestic_cat#n#1, wildcat#n#3
SCRIPT
use WordNet::QueryData;
my $wn = WordNet::QueryData->new;
foreach $word (qw/cat#n/) {
#senses = $wn->querySense($word);
foreach $wps (#senses) {
#gloss = $wn -> querySense($wps, "syns");
print "$wps : #gloss\n";
}
}
OUTPUT:
cat#n#1 : cat#n#1 true_cat#n#1
cat#n#2 : guy#n#1 cat#n#2 hombre#n#1 bozo#n#2
cat#n#3 : cat#n#3
cat#n#4 : kat#n#1 khat#n#1 qat#n#1 quat#n#1 cat#n#4 Arabian_tea#n#1 African_tea#n#1
cat#n#5 : cat-o'-nine-tails#n#1 cat#n#5
cat#n#6 : Caterpillar#n#2 cat#n#6
cat#n#7 : big_cat#n#1 cat#n#7
cat#n#8 : computerized_tomography#n#1 computed_tomography#n#1 CT#n#2 computerized_axial_tomography#n#1 computed_axial_tomography#n#1 CAT#n#8
P.S.
I have never written perl before, but have been looking into perl scripts since morning - and can now understand the basic stuff. Just need to know if there is cleaner way to do this using the api docs - couldn't figure out from the api or usergroup archives.
Update:
I think I'll settle with:
wn cat -synsn | sed '1,6d' |sed 's/Sense [[:digit:]]//g' | sed 's/[[:space:]]*=> //' | sed '/^$/d'
sed rocks!
I think you'll find the following hepful...
http://marimba.d.umn.edu/WordNet-Pairs/
What are the N most similar words to X, according to WordNet?
This data seeks to answer that question, where similarity is based on
measures from WordNet::Similarity. http://wn-similarity.sourceforge.net
-------------- verb data
These files were created with WordNet::Similarity version 2.05 using
WordNet 3.0. They show all the pairwise verb-verb similarities found
in WordNet according to the path, wup, lch, lin, res, and jcn measures.
The path, wup, and lch are path-based, while res, lin, and jcn are based
on information content.
As of March 15, 2011 pairwise measures for all verbs using the six
measures above are availble, each in their own .tar file. Each *.tar
file is named as WordNet-verb-verb-MEASURE-pairs.tar, and is approx
2.0 - 2.4 GB compressed. In each of these .tar files you will find
25,047 files, one for each verb sense. Each file consists of 25,048 lines,
where each line (except the first) contains a WordNet verb sense and the
similarity to the sense featured in that particular file. Doing
the math here, you find that each .tar file contains about 625,000,000
pairwise similarity values. Note that these are symmetric (sim (A,B)
= sim (B,A)) so you have a bit more than 300 million unique values.
-------------- noun data
As of August 19, 2011 pairwise measures for all nouns using the path
measure are available. This file is named WordNet-noun-noun-path-pairs.tar.
It is approximately 120 GB compressed. In this file you will find
146,312 files, one for each noun sense. Each file consists of
146,313 lines, where each line (except the first) contains a WordNet
noun sense and the similarity to the sense featured in that particular
file. Doing the math here, you find that each .tar file contains
about 21,000,000,000 pairwise similarity values. Note that these
are symmetric (sim (A,B) = sim (B,A)) so you have around 10 billion
unique values.
We are currently running wup, res, and lesk, but do not have an
estimated date of availability yet.
Put this is a script, say synonym.sh
wn $1 -synsn | sed '1,6d' |sed 's/Sense [[:digit:]]//g' | sed 's/[[:space:]]*=> //' | sed '/^$/d' | sed 's/ //g' | grep -iv $1 | tr '\n' ','
wn $1 -synsv | sed '1,6d' |sed 's/Sense [[:digit:]]//g' | sed 's/[[:space:]]*=> //' | sed '/^$/d' | sed 's/ //g' | grep -iv $1 | tr '\n' ',';echo
From your perl script
system("/path/synonym.sh","kittens");
system("/path/synonym.sh","cats");