Extract only numbers from a line of a txt file - sed

The output of certain command contains
>> ..................546 Jobs Retrieved
List of jobs Retrieved: 1-4,6-12,14,2017-2018 ............
>>> 30 Jobs Done
Jobs terminated: retrieve them with: crab -getoutput <List of jobs>
List of jobs: 203,376,578,765,803,809,811
.....................
And I want to extract only 203,376,578,765,803,809,811 that occurs after line 30 Jobs Done. And after that I neet to put this number as a string in certain variable to use this in some command. How can I do it.
I tried it in this way:
I put the output in a status.log file
$ sed -e '1,/Jobs Done/d' status.log | grep "List of jobs:"
then I got only line
List of jobs: 578,765,811,836,1068,1096,1128
but I don't need the phrase "List of jobs"
Please help me.
Thank you very much in advance.

You can use this:
awk '/30 Jobs Done/ {f=1;next} f && /List of jobs:/ {print $4;exit}' file
203,376,578,765,803,809,811
When it find 30 Jobs Done it set flag f to true.
If it then finds List of jobs: and flag f is true, print field 4

Using simple tools:
egrep '^\s+List of jobs: [0-9,]+$' status.log | cut -d: -f2
The pattern for egrep matches the whole line and the cut returns everything after the :.
That means you will get a leading space in the result. If that's a problem:
egrep '^\s+List of jobs: [0-9,]+$' status.log | cut -d: -f2 | cut -c2-

You could do this:
grep -A2 "Jobs Done" yourfile | awk '/List of jobs:/{print $4}'
Grab two lines following "Jobs Done" (-A2) and then look for "List of jobs" with awk and print 4th field.

Related

sed with filename from pipe

In a folder I have many files with several parameters in filenames, e.g (just with one parameter) file_a1.0.txt, file_a1.2.txt etc.
These are generated by a c++ code and I'd need to take the last one (in time) generated. I don't know a priori what will be the value of this parameter when the code is terminated. After that I need to copy the 2nd line of this last file.
To copy the 2nd line of the any file, I know that this sed command works:
sed -n 2p filename
I know also how to find the last generated file:
ls -rtl file_a*.txt | tail -1
Question:
how to combine these two operation? Certainly it is possible to pipe the 2nd operation to that sed operation but I dont know how to include filename from pipe as input to that sed command.
You can use this,
ls -rt1 file_a*.txt | tail -1 | xargs sed -n '2p'
(OR)
sed -n '2p' `ls -rt1 file_a*.txt | tail -1`
sed -n '2p' $(ls -rt1 file_a*.txt | tail -1)
Typically you can put a command in back ticks to put its output at a particular point in another command - so
sed -n 2p `ls -rt name*.txt | tail -1 `
Alternatively - and preferred, because it is easier to nest etc -
sed -n 2p $(ls -rt name*.txt | tail -1)
-r in ls is reverse order.
-r, --reverse
reverse order while sorting
But it is not good idea when used it with tail -1.
With below change (head -1 without r option in ls), performance will be better, that you needn't wait to list all files then pipe to tail command
sed -n 2p $(ls -t1 name*.txt | head -1 )
I was looking for a similar solution: taking the file names from a pipe of grep results to feed to sed. I've copied my answer here for the search & replace, but perhaps this example can help as it calls sed for each of the names found in the pipe:
this command to simply find all the files:
grep -i -l -r foo ./*
this one to exclude this_shell.sh (in case you put the command in a script called this_shell.sh), tee the output to the console to see what happened, and then use sed on each file name found to replace the text foo with bar:
grep -i -l -r --exclude "this_shell.sh" foo ./* | tee /dev/fd/2 | while read -r x; do sed -b -i 's/foo/bar/gi' "$x"; done
I chose this method, as I didn't like having all the timestamps changed for files not modified. Feeding the grep result allows only the files with target text to be looked at (thus likely may improve performance / speed as well)
be sure to backup your files & test before using. May not work in some environments for files with embedded spaces. (?)
fwiw - I had some problems using the tail method, it seems that the entire dataset was generated before calling tail on just the last item.

Shell: sed pipeline

I'm trying to make a script that redirects data from a serial port to other one.
I have realizate it using this command:
cat /dev/ttyS0 > /dev/ttyS1
Everything works but, now I would also logging data. I thought I'd use the tee command:
  
cat /dev/ttyS0 | tee /dev/ttyS1 log.txt
Now I want to make sure that every time it is recorded on the log file should be preceded by the string "from S0 to S1:" I tried this:
cat /dev/ttyS0 | tee /dev/ttyS1 | sed 's/$/from S0 to S1/' | less > log.txt
But it does not work, the file remains empty.
Where am I doing wrong?
Try:
cat /dev/ttyS0 | tee /dev/ttyS1 | sed 's/^/from S0 to S1: /' | tee log.txt
Since you wanted to prefix the line with the string, the $ in your sed has been replaced by ^. The substituted output is sent to STDOUT that can serve as an input for tee.
Not sure if this helps, but I'd remove the pager from the pipeline and redirect the sed output directly to the file. Also, if you want to prepend text you need to match the beginning of a line (^) not the end of a line ($).
... | sed 's/^/from S0 to S1: /' > log.txt
Also, what does the input look like in the first place? Does it contain linebreaks that the pattern could match?

sed: how to determine if line 1 is contained in line 2

My text file is sorted alphabetically. I want to determine if each line is contained within the following line, and if so, delete the first of the two. So, for example, if I had...
car
car and trailer
train
... I want to end up with...
car and trailer
train
I found the "sed one-liners" page(s), which has the code to search out duplicate lines:
sed '$!N; /^(.*)\n\1$/!P; D'
... and I figured deleting the ^ would do the trick, but it didn't.
(It would also be nice to do this with non-consecutive lines, but my files run to thousands of lines, and it would probably take a script hours, or days, to run.)
The original command
sed '$!N; /^\(.*\)\n\1$/!P; D'
Looks for an exact line match. As you want to check if the first line is contained in the second, you need to add some wild cards:
sed '$!N; /^\(.*\)\n.*\1.*$/!P; D'
Should do it.
sed is an excellent tool for simple substitutions on a single line, for anything else just use awk:
awk '$0 !~ prev{print prev} {prev=$0} END{print}' file
You said:
It would also be nice to do this with non-consecutive lines.
Here is a bash script to remove all shorter lines contained within another line, not necessarily consecutive, case-insensitive:
#!/bin/bash
# sed with I and Q are gnu extensions:
cat test.txt | while read line; do
echo Searching for: $line
sed -n "/.$line/IQ99;/$line./IQ99" test.txt # or grep -i
if [ $? -eq 99 ]; then
echo Removing: $line
sed -i "/^$line$/d" test.txt
fi
done
Test:
$ cat test.txt
Boat
Car
Train and boat
car and cat
$ my_script
Searching for: Boat
Removing: Boat
Searching for: Car
Removing: Car
Searching for: Train and boat
Searching for: car and cat
$ cat test.txt
Train and boat
car and cat

How to "grep" out specific line ranges of a file

There are often times I will grep -n whatever file to find what I am looking for. Say the output is:
1234: whatev 1
5555: whatev 2
6643: whatev 3
If I want to then just extract the lines between 1234 and 5555, is there a tool to do that? For static files I have a script that does wc -l of the file and then does the math to split it out with tail & head but that doesn't work out so well with log files that are constantly being written to.
Try using sed as mentioned on
http://linuxcommando.blogspot.com/2008/03/using-sed-to-extract-lines-in-text-file.html. For example use
sed '2,4!d' somefile.txt
to print from the second line to the fourth line of somefile.txt. (And don't forget to check http://www.grymoire.com/Unix/Sed.html, sed is a wonderful tool.)
The following command will do what you asked for "extract the lines between 1234 and 5555" in someFile.
sed -n '1234,5555p' someFile
If I understand correctly, you want to find a pattern between two line numbers. The awk one-liner could be
awk '/whatev/ && NR >= 1234 && NR <= 5555' file
You don't need to run grep followed by sed.
Perl one-liner:
perl -ne 'if (/whatev/ && $. >= 1234 && $. <= 5555) {print}' file
Line numbers are OK if you can guarantee the position of what you want. Over the years, my favorite flavor of this has been something like this:
sed "/First Line of Text/,/Last Line of Text/d" filename
which deletes all lines from the first matched line to the last match, including those lines.
Use sed -n with "p" instead of "d" to print those lines instead. Way more useful for me, as I usually don't know where those lines are.
Put this in a file and make it executable:
#!/usr/bin/env bash
start=`grep -n $1 < $3 | head -n1 | cut -d: -f1; exit ${PIPESTATUS[0]}`
if [ ${PIPESTATUS[0]} -ne 0 ]; then
echo "couldn't find start pattern!" 1>&2
exit 1
fi
stop=`tail -n +$start < $3 | grep -n $2 | head -n1 | cut -d: -f1; exit ${PIPESTATUS[1]}`
if [ ${PIPESTATUS[0]} -ne 0 ]; then
echo "couldn't find end pattern!" 1>&2
exit 1
fi
stop=$(( $stop + $start - 1))
sed "$start,$stop!d" < $3
Execute the file with arguments (NOTE that the script does not handle spaces in arguments!):
Starting grep pattern
Stopping grep pattern
File path
To use with your example, use arguments: 1234 5555 myfile.txt
Includes lines with starting and stopping pattern.
If I want to then just extract the lines between 1234 and 5555, is
there a tool to do that?
There is also ugrep, a GNU/BSD grep compatible tool but one that offers a -K option (or --range) with a range of line numbers to do just that:
ugrep -K1234,5555 -n '' somefile.log
You can use the usual GNU/BSD grep options and regex patterns (but it also offers a lot more such as -K.)
If you want lines instead of line ranges, you can do it with perl: eg. if you want to get line 1, 3 and 5 from a file, say /etc/passwd:
perl -e 'while(<>){if(++$l~~[1,3,5]){print}}' < /etc/passwd

Extracting a string from a file name

My script takes a file name in the form R#TYPE.TXT (# is a number and TYPE is two or three characters).
I want my script to give me TYPE. What should I do to get it? Guess I need to use awk and sed.
I'm using /bin/sh (which is a requirement)
you can use awk
$ echo R1CcC.TXT | awk '{sub(/.*[0-9]/,"");sub(".TXT","")}{print}'
CcC
or
$ echo R1CcC.TXT | awk '{gsub(/.*[0-9]|\.TXT$/,"");print}'
CcC
and if sed is really what you want
$ echo R9XXX.TXT | sed 's/R[0-9]\(.*\)\.TXT/\1/'
XXX
I think this is what you are looking for.
$ echo R3cf.txt | sed "s/.[0-9]\(.*\)\..*/\1/"
cf
If txt is always upper case and the filename always starts with R you could do something like.
$ echo R3cf.txt | sed "s/R[0-9]\(.*\)\.TXT/\1/"
You can use just the shell (depending what shell your bin/sh is:
f=R9ABC.TXT
f="${f%.TXT}" # remove the extension
type="${f#R[0-9]}" # remove the first bit
echo "$type" # ==> ABC