help using command line to extract snippets of data on stdout - perl

I would like the option of extracting the following string/data:
/work/foo/processed/25
/work/foo/processed/myproxy
/work/foo/processed/sample
=or=
25
myproxy
sample
But it would help if I see both.
From this output using cut or perl or anything else that would work:
Found 3 items
drwxr-xr-x - foo_hd foo_users 0 2011-03-16 18:46 /work/foo/processed/25
drwxr-xr-x - foo_hd foo_users 0 2011-04-05 07:10 /work/foo/processed/myproxy
drwxr-x--- - foo_hd testcont 0 2011-04-08 07:19 /work/foo/processed/sample
Doing a cut -d" " -f6 will get me foo_users, testcont. I tried increasing the field to higher values and I'm just not able to get what I want.
I'm not sure if cut is good for this or something like perl?
The base directories will remain static /work/foo/processed.
Also, I need the first line Found Xn items removed. Thanks.

You can do a substitution from beginning to the first occurrence of / , (non greedily)
$ your_command | ruby -ne 'print $_.sub(/.*?\/(.*)/,"/\\1") if /\//'
/work/foo/processed/25
/work/foo/processed/myproxy
/work/foo/processed/sample
Or you can find a unique separator (field delimiter) to split on. for example, the time portion is unique , so you can split on that and get the last element. (2nd element)
$ ruby -ne 'print $_.split(/\s+\d+:\d+\s+/)[-1] if /\//' file
/work/foo/processed/25
/work/foo/processed/myproxy
/work/foo/processed/sample
With awk,
$ awk -F"[0-9][0-9]:[0-9][0-9]" '/\//{print $NF}' file
/work/foo/processed/25
/work/foo/processed/myproxy
/work/foo/processed/sample

perl -lanF"\s+" -e 'print #F[-1] unless /^Found/' file
Here is an explanation of the command-line switches used:
-l: remove line break from each line of input, then add one back on print
-a: auto-split each line of input into an #F array
-n: loop through each line of input
-F: the regexp pattern to use for the auto-split (with -a)
-e: the perl code to execute (for each line of input if using -n or -p)
If you want to just output the last portion of your directory path, and the basedir is always '/work/foo/processed', I would do this:
perl -nle 'print $1 if m|/work/foo/processed/(\S+)|' file

Try this out :
<Your Command> | grep -P -o '[\/\.\w]+$'
OR if the directory '/work/foo/processed' is always static then:
<Your Command>| grep -P -o '\/work\/foo\/processed\/.+$'
-o : Show only the part of a matching line that matches PATTERN.
-P : Interpret PATTERN as a Perl regular expression.
In this example, the last word in the input will be matched .
(The word can also contain dot(s)),so file names like 'text_file1.txt', can be matched).
Ofcourse, you can change the pattern, as per your requirement.

If you know the columns will be the same, and you always list the full path name, you could try something like:
ls -l | cut -c79-
which would cut out the 79th character until the end. That might work in this exact case, but I think it would be better to find the basename of the last field. You could easily do this in awk or perl. Respond if this is not what you want and I'll add the awk and perl versions.

take the output of your ls command and pipe it to awk
your command|awk -F'/' '{print $NF}'

your_command | perl -pe 's#.*/##'

Related

Get version of Podspec via command line (bash, zsh) [duplicate]

Given a file, for example:
potato: 1234
apple: 5678
potato: 5432
grape: 4567
banana: 5432
sushi: 56789
I'd like to grep for all lines that start with potato: but only pipe the numbers that follow potato:. So in the above example, the output would be:
1234
5432
How can I do that?
grep 'potato:' file.txt | sed 's/^.*: //'
grep looks for any line that contains the string potato:, then, for each of these lines, sed replaces (s/// - substitute) any character (.*) from the beginning of the line (^) until the last occurrence of the sequence : (colon followed by space) with the empty string (s/...// - substitute the first part with the second part, which is empty).
or
grep 'potato:' file.txt | cut -d\ -f2
For each line that contains potato:, cut will split the line into multiple fields delimited by space (-d\ - d = delimiter, \ = escaped space character, something like -d" " would have also worked) and print the second field of each such line (-f2).
or
grep 'potato:' file.txt | awk '{print $2}'
For each line that contains potato:, awk will print the second field (print $2) which is delimited by default by spaces.
or
grep 'potato:' file.txt | perl -e 'for(<>){s/^.*: //;print}'
All lines that contain potato: are sent to an inline (-e) Perl script that takes all lines from stdin, then, for each of these lines, does the same substitution as in the first example above, then prints it.
or
awk '{if(/potato:/) print $2}' < file.txt
The file is sent via stdin (< file.txt sends the contents of the file via stdin to the command on the left) to an awk script that, for each line that contains potato: (if(/potato:/) returns true if the regular expression /potato:/ matches the current line), prints the second field, as described above.
or
perl -e 'for(<>){/potato:/ && s/^.*: // && print}' < file.txt
The file is sent via stdin (< file.txt, see above) to a Perl script that works similarly to the one above, but this time it also makes sure each line contains the string potato: (/potato:/ is a regular expression that matches if the current line contains potato:, and, if it does (&&), then proceeds to apply the regular expression described above and prints the result).
Or use regex assertions: grep -oP '(?<=potato: ).*' file.txt
grep -Po 'potato:\s\K.*' file
-P to use Perl regular expression
-o to output only the match
\s to match the space after potato:
\K to omit the match
.* to match rest of the string(s)
sed -n 's/^potato:[[:space:]]*//p' file.txt
One can think of Grep as a restricted Sed, or of Sed as a generalized Grep. In this case, Sed is one good, lightweight tool that does what you want -- though, of course, there exist several other reasonable ways to do it, too.
This will print everything after each match, on that same line only:
perl -lne 'print $1 if /^potato:\s*(.*)/' file.txt
This will do the same, except it will also print all subsequent lines:
perl -lne 'if ($found){print} elsif (/^potato:\s*(.*)/){print $1; $found++}' file.txt
These command-line options are used:
-n loop around each line of the input file
-l removes newlines before processing, and adds them back in afterwards
-e execute the perl code
You can use grep, as the other answers state. But you don't need grep, awk, sed, perl, cut, or any external tool. You can do it with pure bash.
Try this (semicolons are there to allow you to put it all on one line):
$ while read line;
do
if [[ "${line%%:\ *}" == "potato" ]];
then
echo ${line##*:\ };
fi;
done< file.txt
## tells bash to delete the longest match of ": " in $line from the front.
$ while read line; do echo ${line##*:\ }; done< file.txt
1234
5678
5432
4567
5432
56789
or if you wanted the key rather than the value, %% tells bash to delete the longest match of ": " in $line from the end.
$ while read line; do echo ${line%%:\ *}; done< file.txt
potato
apple
potato
grape
banana
sushi
The substring to split on is ":\ " because the space character must be escaped with the backslash.
You can find more like these at the linux documentation project.
Modern BASH has support for regular expressions:
while read -r line; do
if [[ $line =~ ^potato:\ ([0-9]+) ]]; then
echo "${BASH_REMATCH[1]}"
fi
done
grep potato file | grep -o "[0-9].*"

sed or awk to change a specific number in a file on RHEL7

I need help figuring out the syntax or what command to use to find an replace a specific number in a file.
I need to replace the number 10 with 25 in a configuration file. I have tried the following:
sed 's/10/25/g' /etc/security/limits.conf
This changes other instances that contain 10 such as 1000 and 10000 to 2500 and 25000, I need to juct change the need to just change 10 to 25. Please help.
Thank you,
Joseph
The trick here is to limit the sed substitution to the line you want to change. For limits.conf you are best off matching the domain, type and item. So if you wanted to just change a limit for domain #student, type hard, item nproc, you'd use something like
sed '/#student.*hard.*nproc/s/10/25/g' /etc/security/limits.conf
sed -ri '/^#/!s/(^.*)([[:space:]]10$)/\1 25/' /etc/security/limits.conf
With regular expression interpretation enabled (-r or -E), process all lines that don't start with a # by using ! We then split the lines into two sections, and replace the line for the first section followed by a space and 25. The $ ensure that the entry to replace is anchored at the end of the line.
Awk is another option:
awk -i 'NF==4 && $4==10 { gsub("10","25",$4) }1' /etc/security/limits.conf
Check if the line has 4 space delimited fields (NF==4) and the 4th field ($4) is 10. If this condition is met, replace 10 with 25 using gsub and print all lines with 1
The -i is an inplace amend flag on more recent versions of awk. If a compliant version is not available, use:
awk 'NF==4 && $4==10 { gsub("10","25",$4) }1' /etc/security/limits.conf > /etc/security/limits.tmp && mv -f /etc/security/limits.tmp /etc/security/limits.conf
Use this Perl one-liner, where \b stands for word break (so that 10 will not match 210 or 102):
perl -pe 's/\b10\b/25/g' in_file > out_file
Or to change the file in-place:
perl -i.bak -pe 's/\b10\b/25/g' in_file
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-p : Loop over the input one line at a time, assigning it to $_ by default. Add print $_ after each loop iteration.
-i.bak : Edit input files in-place (overwrite the input file). Before overwriting, save a backup copy of the original file by appending to its name the extension .bak.
The regex uses modifier /g : Match the pattern repeatedly.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
perldoc perlrequick: Perl regular expressions quick start

grep -f forEXACT pattern

I want TO extract list of names from other bigger file (input), having that name and some additional information associated with that name. My problem is with grep -f option, as it is not matching the exact entries in input file but some other entries that contain similar name.
I tried:
$ grep -f list.txt -A 1 input >output
Following are the format of files;
list.txt
TE_final_35005
TE_final_1040
Input file
>TE_final_10401
ACGTACGTACGTACGT
>TE_final_35005
ACGTACGATCAGT
>TE_final_1040
ACGTACGTACGT
Required output:
>TE_final_35005
ACGTACGATCAGT
>TE_final_1040
ACGTACGTACGT
output I am getting:
>TE_final_10401
ACGTACGTACGTACGT
>TE_final_35005
ACGTACGATCAGT
>TE_final_1040
ACGTACGTACGT
Although TE_final_10401 is not in the list.txt
How I can use ^ in list?
Please help to match the exact value or suggest other ways to do this.
Add the whole word switch (-w):
grep -w -A1 -f list.txt infile
Output:
>TE_final_35005
ACGTACGATCAGT
>TE_final_1040
ACGTACGTACGT
A couple of things, remove the blanks lines from the files first:
sed -i '/^\s*$/d' file list
Then -w is used to match whole words only and -A1 will print the next line after the match:
$ grep -w -A1 -f list file > new_file
$ cat new_file
>TE_final_35005
ACGTACGATCAGT
>TE_final_1040
ACGTACGTACGT
as others have mentioned, adding the -w flag is the cleanest and easiest approach based on your sample data. but since you explicitly asked how you could use ^ in list.txt, here's another option.
to add ^ and/or $ anchors to each line in list.txt:
$ cat list.txt
^>TE_final_35005[ ]*$
^>TE_final_1040[ ]*$
this searches for your patterns at the start of the line, preceded by a > character, and ignores any trailing spaces. then your previous command will work (assuming you either remove those blank lines or change your argument to -A 2).
if you'd like to add these anchors to the list file automatically (and delete any blank lines at the same time), use this awk construct:
awk '{if($0 != ""){print "^>"$0"[ ]*$"}}' list.txt >newlist.txt
or if you prefer sed inplace editing:
sed -i '/^[ ]*$/d;s/\(.*\)/^>\1[ ]*$/g' list.txt

In-place replacement

I have a CSV. I want to edit the 35th field of the CSV and write the change back to the 35th field. This is what I am doing on bash:
awk -F "," '{print $35}' test.csv | sed -i 's/^0/+91/g'
so, I am pulling the 35th entry using awk and then replacing the "0" in the starting position in the string with "+91". This one works perfet and I get desired output on the console.
Now I want this new entry to get written in the file. I am thinking of sed's "in -place" replacement feature but this fetuare needs and input file. In above command, I cannot provide input file because my primary command is awk and sed is taking the input from awk.
Thanks.
You should choose one of the two tools. As for sed, it can be done as follows:
sed -ri 's/^(([^,]*,){34})0([^,]*)/\1+91\3/' test.csv
Not sure about awk, but #shellter's comment might help with that.
The in-place feature of sed is misnamed, as it does not edit the file in place. Instead, it creates a new file with the same name. eg:
$ echo foo > foo
$ ln -f foo bar
$ ls -i foo bar # These are the same file
797325 bar 797325 foo
$ echo new-text > foo # Changes bar
$ cat bar
new-text
$ printf '/new/s//newer\nw\nq\n' | ed foo # Edit foo "in-place"; changes bar
9
newer-text
11
$ cat bar
newer-text
$ ls -i foo bar # Still the same file
797325 bar 797325 foo
$ sed -i s/new/newer/ foo # Does not edit in-place; creates a new file
$ ls -i foo bar
797325 bar 792722 foo
Since sed is not actually editing the file in place, but writing a new file and then renaming it to the old file, you might as well do the same.
awk ... test.csv | sed ... > test.csv.1 && mv test.csv.1 test.csv
There is the misperception that using sed -i somehow avoids the creation of the temporary file. It does not. It just hides the fact from you. Sometimes abstraction is a good thing, but other times it is unnecessary obfuscation. In the case of sed -i, it is the latter. The shell is really good at file manipulation. Use it as intended. If you do need to edit a file in place, don't use the streaming version of ed; just use ed
So, it turned out there are numerous ways to do it. I got it working with sed as below:
sed -i 's/0\([0-9]\{10\}\)/\+91\1/g' test.csv
But this is little tricky as it will edit any entry which matches the criteria. however in my case, It is working fine.
Similar implementation of above logic in perl:
perl -p -i -e 's/\b0(\d{10})\b/\+91$1/g;' test.csv
Again, same caveat as mentioned above.
More precise way of doing it as shown by Lev Levitsky because it will operate specifically on the 35th field
sed -ri 's/^(([^,]*,){34})0([^,]*)/\1+91\3/g' test.csv
For more complex situations, I will have to consider using any of the csv modules of perl.
Thanks everyone for your time and input. I surely know more about sed/awk after reading your replies.
This might work for you:
sed -i 's/[^,]*/+91/35' test.csv
EDIT:
To replace the leading zero in the 35th field:
sed 'h;s/[^,]*/\n&/35;/\n0/!{x;b};s//+91/' test.csv
or more simply:
|sed 's/^\(\([^,]*,\)\{34\}\)0/\1+91/' test.csv
If you have moreutils installed, you can simply use the sponge tool:
awk -F "," '{print $35}' test.csv | sed -i 's/^0/+91/g' | sponge test.csv
sponge soaks up the input, closes the input pipe (stdin) and, only then, opens and writes to the test.csv file.
As of 2015, moreutils is available in package repositories of several major Linux distributions, such as Arch Linux, Debian and Ubuntu.
Another perl solution to edit the 35th field in-place:
perl -i -F, -lane '$F[34] =~ s/^0/+91/; print join ",",#F' test.csv
These command-line options are used:
-i edit the file in-place
-n loop around every line of the input file
-l removes newlines before processing, and adds them back in afterwards
-a autosplit mode – split input lines into the #F array. Defaults to splitting on whitespace.
-e execute the perl code
-F autosplit modifier, in this case splits on ,
#F is the array of words in each line, indexed starting with 0
$F[34] is the 35 element of the array
s/^0/+91/ does the substitution

How to "grep" out specific line ranges of a file

There are often times I will grep -n whatever file to find what I am looking for. Say the output is:
1234: whatev 1
5555: whatev 2
6643: whatev 3
If I want to then just extract the lines between 1234 and 5555, is there a tool to do that? For static files I have a script that does wc -l of the file and then does the math to split it out with tail & head but that doesn't work out so well with log files that are constantly being written to.
Try using sed as mentioned on
http://linuxcommando.blogspot.com/2008/03/using-sed-to-extract-lines-in-text-file.html. For example use
sed '2,4!d' somefile.txt
to print from the second line to the fourth line of somefile.txt. (And don't forget to check http://www.grymoire.com/Unix/Sed.html, sed is a wonderful tool.)
The following command will do what you asked for "extract the lines between 1234 and 5555" in someFile.
sed -n '1234,5555p' someFile
If I understand correctly, you want to find a pattern between two line numbers. The awk one-liner could be
awk '/whatev/ && NR >= 1234 && NR <= 5555' file
You don't need to run grep followed by sed.
Perl one-liner:
perl -ne 'if (/whatev/ && $. >= 1234 && $. <= 5555) {print}' file
Line numbers are OK if you can guarantee the position of what you want. Over the years, my favorite flavor of this has been something like this:
sed "/First Line of Text/,/Last Line of Text/d" filename
which deletes all lines from the first matched line to the last match, including those lines.
Use sed -n with "p" instead of "d" to print those lines instead. Way more useful for me, as I usually don't know where those lines are.
Put this in a file and make it executable:
#!/usr/bin/env bash
start=`grep -n $1 < $3 | head -n1 | cut -d: -f1; exit ${PIPESTATUS[0]}`
if [ ${PIPESTATUS[0]} -ne 0 ]; then
echo "couldn't find start pattern!" 1>&2
exit 1
fi
stop=`tail -n +$start < $3 | grep -n $2 | head -n1 | cut -d: -f1; exit ${PIPESTATUS[1]}`
if [ ${PIPESTATUS[0]} -ne 0 ]; then
echo "couldn't find end pattern!" 1>&2
exit 1
fi
stop=$(( $stop + $start - 1))
sed "$start,$stop!d" < $3
Execute the file with arguments (NOTE that the script does not handle spaces in arguments!):
Starting grep pattern
Stopping grep pattern
File path
To use with your example, use arguments: 1234 5555 myfile.txt
Includes lines with starting and stopping pattern.
If I want to then just extract the lines between 1234 and 5555, is
there a tool to do that?
There is also ugrep, a GNU/BSD grep compatible tool but one that offers a -K option (or --range) with a range of line numbers to do just that:
ugrep -K1234,5555 -n '' somefile.log
You can use the usual GNU/BSD grep options and regex patterns (but it also offers a lot more such as -K.)
If you want lines instead of line ranges, you can do it with perl: eg. if you want to get line 1, 3 and 5 from a file, say /etc/passwd:
perl -e 'while(<>){if(++$l~~[1,3,5]){print}}' < /etc/passwd