I want to add one to the last value at the end of a string in sed.
I'm thinking along the lines of
cat 0809_data.csv |sed -e 's/\([0-9]\{6\}\).*\(,[^,]*$\)/\1\2/g'| export YEARS = $(echo `grep -o '[^,]*$' + 1`|bc)
e.g. 123456, kjhsflk, lksjgrlks, 2.8 -> 123456, 3.8
Would this be more reasonable/feasible in awk?
This should work:
years=$(awk -F, 'BEGIN{ OFS=", "} {print $1, $4+1}' 0809_data.csv)
It would be really awkward to try to use sed and do arithmetic with part of the result. You'd have to pull the string apart and do the math and put everything back together. AWK does that neatly without any fuss.
Notice that cat is not necessary (even using sed in a command similar to the one in your question) and it's probably not necessary to export the variable unless you're calling another script and need it to be able to access it as a "global" variable. Also, shells generally do integer math so you don't need to use bc unless you need floats.
Related
I'm trying to come up with a sed script to take all lines containing a pattern and move them to the end of the output. This is an exercise in learning hold vs pattern space and I'm struggling to come up with it (though I feel close).
I'm here:
$ echo -e "hi\nfoo1\nbar\nsomething\nfoo2\nyo" | sed -E '/foo/H; //d; $G'
hi
bar
something
yo
foo1
foo2
But I want the output to be:
hi
bar
something
yo
foo1
foo2
I understand why this is happening. It is because the first time we find foo the hold space is empty so the H appends \n to the blank hold space and then the first foo, which I suppose is fine. But then the $G does it again, namely another append which appends \n plus what is in the hold space to the pattern space.
I tried a final delete command with /^$/d but that didn't remove the blank line (I think this is because this pattern is being matched not against the last line, but against the, now, multiline pattern space which has a \n\n in it.
I'm sure the sed gurus have a fix for me.
This might work for you (GNU sed):
sed '/foo/H;//!p;$!d;x;//s/.//p;d' file
If the line contains the required string append it to the hold space (HS) otherwise print it as normal. If it is not the last line delete it otherwise swap the HS for the pattern space (PS). If the required string(s) is now in the PS (what was the HS); since all such patterns were appended, the first character will be a newline, delete the first character and print. Delete whatever is left.
An alternative, using the -n flag:
sed -n '/foo/H;//!p;$!b;x;//s/.//p' file
N.B. When the d or b (without a parameter) command is performed no further sed commands are, a new line is read into the PS and the sed script begins with the first command i.e. the sed commands do not resume following the previous d command.
Why? Stuff like this is absolutely trivial in awk, awk is available everywhere that sed is, and the resulting awk script will be simpler, more portable, faster and better in almost every other way than a sed script to do the same task. All that hold space stuff was necessary in sed before the mid-1970s when awk was invented but there's absolutely no use for it now other than as a mental exercise.
$ echo -e "hi\nfoo1\nbar\nsomething\nfoo2\nyo" |
awk '/foo/{buf = buf $0 RS;next} {print} END{printf "%s",buf}'
hi
bar
something
yo
foo1
foo2
The above will work as-is in every awk on every UNIX installation and I bet you can figure out how it works very easily.
This feels like a hack and I think it should be possible to handle this situation more gracefully. The following works on GNU sed:
echo -e "hi\nfoo1\nbar\nsomething\nfoo2\nyo" | sed -r '/foo/{H;d;}; $G; s/\n\n/\n/g'
However, on OSX/BSD sed, results in this odd output:
hi
bar
something
yonfoo1
foo2
Note the 2 consecutive newlines was replaced with the literal character n
The OSX/BSD vs GNU sed is explained in this article. And the following works (in GNU SED as well):
echo -e "hi\nfoo1\nbar\nsomething\nfoo2\nyo" | sed '/foo/{H;d;}; $G; s/\n\n/\'$'\n''/'
TL;DR; in BSD sed, it does not accept escaped characters in the RHS of the replacement expression and so you either have to put a true LF/newline in there at the command line, or do the above where you split the sed script string where you need the newline on the RHS and put a dollar sign in front of '\n' so the shell interprets it as a line feed.
I have this:
$ cat f2
123-foo-456
abc-xx
foo-yy
ddd-ao
abc
6778
123
This gives me: (#1)
$ sed -n -e '/456/,/ddd/{/ddd/{!s/a/A/g;!s/o/Q/g};p}' f2
123-foo-456
abc-xx
foo-yy
ddd-ao
And this gives me: (#2)
$ sed -n -e '/456/,/ddd/{/ddd/!{s/a/A/g;s/o/Q/g};p}' f2
123-fQQ-456
Abc-xx
fQQ-yy
ddd-ao
I prefer #2 since it does what I wanted to get as output.
Can someone explain the difference between the two?
And a good source of documentation that explains the difference?
/ddd/{!s/a/A/g;!s/o/Q/g}
when ddd is on the line (working buffer)
execute sub code { ...}
never (!) address ( with empty adress it mean every line so on no lines) substitute (s/a/A/g) ...
So it do nothing
/ddd/!{s/a/A/g;s/o/Q/g}
when ddd is NOT on the line (working buffer) (! is for address/pattern /ddd/)
execute sub code { ...}
substitue (s/a/A/g), ...
It change a to A on line that does not contain ddd
There is no noteworthy difference between the 2. They are both unintelligible sequences of random characters that became obsolete in the mid-1970s when awk was invented and so should never be used. sed is for simple substitution on individual lines, that is all. If you're using more than s, g, and p (with -n) then you're using the wrong tool. Stop wasting your time on this and just use awk:
$ cat tst.awk
/456/ { f=1 }
f {
if (/ddd/) {
f=0
}
else {
gsub(/a/,"A")
gsub(/o/,"Q")
}
print
}
$ awk -f tst.awk file
123-fQQ-456
Abc-xx
fQQ-yy
ddd-ao
Clear, simple, concise, robust, efficient, portable and better in every other way than an equivalent sed solution.
Or if having everything squeezed onto one line is appealing to you:
$ awk '/456/{f=1}f{if(/ddd/)f=0;else{gsub(/a/,"A");gsub(/o/,"Q")}print}' file
123-fQQ-456
Abc-xx
fQQ-yy
ddd-ao
You COULD write the awk script in the same style as the sed script:
$ awk '/456/,/ddd/{if(!/ddd/){gsub(/a/,"A");gsub(/o/,"Q")}print}' file
123-fQQ-456
Abc-xx
fQQ-yy
ddd-ao
but then you get the duplicated conditions (/ddd/ twice) that come with using range expressions which is one reason why they should never be used. Fortunately, unlike sed, awk has variables and so you never need to write range expressions.
I have a file with multiple lines and for line 2 to the end of the file I want to swap fields 8 and 9. The file is comma separated and I'd like to do the swap inline so I can run it on a batch of files using * wildcard. If this can be accomplished similarly with awk then that works for me too.
example:
header1,header2,header3,...,header8,header9,...,headerN
field1.1,...,field1.9,field1.8,...,field1.N
field2.1,...,field2.9,field2.8,...,field2.N
field3.1,...,field3.9,field3.8,...,field3.N
...
I think the command would look similar to sed -r -i '2,$s/^(([^,]*,){8})([^,]*,)([^,]*,)(.*)/\1\3\2\4/' temp*.log,
but \2 is not what I expect, it is the 7th field. I know that \2 will not be the 8th field because I have double parentheses there, but I'm not sure how to fix it. Could somebody please explain what this equation is doing and specifically what [^,] is doing and how the {8} is applied?
Thanks in advance.
In awk, you might use:
awk -F',' 'BEGIN {OFS=","} {t = $8; $8 = $9; $9 = t; print}'
In sed, the command is more convoluted, but it could be done.
sed -e 's/^\(\([^,]*,\)\{7\}\)\([^,]*,\)\([^,]*,\)/\1\4\3/'
Add the -i .bak option if your version of sed (e.g. GNU or BSD) supports it.
This uses the universally available sed regexes (it would work on even archaic versions of sed). You could lose most of the backslashes if you used 'extended regular expressions' instead:
sed -r -i 's/^(([^,]*,){7})([^,]*,)([^,]*,)/\1\4\3\5/'
Note the nested remembered (captured) patterns. The outer set is \1, the inner set would be \2 but that gets repeated 7 times, so you'd have the seventh field as \2. Anyway, that's why the eighth and ninth columns are switched with \4 and \3. \5 are the remaining columns.
(I note in passing that it would have been helpful to have some sample data in sufficiently the correct format to test with. It was a nuisance having to edit what is shown in the question to be able to test the code.)
If you need to do much CSV work, then either use Perl and its CSV modules (Text::CSV and Text::CSV_XS) or Python and its CSV module, or get CSVfix.
$2 is the second part in the RE
Denumbered by first occurence of (.
So in
'2,$s/^(([^,]*,){8})([^,]*,)([^,]*,)(.*)/\1\3\2\4/'
You could see (followind alignment):
$1 = (([^,]*,){8})
$2 = ([^,]*,)
$3 = ([^,]*,)
$4 = ([^,]*,)
and finaly $5 = (.*)
In this specific case, $2 must hold the last match of the height ({8}).
it seems that awk is the right tool:
awk -F',' -v OFS=',' '{t=$8;$8=$9;$9=t}7' file
This might work for you (GNU sed):
sed -ri '1!s/(,[^,]*)(,[^,]*)/\2\1/4' file
This swaps the 9th field with the 8th i.e. 8 / 2 = 4, if you wanted the 7th with the 8th:
sed -ri '1!{s/^/,/;s/(,[^,]*)(,[^,]*)/\2\1/4;s/^,//}' file
I have a tab delimited file and I want the output to have the entire line in my file if values in column 1 are the same as the values in column 3. Having very limited knowledge in perl and linux, this is as close as I came to a solution.
File example
Apple Sugar Apple
Apple Butter Orange
Raisins Flour Orange
Orange Butter Orange
The results would be:
Apple Sugar Apple
Orange Butter Orange
Code:
#!/bin/sh
awk '{
prev=$0; f1=$1; f3=$3;
getline
if ($1 == $3) {
print prev
print
}'
} myfilename
I am sure that there is an easier solution to it. Maybe even a grep or awk on the command line. But that was the only code I could find that seemed to give me my solution.
Thanks!
It's easy with awk:
awk '$1 == $3' myfile
The default action is to print out the record, so if fields 1 and 3 are equal, that's what will happen.
Using awk
awk is the tool for the job:
awk '$1 == $3'
If your fields in the data are strictly tab separated and may contain blanks, then you will need to specify the field separator explicitly:
awk -F'\t' '$1 == $3'
(where the The \t represents a tab; you may have to type Tab (or even Control-VTab) to get it into the string).
Using grep
You can do it with grep, but you don't want to do it with grep:
grep -E '([A-Za-z]+)\t[A-Za-z]+\t\1'
The key part of the regex is the \1 which means 'the same value as the first captured string.
You might even go through gyrations like this in bash:
grep -E $'([A-Za-z]+)\t[A-Za-z]+\t\\1'
You could simplify life by noting (assuming) there are no spaces within fields:
grep -E '([A-Za-z]+)[[:space:]]+[A-Za-z]+[[:space:]]+\1'
As noted in one of the comments, I didn't put a $ at the end of the search pattern; it would be feasible (though the data would have to be cleaned up to contain tabs and drop trailing blanks), so that 'Good Noise GoodBad' would not be picked up. There are other ways to do it, and you can make the regex more and more complex to handle more possible situations. But those only go to emphasize that the awk solution is better; awk deals with the details automatically.
Using grep:
grep -P "([^\t]+)\t[^\t]+\t\1" inFile
I've got a file called 'res' that's 29374 characters of http data in a one-line string. Inside it, there are several http links, but I only want to be display those that end in '/idNNNNNNNNN' where N is a digit. In fact I'm only interested in the string 'idNNNNNNNNN'.
I've tried with:
cat res | sed -n '0,/.*\(id[0-9]*\).*/s//\1/p'
but I get the whole file.
Do you know a way to do it?
perl -n -E 'say $1 while m!/id(\d{9})!g' input-file
should work. That assumes exactly 9 digits; that's the {9} in the above. You can match 8 or 9 ({8,9}), 8 or more ({8,}), up to 9 ({0,9}), etc.
Example of this working:
$ echo -n 'junk jumk http://foo/id231313 junk lalala http://bar/id23123 asda' | perl -n -E 'say $1 while m!id(\d{0,9})!g'
231313
23123
That's with the 0 to 9 variant, of course.
If you're stuck with a pre-5.10 perl, use -e instead of -E and print "$1\n" instead of say $1.
How it works
First is the two command-line arguments to Perl. -n tells Perl to read input from standard input or files given on the command line, line by line, setting $_ to each line. $_ is perl's default target for a lot of things, including regular expression matches. -E merely tells Perl that the next argument is a Perl one-liner, using the new language features (vs. -e which does not use the 5.10 extensions).
So, looking at the one liner: say means to print out some value, followed by a newline. $1 is the first regular expression capture (captures are made by parentheses in regular expressions). while is a looping construct, which you're probably familiar with. m is the match operator, the ! after it is the regular expression delimiter (normally, you see / here, but since the pattern contains / it's easier to use something else, so you don't have to escape the / as \/). /id(\d{9}) is the regular expression to match. Keep in mind that the delimiter is !, so the / is not special, it just matches a literal /. The parentheses form a capture group, so $1 will be the number. The ! is the delimiter, followed by g which means to match as many times as possible (as opposed to once). This is what makes it pick up all the URLs in the line, not just the first. As long as there is a match, the m operator will return a true value, so the loop will continue (and run that say $1, printing out the match).
Two-sed solution
I think this is one way to do this with only sed. Much more complicated!
echo 'junk jumk http://foo/id231313 junk lalala http://bar/id23123 asda' | \
sed 's!http://!\nhttp://!g' | \
sed 's!^.*/id\([0-9]*\).*$!\1!'
cat res | perl -ne 'chomp; print "$1\n" if m/\/(id\d*)/'
The trouble is that sed and grep and awk work on lines, and you've only got one line. So, you probably need to split things up so you have more than one line -- then you can make the normal tools work.
tr ':' '\012' < res |
sed -n 's%.*/\(id[0-9][0-9]*\).*%\1%p'
This takes advantage of URLs containing colons and maps colons to newlines with tr, then uses sed to pick up anything up to a slash, followed by id and one or more digits, followed by anything, and prints out the id and digit string (only). Since these only occur in URLs, they will only appear one per line and relatively near the start of the line too.
Here's a solution using only one invocation of sed:
sed -n 's| |\n|g;/^http/{s|http://[^/]*/id\([0-9]*\)|\1|;P};D' inputfile
Explanation:
s| |\n|g; - Divide and conquer
/^http/{ - If pattern space begins with "http"
s|http://[^/]*/id\([0-9]*\)|\1|; - capture the id
P - Print the string preceding the first newline
}; - end if
D - Delete the string preceding the first newline regardless of whether it contains "http"
Edit:
This version uses the same technique but is more selective.
sed -n 's|http://|\n&|g;/^\n*http/{s|\n*http://[^/]*/id\([0-9]*\)|\1\n|;P};D' inputfile