I want to pre-pend a directory name to the last word in a line. The line has the following format:
100644 bfadfab6f98b8fa1e9989fe16b2bf0fb13ffd39e 0^IoneFile$
where ^I denotes a tab, and $ denotes the end-of-line. This line is generated by git ls-files -s.
I want a sed command to prepend one/ to the filename in this line, like so:
`100644 bfadfab6f98b8fa1e9989fe16b2bf0fb13ffd39e 0^Ione/oneFile$
Some of the lines that I've tried, and their corresponding outputs:
Match the longest string of characters that are not \t followed by $; append one/:
$ git ls-files -s | sed 's|[^\t]*$|one/&|'
one/100644 bfadfab6f98b8fa1e9989fe16b2bf0fb13ffd39e 0 oneFile
Match the longest string of characters that are not \t or ' ' followed by $; pre-pend one/:
$ git ls-files -s | sed 's|[^\t ]*$|one/&|'
100644 bfadfab6f98b8fa1e9989fe16b2bf0fb13ffd39e one/0 oneFile
Match the longest string of characters that are not horizontal whitespace, prepend 'one/':
$ git ls-files -s | sed 's|[^[[:blank:]]]*$|one/&|'
100644 bfadfab6f98b8fa1e9989fe16b2bf0fb13ffd39e 0 oneFilone/e
I've basically tried a whole bunch of things matching for:
[^\t ]*$|one/&
[^[[:space:]]]*$|one/&
and the ones listed above. The closest I can get is to have oneFilone/e, which was [^[[:blank:]]]*$|one/&|', or to pre-pend to the 0, but I can't seem to quite get what I want.
EDIT
Because a few people have commented / posted answers, none of which work for me, I figured I'd add: I am using Mac OS X 10.7.3. The version of sed I'm not completely sure of (if anybody knows a way to get it feel free to add a comment to that effect) - the man sed page says it's a BSD sed. I'm not sure how different that is to GNU sed, if any.
I'm also using zsh, with oh-my-zsh running (prettymuch unmodified). I have turned on extended_glob (setopt extended_glob).
I've commented with my results for the answers people have given; I assume they are run on a Linux distribution? I don't have access to a non-OS X system tonight, but I will re-run any answers tomorrow; maybe it's just my [shell|OS|bad karma] that isn't letting them work for me.
EDIT Again:
So I've tested on a Ubuntu system, and the (1) above does work. I'd love a working version for my Mac, though.
Final EDIT:
Thanks to all who answered with working equivalent commands. It turns out that my first one does work, but not with BSD sed. I do, however, have gsed available (thanks for pointing that out!) which makes these all magically work.
This is simpler, but seems to work:
echo -e '100644 bfadfab6f98b8fa1e9989fe16b2bf0fb13ffd39e 0 oneFile' | sed -e 's/\([a-zA-Z]\+\)$/one\/\0/g'
100644 bfadfab6f98b8fa1e9989fe16b2bf0fb13ffd39e 0 one/oneFile
It should work with a tab just before oneFile also.
The following works for me:
git ls-files -s | sed -e 's|\t\(.\+\)$|\tone/\1|g'
100644 345242cb0c4e9bb01a6fef9947f4342ff2f68553 0 one/ExView/resource.h
would have been
100644 345242cb0c4e9bb01a6fef9947f4342ff2f68553 0 ExView/resource.h
I've also had trouble when I got strange new line problems. I doubt this is your problem, but in the past, git ls-files -s | tr -d '\r' | ... has been helpful for me.
unable to test right now, but you're using too many brackets with the character classes.
[^[[:space:]]]*$|one/&
should be
[^[:space:]]*$|one/&
With the extra brackets, you get just the characters '[',':','s','p','a','c','e',']' -- explaining why the dir is inserted before the last 'e'
This might work for you (or am I missing something?):
echo -e "100644 bfadfab6f98b8fa1e9989fe16b2bf0fb13ffd39e 0\toneFile" | sed 's/\t/&one\//'
100644 bfadfab6f98b8fa1e9989fe16b2bf0fb13ffd39e 0 one/oneFile
I wanted to do exactly this but did not have gsed around...
The only issue is that OS X sed does not recognise \t as a TAB character, as explained here. You have to use an actual TAB character. Use Option 1 in the original post, but instead of typing \t, press Ctrl+v followed by the Tab key. This is hard to copy and paste here, so you will have to do it yourself. :-)
See also this question.
Related
I am trying to filter a long html page, for leaving only fingerprints which have a consistent structure. for example:
DCD0 5B71 EAB9 4199 527F 44AC DB6B 8C1F 96D8 BF60
i know how to do it by using standrd command line commands as grep, cut and head/tail, but is there more elegant way to do it with sed? the shell comman i use is long and not looking so nice.
thank you
grep is the right tool for extracting strings from a file based on regular expression matching:
grep -Eo '([A-F0-9]{4}[[:space:]]){9}[A-F0-9]{4}' file.html
Here is a sed command tested with GNU sed 4.2.2:
sed -nr '/(([[:xdigit:]]){4} ?){10}/p' file
It matches and prints
10 groups that are made of
4 hexdigits
followed by an optional space
With GNU sed:
sed -E 's/.*(([A-F0-9]{4}[[:space:]]){9}[A-F0-9]{4}).*/\1/' file
I am trying to write a sed expression that can remove urls from a file
example
http://samgovephotography.blogspot.com/ updated my blog just a little bit ago. Take a chance to check out my latest work. Hope all is well:)
Meet Former Child Star & Author Melissa Gilbert 6/15/09 at LA's B&N https://hollywoodmomblog.com/?p=2442 Thx to HMB Contributor #kdpartak :)
But I dont get it:
sed 's/[\w \W \s]*http[s]*:\/\/\([\w \W]\)\+[\w \W \s]*/ /g' posFile
FIXED!!!!!
handles almost all cases, even malformed URLs
sed 's/[\w \W \s]*http[s]*[a-zA-Z0-9 : \. \/ ; % " \W]*/ /g' positiveTweets | grep "http" | more
The following removes http:// or https:// and everything up until the next space:
sed -e 's!http\(s\)\{0,1\}://[^[:space:]]*!!g' posFile
updated my blog just a little bit ago. Take a chance to check out my latest work. Hope all is well:)
Meet Former Child Star & Author Melissa Gilbert 6/15/09 at LA's B&N Thx to HMB Contributor #kdpartak :)
Edit:
I should have used:
sed -e 's!http[s]\?://\S*!!g' posFile
"[s]\?" is a far more readable way of writing "an optional s" compared to "\(s\)\{0,1\}"
"\S*" a more readable version of "any non-space characters" than "[^[:space:]]*"
I must have been using the sed that came installed with my Mac at the time I wrote this answer (brew install gnu-sed FTW).
There are better URL regular expressions out there (those that take into account schemes other than HTTP(S), for instance), but this will work for you, given the examples you give. Why complicate things?
The accepted answer provides the approach that I used to remove URLs, etc. from my files. However it left "blank" lines. Here is a solution.
sed -i -e 's/http[s]\?:\/\/\S*//g ; s/www\.\S*//g ; s/ftp:\S*//g' input_file
perl -i -pe 's/^'`echo "\012"`'${2,}//g' input_file
The GNU sed flags, expressions used are:
-i Edit in-place
-e [-e script] --expression=script : basically, add the commands in script
(expression) to the set of commands to be run while processing the input
^ Match start of line
$ Match end of line
? Match one or more of preceding regular expression
{2,} Match 2 or more of preceding regular expression
\S* Any non-space character; alternative to: [^[:space:]]*
However,
sed -i -e 's/http[s]\?:\/\/\S*//g ; s/www\.\S*//g ; s/ftp:\S*//g'
leaves nonprinting character(s), presumably \n (newlines). Standard sed-based approaches to remove "blank" lines, tabs and spaces, e.g.
sed -i 's/^[ \t]*//; s/[ \t]*$//'
do not work, here: if you do not use a "branch label" to process newlines, you cannot replace them using sed (which reads input one line at a time).
The solution is to use the following perl expression:
perl -i -pe 's/^'`echo "\012"`'${2,}//g'
which uses a shell substitution,
'`echo "\012"`'
to replace an octal value
\012
(i.e., a newline, \n), that occurs 2 or more times,
{2,}
(otherwise we would unwrap all lines), with something else; here:
//
i.e., nothing.
[The second reference below provides a wonderful table of these values!]
The perl flags used are:
-p Places a printing loop around your command,
so that it acts on each line of standard input
-i Edit in-place
-e Allows you to provide the program as an argument,
rather than in a file
References:
perl flags: Perl flags -pe, -pi, -p, -w, -d, -i, -t?
ASCII control codes: https://www.cyberciti.biz/faq/unix-linux-sed-ascii-control-codes-nonprintable/
remove URLs: sed to remove URLs from a file
branch labels: How can I replace a newline (\n) using sed?
GNU sed manual: https://www.gnu.org/software/sed/manual/sed.html
quick regex guide: https://www.gnu.org/software/sed/manual/html_node/Regular-Expressions.html
Example:
$ cat url_test_input.txt
Some text ...
https://stackoverflow.com/questions/4283344/sed-to-remove-urls-from-a-file
https://www.google.ca/search?dcr=0&ei=QCsyWtbYF43YjwPpzKyQAQ&q=python+remove++citations&oq=python+remove++citations&gs_l=psy-ab.3...1806.1806.0.2004.1.1.0.0.0.0.61.61.1.1.0....0...1c.1.64.psy-ab..0.0.0....0.-cxpNc6youY
http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html
https://bbengfort.github.io/tutorials/2016/05/19/text-classification-nltk-sckit-learn.html
http://datasynce.org/2017/05/sentiment-analysis-on-python-through-textblob/
https://www.google.ca/?q=halifax&gws_rd=cr&dcr=0&ei=j7UyWuGKM47SjwOq-ojgCw
http://www.google.ca/?q=halifax&gws_rd=cr&dcr=0&ei=j7UyWuGKM47SjwOq-ojgCw
www.google.ca/?q=halifax&gws_rd=cr&dcr=0&ei=j7UyWuGKM47SjwOq-ojgCw
ftp://ftp.ncbi.nlm.nih.gov/
ftp://ftp.ncbi.nlm.nih.gov/1000genomes/ftp/alignment_indices/20100804.alignment.index
Some more text.
$ sed -e 's/http[s]\?:\/\/\S*//g ; s/www\.\S*//g ; s/ftp:\S*//g' url_test_input.txt > a
$ cat a
Some text ...
Some more text.
$ perl -i -pe 's/^'`echo "\012"`'${2,}//g' a
Some text ...
Some more text.
$
I have a patch file containing the output from git diff. I want to get a summary of all the files that, according to the patch file, have been added or modified. What command can I use to achieve this?
patchutils includes a lsdiff utility.
grep '+++' mydiff.patch seems to do the trick.
I can also use git diff --names-only which is probably the better approach.
grep '+++' mydiff.patch|perl -pe 's/\+\+\+ //g'
Details:
git diff produces output in the format
+++ b/file
So if you're using grep as Nathan suggested
grep '+++' mydiff.patch
You'll have the list of affected files, prepended by '+++ ' (3 plus signs and a space).
I often need to further process files and find it convenient to have one filename per line without anything else. This can be achieved with the following command, where perl/regex removes these plus signs and the space.
grep '+++' mydiff.patch|perl -pe 's/\+\+\+ //g'
For patch files generated with diff -Naur, the mydiff.patch file contains entries with filename and date ( is indicating the tabulator whitespace character)
+++ b/file<tab>2013-07-03 13:58:45.000000000 +0200
To extract the filenames for this, use
grep '+++' mydiff.patch|perl -pe 's/\+\+\+ (.*)\t.*/\1/g'
A decent way to do this is to use the --stat flag (or the --summary flag, if you need only new / deleted / renamed files for some reason).
Example:
git apply --stat peer.diff | awk '{ print $1 }' | sed '$d'
1-js/03-code-quality/index.md
CONTR.md
LICENSE.md
README.md
chat-app.readme.md
When you parse patches generated by git format-patch or others containing additional information about number of lines edited, it's crucial to search for ^+++ (at the start of the line) rather than just +++.
For example:
grep '^+++' *.patch | sed -e 's#+++ [ab]/##'
will output paths without a/ or b/ at the begin.
how to remove comment lines (as # bal bla ) and empty lines (lines without charecters) from file with one sed command?
THX
lidia
If you're worried about starting two sed processes in a pipeline for performance reasons, you probably shouldn't be, it's still very efficient. But based on your comment that you want to do in-place editing, you can still do that with distinct commands (sed commands rather than invocations of sed itself).
You can either use multiple -e arguments or separate commands with a semicolon, something like (just one of these, not both):
sed -i 's/#.*$//' -e '/^$/d' fileName
sed -i 's/#.*$//;/^$/d' fileName
The following transcript shows this in action:
pax> printf 'Line # with a comment\n\n# Line with only a comment\n' >file
pax> cat file
Line # with a comment
# Line with only a comment
pax> cp file filex ; sed -i 's/#.*$//;/^$/d' filex ; cat filex
Line
pax> cp file filex ; sed -i -e 's/#.*$//' -e '/^$/d' filex ; cat filex
Line
Note how the file is modified in-place even with two -e options. You can see that both commands are executed on each line. The line with a comment first has the comment removed then all is removed because it's empty.
In addition, the original empty line is also removed.
#paxdiablo has a good answer but it can be improved.
(1) The '/^$/d' clause only matches 100% blank lines.
If you want to also match lines that are entirely whitespace (spaces, tabs etc.) use this instead:
'/^\s*$/d'
(2) The 's/#.*$//' clause only matches lines that start with the # character in column 0.
If you want to also match lines that have only whitespace before the first # use this instead:
'/^\s*#.*$/d'
The above criteria may not be universal (e.g. within a HEREDOC block, or in a Python multi-line string the different approaches could be significant), but in many cases the conventional definition of "blank" lines include whitespace-only, and "comment" lines include whitespace-then-#.
(3) Lastly, on OSX at least, the #paxdiablo solution in which the first clause turns comment lines into blank lines, and the second clause strips blank lines (including what were originally comments) doesn't work. It seems to be more portable to make both clauses /d delete actions as I've done.
The revised command incorporating the above is:
sed -e '/^\s*#.*$/d' -e '/^\s*$/d' inputFile
This tiny jewel removes all # comments, no matter where they begin in a line (see caution below):
sed -e 's/\s*#.*$//'
Example:
text="
this is a # test
#this is a test
#this is a #test
this is # another #test
"
$echo "$text" | sed -e 's/\s*#.*$//'
this is a
this is
Next this removes any resulting blank lines:
$echo "$text" | sed -e 's/\s*#.*$//' | sed -e '/^\s*$/d'
Caution: Depending on the syntax and/or interpretation of the lines your processing, this might not be an appropriate solution, as it just stupidly removes end of lines, even if the '#' is part of your data or code. However, for use cases where you'll never use a hash except for as an end of line comment then it works fine. So just as with all coding, context must be taken into consideration.
Alternative variant, using grep:
cat file.txt | grep -Ev '(#.*$)|(^$)'
you can use awk
awk 'NF{gsub(/^[ \t]*#/,"");print}' file
First example(paxdiablo) is very good except its not change file, just output result. If you want to change it inline:
sudo sed -i 's/#.*$//;/^$/d' inputFile
On (one of) my linux boxes, sed understands extended regular expressions with the -r option, so:
sed -r '/(^\s*#)|(^\s*$)/d' squid.conf.installed
is very useful for showing all non-blank, non comment lines.
The regex matches either start of line followed by zero or more spaces or tabs followed by either a hash or end of line, and deletes those matching lines from the input.
I have been through the sed one liners but am still having trouble with my goal. I want to substitue matching strings on all but the first occurrence of a line. My exact usage would be:
$ echo 'cd /Users/joeuser/bump bonding/initial trials' | sed <<MAGIC HAPPENS>
cd /Users/joeuser/bump\ bonding/initial\ trials
The line replaced the space in bump bonding with the slash space bump\ bonding so that I can execute this line (since when the spaces aren't escaped I wouldn't be able to cd to it).
Update: I solved this by just using single quotes and outputting
cd 'blah blah/thing/another space/'
and then using source to execute the command. But it didn't answer my question. I'm still curious though... how would you use sed to fix it?
s/ /\\ /2g
The 2 specifies that the second one should apply, and the g specifies that all the rest should apply too. (This probably only works on GNU sed. According to the Open Group Base Specification, "If both g and n are specified, the results are unspecified.")
You can avoid the problem with g and n
Replace all of them, then undo the first one:
sed -e 's/ /\\ /g' -e 's/\\ / /1'
Here's another method which uses the t branch-if-substituted command:
sed ':a;s/\([^ ]* .*[^\\]\) \(.*\)/\1\\ \2/;ta'
which has the advantage of leaving existing backslash-space sequences in the input intact.
use awk
$ echo cd 'blah blah/thing/another space/' | awk '{for(i=2;i<NF;i++) $i=$i"\\"}1'
cd blah\ blah/thing/another\ space/
$ echo 'cd /Users/joeuser/bump bonding/initial trials' | awk '{for(i=2;i<NF;i++) $i=$i"\\"}1'
cd /Users/joeuser/bump\ bonding/initial\ trials