How would I delete all lines (from a text file) which contain only two dots, with random data between the dots. Some lines have three or more dots, and I need those to remain in the file. I would like to use sed.
Example Dirty File:
.dirty.dig
.please.dont.delete.me
.delete.me
.dont.delete.me.ether
.nnoooo.not.meee
.needto.delete
Desired Output:
.please.dont.delete.me
.dont.delete.me.ether
.nnoooo.not.meee
Would be simpler to use awk here
$ awk -F. 'NF!=3' ip.txt
.please.dont.delete.me
.dont.delete.me.ether
.nnoooo.not.meee
-F. use . as delimiter
NF!=3 print all lines where number of input fields is not equal to 3
this will retain lines like abc.xyz
to retain only lines with more than 2 dots, use awk -F. 'NF>3' ip.txt
sed '/^[^.]*\.[^.]*\.[^.]*$/d' file
Output:
.please.dont.delete.me
.dont.delete.me.ether
.nnoooo.not.meee
See: The Stack Overflow Regular Expressions FAQ
sed is for making substitutions, to just Globally search for a Regular Expression and Print the result there's a whole other tool designed just for that purpose and even named after it - grep.
grep -v '^[^.]*\.[^.]*\.[^.]*$' file
or with GNU grep for EREs:
$ grep -Ev '^[^.]*(\.[^.]*){2}$' file
.please.dont.delete.me
.dont.delete.me.ether
.nnoooo.not.meee
I don't have a sed handy to double-check, but
/^[^.]*\.[^.]*\.[^.]*$/d
should match and delete all lines that have two dots with non-dot strings before and between them.
This might work for you (GNU sed):
sed 's/\./&/3;t;s//&/2;T;d' file
If there are 3 or more . print. If there are 2 . delete. Otherwise print.
Another way:
sed -r '/^([^.]*\.[^.]*){2}$/d' file
Related
I am trying to create a filter command to reduce the lines from a log file, assume each line contains partition made of date,
/iamthepath01/20200301/file01.txt
/iamthepath02/20200302/file02.txt
....
/iamthepathxx/20210619/filexx.txt
then from thousands of lines I only want to keep the ones with two string in the path
/202106
/202105
and remove any other lines
I have tried following command
sed -i -e '\(/202105\|/202106\)!d' ~/log.txt
above command threw
sed: -e expression #1, char 24: unterminated address regex
You can use
sed -i '/\/20210[56]/!d' ~/log.txt
Or, if you need to use more specific alternatives and further enhance the pattern:
sed -i -E '/\/(202105|202106)/!d' ~/log.txt
Details:
-i - GNU sed option for inline file replacement
-E - option enabling POSIX ERE regex syntax
/\/20210[56]/ - regex that matches /20210 and then either 5 or 6
\/(202105|202106) - the POSIX ERE pattern that matches / and then either 202105 or 202106
!d - removes the lines not matching the pattern.
See the online demo:
#!/bin/bash
s='/iamthepath01/20200301/file01.txt
/iamthepath02/20200302/file02.txt
/iamthepathxx/20210619/filexx.txt'
sed '/\/20210[56]/!d' <<< "$s"
Output:
/iamthepathxx/20210619/filexx.txt
sed is the wrong tool for this. If you want a script that's as fragile as the sed one then use grep as it's the tool that exists solely to do a simple g/re/p (hence the name) like you're doing:
$ grep '/20210[56]' file
/iamthepathxx/20210619/filexx.txt
or if you want a more robust solution that focuses just on the part of the line you want to match and so will avoid false matches, then use awk:
$ awk -F '/' '$3 ~ /^20210[56]/' file
/iamthepathxx/20210619/filexx.txt
This might work for you (GNU sed):
sed -ni '\#/20210[56]#p' file
This uses seds -n grep-like option to turn off implicit printing and -i option to edit the file in place.
Normally sed uses the /.../ to match but other delimiters may be used if the first is escaped e.g. \#...#.
So the above solution will filter the existing file down to lines that contain either /202105 or /202106.
N.B. grep will almost certainly be faster in finding the above lines however the use of the -i option may be the ultimate reason for choosing sed (although the same outcome can be achieved by tacking on the > tmpFile && mv tmpFile file to a grep solution).
Say I have a script where I want to change several lines for a single line.
For example, I got a new function that can summarize several commands, so that I can replace in my script as follows:
Original
some_code
command1
command2
command3
some_more_code
Edited
some_code
foo()
some_more_code
How would you do that using sed?
sed '/some_code/,/command3/ !b
/some_code/ b
/command3/ a\
foo()
d' YourFile
be carrefull about meta character ( like &\\^$[]{}().) in any of the pattern (except your foo() line)
I am answering my own question here.
I couldn't figure out a way to do it in one go, so I split the problem into two parts.
Part 1: replace the first line
sed -e 's/command1/foo()/g' file1 > file2
Part 2: remove the rest of the lines
sed -e '/command2/,+1d/' file2 > file3
I'd prefer a more elegant way though, where I can be flexible in the number of lines that I am replacing, possibly matching the last command in the block. Any ideas?
Just use awk:
$ awk -v RS='^$' -v ORS= '{sub(/command1\ncommand2\ncommand3/,"foo()")}1' file
some_code
foo()
some_more_code
The above uses GNU awk for multi-char RS.
This might work for you (GNU sed):
sed '/command1/,/command3/c\foo()' file
I find many ways to do this, AWK, SED, UNIQ, but none of them are working on my file.
I want to delete duplicate lines. Here is an example of part of my file:
KTBX
KFSO
KCLK
KTBX
KFSO
KCLK
PAJZ
PAJZ
NOTE: I had to manually add line feeds when I cut and pasted from the file...for some reason it was putting all the variables on one line. Makes me think that my 44,000 line text file actually has only "1" line? Is there a way to modify it so I can delete dups?
You can see all non-printed characters with this command:
od -c oldfile
If all your records are on one line, you can use sed to replace a whitespace (space, tab, newline) with a linebreak:
sed -e 's/\s\+/\n/g' oldfile > oldfile.1
Once you have multiple lines, this awk one-liner:
awk '!x[$0]++' oldfile.1 > newfile
my outfile:
KTBX
KFSO
KCLK
PAJZ
Perl One-Liner:
perl -nle 'unless($hash{$_}++){print $_}' file
how to remove comment lines (as # bal bla ) and empty lines (lines without charecters) from file with one sed command?
THX
lidia
If you're worried about starting two sed processes in a pipeline for performance reasons, you probably shouldn't be, it's still very efficient. But based on your comment that you want to do in-place editing, you can still do that with distinct commands (sed commands rather than invocations of sed itself).
You can either use multiple -e arguments or separate commands with a semicolon, something like (just one of these, not both):
sed -i 's/#.*$//' -e '/^$/d' fileName
sed -i 's/#.*$//;/^$/d' fileName
The following transcript shows this in action:
pax> printf 'Line # with a comment\n\n# Line with only a comment\n' >file
pax> cat file
Line # with a comment
# Line with only a comment
pax> cp file filex ; sed -i 's/#.*$//;/^$/d' filex ; cat filex
Line
pax> cp file filex ; sed -i -e 's/#.*$//' -e '/^$/d' filex ; cat filex
Line
Note how the file is modified in-place even with two -e options. You can see that both commands are executed on each line. The line with a comment first has the comment removed then all is removed because it's empty.
In addition, the original empty line is also removed.
#paxdiablo has a good answer but it can be improved.
(1) The '/^$/d' clause only matches 100% blank lines.
If you want to also match lines that are entirely whitespace (spaces, tabs etc.) use this instead:
'/^\s*$/d'
(2) The 's/#.*$//' clause only matches lines that start with the # character in column 0.
If you want to also match lines that have only whitespace before the first # use this instead:
'/^\s*#.*$/d'
The above criteria may not be universal (e.g. within a HEREDOC block, or in a Python multi-line string the different approaches could be significant), but in many cases the conventional definition of "blank" lines include whitespace-only, and "comment" lines include whitespace-then-#.
(3) Lastly, on OSX at least, the #paxdiablo solution in which the first clause turns comment lines into blank lines, and the second clause strips blank lines (including what were originally comments) doesn't work. It seems to be more portable to make both clauses /d delete actions as I've done.
The revised command incorporating the above is:
sed -e '/^\s*#.*$/d' -e '/^\s*$/d' inputFile
This tiny jewel removes all # comments, no matter where they begin in a line (see caution below):
sed -e 's/\s*#.*$//'
Example:
text="
this is a # test
#this is a test
#this is a #test
this is # another #test
"
$echo "$text" | sed -e 's/\s*#.*$//'
this is a
this is
Next this removes any resulting blank lines:
$echo "$text" | sed -e 's/\s*#.*$//' | sed -e '/^\s*$/d'
Caution: Depending on the syntax and/or interpretation of the lines your processing, this might not be an appropriate solution, as it just stupidly removes end of lines, even if the '#' is part of your data or code. However, for use cases where you'll never use a hash except for as an end of line comment then it works fine. So just as with all coding, context must be taken into consideration.
Alternative variant, using grep:
cat file.txt | grep -Ev '(#.*$)|(^$)'
you can use awk
awk 'NF{gsub(/^[ \t]*#/,"");print}' file
First example(paxdiablo) is very good except its not change file, just output result. If you want to change it inline:
sudo sed -i 's/#.*$//;/^$/d' inputFile
On (one of) my linux boxes, sed understands extended regular expressions with the -r option, so:
sed -r '/(^\s*#)|(^\s*$)/d' squid.conf.installed
is very useful for showing all non-blank, non comment lines.
The regex matches either start of line followed by zero or more spaces or tabs followed by either a hash or end of line, and deletes those matching lines from the input.
Can anyone explain how to use sed to delete all characters up to & including the 2nd comma on a line in a CSV file?
The beginning of a typical line might look like
1234567890,ABC/DEF, and the number of digits in the first column varies i.e. there might be 9 or 10 or 11 separate digits in random order, and the letters in the second column could also be random. This randomness and varying length makes it impossible to use any explicit pattern searching.
You could do it with sed like this
sed -e 's/^\([^,]*,\)\{2\}//'
not 100% sure on the syntax, I tried it, and it seems to work though. It'll delete zero-or-more of anything-but-a-comma followed by a comma, and all that is matched twice in succession.
But even easier would be to use cut, like this
cut -d, -f3-
which will use comma as a delimiter, and print fields 3 and up.
EDIT:
Just for the record, both sed and cut can work with a file as a parameter, just append it at the end like so
cut -d, -f3- myfile.txt
or you can pipe the output of your program through them
./myprogram | cut -d, -f3-
sed is not the "right" choice of tool (although it can be done). since you have structured data, you can use fields/delimiter method instead of creating complicated regex.
you can use cut
$ cut -f3- -d"," file
or gawk
$ gawk -F"," '{$1=$2=""}1' file
$ gawk -F"," '{for(i=3;i<NF;i++) printf "%s,",$i; print $NF}' file
Thanks for all replies - with the help provided I have written the simple executable script below which does what I want.
#!/bin/bash
cut -d, -f3- ~/Documents/forex_convert/input.csv |
sed -e '1d' \
-e 's/-/,/g' \
-e 's/ /,/g' \
-e 's/:/,/g' \
-e 's/,D//g' > ~/Documents/forex_convert/converted_input
exit