Remove all strings of numbers of length 3 or 4 - SED - sed

I am trying to write an sed command that will mask all instances of length 3/4 numbers with '*' from a text file. I have this:
s/[0-9]\{4\}/\*\*\*\*/g
s/[0-9]\{3\}/\*\*\*/g
which will do it but also masks the first 3/4 characters from any longer numerical strings.
Is there a way to just mask the string of length 3 and 4. The numbers could be part of text also.
I am new to sed and have tried to read the documentation but am struggling to just remove the ones I need.
Thanks in advance!

you didn't give any input example, try this, see if it helps:
sed -r 's/\b[0-9]{3}\b/***/g;s/\b[0-9]{4}\b/****/g' file
example:
kent$ echo "111
1234567
111 1111 12ab 1111a
11 1111 1"|sed -r 's/\b[0-9]{3}\b/***/g;s/\b[0-9]{4}\b/****/g'
***
1234567
*** **** 12ab 1111a
11 **** 1
btw, do you really want to replace 3 numbers with 3 stars, 4 numbers with 4 stars?
EDIT:
sed -r 's/([^0-9]|\b)[0-9]{3}([^0-9]|\b)/\1***\2/g;s/(\b|[^0-9])[0-9]{4}(\b|[^0-9])/\1****\2/g' file
test (with OP's example)
kent$ echo "111
1234567
111 1111 12ab 1111a
11 1111 1
fhdjs777ssaa"|sed -r 's/([^0-9]|\b)[0-9]{3}([^0-9]|\b)/\1***\2/g;s/(\b|[^0-9])[0-9]{4}(\b|[^0-9])/\1****\2/g'
***
1234567
*** **** 12ab ****a
11 **** 1
fhdjs***ssaa
note, this line will replace foo123bar with foo***bar, but will leave foo123456bar the same.

Related

extract some values from file#1 and others from file#2 and print them into file#3

I hope you can help with this question, I have two files, each one has some lines that I need in a third file. But I need to take some entire lines (with values in 5 or 6 columns) from file#1 and others from file#2 and save them in file#3 (keeping the line number). Example:
File 1
1. mike
2. linda
3. matt
4. eric
5. emma
File 2
1. beth
2. shelly
3. michael
4. andy
5. theo
File 3 (output)
1. mike
2. shelly
3. matt
4. andy
5. emma
So, I need to extract the values of line 2 and 4 (from file#2) and print them in a third file while keeping the content of lines 1, 3 and 5 from file#1.
I tried this using sed (easy example):
sed -n -e 1,3p -e 5p file1.txt > file3.txt
This will take lines 1,3 and 5 from my file#1 and print them in file#3, but I don't know how to get the lines from file#2 (2 and 4) and add them into file#3.
Using grep to annotate with file names:
grep -H '.*' in1 in2 | sed '/in1:[24]/d;/in2:[135]/d;s/[^:]*://' | sort
Output:
1. mike
2. shelly
3. matt
4. andy
5. emma
sed probably isn't a very suitable tool for this. How about
paste in1 in2 | awk -F '\t' '{ print $(1+(1+NR)%2) }'
The Awk variable NR is the current input line number and the modulo operator NR%2 flip-flops between 1 and 0. We need to perform a couple of additions to get it to flip-flop between 1 and 2. Then it's easy to print alternating columns from the paste output.

How to replace every 2nd tab character with a newline character using sed

given the input
123\t456\tabc\tdef
create the output
123\t456\nabc\tdef
which would display like
123 456
abc def
Note that it needs to work across multiple lines, not just two.
EDIT
a better example might help clarify.
input (there is only expected to be 1 line of input)
1\t2\t3\t4\t5\t6\t7\t8
expected output
1 2
3 4
5 6
7 8
...
With GNU sed:
sed 's/\t/\n/2;P;D;' file
Replaces second occurrence of tab character with newline character.
This little trick should work:
sed 's/\(\t[^\t]*\)\t/\1\n/g' < input_file.txt
EDIT:
Below is an example:
$ cat 1.txt
one two three four five six seven
five six seven
$ sed 's/\(\t[^\t]*\)\t/\1\n/g' < 1.txt
one two
three four
five six
seven
five six
seven
$
EDIT2:
For MacOS' standard sed try this:
$ sed $'s/(\t[^\t]*\t/\\1\\\n/g' < 1.txt
$ is used for replacing escape characters on the bash-level.
Let's say following is the Input_file:
cat Input_file
123 456 abc def
Then to get them into 2 columns following may help you in same.
xargs -n2 < Input_file
Output will be as follows.
123 456
abc def

Insert filename into text file with sed

I've been learning about sed and finding it very useful, but cannot find an answer to this in any of the many guides and examples ... I'd like to insert the filename of a text file, minus its path and extension, into a specific line within the text itself. Possible?
In such cases, the correct starting point should be man pages. Manual of sed does not provide a feature for sed to understand "filename", but sed does support inserting a text before/after a line.
As a result you need to isolate the filename separatelly , store the text to a variable and inject this text after/before the line you wish.
Example:
$ a="/home/gv/Desktop/PythonTests/cpu.sh"
$ a="${a##*/}";echo "$a"
cpu.sh
$ a="${a%.*}"; echo "$a"
cpu
$ cat file1
LOCATION 0 X 0
VALUE 1a 2 3
VALUE 1b 2 3
VALUE 1c 2 3
$ sed "2a $a" file1 # Inject the contents of variable $a after line2
LOCATION 0 X 0
VALUE 1a 2 3
cpu
VALUE 1b 2 3
VALUE 1c 2 3
$ sed "2i $a" file1 # Inject the contetns of variable $a before line2
LOCATION 0 X 0
cpu
VALUE 1a 2 3
VALUE 1b 2 3
VALUE 1c 2 3
$ sed "2a George" file1 #Inject a fixed string "George" after line 2
LOCATION 0 X 0
VALUE 1a 2 3
George
VALUE 1b 2 3
VALUE 1c 2 3
Explanation:
a="${a##*/}" : Removes all chars from the beginning of string up to last found slash / (longer match)
a="${a%.*}" : Remove all chars starting from the end of the string up to the first found dot . (short match) . You can also use %% for the longest found dot.
sed "2a $a" : Insert after line 2 the contents of variable $a
sed "2i $q" : Insert before line 2 the contents of $a
Optionally you can use sed -i to make changes in-place / in file under process
wrt I've been learning about sed then you may have been wasting your time as there isn't a lot TO learn about sed beyond s/old/new. Sure there's a ton of other language constructs and things you could do with sed, but in practice you should avoid them all and simply use awk instead. If you edit your question to include concise, testable sample input and expected output and add an awk tag then we can show you how to do whatever you want to do the right way.
Meanwhile, it sounds like all you need is:
$ cat /usr/tmp/file
a
b
c
d
e
$ awk 'NR==3{print gensub(/.*\//,"",1,FILENAME)} 1' /usr/tmp/file
a
b
file
c
d
e
The above inserts the current file name before line 3 of the open file. It uses GNU awk for gensub(), with other awks you'd just use sub() and a variable.

removing just the middle of the file with sed

I want to use sed as a part of the pipeline to preserve just 10 first and 10 last lines of its input. It would not be working on physical files, but just reading from STDIN and outputting to STDOUT. The amount of data in stream is bigger than machine RAM (or its disk space), so it needs to relatively efficient. It also must work in stream mode, without creating temporary files (no writeable filesystems).
Extra bonus if it could display one line instead of all of the middle it deleted:
for example, if I had input lines containing numbers from 1 to 100000, I would need it to output (line with literal <cut> text would be nice, but is optional):
1
2
3
4
5
6
7
8
9
10
<cut>
99991
99992
99993
99994
99995
99996
99997
99998
99999
100000
The best I've come up with is to have it output first 10 lines, and last one 1 line with:
yes ' ' | head -n 100000 |nl | \
sed -e '$q;11,$d'`
which outputs
1
2
3
4
5
6
7
8
9
10
100000
but I need it to output more context (10 lines instead of just 1) at the end of data too.
Update: length of the input stream is unknown and will vary, 100000 above is just an example.
Update: as noted in the question and the tag, I need it in sed, not awk, perl or other programming languages in which it is more easy to accomplish (that requirement, along with no tmp files, is due to fact it is embedded system with limited commands and resources available)
Update: if the input is less then that 10+10 lines, it should ideally just print the whole input
You can try following command:
sed -n 'H; 1,10 { p; b }; g; s/\n[^\n]*//; h; $ { s/\n/<cut>\n/; p }'
sed has two blocks to save content, pattern space and hold space. The first one is used to parse current line, and the second one can be used as a backup. The approach is to save in hold space the last ten lines processed.
H saves every line to hold space, g recover hold space, then remove oldest line and save again to hold space, and in last line ($) print adding your magic word in front of it.
The whole command:
yes ' ' | head -n 100000 |nl|\
sed -n 'H; 1,10 { p; b }; g; s/\n[^\n]*//; h; $ { s/\n/<cut>\n/; p }'
Yields:
1
2
3
4
5
6
7
8
9
10
<cut>
99991
99992
99993
99994
99995
99996
99997
99998
99999
100000
And said that, follow the advice from Ed Morton, because awk is simpler and easier to debug or modify some weeks later.
UPDATE:
You can append to hold space after first ten lines and check if there are more than 10 newline characters in it before removing oldest as FIFO structure:
sed -n '1,10 { p; b }; H; g; /\(\n[^\n]\+\)\{11\}/ s/\n[^\n]*//; h; $ { s/^\n//; p }'
Now it's more challenging to know where to add the <cut> string in the edge case of 20 input lines, but I will leave it as an exercise for you.
sed is for simple substitutions on a single line, that is all. For anything else, including this task, you should be using awk:
$ cat tst.awk
BEGIN { beg=(beg?beg:3); end=(end?end:3) }
NR<=beg
{ rec[(NR-1)%end+1] = $0 }
END {
print "<cut>"
for (i=1;i<=end;i++) {
print rec[(NR+i-1)%end+1]
}
}
$ seq 10 | awk -f tst.awk
1
2
3
<cut>
8
9
10
$ seq 10 | awk -v beg=2 -v end=4 -f tst.awk
1
2
<cut>
7
8
9
10
I see you've added a "it has to be sed" requirement to your question but I'll leave this answer here for future readers looking for a sensible way to perform the task.
This might work for you (GNU sed):
sed '1,10b;:a;$!{N;s/\n/&/10;Ta;D};i\<cut>' file
Print the first 10 lines as normal. Collect the next 11 lines and if it is not the end of file, delete the first of them and repeat always maintaining the last 10 lines. At the end of the file, insert a line containing <cut> and print the remaining 10 lines.

extract 10 digits from a string

The following command is working as expected and showing me the highlighted results where it finds 10 digit number.
# grep '[0-9]\{10\}' test.csv
0987654321,Raka,Nr Man Informatics,Bm ,Bangalore,,26 - 12 - 2010
Rajesh Patel,No 9999 Part Road Town Airlines Bangalore Cell-9702977479,Crv,Bangalore,560051,19 - 7 - 2013
What I need to do is to "extract" that digit to the beginning of the line. It should look something like this...
0987654321,0987654321,Raka,Nr Man Informatics,Bm ,Bangalore,,26 - 12 - 2010
9702977479,Rajesh Patel,No 9999 Part Road Town Airlines Bangalore Cell-9702977479,Crv,Bangalore,560051,19 - 7 - 2013
update:
If no 10 digit number is found then the row should be appended with some dummy data for e.g. 0000000000 (for consistency purpose)
One way using sed:
sed 's/.*\([0-9]\{10\}\).*/\1,&/' input
Gives:
0987654321,0987654321,Raka,Nr Man Informatics...
9702977479,Rajesh Patel,No 9999 Part Road To...
And this one will add 10 0's in case no 10 digit number is found:
sed 's/.*\([0-9]\{10\}\).*/\1,&/;/[0-9]\{10\}/!s/^/0000000000,/' input
Using GNU awk for \> word delimiter:
$ cat file
0987654321,Raka,Nr Man Informatics,Bm ,Bangalore,,26 - 12 - 2010
Rajesh Patel,No 9999 Part Road Town Airlines Bangalore Cell-9702977479,Crv,Bangalore,560051,19 - 7 - 2013
foo,bar
long,num,12345678901234
$ gawk -v OFS="," '{print (match($0,/[[:digit:]]{10}\>/) ? substr($0,RSTART,RLENGTH) : "0000000000"), $0 }' file
0987654321,0987654321,Raka,Nr Man Informatics,Bm ,Bangalore,,26 - 12 - 2010
9702977479,Rajesh Patel,No 9999 Part Road Town Airlines Bangalore Cell-9702977479,Crv,Bangalore,560051,19 - 7 - 2013
0000000000,foo,bar
5678901234,long,num,12345678901234
Better use sed:
sed 's/\(.*\([0-9]\{10\}\).*$\)/\2,\1/'
Now tested and working. Note that I have two sets of capturing groups - one around the entire expression (this is the first capturing group and is referred to as \1), and a second (inner) one that wraps around the ten digit number, referred to as \2.
If you want only the last ten digits of a "possibly longer than 10" number, you can do
sed 's/\(.*\([0-9]\{10\}\)[^0-9].*$\)/\2,\1/'
which makes sure that "the next thing after 10 digits is not a digit (and thus finds the last ten).