extract 10 digits from a string - sed

The following command is working as expected and showing me the highlighted results where it finds 10 digit number.
# grep '[0-9]\{10\}' test.csv
0987654321,Raka,Nr Man Informatics,Bm ,Bangalore,,26 - 12 - 2010
Rajesh Patel,No 9999 Part Road Town Airlines Bangalore Cell-9702977479,Crv,Bangalore,560051,19 - 7 - 2013
What I need to do is to "extract" that digit to the beginning of the line. It should look something like this...
0987654321,0987654321,Raka,Nr Man Informatics,Bm ,Bangalore,,26 - 12 - 2010
9702977479,Rajesh Patel,No 9999 Part Road Town Airlines Bangalore Cell-9702977479,Crv,Bangalore,560051,19 - 7 - 2013
update:
If no 10 digit number is found then the row should be appended with some dummy data for e.g. 0000000000 (for consistency purpose)

One way using sed:
sed 's/.*\([0-9]\{10\}\).*/\1,&/' input
Gives:
0987654321,0987654321,Raka,Nr Man Informatics...
9702977479,Rajesh Patel,No 9999 Part Road To...
And this one will add 10 0's in case no 10 digit number is found:
sed 's/.*\([0-9]\{10\}\).*/\1,&/;/[0-9]\{10\}/!s/^/0000000000,/' input

Using GNU awk for \> word delimiter:
$ cat file
0987654321,Raka,Nr Man Informatics,Bm ,Bangalore,,26 - 12 - 2010
Rajesh Patel,No 9999 Part Road Town Airlines Bangalore Cell-9702977479,Crv,Bangalore,560051,19 - 7 - 2013
foo,bar
long,num,12345678901234
$ gawk -v OFS="," '{print (match($0,/[[:digit:]]{10}\>/) ? substr($0,RSTART,RLENGTH) : "0000000000"), $0 }' file
0987654321,0987654321,Raka,Nr Man Informatics,Bm ,Bangalore,,26 - 12 - 2010
9702977479,Rajesh Patel,No 9999 Part Road Town Airlines Bangalore Cell-9702977479,Crv,Bangalore,560051,19 - 7 - 2013
0000000000,foo,bar
5678901234,long,num,12345678901234

Better use sed:
sed 's/\(.*\([0-9]\{10\}\).*$\)/\2,\1/'
Now tested and working. Note that I have two sets of capturing groups - one around the entire expression (this is the first capturing group and is referred to as \1), and a second (inner) one that wraps around the ten digit number, referred to as \2.
If you want only the last ten digits of a "possibly longer than 10" number, you can do
sed 's/\(.*\([0-9]\{10\}\)[^0-9].*$\)/\2,\1/'
which makes sure that "the next thing after 10 digits is not a digit (and thus finds the last ten).

Related

How match the last part of a line conditionally?

I am very new to perl, currently I am using a very simple perl regex to print the last part of a line after the string "Lecture" reading from a file 1.txt.
cat 1.txt | perl -ne 'print "$1 \n" while /Lecture\s+(\d+\w)/g;'
It works well but I need to add a simple condition to it:
First Preference is always print the characters after the string "Lecture".
If string "Lecture" is not found in a line, simply print the characters at the very end of line.
PS: It might occur that string "Lecture" doesn't have a space around it and throughout I used word character because it not necessarily would be a plain number, it can be alphanumeric .
Example
cat 1.txt
Some Topic 1 Lecture 001
Some Topic 2 Lecture 002
Topic 3 ( classroom Session ) Lecture2B
Practicals 07A
Submissions 10
Topic5Lecture4
Expected output:
001
002
2B
07A
10
4
I preferably want a solution which I can directly run in the cli/console. ( Just Like my original code - cat 1.txt | perl code ).
I don't want to execute a separate .pl file.
This
(?:\w*Lecture)?([^\s]+)$
Will capture ((...)) all (+) non-whitespace ([^\s]) at the end of line ($),
optionally (?) preceeded by non-captured((?:...)) "Lecture", even if there are other letters before (\w*).
It gets the desired output:
001
002
2B
07A
10
4
4
For the sample input:
Some Topic 1 Lecture 001
Some Topic 2 Lecture 002
Topic 3 ( classroom Session ) Lecture2B
Practicals 07A
Submissions 10
Topic5 Lecture4
Topic5Lecture4

extract some values from file#1 and others from file#2 and print them into file#3

I hope you can help with this question, I have two files, each one has some lines that I need in a third file. But I need to take some entire lines (with values in 5 or 6 columns) from file#1 and others from file#2 and save them in file#3 (keeping the line number). Example:
File 1
1. mike
2. linda
3. matt
4. eric
5. emma
File 2
1. beth
2. shelly
3. michael
4. andy
5. theo
File 3 (output)
1. mike
2. shelly
3. matt
4. andy
5. emma
So, I need to extract the values of line 2 and 4 (from file#2) and print them in a third file while keeping the content of lines 1, 3 and 5 from file#1.
I tried this using sed (easy example):
sed -n -e 1,3p -e 5p file1.txt > file3.txt
This will take lines 1,3 and 5 from my file#1 and print them in file#3, but I don't know how to get the lines from file#2 (2 and 4) and add them into file#3.
Using grep to annotate with file names:
grep -H '.*' in1 in2 | sed '/in1:[24]/d;/in2:[135]/d;s/[^:]*://' | sort
Output:
1. mike
2. shelly
3. matt
4. andy
5. emma
sed probably isn't a very suitable tool for this. How about
paste in1 in2 | awk -F '\t' '{ print $(1+(1+NR)%2) }'
The Awk variable NR is the current input line number and the modulo operator NR%2 flip-flops between 1 and 0. We need to perform a couple of additions to get it to flip-flop between 1 and 2. Then it's easy to print alternating columns from the paste output.

sed: replace only in part of string

I have a simple playlist of song files:
1003 James Brown - The Boss Unknown Artist.mp3
1004 James Brown - Slaughters Theme Unknown Artist.mp3
1005 James Brown - Payback(1) Unknown Artist.mp3
...
I would like them in the following format:
1003 James_Brown_-_The_Boss_Unknown_Artist.mp3
1004 James_Brown_-_Slaughters_Theme_Unknown_Artist.mp3
...
Notice that the whitespace behind the number in front is NOT replaced. I have the following simple sed script:
sed "s/ /_/g"
but that replaces also the space after the number. I know how to form capture groups, but that will not help either. How can I convince sed to only apply the replacement to a portion of the input string, rather than the whole string?
You could do
sed 's/ /_/g; s/_/ /'
I.e. first turn all spaces into underscores, then turn the first underscore back into a space.

How to replace every 2nd tab character with a newline character using sed

given the input
123\t456\tabc\tdef
create the output
123\t456\nabc\tdef
which would display like
123 456
abc def
Note that it needs to work across multiple lines, not just two.
EDIT
a better example might help clarify.
input (there is only expected to be 1 line of input)
1\t2\t3\t4\t5\t6\t7\t8
expected output
1 2
3 4
5 6
7 8
...
With GNU sed:
sed 's/\t/\n/2;P;D;' file
Replaces second occurrence of tab character with newline character.
This little trick should work:
sed 's/\(\t[^\t]*\)\t/\1\n/g' < input_file.txt
EDIT:
Below is an example:
$ cat 1.txt
one two three four five six seven
five six seven
$ sed 's/\(\t[^\t]*\)\t/\1\n/g' < 1.txt
one two
three four
five six
seven
five six
seven
$
EDIT2:
For MacOS' standard sed try this:
$ sed $'s/(\t[^\t]*\t/\\1\\\n/g' < 1.txt
$ is used for replacing escape characters on the bash-level.
Let's say following is the Input_file:
cat Input_file
123 456 abc def
Then to get them into 2 columns following may help you in same.
xargs -n2 < Input_file
Output will be as follows.
123 456
abc def

Remove all strings of numbers of length 3 or 4 - SED

I am trying to write an sed command that will mask all instances of length 3/4 numbers with '*' from a text file. I have this:
s/[0-9]\{4\}/\*\*\*\*/g
s/[0-9]\{3\}/\*\*\*/g
which will do it but also masks the first 3/4 characters from any longer numerical strings.
Is there a way to just mask the string of length 3 and 4. The numbers could be part of text also.
I am new to sed and have tried to read the documentation but am struggling to just remove the ones I need.
Thanks in advance!
you didn't give any input example, try this, see if it helps:
sed -r 's/\b[0-9]{3}\b/***/g;s/\b[0-9]{4}\b/****/g' file
example:
kent$ echo "111
1234567
111 1111 12ab 1111a
11 1111 1"|sed -r 's/\b[0-9]{3}\b/***/g;s/\b[0-9]{4}\b/****/g'
***
1234567
*** **** 12ab 1111a
11 **** 1
btw, do you really want to replace 3 numbers with 3 stars, 4 numbers with 4 stars?
EDIT:
sed -r 's/([^0-9]|\b)[0-9]{3}([^0-9]|\b)/\1***\2/g;s/(\b|[^0-9])[0-9]{4}(\b|[^0-9])/\1****\2/g' file
test (with OP's example)
kent$ echo "111
1234567
111 1111 12ab 1111a
11 1111 1
fhdjs777ssaa"|sed -r 's/([^0-9]|\b)[0-9]{3}([^0-9]|\b)/\1***\2/g;s/(\b|[^0-9])[0-9]{4}(\b|[^0-9])/\1****\2/g'
***
1234567
*** **** 12ab ****a
11 **** 1
fhdjs***ssaa
note, this line will replace foo123bar with foo***bar, but will leave foo123456bar the same.