The following command is working as expected and showing me the highlighted results where it finds 10 digit number.
# grep '[0-9]\{10\}' test.csv
0987654321,Raka,Nr Man Informatics,Bm ,Bangalore,,26 - 12 - 2010
Rajesh Patel,No 9999 Part Road Town Airlines Bangalore Cell-9702977479,Crv,Bangalore,560051,19 - 7 - 2013
What I need to do is to "extract" that digit to the beginning of the line. It should look something like this...
0987654321,0987654321,Raka,Nr Man Informatics,Bm ,Bangalore,,26 - 12 - 2010
9702977479,Rajesh Patel,No 9999 Part Road Town Airlines Bangalore Cell-9702977479,Crv,Bangalore,560051,19 - 7 - 2013
update:
If no 10 digit number is found then the row should be appended with some dummy data for e.g. 0000000000 (for consistency purpose)
One way using sed:
sed 's/.*\([0-9]\{10\}\).*/\1,&/' input
Gives:
0987654321,0987654321,Raka,Nr Man Informatics...
9702977479,Rajesh Patel,No 9999 Part Road To...
And this one will add 10 0's in case no 10 digit number is found:
sed 's/.*\([0-9]\{10\}\).*/\1,&/;/[0-9]\{10\}/!s/^/0000000000,/' input
Using GNU awk for \> word delimiter:
$ cat file
0987654321,Raka,Nr Man Informatics,Bm ,Bangalore,,26 - 12 - 2010
Rajesh Patel,No 9999 Part Road Town Airlines Bangalore Cell-9702977479,Crv,Bangalore,560051,19 - 7 - 2013
foo,bar
long,num,12345678901234
$ gawk -v OFS="," '{print (match($0,/[[:digit:]]{10}\>/) ? substr($0,RSTART,RLENGTH) : "0000000000"), $0 }' file
0987654321,0987654321,Raka,Nr Man Informatics,Bm ,Bangalore,,26 - 12 - 2010
9702977479,Rajesh Patel,No 9999 Part Road Town Airlines Bangalore Cell-9702977479,Crv,Bangalore,560051,19 - 7 - 2013
0000000000,foo,bar
5678901234,long,num,12345678901234
Better use sed:
sed 's/\(.*\([0-9]\{10\}\).*$\)/\2,\1/'
Now tested and working. Note that I have two sets of capturing groups - one around the entire expression (this is the first capturing group and is referred to as \1), and a second (inner) one that wraps around the ten digit number, referred to as \2.
If you want only the last ten digits of a "possibly longer than 10" number, you can do
sed 's/\(.*\([0-9]\{10\}\)[^0-9].*$\)/\2,\1/'
which makes sure that "the next thing after 10 digits is not a digit (and thus finds the last ten).
Related
I am very new to perl, currently I am using a very simple perl regex to print the last part of a line after the string "Lecture" reading from a file 1.txt.
cat 1.txt | perl -ne 'print "$1 \n" while /Lecture\s+(\d+\w)/g;'
It works well but I need to add a simple condition to it:
First Preference is always print the characters after the string "Lecture".
If string "Lecture" is not found in a line, simply print the characters at the very end of line.
PS: It might occur that string "Lecture" doesn't have a space around it and throughout I used word character because it not necessarily would be a plain number, it can be alphanumeric .
Example
cat 1.txt
Some Topic 1 Lecture 001
Some Topic 2 Lecture 002
Topic 3 ( classroom Session ) Lecture2B
Practicals 07A
Submissions 10
Topic5Lecture4
Expected output:
001
002
2B
07A
10
4
I preferably want a solution which I can directly run in the cli/console. ( Just Like my original code - cat 1.txt | perl code ).
I don't want to execute a separate .pl file.
This
(?:\w*Lecture)?([^\s]+)$
Will capture ((...)) all (+) non-whitespace ([^\s]) at the end of line ($),
optionally (?) preceeded by non-captured((?:...)) "Lecture", even if there are other letters before (\w*).
It gets the desired output:
001
002
2B
07A
10
4
4
For the sample input:
Some Topic 1 Lecture 001
Some Topic 2 Lecture 002
Topic 3 ( classroom Session ) Lecture2B
Practicals 07A
Submissions 10
Topic5 Lecture4
Topic5Lecture4
I hope you can help with this question, I have two files, each one has some lines that I need in a third file. But I need to take some entire lines (with values in 5 or 6 columns) from file#1 and others from file#2 and save them in file#3 (keeping the line number). Example:
File 1
1. mike
2. linda
3. matt
4. eric
5. emma
File 2
1. beth
2. shelly
3. michael
4. andy
5. theo
File 3 (output)
1. mike
2. shelly
3. matt
4. andy
5. emma
So, I need to extract the values of line 2 and 4 (from file#2) and print them in a third file while keeping the content of lines 1, 3 and 5 from file#1.
I tried this using sed (easy example):
sed -n -e 1,3p -e 5p file1.txt > file3.txt
This will take lines 1,3 and 5 from my file#1 and print them in file#3, but I don't know how to get the lines from file#2 (2 and 4) and add them into file#3.
Using grep to annotate with file names:
grep -H '.*' in1 in2 | sed '/in1:[24]/d;/in2:[135]/d;s/[^:]*://' | sort
Output:
1. mike
2. shelly
3. matt
4. andy
5. emma
sed probably isn't a very suitable tool for this. How about
paste in1 in2 | awk -F '\t' '{ print $(1+(1+NR)%2) }'
The Awk variable NR is the current input line number and the modulo operator NR%2 flip-flops between 1 and 0. We need to perform a couple of additions to get it to flip-flop between 1 and 2. Then it's easy to print alternating columns from the paste output.
I have a simple playlist of song files:
1003 James Brown - The Boss Unknown Artist.mp3
1004 James Brown - Slaughters Theme Unknown Artist.mp3
1005 James Brown - Payback(1) Unknown Artist.mp3
...
I would like them in the following format:
1003 James_Brown_-_The_Boss_Unknown_Artist.mp3
1004 James_Brown_-_Slaughters_Theme_Unknown_Artist.mp3
...
Notice that the whitespace behind the number in front is NOT replaced. I have the following simple sed script:
sed "s/ /_/g"
but that replaces also the space after the number. I know how to form capture groups, but that will not help either. How can I convince sed to only apply the replacement to a portion of the input string, rather than the whole string?
You could do
sed 's/ /_/g; s/_/ /'
I.e. first turn all spaces into underscores, then turn the first underscore back into a space.
given the input
123\t456\tabc\tdef
create the output
123\t456\nabc\tdef
which would display like
123 456
abc def
Note that it needs to work across multiple lines, not just two.
EDIT
a better example might help clarify.
input (there is only expected to be 1 line of input)
1\t2\t3\t4\t5\t6\t7\t8
expected output
1 2
3 4
5 6
7 8
...
With GNU sed:
sed 's/\t/\n/2;P;D;' file
Replaces second occurrence of tab character with newline character.
This little trick should work:
sed 's/\(\t[^\t]*\)\t/\1\n/g' < input_file.txt
EDIT:
Below is an example:
$ cat 1.txt
one two three four five six seven
five six seven
$ sed 's/\(\t[^\t]*\)\t/\1\n/g' < 1.txt
one two
three four
five six
seven
five six
seven
$
EDIT2:
For MacOS' standard sed try this:
$ sed $'s/(\t[^\t]*\t/\\1\\\n/g' < 1.txt
$ is used for replacing escape characters on the bash-level.
Let's say following is the Input_file:
cat Input_file
123 456 abc def
Then to get them into 2 columns following may help you in same.
xargs -n2 < Input_file
Output will be as follows.
123 456
abc def
I am trying to write an sed command that will mask all instances of length 3/4 numbers with '*' from a text file. I have this:
s/[0-9]\{4\}/\*\*\*\*/g
s/[0-9]\{3\}/\*\*\*/g
which will do it but also masks the first 3/4 characters from any longer numerical strings.
Is there a way to just mask the string of length 3 and 4. The numbers could be part of text also.
I am new to sed and have tried to read the documentation but am struggling to just remove the ones I need.
Thanks in advance!
you didn't give any input example, try this, see if it helps:
sed -r 's/\b[0-9]{3}\b/***/g;s/\b[0-9]{4}\b/****/g' file
example:
kent$ echo "111
1234567
111 1111 12ab 1111a
11 1111 1"|sed -r 's/\b[0-9]{3}\b/***/g;s/\b[0-9]{4}\b/****/g'
***
1234567
*** **** 12ab 1111a
11 **** 1
btw, do you really want to replace 3 numbers with 3 stars, 4 numbers with 4 stars?
EDIT:
sed -r 's/([^0-9]|\b)[0-9]{3}([^0-9]|\b)/\1***\2/g;s/(\b|[^0-9])[0-9]{4}(\b|[^0-9])/\1****\2/g' file
test (with OP's example)
kent$ echo "111
1234567
111 1111 12ab 1111a
11 1111 1
fhdjs777ssaa"|sed -r 's/([^0-9]|\b)[0-9]{3}([^0-9]|\b)/\1***\2/g;s/(\b|[^0-9])[0-9]{4}(\b|[^0-9])/\1****\2/g'
***
1234567
*** **** 12ab ****a
11 **** 1
fhdjs***ssaa
note, this line will replace foo123bar with foo***bar, but will leave foo123456bar the same.