How match the last part of a line conditionally? - perl

I am very new to perl, currently I am using a very simple perl regex to print the last part of a line after the string "Lecture" reading from a file 1.txt.
cat 1.txt | perl -ne 'print "$1 \n" while /Lecture\s+(\d+\w)/g;'
It works well but I need to add a simple condition to it:
First Preference is always print the characters after the string "Lecture".
If string "Lecture" is not found in a line, simply print the characters at the very end of line.
PS: It might occur that string "Lecture" doesn't have a space around it and throughout I used word character because it not necessarily would be a plain number, it can be alphanumeric .
Example
cat 1.txt
Some Topic 1 Lecture 001
Some Topic 2 Lecture 002
Topic 3 ( classroom Session ) Lecture2B
Practicals 07A
Submissions 10
Topic5Lecture4
Expected output:
001
002
2B
07A
10
4
I preferably want a solution which I can directly run in the cli/console. ( Just Like my original code - cat 1.txt | perl code ).
I don't want to execute a separate .pl file.

This
(?:\w*Lecture)?([^\s]+)$
Will capture ((...)) all (+) non-whitespace ([^\s]) at the end of line ($),
optionally (?) preceeded by non-captured((?:...)) "Lecture", even if there are other letters before (\w*).
It gets the desired output:
001
002
2B
07A
10
4
4
For the sample input:
Some Topic 1 Lecture 001
Some Topic 2 Lecture 002
Topic 3 ( classroom Session ) Lecture2B
Practicals 07A
Submissions 10
Topic5 Lecture4
Topic5Lecture4

Related

Using sed to extract data from a file. I know the string I'm looking but I need to get the whole block of data that this string is in

I'm using sed to extract data from a file. Lots of same style data in there. I want every occurrence of a specific string occurs but the string is part of a block of information and I want to extract the whole block based of that string.
Example data in file:
123
AAA
ABC
ZZZ
123
KJG
HJY
ZZZ
123
LPC
ABC
TRY
ZZZ
In this example 123 is the start of the block of data I want and ZZZ the end. ABC is the string I search for. So from this example my output should be:
123
AAA
ABC
ZZZ
123
LPC
ABC
TRY
ZZZ
sed -n '/ABC/{:a;p;n;/123/b;ba};' testfile.txt > testfile2.txt
the output with this is
ABC
ZZZ
ABC
TRY
ZZZ
so I'm not getting the data before ABC in the block
This might work for you (GNU sed):
sed -n '/123/{:a;N;/ZZZ/!ba;/ABC/p}' file
Gather up lines between 123 and ZZZ and then print them if they contain ABC.
N.B. n prints the current line and replaces it with the next. Whereas N appends the next line to the pattern space, inserting a newline. Thus keeping those lines current and searchable.

Generate word from list of characters

I asked this question and I realized I was asking the question incorrectly, though the answer #Zdim provided is exactly what I asked: So now I need to change that question a bit.
my $str = 'aaaa';
print $str++, $/ while $str le 'dddd';
So the above code does each combination from aaaa to dddd for instance:
aaaa
aaab
aaac
...
daaa
...
dddd
However, we need to generate all the possible combinations of a given set of the given characters. whether they are numeric, special characters or alphabetical characters. So If I tell the script the minimum 2 and maximum is 4 letter words and I give an input string of:
abcdefG1234%##
it will then generate:
aa
aaa
aaaa
bb
aaab
bbbb
####
abc#
ab#1
...
So it should use each of the characters and create each possible combination from minimum 2 characters to maximum 4 characters.
So even if I give the entire alphanumeric and special characters, it will create each possible word or string within the range of 2 to 4 characters.
If We take this glob example, it is close, but it will only do all the sets of 4, not all combinations from 2, then 3 and then 4
print, while glob '{A,B,C,D,#,#,a,d,e,f}'x4
for my $i (2..4) {
say while glob '{A,B,C,D,#,#,a,d,e,f}' x $i;
}
One way for this is to use a little extension of the linked question and answer. To generate the sequence of ascii codes which will be sampled from, from a given string
perl -wE'say for map { ord($_) } split "", q(abcdefG1234%##)'
Now with that list on hand, run the code from the linked page for sequences of length 2 through 4.

How to replace every 2nd tab character with a newline character using sed

given the input
123\t456\tabc\tdef
create the output
123\t456\nabc\tdef
which would display like
123 456
abc def
Note that it needs to work across multiple lines, not just two.
EDIT
a better example might help clarify.
input (there is only expected to be 1 line of input)
1\t2\t3\t4\t5\t6\t7\t8
expected output
1 2
3 4
5 6
7 8
...
With GNU sed:
sed 's/\t/\n/2;P;D;' file
Replaces second occurrence of tab character with newline character.
This little trick should work:
sed 's/\(\t[^\t]*\)\t/\1\n/g' < input_file.txt
EDIT:
Below is an example:
$ cat 1.txt
one two three four five six seven
five six seven
$ sed 's/\(\t[^\t]*\)\t/\1\n/g' < 1.txt
one two
three four
five six
seven
five six
seven
$
EDIT2:
For MacOS' standard sed try this:
$ sed $'s/(\t[^\t]*\t/\\1\\\n/g' < 1.txt
$ is used for replacing escape characters on the bash-level.
Let's say following is the Input_file:
cat Input_file
123 456 abc def
Then to get them into 2 columns following may help you in same.
xargs -n2 < Input_file
Output will be as follows.
123 456
abc def

Insert filename into text file with sed

I've been learning about sed and finding it very useful, but cannot find an answer to this in any of the many guides and examples ... I'd like to insert the filename of a text file, minus its path and extension, into a specific line within the text itself. Possible?
In such cases, the correct starting point should be man pages. Manual of sed does not provide a feature for sed to understand "filename", but sed does support inserting a text before/after a line.
As a result you need to isolate the filename separatelly , store the text to a variable and inject this text after/before the line you wish.
Example:
$ a="/home/gv/Desktop/PythonTests/cpu.sh"
$ a="${a##*/}";echo "$a"
cpu.sh
$ a="${a%.*}"; echo "$a"
cpu
$ cat file1
LOCATION 0 X 0
VALUE 1a 2 3
VALUE 1b 2 3
VALUE 1c 2 3
$ sed "2a $a" file1 # Inject the contents of variable $a after line2
LOCATION 0 X 0
VALUE 1a 2 3
cpu
VALUE 1b 2 3
VALUE 1c 2 3
$ sed "2i $a" file1 # Inject the contetns of variable $a before line2
LOCATION 0 X 0
cpu
VALUE 1a 2 3
VALUE 1b 2 3
VALUE 1c 2 3
$ sed "2a George" file1 #Inject a fixed string "George" after line 2
LOCATION 0 X 0
VALUE 1a 2 3
George
VALUE 1b 2 3
VALUE 1c 2 3
Explanation:
a="${a##*/}" : Removes all chars from the beginning of string up to last found slash / (longer match)
a="${a%.*}" : Remove all chars starting from the end of the string up to the first found dot . (short match) . You can also use %% for the longest found dot.
sed "2a $a" : Insert after line 2 the contents of variable $a
sed "2i $q" : Insert before line 2 the contents of $a
Optionally you can use sed -i to make changes in-place / in file under process
wrt I've been learning about sed then you may have been wasting your time as there isn't a lot TO learn about sed beyond s/old/new. Sure there's a ton of other language constructs and things you could do with sed, but in practice you should avoid them all and simply use awk instead. If you edit your question to include concise, testable sample input and expected output and add an awk tag then we can show you how to do whatever you want to do the right way.
Meanwhile, it sounds like all you need is:
$ cat /usr/tmp/file
a
b
c
d
e
$ awk 'NR==3{print gensub(/.*\//,"",1,FILENAME)} 1' /usr/tmp/file
a
b
file
c
d
e
The above inserts the current file name before line 3 of the open file. It uses GNU awk for gensub(), with other awks you'd just use sub() and a variable.

extract 10 digits from a string

The following command is working as expected and showing me the highlighted results where it finds 10 digit number.
# grep '[0-9]\{10\}' test.csv
0987654321,Raka,Nr Man Informatics,Bm ,Bangalore,,26 - 12 - 2010
Rajesh Patel,No 9999 Part Road Town Airlines Bangalore Cell-9702977479,Crv,Bangalore,560051,19 - 7 - 2013
What I need to do is to "extract" that digit to the beginning of the line. It should look something like this...
0987654321,0987654321,Raka,Nr Man Informatics,Bm ,Bangalore,,26 - 12 - 2010
9702977479,Rajesh Patel,No 9999 Part Road Town Airlines Bangalore Cell-9702977479,Crv,Bangalore,560051,19 - 7 - 2013
update:
If no 10 digit number is found then the row should be appended with some dummy data for e.g. 0000000000 (for consistency purpose)
One way using sed:
sed 's/.*\([0-9]\{10\}\).*/\1,&/' input
Gives:
0987654321,0987654321,Raka,Nr Man Informatics...
9702977479,Rajesh Patel,No 9999 Part Road To...
And this one will add 10 0's in case no 10 digit number is found:
sed 's/.*\([0-9]\{10\}\).*/\1,&/;/[0-9]\{10\}/!s/^/0000000000,/' input
Using GNU awk for \> word delimiter:
$ cat file
0987654321,Raka,Nr Man Informatics,Bm ,Bangalore,,26 - 12 - 2010
Rajesh Patel,No 9999 Part Road Town Airlines Bangalore Cell-9702977479,Crv,Bangalore,560051,19 - 7 - 2013
foo,bar
long,num,12345678901234
$ gawk -v OFS="," '{print (match($0,/[[:digit:]]{10}\>/) ? substr($0,RSTART,RLENGTH) : "0000000000"), $0 }' file
0987654321,0987654321,Raka,Nr Man Informatics,Bm ,Bangalore,,26 - 12 - 2010
9702977479,Rajesh Patel,No 9999 Part Road Town Airlines Bangalore Cell-9702977479,Crv,Bangalore,560051,19 - 7 - 2013
0000000000,foo,bar
5678901234,long,num,12345678901234
Better use sed:
sed 's/\(.*\([0-9]\{10\}\).*$\)/\2,\1/'
Now tested and working. Note that I have two sets of capturing groups - one around the entire expression (this is the first capturing group and is referred to as \1), and a second (inner) one that wraps around the ten digit number, referred to as \2.
If you want only the last ten digits of a "possibly longer than 10" number, you can do
sed 's/\(.*\([0-9]\{10\}\)[^0-9].*$\)/\2,\1/'
which makes sure that "the next thing after 10 digits is not a digit (and thus finds the last ten).