This question already has an answer here:
Perl: find whether an particular element of array is a word or sentence
(1 answer)
Closed 9 years ago.
I have a line which can be a single word or sentence. What is the command line to check whether it is a single word or sentence ?
Your input is in $line.
Check like below
if(chomp($line) =~ /^\w+$/){
# only a word
} else {
# It contains multiple words
}
Coudln't you just check for spaces in the input line? If it contains a space it's safe to say it's a sentence? Then add some safety checks so it doesn't count when people write something like " word", "word ", etc. :)
do split(" ") and store in Array. If your array more than 1 element so it obviously not a word.
Related
This question already has answers here:
split() but keep delimiter
(2 answers)
Closed 11 months ago.
I want to split a multi sentence paragraph into its constituent sentences whilst retaining the split characters ie the '. ? !'. The code I'm using is:
my #Sentence = split(/[\.\?\!]/,$Paragraph);
Is there any way that I can save those sentence terminators?
Yes, if you add parentheses around the delimiter, they will be included in the result list.
my #Sentence = split /([\.\?\!])/, $Paragraph;
E.g. if you have the string foo.bar.baz before you would get qw(foo bar baz), and with parentheses you would get qw(foo . bar . baz).
In case you want to keep the delimiters attached to the sentence, you could use a lookbehind assertion
my #Sentence = split /(?<=[\.\?\!])/, $Paragraph;
# result qw(foo. bar. baz)
If you want to strip unnecessary spaces after the match, you could use /(?<=[\.\?\!]) */.
This question already has answers here:
Delete the first line that matches a pattern
(5 answers)
Closed 2 years ago.
I have file like this,then now I want to delete only the first def_xxx
abc_xxx
def_xxx
ghi_xxx
abc_yyy
def_yyy
ghi_yyy
It delete the two lines def_xxx,def_yyy.
sed -e '/def/d' myfile.txt
How can I delete the only first line def_xxx??
sed -e '0,/def/{/def/d;}' myfile.txt
This deletes the first occurrence of the pattern.
From its manual:
0,addr2
Start out in "matched first address" state, until addr2 is found. This is similar
to 1,addr2, except that if addr2 matches the very first line of input the
0,addr2 form will be at the end of its range, whereas the 1,addr2 form will
still be at the beginning of its range. This works only when addr2 is a
regular expression.
Ref: https://linux.die.net/man/1/sed
This question already has answers here:
How to Replace white space in perl
(3 answers)
Closed 5 years ago.
What does this line do in Perl?
s/\s//g;
I'm looking at a script that is used to search and count certain characters in an input file and I understand everything in the code except for this line. I was wondering what this line did for the script?
s/\s//g;
is short for
$_ =~ s/\s//g;
It is a substitution operator bound to $_. It replaces all sequences in $_ that match the regex pattern \s with nothing. (Without g, it would only replace the first.)
\s matches a character of whitespace.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I have a file which consist a data as below, and I want to remove which data not present
in the parenthesis.
hello (welcome) to chennai (hai)
hello (how) this is for testing (with)
[is] this (bhuvanesh)
I want the output as below
(welcome) (hai)
(how) (with)
(bhuvanesh)
You can use the following sed command:
sed 's/[^(]*\(([^)]\+)\)[^(]*/\1/g' input.txt
Explanation:
I'm using the substitute command. In it's basic form it looks like this:
s/SEARCH/REPLACE/g
the g at end the means global, and means sed should reaplace all occurences of SEARCH not just the first.
The SEARCH pattern looks like this:
[^(]*\(([^)]\+)\)[^(]*
I'll try to explain it step by step...
[^(]*
[] is a character class, the ^ at the beginning means that the characters listed in the class should not match. We are listing only a single character - the opening parenthesis (. The * means this can occur zero or more times. In one sentence, sed is searching for all characters before the first starting parenthesis (.
\(([^)]\+)\)
(...) is a matching group. In the basic sed language it needs to get escaped: \(...\). The first character in the matching group is the opening parenthesis (. A character class [^)] is following. It matches every character except of the closing parenthesis ). The quantifier \+ means there must be at least one character between the parenthesises in your input text, if you would like to allow empty content you need to use the * as quantifier here. It follows the closing parenthesis ) and the end of the matching group \)..
Through the usage of the matching group, the matched content is available via \1 now.
The last part of the search pattern is the same as the first part:
[^(]*
It matches everything until the next opening parenthesis.
The REPLACE pattern is simple. It throws away everything except of the content of matching group \1.
This awk would do:
awk -F"[()]" '{for (i=2;i<=NF;i+=2) printf "(%s) ",$i;print ""}' file
(welcome) (hai)
(how) (with)
(bhuvanesh)
Or like this:
awk -F"[()]" '{for (i=2;i<=NF;i+=2) printf "%s ",$i;print ""}' file
welcome hai
how with
bhuvanesh
Try this one.
sed -r 's/\[.*\][^(]*//g ; s/.*(\(.*\)).*(\(.*\))/\1\2/g'
My program contains a string like
$abc= "mojo logo sfdgsdj2123 *** mojo **";
I want to change it to
$abc= "mojo *** mojo **";
How can i do this?
Also the characters between "logo" and the first " * " can be anything other than " * " (ie not necessarily sfdgsd2123).
So basically the question is to remove till the first occurrence of "*" after first "mojo".
Please help...
I'll give you the "here's how you would do it answer" rather than write the substitution code, as you did ask "How can I do this?"
So here's how:
Make the regex for "a sequence of zero or more non-asterisk characters that are preceded by the word mojo"
Substitute the first occurrence of the substring that matches that regex with the empty string.
That's all there is to it. It's a little one-liner in Perl and most languages with sophisticated enough regex engines to support positive lookbehind.
If all that sounded crazy, feel free to walk through the string character by character. Find where "mojo" appears first. Then continue walking through the string, removing all the non-asterisk characters you encounter.