perl - search multi-line string for match of any of a list of strings

perl - search multi-line string for match of any of a list of strings - perl

I have a list of bug identifiers.
For each bugid in this buglist, I run an external command to get the history of the bug as a multi-line string:
$buginfo = `dumpbug $bugid`;
$buginfo looks something like this (greatly simplified):
04/04/2014 dog created
04/04/2014 cat manager
04/04/2014 moose assigner
04/04/2014 moose engineer
04/05/2014 moose resolved
04/06/2014 rabbit verified
Now I want to see if any of (fox, aardvark, emu, rabbit) has ever had anything to do with this bug.
I would like to stop searching through $bugid on the first match of any user in my list.
I will be searching the buginfo from each of the bugids in my buglist for the same users.
I am also limited to features of perl 5.8

print "$1 was involved in bug $bugid.\n" if $buginfo =~ /\b(fox|aardvark|emu|rabbit)\b/;

Related

Perl interface with Aspell

I am trying to identify misspelled words with Aspell via Perl. I am working on a Linux server without administrator privileges which means I have access to Perl and Aspell but not, for example, Text::Aspell which is a Perl interface for Aspell.
I want to do the very simple task of passing a list of words to Aspell and having it return the words that are misspelled. If the words I want to check are "dad word lkjlkjlkj" I can do this through the command line with the following commands:
aspell list
dad word lkjlkjlkj
Aspell requires CTRL + D at the end to submit the word list. It would then return "lkjlkjlkj", as this isn't in the dictionary.
In order to do the exact same thing, but submitted via Perl (because I need to do this for thousands of documents) I have tried:
my $list = q(dad word lkjlkjlkj):
my #arguments = ("aspell list", $list, "^D");
my $aspell_out=`#arguments`;
print "Aspell output = $aspell_out\n";
The expected output is "Aspell output = lkjlkjlkj" because this is the output that Aspell gives when you submit these commands via the command line. However, the actual output is just "Aspell output = ". That is, Perl does not capture any output from Aspell. No errors are thrown.
I am not an expert programmer, but I thought this would be a fairly simple task. I've tried various iterations of this code and nothing works. I did some digging and I'm concerned that perhaps because Aspell is interactive, I need to use something like Expect, but I cannot figure out how to use it. Nor am I sure that it is actually the solution to my problem. I also think ^D should be an appropriate replacement for CTRL+D at the end of the commands, but all I know is it doesn't throw an error. I also tried \cd instead. Whatever it is, there is obviously an issue in either submitting the command or capturing the output.

The complication with using aspell out of a program is that it is an interactive and command-line driver tool, as you suspect. However, there is a simple way to do what you need.
In order to use aspell's command list one needs to pass it words via STDIN, as its man page says. While I find the GNU Aspell manual a little difficult to get going with, passing input to a program via its STDIN is easy enough and we can rewrite the invocation as
echo dad word lkj | aspell list
We get lkj printed back, as due. Now this can run out of a program just as it stands
my $word_list = q(word lkj good asdf);
my $cmd = qq(echo $word_list | aspell list);
my #aspell_out = qx($cmd);
print for #aspell_out;
This prints lines lkj and asdf.
I assemble the command in a string (as opposed to an array) for specific reasons, explained below. The qx is the operator form of backticks, which I prefer for its far superior readability.
Note that qx can return all output in a string, if in scalar context (assigned to a scalar for example), or in a list when in list context. Here I assign to an array so you get each word as an element (alas, each also comes with a newline, so may want to do chomp #aspell_out;).
Comment on a list vs string form of a command
I think that it's safe to recommend to use a list-form for a command, in general. So we'd say
my #cmd = ('ls', '-l', $dir); # to be run as an external command
instead of
my $cmd = "ls -l $dir"; # to be run as an external command
The list form generally makes it easier to manage the command, and it avoids the shell altogether.
However, this case is a little different
The qx operator doesn't really behave differently -- the array gets concatenated into a string, and that runs. The very fact that we can pass it an array is incidental, and not even documented
We need to pipe input to aspell's STDIN, and shell does that for us simply. We can use a shell with command's LIST form as well, but then we'd need to invoke it explicitly. We can also go for aspell's STDIN by means other than the shell but that's more complex
With a command in a list the command name must be the first word, so that "aspell list" from the question is wrong and it should fail (there is no command named that) ... except that in this case it wouldn't (if the rest were correct), since for qx the array gets collapsed into a string
Finally, apsell nicely exposes its API in a C library and that's been utilized for the module you mention. I'd suggest to install it as a user (no privileges needed) and use that.

You should take a step back and investigate if you can install Text::Aspell without administrator privilige. In most cases that's perfectly possible.
You can install modules into your home directory. If there is no C-compiler available on the server you can install the module on a compatible machine, compile and copy the files.

Getting Error of Modification of a read-only value attempted

I am trying to select the below value from database:
Reporting that one of #its many problems had been the recent# extended
sales slump in women's apparel, the seven-store retailer said it would
start a three-month liquidation sale in all of its stores.~(A) its
many problems had been the recent~(B) its many problems has been the
recently~(C) its many problems is the recently~(D) their many problems
is the recent~(E) their many problems had been the recent~
i am selecting this value in variable $ques and then selecting a text as below:
$ques=~s/^(.*?)\#(.*?)\#(.*?)$/$2/;
Now, while replacing the ~ character in the string by
$3=~s/~/\n/g; ---->line 171
and running the script, I am getting one error as:
Modification of a read-only value attempted at main.pl line 171
I want to replace all the ~ character with '\n' and print the final value. Please suggest how to do it.
*I have researched this on net, but got confused that how to handle these read only variables.

You've already got a good explanation of the problem from José Castro. But there's another solution if you're using a recent-ish version of Perl (Update: having checked more carefully, I find that means 5.14+). The /r argument to the substitution operator will copy your string, make the substitution on the copy and then return that altered value.
So you could write:
my $new_value = $3 =~ s/~/\n/rg;

It sounds like what you really want in this case is split rather than regular expression capture groups:
my #parts = split(/#/, $ques);
$parts[2] =~ s/~/\n/g;
It makes the intent of your code clearer since you are, in fact, splitting on # symbols.

Just like you say, the special variables $1, $2, etc., are read-only, and that means that you can't perform that substitution on them.
Performing the substitution on $ques will do what you need:
$ques =~ s/~/\n/g;
print $ques;
Do note that in the earlier substitution that you're performing on $ques you're getting rid of all the ~ characters.

Retrieving String with single quotes from database and storing in Perl

I have a SQL query
select name from Employee
Output :
Sharma's
How can I store this output in perl string.
I tried below :
$sql =qq {select Name from Employee};
$Arr = &DataBaseQuery( $dbHandle, $sql );
$name = $Arr;
But when I print $name I get output as
Sharma::s
How can I store the single quote in the $name.

First of all, non of standard DBI/DBD exibits behavior you listed, in my experience.
Without knowing details of what DataBaseQuery() does it's impossible to answer conclusively, but a plausible theory can be formed:
Apostrophe is a valid package separator in Perl, equivalent to "::".
Reference: perldoc perlmod
The old package delimiter was a single quote, but double colon is now the preferred delimiter, in part because it's more readable to humans, and in part because it's more readable to emacs macros. It also makes C++ programmers feel like they know what's going on--as opposed to using the single quote as separator, which was there to make Ada programmers feel like they knew what was going on. Because the old-fashioned syntax is still supported for backwards compatibility, if you try to use a string like "This is $owner's house" , you'll be accessing $owner::s ; that is, the $s variable in package owner , which is probably not what you meant. Use braces to disambiguate, as in "This is ${owner}'s house" .
perl -e 'package A::B; $A::B=1; 1;
package main;
print "double-colon: $A::B\n";
print "apostrophe: $A'"'"'B\n";'
double-colon: 1
apostrophe: 1
I have a strong suspicion something within your own libraries inside DataBaseQuery() call was written to be "smart" and to convert apostrophes to double-colons because of this.
If you can't figure out root cause, you can always do one of the following:
Write your own DB wrapper
Assuming your text isn't likely to contain "::", run a regex to fix s#::#'#g; on all results from DataBaseQuery() (likely, in a function serving as a wrapper-replacement for DataBaseQuery())

how to get list of POSIX group members in Perl

Is there any way to get a list of all the members of a POSIX group in Perl?
I can't use getgrent() and similar because it returns the list as a space delimited string, and some usernames can have spaces in them.
I have to handle spaces in user and group names, because I'm working in an AD environment that other organizations can create users and groups in, so I'm trying to account for possible edge cases.

I'd say just use getgrent() and don't worry about spaces.
It may be possible to create a user name with one or more spaces in it, perhaps by manually editing /etc/passwd, but it's going to cause other problems as well. For example, ~foo is foo's home directory, but ~foo bar isn't foo bar's home directory.
On Linux, the useradd and adduser commands don't even permit spaces in file names. On Linux Mint 14 (based on Ubuntu 12.10):
$ sudo adduser 'foo bar'
adduser: To avoid problems, the username should consist only of
letters, digits, underscores, periods, at signs and dashes, and not start with
a dash (as defined by IEEE Std 1003.1-2001). For compatibility with Samba
machine accounts $ is also supported at the end of the username
$ sudo useradd !$
sudo useradd 'foo bar'
useradd: invalid user name 'foo bar'
$
Do you actually have user names with spaces on your system?
UPDATE: I've found that it actually is possible to create user names with spaces. useradd and adduser don't allow it (and you should be using one of those commands, or something similar, to create new accounts). But if I manually edit /etc/passwd using sudo vipw, I can create a user named foo bar, and I can do:
su - 'foo bar'
ssh 'foo bar#localhost'
etc. But it's a Really Bad Idea. Perl's getgr*() cannot tell whether a group contains one entry for foo bar or two entries for foo and bar (which is what you're asking about), and I can't use the shell's ~name syntax to refer to the account's home directory. I could use other methods to get both pieces of information, but it's much easier to avoid creating such an account in the first place.
If you're seriously concerned about some admin being foolish enough to create such an account, then you can use some of the alternative methods that have been discussed. But as I said, I don't think it's worth the effort.
(Perl could have avoided this problem by delimiting the list with : characters rather than spaces, since those are actually incompatible with the format of /etc/passwd and /etc/group, which the system depends on. But it's too late to change it now.)
UPDATE 2:
As you say in a comment (which I've edited into your question):
I have to handle spaces in user and group names, because I'm working in an AD environment that other organizations can create users and groups in, so I'm trying to account for possible edge cases.
Your solution from the same comment:
map((getgrgrid($_))[0], split(/ /, `id -G $username`))
is probably the best workaround. (id -G prints numeric group ids; which obviously can't contain spaces.)
It's probably also worth checking whether you actually have user or group names with spaces in them (though of course that doesn't guard against such names being added in the future). I wonder how your POSIX system actually deals with such names. I wouldn't be astonished if they're automatically translates them somehow. Even so, your id -G solution will still work.

If I have /etc/group:
...
postgres:x:26:
fsniper:x:481:
clamupdate:x:480:
some spacey group:x:482:saml, some spacey user
I can use the following commands to see this group's members:
% getent group
...
postgres:x:26:
fsniper:x:481:
clamupdate:x:480:
some spacey group:x:482:saml,some spacey user
Or if you know the specific group that you're interested in:
% getent group "some spacey group"
some spacey group:x:482:saml,some spacey user
These could be wrapped inside of a Perl script like this:
#!/usr/bin/perl
use feature qw(say);
chomp (my $getent = `getent group "some spacey group" | sed 's/.*://'`);
my #users = split(/,/, $getent);
foreach my $i (#users) { say $i; }
Running it:
% ./b.pl
saml
some spacey user
Resources
How to list all users in a Linux group?

zsh filename globbling/substitution

I am trying to create my first zsh completion script, in this case for the command netcfg.
Lame as it may sound I have stuck on the first hurdle, disclaimer, I know how to do this crudely, however I seek the "ZSH WAY" to do this.
I need to list the files in /etc/networking but only the files, not the directory component, so I do the following.
echo $(ls /etc/network.d/*(.))
/etc/network.d/ethernet-dhcp /etc/network.d/wireless-wpa-config
What I wanted was:
ethernet-dhcp wireless-wpa-config
So I try (excuse my naivity) :
echo ${(s/*\/)$(ls /etc/network.d/*(.))}
/etc/network.d/ethernet-dhcp /etc/network.d/wireless-wpa-config
It seems that this doesn't work, I'm sure there must be some clever way of doing this by splitting into an array and getting the last part but as I say, I'm complete noob at this.
Any advice gratefully received.

General note: There is no need to use ls to generate the filenames. You might as well use echo some*glob. But if you want to protect the possible embedded newline characters even that is a bad idea. The first example below globs directly into an array to protect embedded newlines. The second one uses printf to generate NUL terminated data to accomplish the same thing without using a variable.
It is easy to do if you are willing to use a variable:
typeset -a entries
entries=(/etc/network.d/*(.)) # generate the list
echo ${entries#/etc/network.d/} # strip the prefix from each one
You can also do it without a variable, but the extra stuff to isolate individual entries is a bit ugly:
# From the inside, to the outside:
# * glob the entries
# * NUL terminate them into a single string
# * split at NUL
# * strip the prefix from each one
echo ${${(0)"$(printf '%s\0' /etc/network.d/*(.))"}#/etc/network.d/}
Or, if you are going to use a subshell anyway (i.e. the command substitution in the previous example), just cd to the directory so it is not part of the glob expansion (plus, you do not have to repeat the directory name):
echo ${(0)"$(cd /etc/network.d && printf '%s\0' *(.))"}

Chris Johnsen's answer is full of useful information about zsh, however it doesn't mention the much simpler solution that works in this particular case:
echo /etc/network.d/*(:t)
This is using the t history modifier as a glob qualifier.

Thanks for your suggestions guys, having done yet more reading of ZSH and coming back to the problem a couple of days later, I think I've got a very terse solution which I would like to share for your benefit.
echo ${$(print /etc/network.d/*(.)):t}

I'm used to seeing basename(1) stripping off directory components; also, you can use echo /etc/network/* to get the file listing without running the external ls program. (Running external programs can slow down completion more than you'd like; I didn't find a zsh-builtin for basename, but that doesn't mean that there isn't one.)
Here's something I hope will help:
haig% for f in /etc/network/* ; do basename $f ; done
if-down.d
if-post-down.d
if-pre-up.d
if-up.d
interfaces

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse