Substitute only one part of a string using perl - perl

I have an array that have some symbols that I want to remove and even thought I find a solution, I will like to know if this is the right way because I'm afraid if I use it with array will remove the character that I might need on future arrays.
Here is an example item on my array:
$string1='22 | logging monitor informational';
so I try the following:
$string1=~ s/\s{6}\|(?=\s{6})//;
So my output is:
22 logging monitor informational
Is the other way that best match "|". I just want to remove the pipe character.
Thanks in advance

"I want to remove just the pipe character."
OK, then do this:
$string1 =~ s/\|//;
This will remove the first pipe character in the string. (You said in another comment that you don't want to remove any additional pipe characters.) If that's not what you want, then I'd suggest telling us exactly what you do want. We can't read minds, you know.
In the mean time, I'd also strongly recommend reading the Perl regular expressions tutorial.

Related

Regsubbing simple matches

I'm looking for a regsub example that does the following:
123tcl456TCL789 => 123!tcl!456!TCL!789
This is an Tcl example => This is an !Tcl! example
Yes, I could use string first to find a position and mash things but I saw in past a regsub command that does what I want but I can't recall. What would be the regsub command that allows that? I would guess regsub -all -nocase is a start.
I am bad at regsub and regexps. I wonder if there is a site or tool/script that we can supply a string, the final result and then we get the regsub form.
You're looking at the right tool, but there are various options, depending on exactly what the conditions are when faced with other text. Here's one that wraps each occurrence of "Tcl" (any capitalisation) with exclamation marks:
set inputString "123tcl456TCL789"
set replaced [regsub -all -nocase {tcl} $inputString {!&!}]
puts $replaced
That's using a very simple regular expression with the -nocase option, and the replacement means "put ! on either side of the substring matched".
Another (more generally applicable... perhaps) might be to put ! after any letter or number sequence that is followed by a number or letter.
set replaced [regsub -all {[A-Za-z]+(?=[0-9])|[0-9]+(?=[A-Za-z])} $inputString {&!}]
Note that doing things correctly typically requires understanding the real input data fairly well. For example, whether the numbers include floating point numbers in scientific notation, or whether the substrings to delimit are of fixed length.

use a variable with whitespace Perl

I am currently working on a project but I have one big problem. I have some picture with a whitespace in the name and I want to do a montage. The problem is that I can't rename my picture and my code is like that :
$pic1 = qq(picture one.png);
$pic2 = qq(picture two.png);
my $cmd = "C:\...\montage.exe $pic1 $pic2 output.png";
system($cmd);
but because of the whitespace montage.exe doesn't work. How can I execute my code without renaming all my pictures?
Thanks a lot for your answer!
You can properly quote the filenames within the string you pass to system, as #Borodin shows in his answer. Something like: system("montage.exe '$pic1' '$pic2'")
However, A more reliable and safer solution is to pass the arguments to montage.exe as extra parameters in the system call:
system('montage.exe', $pic2, $pic2, 'output.png')
Now you don't have to worry about nesting the correct quotes, or worry about files with unexpected characters. Not only is this simpler code, but it avoids malicious injection issues, should those file names ever come from a tainted source. Someone could enter | rm *, but your system call will not remove all your files for you.
Further, in real life, you probably are not going to have a separate scalar variable for each file name. You'll have them in an array. This makes your system call even easier:
system('montage.exe', #filenames, 'output.png')
Not only is that super easy, but it avoids the pitfall of having a command line too long. If your filenames have nice long paths (maybe 50-100 characters), a Windows command line will exceed the max command length after around 100 files. Passing the arguments through system() instead of in one big string avoids that limitation.
Alternatively, you can pass the arguments to montage.exe as a list (instead of concatenating them all into a string):
use strict;
use warnings;
my $pic1 = qq(picture one.png);
my $pic2 = qq(picture two.png);
my #cmd = ("C:\...\montage.exe", $pic1, $pic2, "output.png");
system(#cmd);
You need to put quotes around the file names that have spaces. You also need to escape the backslashes
my $cmd = qq{C:\\...\\montage.exe "$pic1" "$pic2" output.png};
In unix systems, the best approach is the multi-argument form of system because 1) it avoids invoking a shell, and 2) that's the format accepted by the OS call. Neither of those are true in Windows. The OS call to spawn a program expects a command line, and system's attempt to form this command line is sometimes incorrect. The safest approach is to use Win32::ShellQuote.
use Win32::ShellQuote qw( quote_system );
system quote_system("C:\\...\\montage.exe", $pic1, $pic2, "output.png");

PHP: Using preg_replace to replace an unknown string between two known strings

I have $stringF. Contained within $stringF is the following (the string is all one line, not word-wrapped as below):
http://news.google.com/news/url?sa=t&fd=R&ct2=us&usg=
AFQjCNHWQk0M4bZi9xYO4OY4ZiDqYVt2SA&clid=
c3a7d30bb8a4878e06b80cf16b898331&cid=52779892300270&ei=
H4IAW6CbK5WGhQH7s5SQAg&url=https://abcnews.
go.com/Lifestyle/wireStory/latest-royal-wedding-thousands-streets-windsor-55280649
I want to locate that string and make it look like this:
https://abcnews.go.com/Lifestyle/wireStory/latest-royal-
wedding-thousands-streets-windsor-55280649
Basically I need to use preg_replace to find the following string:
http://news.google.com/news/url?sa= ***SOME UNKNOWN CONTENT*** &url=http
and replace it with the following string:
http
I'm a little rusty with my php, and even rustier with regular expressions, so I'm struggling to figure this one out. My code looks like this:
$stringG = preg_replace('http://news.google.com/news/url?sa=*&url=http','http',$stringH);
except I know I can't use wildcards and I know I need to specially deal with the special characters (colon, forward slash, question mark, and sign, etc). Hoping someone can help me out here.
Also of note is that my $stringF contains multiple instances of such strings, so I need the preg_replace to be not greedy - otherwise it will replace a huge chunk of my string unnecessarily.
PHP has tools for that, no need to use a regex. parse_url to get the components of an url (scheme, host, path, anchor, query, ...) and parse_str to get the keys/values of the query part.
$url = 'http://news.google.com/news/url?sa=t&fd=R&ct2=us&usg=AFQjCNHWQk0M4bZi9xYO4OY4ZiDqYVt2SA&clid=c3a7d30bb8a4878e06b80cf16b898331&ci=52779892300270&ei=H4IAW6CbK5WGhQH7s5SQAg&url=https://abcnews.go.com/Lifestyle/wireStory/latest-royal-wedding-thousands-streets-windsor-55280649';
parse_str(parse_url($url, PHP_URL_QUERY), $arr);
echo $arr['url'];

Regex to remove data between 2 semicolns perl

A string have data with semicolons now i want to remove all the data within the 2 semicolons and leave the rest as it is. I am using perl regex to remove the unwanted data from the string:
String :
$val="Data;test is here ;&data=1dffvdviofv;&dt&;&data=343";
Now we want to remove all the data between each semicolons ,throughout the string :
$val=~s/(.*)(\;.*\;)(.*)$/$1$3/g;
But this is not working for me. Final out should be like below :
Data &data=1dffvdviofv&data=343
One of the problems is that .* is greedy, that is, it will consume as much as it can. You can make it non-greedy by writing .*?, but that alone won't fix your regex since you've anchored it to the end of the string with $. Personally I don't think there is a need for the capture groups, you can just write
$val =~ s/;.*?;//g;
I'm assuming that the extra space in your expected output (Data &data...) is a typo.
You might also want to consider using a proper parser for whatever data format this is.

perl sequence extraction loop

I have an existing perl one-liner (from the Edwards lab) that works wonderfully to read a text file (named ids.file) that contains one column of IDs and searches a second, specially formatted text file (named fasta.file in this example - in "fasta" format for those who know bioinformatics) and returns sequences that match the ID from the first file. I was hoping to expand this script to do two additional things:
The current perl one-liner only seems to work if the ids.file contains one column of data. I would like it to work on a file that contains two columns (separated by spaces), and act on the second column of data (well, really any column of data, but I assume that it will be obvious enough to adapt it if someone can give an example using a second column)
I would like to append the any results returned from the output of the search to a third column, instead of just to a new file.
If someone is kind enough to offer an example but only has time or inclination to work on one of these, I would prefer that you try to solve #2 - I have come close to solving #1 with a for loop that uses awk to only use the Perl code on the second column - I haven't gotten it yet, but am close, so #2 seems like the harder one to me.
The perl one liner is as follows:
perl -ne 'if(/^>(\S+)/){$c=$i{$1}}$c?print:chomp;$i{$_}=1 if #ARGV' ids.file fasta.file
I appreciate any help you can give!
Not quite sure but will this do?
perl -ne 'chomp; s/^>(\S+).*/$c=$i{$1}/e; print if $c;
$i{(/^\S*\s(\S*)$/)[0]}="$_ " if #ARGV'
ids.file fasta.file