How to capture text within a negate class character using perl [duplicate] - perl

This question already has answers here:
missing last character in perl regex
(3 answers)
Closed 7 years ago.
My first problem is, I need to search for two consecutive sets of parenthesis, for example,
(log dkdkdkd) (log edksks)
This code below solves this first problem:
^\([^)]*\) \([^)]*\)$
Second problem, in addition to using the solution above, I need to capture the text after the log something like this:
^\(log (.*)\) \(log (.*)\)$
But this above solution does not work because it find more than two sets of parenthesis, for example:
(log dkdkdkd) (log edksks) (log riwqoq)
What I really need is to find two sets of consecutive parenthesis while capturing the text after the log text?

You can tell Perl that the text after "log" doesn't contain (:
/^\(log ([^(]*)\)\s*\(log ([^(]*)\)$/
Or, if ( is possible and you only want to exclude (log, check that in two steps:
for my $s ('(log this) (log matches)',
'(log these) (log do) (log not)'
) {
my #matches = $s =~ /^\(log (.*)\)\s*\(log (.*)\)$/;
next if $matches[0] =~ /\(log /; # More than 2 logs, skip.
say "($_)" for #matches;
}

You can use a non-greedy search
/^\(log (.*?)\) \(log (.*?)\)/
Saying .*? instead of .* makes the regex engine try to match the pattern with the minimum number of characters necessary.

Related

Search for a match, after the match is found take the number after the match and add 4 to it, it is posible in perl?

I am a beginer in perl and I need to modify a txt file by keeping all the previous data in it and only modify the file by adding 4 to every number related to a specific tag (< COMPRESSED-SIZE >). The file have many lines and tags and looks like below, I need to find all the < COMPRESSED-SIZE > tags and add 4 to the number specified near the tag:
< SOURCE-START-ADDRESS >01< /SOURCE-START-ADDRESS >
< COMPRESSED-SIZE >132219< /COMPRESSED-SIZE >
< UNCOMPRESSED-SIZE >229376< /UNCOMPRESSED-SIZE >
So I guess I need to do something like: search for the keyword(match) and store the number 132219 in a variable and add the second number (4) to it, replace the result 132219 with 132223, the rest of the file must remain unchanged, only the numbers related to this tag must change. I cannot search for the number instead of the tag because the number could change while the tag will remain always the same. I also need to find all the tags with this name and replace the numbers near them by adding 4 to them. I already have the code for finding something after a keyword, because I needed to search also for another tag, but this script does something else, adds a number in front of a keyword. I think I could use this code for what i need, but I do not know how to make the calculation and keep the rest of the file intact or if it is posible in perl.
while (my $row = <$inputFileHandler>)
{
if(index($row,$Data_Pattern) != -1){
my $extract = substr($row, index($row,$Data_Pattern) + length($Data_Pattern), length($row));
my $counter_insert = sprintf "%08d", $counter;
my $spaces = " " x index($row,$Data_Pattern);
$data_to_send ="what i need to add" . $extract;
print {$outs} $spaces . $Data_Pattern . $data_to_send;
$counter = $counter + 1;
}
else
{
print {$outs} $row;
next;
}
}
Maybe you could help me with a block of code for my needs, $Data_Pattern is the match. Thank you very much!
This is a classic one-liner Perl task. Basically you would do something like
$ perl -i.bak -pe's/^< COMPRESSED-SIZE >\K(\d+)/$1 + 4/e' yourfile.txt
Which will in essence copy and replace your file with a new, edited file. This can be very dangerous, especially if you are a Perl newbie. The -i switch is here used with the .bak extension which saves a backup in yourfile.txt.bak. This does not make this operation safe, however, as running the command twice will overwrite the backup.
It is advisable to make a separate backup of the target file before using this command.
-i.bak edit "in-place", the file is overwritten, a backup of the original is created with extension .bak.
-p argument is treated as a file name, which is read, and printed back.
s/ // the substitution operator, which is applied to all lines of the file.
^ inside the regex looks for beginning of line.
\K keep the match that is to the left.
(\d+) capture () 1 or more digits \d+ and store them in $1
/e treat the right hand side of the substitution operator as an expression and use the result as the replacement string. In this case it will increase your number and return the sum.
The long version of this command is
while (<>) {
s/^< COMPRESSED-SIZE >\K(\d+)/$1 + 4/e
}
Which can be placed in a file and run with the -i switch.

Partial String Replacement using PowerShell

Problem
I am working on a script that has a user provide a specific IP address and I want to mask this IP in some fashion so that it isn't stored in the logs. My problem is, that I can easily do this when I know what the first three values of the IP typically are; however, I want to avoid storing/hard coding those values into the code to if at all possible. I also want to be able to replace the values even if the first three are unknown to me.
Examples:
10.11.12.50 would display as XX.XX.XX.50
10.12.11.23 would also display as XX.XX.XX.23
I have looked up partial string replacements, but none of the questions or problems that I found came close to doing this. I have tried doing things like:
# This ended up replacing all of the numbers
$tempString = $str -replace '[0-9]', 'X'
I know that I am partway there, but I aiming to only replace only the first 3 sets of digits so, basically every digit that is before a '.', but I haven't been able to achieve this.
Question
Is what I'm trying to do possible to achieve with PowerShell? Is there a best practice way of achieving this?
Here's an example of how you can accomplish this:
Get-Content 'File.txt' |
ForEach-Object { $_ = $_ -replace '\d{1,3}\.\d{1,3}\.\d{1,3}','xx.xx.xx' }
This example matches a digit 1-3 times, a literal period, and continues that pattern so it'll capture anything from 0-999.0-999.0-999 and replace with xx.xx.xx
TheIncorrigible1's helpful answer is an exact way of solving the problem (replacement only happens if 3 consecutive .-separated groups of 1-3 digits are matched.)
A looser, but shorter solution that replaces everything but the last .-prefixed digit group:
PS> '10.11.12.50' -replace '.+(?=\.\d+$)', 'XX.XX.XX'
XX.XX.XX.50
(?=\.\d+$) is a (positive) lookahead assertion ((?=...)) that matches the enclosed subexpression (a literal . followed by 1 or more digits (\d) at the end of the string ($)), but doesn't capture it as part of the overall match.
The net effect is that only what .+ captured - everything before the lookahead assertion's match - is replaced with 'XX.XX.XX'.
Applied to the above example input string, 10.11.12.50:
(?=\.\d+$) matches the .-prefixed digit group at the end, .50.
.+ matches everything before .50, which is 10.11.12.
Since the (?=...) part isn't captured, it is therefore not included in what is replaced, so it is only substring 10.11.12 that is replaced, namely with XX.XX.XX, yielding XX.XX.XX.50 as a result.

What is the right regex to match a relative path to an image file?

I have this path ../../Capture.jpg. So far I've figured out this incomplete regex: '[../]+'. I want to check if user puts in the right path like ../../image file name. The file extensions can be jpg, png, ..
your [../]+ is not sufficient or correct for the job at hand, if you REALLY want to match a bunch of ../ at the start of a filename.
It's not completely clear what you want to do exactly, but the following will match one or more ../ at the start of a string:
/^((?:\.\.\/)+)/
basically:
^ to anchor to the start of the string being tested - will not match any ../ INSIDE the string
( and the balancing ) at the end: capture the contents within. All your ../../ will be available in a variable called $1
then I'm using (?: ) to wrap the next content. This groups the bit inside, but does NOT save the value inside a $1, $2, etc. More information soon...
The REAL pattern of interest is
\.\.\/
Since . and / are magic characters, they need 'escaping' with backslash. This tells Perl that the . and / do NOT have a special meaning at this point.
I've used the (?: ) wrapper to group them together, so that the + operates on all 3 characters of interest. The + operator means "one or more repetitions".
So, my pattern will match one or more repetitions of ../ which are anchored to the start of the string. Furthermore, the exact contents matched will be available in $1 if you are interested in doing something with that (eg count how many ../ you have)
Please ask if you have further questions, or I have misunderstood your goals.
EDIT: to suit your new requirements, and add a bit of bonus:
m!^\.\./\.\./(([^/]+)\.([^.]+))$!
Note first that I've used m!pattern! instead of /pattern/. Firstly, if Perl sees /pattern/ it assumes it's m/pattern/ but you can use an alternative character to wrap the patterns. This is useful if you actually want to use / in your pattern without having to go nuts with backslashes.
so:
^ exactly match only from the start
followed by exactly ../../
next I've used ( ) wrappers to capture the bits following. Explanation after...
ignoring the ( and ) now:
[^/]+ one or more repetitions (+) of any character that isn't /
. literally a dot - the one before the extension
[^./]+ one or more repetitions of any character that isn't . or /
Notice how the [^/]+ allows for any character including . but prevents another directory part from sneaking in. Thus, the filename could be foo.bar.jpg and it will be collected properly.
Notice how [^./]+ allows for any character in the extension except a dot - and also excluding / to prevent another directory segment from sneaking in.
Finally, $ is used to ensure we've reached the end of the pattern.
as for the captures:
$1 will contain all of foo.bar.jpg
$2 will contain foo.bar
$3 will contain jpg (not .jpg) but I'll leave it up to you to figure out what to change if you wish to capture the dot as well.
FINALLY - in a typical script, you might do something like:
if($filename =~ m!^\.\./\.\./(([^/]+)\.([^./]+))$!) {
print "You correctly entered ../../$1 giving basename=$2 and extension=$3 - Bravo!\n";
}
else {
print "you've failed to read the instructions properly\n";
}
As a bonus, I even tested that, and found 2 spolling mistaiks you'll never have to see
cheers.
# convert relative file paths to md links ...
# file paths and names with letters , nums - and _ s supported
$str =~ s! (\.\.\/([a-zA-Z0-9_\-\/\\]*)[\/\\]([a-zA-Z0-9_\-]*)\.([a-zA-Z0-9]*)) ! [$3]($1) !gm
If you don't care the path prefix, use:
$path =~ /\.(jpg|png)$/
or
substr($path, -4) ~~ ['.jpg', '.png']
With exactly '../../', use:
$path =~ m!^\.\./\.\./[^/]*\.(jpg|png)$!
With any number of '../'s, use:
$path =~ m!^(\.\./)*[^/]*\.(jpg|png)$!

How do I parse this file and store it in a table?

I have to parse a file and store it in a table. I was asked to use a hash to implement this. Give me simple means to do that, only in Perl.
-----------------------------------------------------------------------
L1234| Archana20 | 2010-02-12 17:41:01 -0700 (Mon, 19 Apr 2010) | 1 line
PD:21534 / lserve<->Progress good
------------------------------------------------------------------------
L1235 | Archana20 | 2010-04-12 12:54:41 -0700 (Fri, 16 Apr 2010) | 1 line
PD:21534 / Module<->Dir,requires completion
------------------------------------------------------------------------
L1236 | Archana20 | 2010-02-12 17:39:43 -0700 (Wed, 14 Apr 2010) | 1 line
PD:21534 / General Page problem fixed
------------------------------------------------------------------------
L1237 | Archana20 | 2010-03-13 07:29:53 -0700 (Tue, 13 Apr 2010) | 1 line
gTr:SLC-163 / immediate fix required
------------------------------------------------------------------------
L1238 | Archana20 | 2010-02-12 13:00:44 -0700 (Mon, 12 Apr 2010) | 1 line
PD:21534 / Loc Information Page
------------------------------------------------------------------------
I want to read this file and I want to perform a split or whatever to extract the following fields in a table:
the id that starts with L should be the first field in a table
Archana20 must be in the second field
timestamp must be in the third field
PD must be in the fourth field
Type (content preceding / must be in the last field)
My questions are:
How to ignore the --------… (separator line) in this file?
How to extract the above?
How to split since the file has two delimiters (|, /)?
How to implement it using a hash and what is the need for this?
Please provide some simple means so that I can understand since I am a beginner to Perl.
My questions are:
How to ignore the --------… (separator line) in this file?
How to extract the above?
How to split since the file has two delimiters (|, /)?
How to implement it using a hash and what is the need for this?
You will probably be working through the file line by line in a loop. Take a look at perldoc -f next. You can use regular expressions or a simpler match in this case, to make sure that you only skip appropriate lines.
You need to split first and then handle each field as needed after, I would guess.
Split on the primary delimiter (which appears to be ' | ' - more on that in a minute), then split the final field on its secondary delimiter afterwards.
I'm not sure if you are asking whether you need a hash or not. If so, you need to pick which item will provide the best set of (unique) keys. We can't do that for you since we don't know your data, but the first field (at a glance) looks about right. As for how to get something like this into a more complex data structure, you will want to look at perldoc perldsc eventually, though it might only confuse you right now.
One other thing, your data above looks like it has a semi-important typo in the first line. In that line only, there is no space between the first field and its delimiter. Everywhere else it's ' | '. I mention this only because it can matter for split. I nearly edited this, but maybe the data itself is irregular, though I doubt it.
I don't know how much of a beginner you are to Perl, but if you are completely new to it, you should think about a book (online tutorials vary widely and many are terribly out of date). A reasonably good introductory book is freely available online: Beginning Perl. Another good option is Learning Perl and Intermediate Perl (they really go together).
When you say This is not a homework...to mean this will be a start to assess me in perl I assume you mean that this is perhaps the first assignment you have at a new job or something, in which case It seems that if we just give you the answer it will actually harm you later since they will assume you know more about Perl than you do.
However, I will point you in the right direction.
A. Don't use split, use regular expressions. You can learn about them by googling "perl regex"
B. Google "perl hash" to learn about perl hashes. The first result is very good.
Now to your questions:
regular expressions will help you ignore lines you don't want
regular expressions with extract items. Look up "capture variables"
Don't split, use regex
See point B above.
If this file is line based then you can do a line by line based read in a while loop. Then skip those lines that aren't formatted how you wish.
After that, you can either use regex as indicated in the other answer. I'd use that to split it up and get an array and build a hash of lists for the record. Either after that (or before) clean up each record by trimming whitespace etc. If you use regex, then use the capture expressions to add to your list in that fashion. Its up to you.
The hash key is the first column, the list contains everything else. If you are just doing a direct insert, you can get away with a list of lists and just put everything in that instead.
The key for the hash would allow you to look at particular records for fast lookup. But if you don't need that, then an array would be fine.
You can try this one,
Points need to know:
read the file line by line
By using regular expression, removing '----' lines.
after that use split function to populate Hashes of array .
#!/usr/bin/perl
use strict;
use warning;
my $test_file = 'test.txt';
open(IN, '<' ,"$test_file") or die $!;
my (%seen, $id, $name, $timestamp, $PD, $type);
while(<IN>){
chomp;
my $line = $_;
if($line =~ m/^-/){ #removing '---' lines
# print "$line:hello\n";
}else{
if ($line =~ /\|/){
($id , $name, $timestamp) = split /\|/, $line, 4;
} else{
($PD, $type) = split /\//, $line , 3;
}
$seen{$id}= [$name, $timestamp, $PD, $type]; //use Hashes of array
}
}
for my $test(sort keys %seen){
my $test1 = $seen{$test};
print "$test:#{$test1}\n";
}
close(IN);

How do I insert a lot of whitespace in Perl?

I need to buff out a line of text with a varying but large number of whitespace. I can figure out a janky way of doing a loop and adding whitespace to $foo, then splicing that into the text, but it is not an elegant solution.
I need a little more info. Are you just appending to some text or do you need to insert it?
Either way, one easy way to get repetition is perl's 'x' operator, eg.
" " x 20000
will give you 20K spaces.
If have an existing string ($s say) and you want to pad it out to 20K, try
$s .= (" " x (20000 - length($s)))
BTW, Perl has an extensive set of operators - well worth studying if you're serious about the language.
UPDATE: The question as originally asked (it has since been edited) asked about 20K spaces, not a "lot of whitespace", hence the 20K in my answer.
If you always want the string to be a certain length you can use sprintf:
For example, to pad out $var with white space so it 20,000 characters long use:
$var = sprintf("%-20000s",$var);
use the 'x' operator:
print ' ' x 20000;