Dollar symbol should be around entities and values- Perl - perl

I have confusion with my code to remove dollars inside the digits (multi values) and to be inserted the dollar symbol around the values.
Sure I am little bit confused.
For e.g.: 10$x$10$x$10$x$10 should be $10x10x10x10$ #might be 'n' numbered infinite.
My code:
use strict;
use warnings;
my $tmp = do { local $/; $_ = <DATA>; };
my #allines = split /\n/, $tmp;
for(#allines)
{
my $lines = $_;
my ($pre,$matches,$posts) = "";
$lines=~s/(\d+)(\$*)\\times\$(\d+)/$1$2\\times$3\$/g;
print $lines;
}
Input:
__DATA__
where $Q=k-k^{\prime}$ is the scattering vector of length $4\pi \sin{\theta} /{\lambda}$ for a neutron of wavelength ${\lambda}$ scattered at an angle $2{\theta}$, and k and k' are X-ray absorption spectroscopy. Thus, RMC trials were performed for several samples assuming either A 10$\times$10$\times$10 supercell was first built, based on the unit cell model Sample paragraph testing 10$\times$10$\times$10 text continues.... Sample paragraph testing 10$\times$10$\times$10$\times$10 text continues.... Sample paragraph testing 10$\times$10$\times$10$\times$10$\times$10$\times$10 text continues.... obtained.
Required Output:
where $Q=k-k^{\prime}$ is the scattering vector of length $4\pi \sin{\theta} /{\lambda}$ for a neutron of wavelength ${\lambda}$ scattered at an angle $2{\theta}$, and k and k' are X-ray absorption spectroscopy. Thus, RMC trials were performed for several samples assuming either A $10\times10\times10$ supercell was first built, based on the unit cell model Sample paragraph testing $10\times10\times10$ text continues.... Sample paragraph testing $10\times10\times10\times10$ text continues.... Sample paragraph testing $10\times10\times10\times10\times10\times10$ text continues.... obtained.```

If you simply want to blindly transform 10$x$10$x$10$x$10 into $10x10x10x10$ without taking account anything about the surrounding text, then this should be enough.
$lines=~s/(\d+)\$/\$$1/g;
If your requirements are more complex than that, you need to update the question with the details.
[UPDATE]
Just looking again at the input and expected output, I see there is a complication -- some of the input looks like this times$10$ with the expected output times$10. That means we have an optional leading $ that needs to be taken into account.
To deal with that we can add \$? to the start of the regex to match the optional $, like this
$lines=~s/\$?(\d+)\$/\$$1/g;
Below is a rewrite of your code that also removes some of the unnecessary splitting
use strict;
use warnings;
while (<DATA>)
{
s/\$?(\d+)\$/\$$1/g;
print ;
}
__DATA__
Sample paragraph testing 10$\times$10$\times$10 text continues....
Sample paragraph testing 10$\times$10$\times$10$\times$10 text continues....
Sample paragraph testing 10$\times$10$\times$10$\times$10$\times$10$\times$10 text continues....
Output is
Sample paragraph testing $10\times$10\times$10 text continues....
Sample paragraph testing $10\times$10\times$10\times$10 text continues....
Sample paragraph testing $10\times$10\times$10\times$10\times$10\times$10 text continues....
[UPDATE 2]
Assuming the actual requirements are
change the first occurrence of, say, 123$ into $123
for last occurrence of $123, change to 123$
for the intermediate digit-dollar sequences, remove the dollars.
use strict;
use warnings;
while (<DATA>)
{
# replace the first occurrence only
s/\$?(\d+)\$/\$$1/;
# remove $ from the all but the last digit-dollar
# uses lookahead to prevent matching the last digit-dollar
s/times\$?(\d+)\$?(?=\\t)/times$1/g;
# rework the last occurrence of digit-dollar
s/times\$(\d+)/times$1\$/;
print ;
}
Input:
__DATA__
Sample paragraph testing 10$\times$10$\times$10 text continues....
Sample paragraph testing 10$\times$10$\times$10$\times$10 text continues....
Sample paragraph testing 10$\times$10$\times$10$\times$10$\times$10$\times$10 text continues....
output is
Sample paragraph testing $10\times10\times10$ text continues....
Sample paragraph testing $10\times10\times10\times10$ text continues....
Sample paragraph testing $10\times10\times10\times10\times10\times10$ text continues....
UPDATE 3
New requirement -- there can be multiple digit-dollar sequences in a single line.
This complicates the code a bit, but not much.
use strict;
use warnings;
while (<DATA>)
{
# walk the string looking for strings of the form "10$\times$10$\times$10$\times$10"
while (s/(.*?)((\$?\d+\$?\\times)+\$?\d+\$?)//)
{
# output any data that preceded the digit-dollar sequence
print $1;
my $block = $2;
# Remove all dollars
$block =~ s/\$+//g;
# put back the initial dollar
$block =~ s/^(\d+)/\$$1/;
# and the terminating dollar
$block =~ s/$/\$/;
# output the modified digit-dollar sequence
print $block;
}
# output trailing text
print;
}
Input:
__DATA__
where $Q=k-k^{\prime}$ is the scattering vector of length $4\pi \sin{\theta} /{\lambda}$ for a neutron of wavelength ${\lambda}$ scattered at an angle $2{\theta}$, and k and k' are X-ray absorption spectroscopy. Thus, RMC trials were performed for several samples assuming either A 10$\times$10$\times$10 supercell was first built, based on the unit cell model Sample paragraph testing 10$\times$10$\times$10 text continues.... Sample paragraph testing 10$\times$10$\times$10$\times$10 text continues.... Sample paragraph testing 10$\times$10$\times$10$\times$10$\times$10$\times$10 text continues.... obtained.
Sample paragraph testing 10$\times$10$\times$10 text continues....
Sample paragraph testing 10$\times$10$\times$10$\times$10 text continues....
Sample paragraph testing 10$\times$10$\times$10$\times$10$\times$10$\times$10 text continues....
output is
where $Q=k-k^{\prime}$ is the scattering vector of length $4\pi \sin{\theta} /{\lambda}$ for a neutron of wavelength ${\lambda}$ scattered at an angle $2{\theta}$, and k and k' are X-ray absorption spectroscopy. Thus, RMC trials were performed for several samples assuming either A $10\times10\times10$ supercell was first built, based on the unit cell model Sample paragraph testing $10\times10\times10$ text continues.... Sample paragraph testing $10\times10\times10\times10$ text continues.... Sample paragraph testing $10\times10\times10\times10\times10\times10$ text continues.... obtained.
Sample paragraph testing $10\times10\times10$ text continues....
Sample paragraph testing $10\times10\times10\times10$ text continues....
Sample paragraph testing $10\times10\times10\times10\times10\times10$ text continues....

Related

Sentence vectors for fasttext

I tried for creating sentence vectors in fasttext for a single line in english in python, but all i got was solution related to CLI in fasttext.
Link for fast test sentence vector creation
I want a function or a library that can easily provide me sentence vector for the input sentence in an english text form that I can use it in my python script.
This one:
https://pypi.org/project/fasttext/
Then we just need to average or sum output vectors for each token in a sentence.

Detect a PNG image's length from a stream

Basically my question is like this one, but from PNG instead of JPEG.
More specifically, I have a bunch of PNG images concatenated together and I want to discover their lengths, so I can split the stream in pieces corresponding to individual images.
I do not need to decode or validate the images. I can assume that the input stream is composed of valid PNG images and do not want to verify that. Instead, it is useful for me to do this as quickly as possible, so the less amount of decoding operations are required, the better.
Here you have a perl script, based on this answer
#!/usr/bin/perl
undef $/;
$_ = <>;
$n = 0;
for $match (split(/(?=\x{89}PNG\x{0d}\x{0a}\x{1a}\x{0a})/)) {
open(O, sprintf('>temp%04d.png',++$n));
print O $match;
close(O);
}
Save this as, say splitpng.pl and run perl splitpng.pl < myfile
This is not 100% foolproof (the rigourous way would be to count chunks sizes, as per Jongware's comment), but the probabilty of having that signature inside a PNG should be small.

Exporting cell array with both text and numbers to csv file in Matlab

I have a cell array with nine columns (the first eight text and the ninth numbers) and thousands of rows that I would like to export to a csv file.
I have tried to follow the suggestions provided in similar questions and I take that the best way to proceed is to use the fprintf function:
fid = fopen(outputfile, 'w')
fprint(fid, ???, variable{:,:})
fclose(fid)
Nevertheless, I cannot figure out what I am supposed to write in the middle. I have tried several combinations using "%s", "\n", "\t", but it does not seem to work. Ideally, I would like to separate each column by either a ";", "," or a tab, and to make sure that the decimals of the values are not lost.

Reading file After a special character delimiter in matlab

I have a file in which sentences ends like ./.
I want to read the file in a cell array, one cell for every line. Could you please tell me how to do that using textscan.
Basically I want to know how to put the delimiter ./.
well i am not sure if this is helpful or not
in the normal case of a new line for each sentence you could use
tline = fgetl(fileID);
D=textscan(tline,'%s','delimiter','./.');
but if your file doesn't have new lines for each sentence just ./. as a separator there are two cases that the sentences don't contain any characters used as a delimiter i.e . or /
in that case you can try something like
C = textscan(fileID,'%s %*1s','delimiter','/','MultipleDelimsAsOne',1);
the other case if your sentences did contain these characters then i think you can't use them as a delimiters but i might be wrong

Displaying information from MATLAB without a line feed

Is there any way to output/display information from a MATLAB program without an ending line feed?
My MATLAB program outputs a number a bit now and then. Between outputting the number the program does a lot of other stuff. This is a construct mainly to indicate some kind of progress and it would be nice not to have a line feed each time, just to make it more readable for the user. This is approximately what I'm looking for:
Current random seed:
4 7 1 1
The next output from the program would be on the same row if it is still doing the same thing as before.
I've read the doc on disp, sprintf, and format but haven't found what I'm looking for. This doesn't mean it isn't there. ;)
The fprintf function does not add a line feed unless you explicitly tell it to. Omit the fid argument to have it print to the Command Window.
fprintf('Doing stuff... ');
for i = 1:5
fprintf('%d ', i);
% do some work on that pass...
end
fprintf(' done.\n'); % That \n explicitly adds the linefeed
Using sprintf won't quite work: it creates a string without a line feed, but then if you use disp() or omit the semicolon, disp's own display logic will add a line feed.