matlab regexprep - matlab

How to use matlab regexprep , for multiple expression and replacements?
file='http:xxx/sys/tags/Rel/total';
I want to replace 'sys' with sys1 and 'total' with 'total1'. For a single expression a replacement it works like this:
strrep(file,'sys', 'sys1')
and want to have like
strrep(file,'sys','sys1','total','total1') .
I know this doesn't work for strrep

Why not just issue the command twice?
file = 'http:xxx/sys/tags/Rel/total';
file = strrep(file,'sys','sys1')
strrep(file,'total','total1')

To solve it you need substitute functionality with regex, try to find in matlab's regexes something similar to this in php:
$string = 'http:xxx/sys/tags/Rel/total';
preg_replace('/http:(.*?)\//', 'http:${1}1/', $string);
${1} means 1st match group, that is what in parenthesis, (.*?).
http:(.*?)\/ - match pattern
http:${1}1/ - replace pattern with second 1 as you wish to add (first 1 is a group number)
http:xxx/sys/tags/Rel/total - input string
The secret is that whatever is matched by (.*?) (whether xxx or yyyy or 1234) will be inserted instead of ${1} in replace pattern, and then replace instead of old stuff into the input string. Welcome to see more examples on substitute functionality in php.

As documented in the help page for regexprep, you can specify pairs of patterns and replacements like this:
file='http:xxx/sys/tags/Rel/total';
regexprep(file, {'sys' 'total'}, {'sys1' 'total1'})
ans =
http:xxx/sys1/tags/Rel/total1
It is even possible to use tokens, should you be able to define a match pattern for everything you want to replace:
regexprep(file, '/([st][yo][^/$]*)', '/$11')
ans =
http:xxx/sys1/tags/Rel/total1
However, care must be taken with the first approach under certain circumstances, because MATLAB replaces the pairs one after another. That is to say if, say, the first pattern matches a string and replaces it with something that is subsequently matched by a later pattern, then that will also be replaced by the later replacement, even though it might not have matched the later pattern in the original string.
Example:
regexprep('This\is{not}LaTeX.', {'\\' '([{}])'}, {'\\textbackslash{}' '\\$1'})
ans =
This\textbackslash\{\}is\{not\}LaTeX.
=> This\{}is{not}LaTeX.
and
regexprep('This\is{not}LaTeX.', {'([{}])' '\\'}, {'\\$1' '\\textbackslash{}'})
ans =
This\textbackslash{}is\textbackslash{}{not\textbackslash{}}LaTeX.
=> This\is\not\LaTeX.
Both results are unintended, and there seems to be no way around this with consecutive replacements instead of simultaneous ones.

Related

Replace every non letter or number character in a string with another

Context
I am designing a code that runs a bunch of calculations, and outputs figures. At the end of the code, I want to save everything in a nice way, so my take on this is to go to a user specified Output directory, create a new folder and then run the save process.
Question(s)
My question is twofold:
I want my folder name to be unique. I was thinking about getting the current date and time and creating a unique name from this and the input filename. This works but it generates folder names that are a bit cryptic. Is there some good practice / convention I have not heard of to do that?
When I get the datetime string (tn = datestr(now);), it looks like that:
tn =
'07-Jul-2022 09:28:54'
To convert it to a nice filename, i replace the '-',' ' and ':' characters by underscores and append it to a shorter version of the input filename chosen by the user. I do that using strrep:
tn = strrep(tn,'-','_');
tn = strrep(tn,' ','_');
tn = strrep(tn,':','_');
This is fine but it bugs me to have to use 3 lines of code to do so. Is there a nice one liner to do that? More generally, is there a way to look for every non letter or number character in a string and replace it with a given character? I bet that's what regexp is there for but frankly I can't quite get a hold on how regexps work.
Your point (1) is opinion based so you might get a variety of answers, but I think a common convention is to at least start the name with a reverse-order date string so that sorting alphabetically is the same as sorting chronologically (i.e. yymmddHHMMSS).
To answer your main question directly, you can use the built-in makeValidName utility which is designed for making valid variable names, but works for making similarly "plain" file names.
str = '07-Jul-2022 09:28:54';
str = matlab.lang.makeValidName(str)
% str = 'x07_Jul_202209_28_54'
Because a valid variable can't start with a number, it prefixes an x - you could avoid this by manually prefixing something more descriptive first.
This option is a bit more simple than working out the regex, although that would be another option which isn't too nasty here using regexprep and replacing non-alphanumeric chars with an underscore:
str = regexprep( str, '\W', '_' ); % \W (capital W) matches all non-alphanumeric chars
% str = '07_Jul_2022_09_28_54'
To answer indirectly with a different approach, a nice trick with datestr which gets around this issue and addresses point (1) in one hit is to use the following syntax:
str = datestr( now(), 30 );
% str = '20220707T094214'
The 30 input (from the docs) gives you an ISO standardised string to the nearest second in reverse-order:
'yyyymmddTHHMMSS' (ISO 8601)
(note the T in the middle isn't a placeholder for some time measurement, it remains a literal letter T to split the date and time parts).
I normally use your folder naming approach with a meaningful prefix, replacing ':' by something else:
folder_name = ['results_' strrep(datestr(now), ':', '.')];
As for your second question, you can use isstrprop:
folder_name(~isstrprop(folder_name, 'alphanum')) = '_';
Or if you want more control on the allowed characters you can use good old ismember:
folder_name(~ismember(folder_name, ['0':'9' 'a':'z' 'A':'Z'])) = '_';

Matlab - Help in listing files using a name-pattern

I'm trying to create a function that lists the content of a folder based on a pattern, however the listing includes more files than needed. I'll explain by an example: Consider a folder containing the files
file.dat
file.dat._
file.dat.000
file.dat.001
...
file.dat.999
I am interested only in the files that are .000, .001 and so on. The files file.dat and file.dat._ are to be excluded.
The later numbering can also be .0000,.0001 and so on, so number of digits is not necessarily 3.
I tried using the Dir command with the pattern file.dat.* - this included file.dat for some reason (Why the last comma treated differently?) and file.dat._, which was expected.
The "obvious" set of solutions is to add an additional regular expression or length check - however I would like to avoid that, if possible.
This needs to work both under UNIX and Windows (and preferably MacOS).
Any elegant solutions?
Get all filenames with dir and filter them using with the regex '^file\.dat\.\d+$'. This matches:
start of the string (^)
followed by the string file.dat. (file\.dat\.)
followed by one or more digits (\d+)
and then the string must end ($)
Since the output of dir is a cell array of char vectors, regex returns a cell array with the matching indices of each char vector. The matching indices can only be 1 or [], so any is applied to each cell's content to reduce it to true or false The resulting logical index tells which filenames should be kept.
f = dir('path/to/folder');
names = {f.name};
ind = cellfun(#any, regexp(names, '^file\.dat\.\d+$'));
names = names(ind);

Partial String Replacement using PowerShell

Problem
I am working on a script that has a user provide a specific IP address and I want to mask this IP in some fashion so that it isn't stored in the logs. My problem is, that I can easily do this when I know what the first three values of the IP typically are; however, I want to avoid storing/hard coding those values into the code to if at all possible. I also want to be able to replace the values even if the first three are unknown to me.
Examples:
10.11.12.50 would display as XX.XX.XX.50
10.12.11.23 would also display as XX.XX.XX.23
I have looked up partial string replacements, but none of the questions or problems that I found came close to doing this. I have tried doing things like:
# This ended up replacing all of the numbers
$tempString = $str -replace '[0-9]', 'X'
I know that I am partway there, but I aiming to only replace only the first 3 sets of digits so, basically every digit that is before a '.', but I haven't been able to achieve this.
Question
Is what I'm trying to do possible to achieve with PowerShell? Is there a best practice way of achieving this?
Here's an example of how you can accomplish this:
Get-Content 'File.txt' |
ForEach-Object { $_ = $_ -replace '\d{1,3}\.\d{1,3}\.\d{1,3}','xx.xx.xx' }
This example matches a digit 1-3 times, a literal period, and continues that pattern so it'll capture anything from 0-999.0-999.0-999 and replace with xx.xx.xx
TheIncorrigible1's helpful answer is an exact way of solving the problem (replacement only happens if 3 consecutive .-separated groups of 1-3 digits are matched.)
A looser, but shorter solution that replaces everything but the last .-prefixed digit group:
PS> '10.11.12.50' -replace '.+(?=\.\d+$)', 'XX.XX.XX'
XX.XX.XX.50
(?=\.\d+$) is a (positive) lookahead assertion ((?=...)) that matches the enclosed subexpression (a literal . followed by 1 or more digits (\d) at the end of the string ($)), but doesn't capture it as part of the overall match.
The net effect is that only what .+ captured - everything before the lookahead assertion's match - is replaced with 'XX.XX.XX'.
Applied to the above example input string, 10.11.12.50:
(?=\.\d+$) matches the .-prefixed digit group at the end, .50.
.+ matches everything before .50, which is 10.11.12.
Since the (?=...) part isn't captured, it is therefore not included in what is replaced, so it is only substring 10.11.12 that is replaced, namely with XX.XX.XX, yielding XX.XX.XX.50 as a result.

finding a comma in string

[23567,0,0,0,0,0] and other value is [452221,0,0,0,0,0] and the value should be contineously displaying about 100 values and then i want to display only the sensor value like in first sample 23567 and in second sample 452221 , only the these values have to display . For that I have written a code
value = str2double(str(2:7));see here my attempt
so I want to find the comma in the output and only display the value before first comma
As proposed in a comment by excaza, MATLAB has dedicated functions, such as sscanf for such purposes.
sscanf(str,'[%d')
which matches but ignores the first [, and returns the next (i.e. the first) number as a double variable, and not as a string.
Still, I like the idea of using regular expressions to match the numbers. Instead of matching all zeros and commas, and replacing them by '' as proposed by Sardar_Usama, I would suggest directly matching the numbers using regexp.
You can return all numbers in str (still as string!) with
nums = regexp(str,'\d*','match')
and convert the first number to a double variable with
str2double(nums{1})
To match only the first number in str, we can use the regexp
nums = regexp(str,'[(\d*),','tokens')
which finds a [, then takes an arbitrary number of decimals (0-9), and stops when it finds a ,. By enclosing the \d* in brackets, only the parts in brackets are returned, i.e. only the numbers without [ and ,.
Final Note: if you continue working with strings, you could/should consider the regexp solution. If you convert it to a double anyways, using sscanf is probably faster and easier.
You can use regexprep as follows:
str='[23567,0,0,0,0,0]' ;
required=regexprep(str(2:end-1),',0','')
%Taking str(2:end-1) to exclude brackets, and then removing all ,0
If there can be values other than 0 after , , you can use the following more general approach instead:
required=regexprep(str(2:end-1),',[-+]?\d*\.?\d*','')

Checking the format of a string in Matlab

So I'm reading multiple text files in Matlab that have, in their first columns, a column of "times". These times are either in the format 'MM:SS.milliseconds' (sorry if that's not the proper way to express it) where for example the string '29:59.9' would be (29*60)+(59)+(.9) = 1799.9 seconds, or in the format of straight seconds.milliseconds, where '29.9' would mean 29.9 seconds. The format is the same for a single file, but varies across different files. Since I would like the times to be in the second format, I would like to check if the format of the strings match the first format. If it doesn't match, then convert it, otherwise, continue. The code below is my code to convert, so my question is how do I approach checking the format of the string? In otherwords, I need some condition for an if statement to check if the format is wrong.
%% Modify the textdata to convert time to seconds
timearray = textdata(2:end, 1);
if (timearray(1, 1) %{has format 'MM.SS.millisecond}%)
datev = datevec(timearray);
newtime = (datev(:, 5)*60) + (datev(:, 6));
elseif(timearray(1, 1) %{has format 'SS.millisecond}%)
newtime = timearray;
You can use regular expressions to help you out. Regular expressions are methods of specifying how to search for particular patterns in strings. As such, you want to find if a string follows the formats of either:
xx:xx.x
or:
xx.x
The regular expression syntax for each of these is defined as the following:
^[0-9]+:[0-9]+\.[0-9]+
^[0-9]+\.[0-9]+
Let's step through how each of these work.
For the first one, the ^[0-9]+ means that the string should start with any number (^[0-9]) and the + means that there should be at least one number. As such, 1, 2, ... 10, ... 20, ... etc. is valid syntax for this beginning. After the number should be separated by a :, followed by another sequence of numbers of at least one or more. After, there is a . that separates them, then this is followed by another sequence of numbers. Notice how I used \. to specify the . character. Using . by itself means that the character is a wildcard. This is obviously not what you want, so if you want to specify the actual . character, you need to prepend a \ to the ..
For the second one, it's almost the same as the first one. However, there is no : delimiter, and we only have the . to work with.
To invoke regular expressions, use the regexp command in MATLAB. It is done using:
ind = regexp(str, expression);
str represents the string you want to check, and expression is a regular expression that we talked about above. You need to make sure you encapsulate your expression using single quotes. The regular expression is taken in as a string. ind would this return the starting index of your string of where the match was found. As such, when we search for a particular format, ind should either be 1 indicating that we found this search at the beginning of the string, or it returns empty ([]) if it didn't find a match. Here's a reproducible example for you:
B = {'29:59.9', '29.9', '45:56.8', '24.5'};
for k = 1 : numel(B)
if (regexp(B{k}, '^[0-9]+:[0-9]+\.[0-9]+') == 1)
disp('I''m the first case!');
elseif (regexp(B{k}, '^[0-9]+\.[0-9]+') == 1)
disp('I''m the second case!');
end
end
As such, the code should print out I'm the first case! if it follows the format of the first case, and it should print I'm the second case! if it follows the format of the second case. As such, by running this code, we get:
I'm the first case!
I'm the second case!
I'm the first case!
I'm the second case!
Without knowing how your strings are formatted, I can't do the rest of it for you, but this should be a good start for you.