How can I get a substring of a string in Emacs Lisp? - emacs

When I have a string like "Test.m", how can I get just the substring "Test" from that via elisp? I'm trying to use this in my .emacs file.

One way is to use substring (or substring-no-properties):
(substring "Test.m" 0 -2) => "Test"
(substring STRING FROM &optional TO )
Return a new string whose contents are a substring of STRING. The
returned string consists of the characters between index FROM
(inclusive) and index TO (exclusive) of STRING. FROM and TO are
zero-indexed: 0 means the first character of STRING. Negative values
are counted from the end of STRING. If TO is nil, the substring runs
to the end of STRING.

Stefan's answer is idiomatic, when you just need a filename without extension. However, if you manipulate files and filepaths heavily in your code, i recommend installing Johan Andersson's f.el file and directory API, because it provides many functions absent in Emacs with a consistent API. Check out functions f-base and f-no-ext:
(f-base "~/doc/index.org") ; => "index"
(f-no-ext "~/doc/index.org") ; => "~/doc/index"
If, instead, you work with strings often, install Magnar Sveen's s.el for the same reasons. You might be interested in s-chop-suffix:
(s-chop-suffix ".org" "~/doc/index.org") ; => "~/doc/index"
For generic substring retrieval use dkim's answer.

In your particular case, you might like to use file-name-sans-extension.

Probably the most flexible option (although it's not clear if you need flexibility) would be to use replace-regexp-in-string:
See C-hf replace-regexp-in-string RET
e.g.:
(replace-regexp-in-string "\\..*" "" "Test.m")

Related

Regsubbing simple matches

I'm looking for a regsub example that does the following:
123tcl456TCL789 => 123!tcl!456!TCL!789
This is an Tcl example => This is an !Tcl! example
Yes, I could use string first to find a position and mash things but I saw in past a regsub command that does what I want but I can't recall. What would be the regsub command that allows that? I would guess regsub -all -nocase is a start.
I am bad at regsub and regexps. I wonder if there is a site or tool/script that we can supply a string, the final result and then we get the regsub form.
You're looking at the right tool, but there are various options, depending on exactly what the conditions are when faced with other text. Here's one that wraps each occurrence of "Tcl" (any capitalisation) with exclamation marks:
set inputString "123tcl456TCL789"
set replaced [regsub -all -nocase {tcl} $inputString {!&!}]
puts $replaced
That's using a very simple regular expression with the -nocase option, and the replacement means "put ! on either side of the substring matched".
Another (more generally applicable... perhaps) might be to put ! after any letter or number sequence that is followed by a number or letter.
set replaced [regsub -all {[A-Za-z]+(?=[0-9])|[0-9]+(?=[A-Za-z])} $inputString {&!}]
Note that doing things correctly typically requires understanding the real input data fairly well. For example, whether the numbers include floating point numbers in scientific notation, or whether the substrings to delimit are of fixed length.

Replace every non letter or number character in a string with another

Context
I am designing a code that runs a bunch of calculations, and outputs figures. At the end of the code, I want to save everything in a nice way, so my take on this is to go to a user specified Output directory, create a new folder and then run the save process.
Question(s)
My question is twofold:
I want my folder name to be unique. I was thinking about getting the current date and time and creating a unique name from this and the input filename. This works but it generates folder names that are a bit cryptic. Is there some good practice / convention I have not heard of to do that?
When I get the datetime string (tn = datestr(now);), it looks like that:
tn =
'07-Jul-2022 09:28:54'
To convert it to a nice filename, i replace the '-',' ' and ':' characters by underscores and append it to a shorter version of the input filename chosen by the user. I do that using strrep:
tn = strrep(tn,'-','_');
tn = strrep(tn,' ','_');
tn = strrep(tn,':','_');
This is fine but it bugs me to have to use 3 lines of code to do so. Is there a nice one liner to do that? More generally, is there a way to look for every non letter or number character in a string and replace it with a given character? I bet that's what regexp is there for but frankly I can't quite get a hold on how regexps work.
Your point (1) is opinion based so you might get a variety of answers, but I think a common convention is to at least start the name with a reverse-order date string so that sorting alphabetically is the same as sorting chronologically (i.e. yymmddHHMMSS).
To answer your main question directly, you can use the built-in makeValidName utility which is designed for making valid variable names, but works for making similarly "plain" file names.
str = '07-Jul-2022 09:28:54';
str = matlab.lang.makeValidName(str)
% str = 'x07_Jul_202209_28_54'
Because a valid variable can't start with a number, it prefixes an x - you could avoid this by manually prefixing something more descriptive first.
This option is a bit more simple than working out the regex, although that would be another option which isn't too nasty here using regexprep and replacing non-alphanumeric chars with an underscore:
str = regexprep( str, '\W', '_' ); % \W (capital W) matches all non-alphanumeric chars
% str = '07_Jul_2022_09_28_54'
To answer indirectly with a different approach, a nice trick with datestr which gets around this issue and addresses point (1) in one hit is to use the following syntax:
str = datestr( now(), 30 );
% str = '20220707T094214'
The 30 input (from the docs) gives you an ISO standardised string to the nearest second in reverse-order:
'yyyymmddTHHMMSS' (ISO 8601)
(note the T in the middle isn't a placeholder for some time measurement, it remains a literal letter T to split the date and time parts).
I normally use your folder naming approach with a meaningful prefix, replacing ':' by something else:
folder_name = ['results_' strrep(datestr(now), ':', '.')];
As for your second question, you can use isstrprop:
folder_name(~isstrprop(folder_name, 'alphanum')) = '_';
Or if you want more control on the allowed characters you can use good old ismember:
folder_name(~ismember(folder_name, ['0':'9' 'a':'z' 'A':'Z'])) = '_';

How to sort unicode strings alphabetically in Common Lisp?

This:
(sort '("Aaa" "Ééé" "Zzz") #'string-lessp)
;; ("Aaa" "Zzz" "Ééé")
is not satisfying, because "Ééé" should come before "Zzz".
How can we sort unicode strings alphabetically?
My current approach has been to create a copy of the strings, replace accentuated letters by their counterpart (with cl-slug:asciify, that calls ppcre:regexp-replace-all), sort this and display the original string back.
Thanks.
If you use SBCL, you have integrated support for unicode.
String operations
Try to sort with unicode< instead of string-lessp.

Warning Control Character '\S' is not valid when concatinating two strings

I have two variables such as:
path='data\voc11\SegmentationClassExt\%s.png'
name='123'
I want to concatenate two strings into one like so:
data\voc11\SegmentationClassExt\123.png
I used the code below:
sprintf(path, name)
However I receive the following error:
Warning: Control Character '\S' is not valid. See 'doc sprintf' for control characters valid in the format string.
ans =
dataoc11
I am using MATLAB on Windows. Could you give me any solution for that. I tried to change path='data\\voc11\\SegmentationClassExt\\%s.png' and when I did that, the above code will work. However, the current data is
path='data\voc11\SegmentationClassExt\%s.png';
use the matlab function fullfile
filename = fullfile ( path, [name '.png'] );
or
filename = fullfile ( path, sprintf ( '%s.png', name ) );
Note: you should avoid using path as a variable as it is already a Matlab function
Before we start, it's highly advised that you do not use path as a local variable. path is a global variable that MATLAB uses to resolve function scope, especially if you are going to use any functions from toolboxes. Overwriting path with your own string will actually make MATLAB not function properly. Use a different variable name.
Now to resolve your problem, you can use either fullfile as what #matlabgui has suggested, or if you don't care about OS compatibility and are only working in Windows, you can either manually change the path as you have placed so that you can introduce two back slashes and it will indeed work on Windows OS, or you can perhaps use a string replace function so that all back slashes will be accompanied with an additional back slash.
Either one of these two methods will work:
Method 1 - Using regular expressions
pat = 'data\voc11\SegmentationClassExt\%s.png';
pat_new = regexprep(pat, '\\', '\\\\');
The function regexprep performs a string replacement by regular expressions. We search for all single backslashes and replace them with double backslashes. Note that the single back slash \ is a special character in regular expressions so if you explicitly what to look for back slashes, you must place an additional back slash beside it.
Method 2 - Using strrep
pat = 'data\voc11\SegmentationClassExt\%s.png';
pat_new = strrep(pat, '\', '\\');
strrep stands for String Replace. It works very similar to regular expressions as we have discussed above. However, what's nice is that you don't have to append an additional back slash when looking for the actual character.
Once you do this, you can use sprintf as normal:
pat_new = sprintf(pat_new, name);

matlab regexprep

How to use matlab regexprep , for multiple expression and replacements?
file='http:xxx/sys/tags/Rel/total';
I want to replace 'sys' with sys1 and 'total' with 'total1'. For a single expression a replacement it works like this:
strrep(file,'sys', 'sys1')
and want to have like
strrep(file,'sys','sys1','total','total1') .
I know this doesn't work for strrep
Why not just issue the command twice?
file = 'http:xxx/sys/tags/Rel/total';
file = strrep(file,'sys','sys1')
strrep(file,'total','total1')
To solve it you need substitute functionality with regex, try to find in matlab's regexes something similar to this in php:
$string = 'http:xxx/sys/tags/Rel/total';
preg_replace('/http:(.*?)\//', 'http:${1}1/', $string);
${1} means 1st match group, that is what in parenthesis, (.*?).
http:(.*?)\/ - match pattern
http:${1}1/ - replace pattern with second 1 as you wish to add (first 1 is a group number)
http:xxx/sys/tags/Rel/total - input string
The secret is that whatever is matched by (.*?) (whether xxx or yyyy or 1234) will be inserted instead of ${1} in replace pattern, and then replace instead of old stuff into the input string. Welcome to see more examples on substitute functionality in php.
As documented in the help page for regexprep, you can specify pairs of patterns and replacements like this:
file='http:xxx/sys/tags/Rel/total';
regexprep(file, {'sys' 'total'}, {'sys1' 'total1'})
ans =
http:xxx/sys1/tags/Rel/total1
It is even possible to use tokens, should you be able to define a match pattern for everything you want to replace:
regexprep(file, '/([st][yo][^/$]*)', '/$11')
ans =
http:xxx/sys1/tags/Rel/total1
However, care must be taken with the first approach under certain circumstances, because MATLAB replaces the pairs one after another. That is to say if, say, the first pattern matches a string and replaces it with something that is subsequently matched by a later pattern, then that will also be replaced by the later replacement, even though it might not have matched the later pattern in the original string.
Example:
regexprep('This\is{not}LaTeX.', {'\\' '([{}])'}, {'\\textbackslash{}' '\\$1'})
ans =
This\textbackslash\{\}is\{not\}LaTeX.
=> This\{}is{not}LaTeX.
and
regexprep('This\is{not}LaTeX.', {'([{}])' '\\'}, {'\\$1' '\\textbackslash{}'})
ans =
This\textbackslash{}is\textbackslash{}{not\textbackslash{}}LaTeX.
=> This\is\not\LaTeX.
Both results are unintended, and there seems to be no way around this with consecutive replacements instead of simultaneous ones.