Sas9 Replace special characters with underscores - special-characters

In Sas9, how can I replace all the , \ / or spaces, and other special characters of my choosing with underscores? A solution either in a datastep or in macro functions would do the trick, I'm just looking for a method to do it.
Thanks

You can use the Perl regular expression functionality built into SAS.
data tmp;
set tmp;
var1 = prxchange('s/[,\/\\]/_/', -1, var);
run;
or something similar.

The translate function might be what you're looking for
field2 = translate(trim(field_name),'_______',' ,.\/()')
Make sure to have as many underscores as you have special characters. Also, because you're translating spaces, you have to use the trim function or else you'll get a bunch of underscores after the name.

Related

Replace every non letter or number character in a string with another

Context
I am designing a code that runs a bunch of calculations, and outputs figures. At the end of the code, I want to save everything in a nice way, so my take on this is to go to a user specified Output directory, create a new folder and then run the save process.
Question(s)
My question is twofold:
I want my folder name to be unique. I was thinking about getting the current date and time and creating a unique name from this and the input filename. This works but it generates folder names that are a bit cryptic. Is there some good practice / convention I have not heard of to do that?
When I get the datetime string (tn = datestr(now);), it looks like that:
tn =
'07-Jul-2022 09:28:54'
To convert it to a nice filename, i replace the '-',' ' and ':' characters by underscores and append it to a shorter version of the input filename chosen by the user. I do that using strrep:
tn = strrep(tn,'-','_');
tn = strrep(tn,' ','_');
tn = strrep(tn,':','_');
This is fine but it bugs me to have to use 3 lines of code to do so. Is there a nice one liner to do that? More generally, is there a way to look for every non letter or number character in a string and replace it with a given character? I bet that's what regexp is there for but frankly I can't quite get a hold on how regexps work.
Your point (1) is opinion based so you might get a variety of answers, but I think a common convention is to at least start the name with a reverse-order date string so that sorting alphabetically is the same as sorting chronologically (i.e. yymmddHHMMSS).
To answer your main question directly, you can use the built-in makeValidName utility which is designed for making valid variable names, but works for making similarly "plain" file names.
str = '07-Jul-2022 09:28:54';
str = matlab.lang.makeValidName(str)
% str = 'x07_Jul_202209_28_54'
Because a valid variable can't start with a number, it prefixes an x - you could avoid this by manually prefixing something more descriptive first.
This option is a bit more simple than working out the regex, although that would be another option which isn't too nasty here using regexprep and replacing non-alphanumeric chars with an underscore:
str = regexprep( str, '\W', '_' ); % \W (capital W) matches all non-alphanumeric chars
% str = '07_Jul_2022_09_28_54'
To answer indirectly with a different approach, a nice trick with datestr which gets around this issue and addresses point (1) in one hit is to use the following syntax:
str = datestr( now(), 30 );
% str = '20220707T094214'
The 30 input (from the docs) gives you an ISO standardised string to the nearest second in reverse-order:
'yyyymmddTHHMMSS' (ISO 8601)
(note the T in the middle isn't a placeholder for some time measurement, it remains a literal letter T to split the date and time parts).
I normally use your folder naming approach with a meaningful prefix, replacing ':' by something else:
folder_name = ['results_' strrep(datestr(now), ':', '.')];
As for your second question, you can use isstrprop:
folder_name(~isstrprop(folder_name, 'alphanum')) = '_';
Or if you want more control on the allowed characters you can use good old ismember:
folder_name(~ismember(folder_name, ['0':'9' 'a':'z' 'A':'Z'])) = '_';

Strip out the characters which is non numeric, dashes and pipes

I am trying to find a solution but somehow i am getting wrong output (referred some online solutions and confusing myself. please advise where i am going wrong.
I need to Strip out any characters that is non-numeric,dash "-" or pipe "|" using plsql.
As an example:
if I need to filter the string 0094-78556232_imk*.ext|4444; the output should be 0094-78556232|4444
Use REGEXP_REPLACE:
SELECT
col,
REGEXP_REPLACE (col, '[^0-9|-]', '') AS col_updated
FROM yourTable;
Demo
Don't use regexp_replace, especially if performance is important.
Instead use the standard string function TRANSLATE. Like so:
select col,
translate(col, '0123456789|-' || col, '01234567890|-') as col_updated
from yourTable;
This translates each character in the col value, according to the following scheme: 0 is translated to itself, ...., - is translated to itself. Any other character in col, which is not in this list already, is "translated" to nothing, since there is nothing for it to be translated to in the third argument to the function. So those characters that are NOT on the list are simply removed from the string.

How to replace content within quotes via a file

Why I cannot use str_replace() to replace content between the ""? While I replace links within a file they get skipped since they are within quotes.
Example.
href="/path/to/file/is/here"
should be
href="/New/Path/To/File/Goes/Here"
If the paths/urls were not in quotes, str_replace() would work.
I'm assuming this is PHP. So, from the examples here:
http://php.net/manual/en/function.str-replace.php
You can see that you should not intercalate the same type of quotes.
So try changing the quotes in your code to single quotes or, change, the double quotes in your html to single quotes.
If that's not it, I hope at least that doc reference helps you.
This might help I usually code in java but php is pretty similar. Next time input part of your code so that the community can see your logic.
In your if statement on line 67 the 3rd variable $stringToSearch should be regex not the string your assigning it to. The purpose of regex as you know is to replace characters you don't want in your code as you already know
What you had that was not working:
// replacing string from files
//$stringToSearch = str_replace('"', "!!", $stringToSearch);
$stringToSearch = str_replace($toBeReplaced, $toBeReplacedWith, $stringToSearch);
//$stringToSearch = str_replace("!!", '"', $stringToSearch);
What I am thinking it should be:
$stringToRegex = str_replace('"', "!!", $stringToSearch);
$stringToSearch = str_replace($toBeReplaced, $toBeReplacedWith, $stringToRegex );
If anyone else has any suggestion it would be appreciated as i don't code in php.

Warning Control Character '\S' is not valid when concatinating two strings

I have two variables such as:
path='data\voc11\SegmentationClassExt\%s.png'
name='123'
I want to concatenate two strings into one like so:
data\voc11\SegmentationClassExt\123.png
I used the code below:
sprintf(path, name)
However I receive the following error:
Warning: Control Character '\S' is not valid. See 'doc sprintf' for control characters valid in the format string.
ans =
dataoc11
I am using MATLAB on Windows. Could you give me any solution for that. I tried to change path='data\\voc11\\SegmentationClassExt\\%s.png' and when I did that, the above code will work. However, the current data is
path='data\voc11\SegmentationClassExt\%s.png';
use the matlab function fullfile
filename = fullfile ( path, [name '.png'] );
or
filename = fullfile ( path, sprintf ( '%s.png', name ) );
Note: you should avoid using path as a variable as it is already a Matlab function
Before we start, it's highly advised that you do not use path as a local variable. path is a global variable that MATLAB uses to resolve function scope, especially if you are going to use any functions from toolboxes. Overwriting path with your own string will actually make MATLAB not function properly. Use a different variable name.
Now to resolve your problem, you can use either fullfile as what #matlabgui has suggested, or if you don't care about OS compatibility and are only working in Windows, you can either manually change the path as you have placed so that you can introduce two back slashes and it will indeed work on Windows OS, or you can perhaps use a string replace function so that all back slashes will be accompanied with an additional back slash.
Either one of these two methods will work:
Method 1 - Using regular expressions
pat = 'data\voc11\SegmentationClassExt\%s.png';
pat_new = regexprep(pat, '\\', '\\\\');
The function regexprep performs a string replacement by regular expressions. We search for all single backslashes and replace them with double backslashes. Note that the single back slash \ is a special character in regular expressions so if you explicitly what to look for back slashes, you must place an additional back slash beside it.
Method 2 - Using strrep
pat = 'data\voc11\SegmentationClassExt\%s.png';
pat_new = strrep(pat, '\', '\\');
strrep stands for String Replace. It works very similar to regular expressions as we have discussed above. However, what's nice is that you don't have to append an additional back slash when looking for the actual character.
Once you do this, you can use sprintf as normal:
pat_new = sprintf(pat_new, name);

T-SQL syntax issue with "LTRIM(RTRIM())" not working correctly

What is wrong with this statement that it is still giving me spaces after the field. This makes me think that the syntax combining the WHEN statements is off. My boss wants them combined in one statement. What am I doing wrong?
Case WHEN LTRIM(RTRIM(cSHortName))= '' Then NULL
WHEN cShortname is NOT NULL THEN
REPLACE (cShortName,SUBSTRING,(cShortName,PATINDEX('%A-Za-z0-9""},1,) ''_
end AS SHORT_NAME
Judging from the code, it seems that you may be trying to strip spaces and non-alphanumeric characters from the beginning and ending of the string.
If so, would this work for you?
I think it provides the substring from the first alphanumeric occurrence to the last.
SELECT
SUBSTRING(
cShortName,
PATINDEX('%A-Za-z0-9',cShortName),
( LEN(cShortName)
-PATINDEX('%A-Za-z0-9',REVERSE(cShortName))
-PATINDEX('%A-Za-z0-9',cShortName)
)
) AS SHORTNAME
Replace TRIM with LTRIM.
You can also test LEN(cShortName) = 0
Ummm there seems to be some problems in this script, but try this.
Case
WHEN LTRIM(RTRIM(cSHortName))= '' Then NULL
WHEN cShortname is NOT NULL THEN REPLACE(cShortName, SUBSTRING(cShortName, PATINDEX('%A-Za-z0-9', 1) , ''), '')
end AS SHORT_NAME
Why do you think it is supposed not to give you spaces after the field?
Edit:
As far as I understand you are trying to remove any characters from the string that do not match this regular expression range [a-zA-Z0-9] (add any other characters that you want to preserve).
I see no clean way to do that in Microsoft SQL Server (you are using Microsoft SQL Server it seems) using the built-in functions. There are some examples on the web that use a temporary table and a while loop, but this is ugly. I would either return the strings as is and process them on the caller side, or write a function that does that using the CLR and invoke it from the select statement.
I hope this helps.