find whether a string is substring of other string in SML NJ - smlnj

In SML NJ, I want to find whether a string is substring of another string and find its index. Can any one help me with this?

The Substring.position function is the only one I can find in the basis library that seems to do string search. Unfortunately, the Substring module is kind of hard to use, so I wrote the following function to use it. Just pass two strings, and it will return an option: NONE if not found, or SOME of the index if it is found:
fun index (str, substr) = let
val (pref, suff) = Substring.position substr (Substring.full str)
val (s, i, n) = Substring.base suff
in
if i = size str then
NONE
else
SOME i
end;

Well you have all the substring functions, however if you want to also know the position of it, then the easiest is to do it yourself, with a linear scan.
Basically you want to explode both strings, and then compare the first character of the substring you want to find, with each character of the source string, incrementing a position counter each time you fail. When you find a match you move to the next char in the substring as well without moving the position counter. If the substring is "empty" (modeled when you are left with the empty list) you have matched it all and you can return the position index, however if the matching suddenly fail you have to return back to when you had the first match and skip a letter (incrementing the position counter) and start all over again.
Hope this helps you get started on doing this yourself.

Related

Converting numbers into timestamps (inserting colons at specific places)

I'm using AutoHotkey for this as the code is the most understandable to me. So I have a document with numbers and text, for example like this
120344 text text text
234000 text text
and the desired output is
12:03:44 text text text
23:40:00 text text
I'm sure StrReplace can be used to insert the colons in, but I'm not sure how to specify the position of the colons or ask AHK to 'find' specific strings of 6 digit numbers. Before, I would have highlighted the text I want to apply StrReplace to and then press a hotkey, but I was wondering if there is a more efficient way to do this that doesn't need my interaction. Even just pointing to the relevant functions I would need to look into to do this would be helpful! Thanks so much, I'm still very new to programming.
hfontanez's answer was very helpful in figuring out that for this problem, I had to use a loop and substring function. I'm sure there are much less messy ways to write this code, but this is the final version of what worked for my purposes:
Loop, read, C:\[location of input file]
{
{ If A_LoopReadLine = ;
Continue ; this part is to ignore the blank lines in the file
}
{
one := A_LoopReadLine
x := SubStr(one, 1, 2)
y := SubStr(one, 3, 2)
z := SubStr(one, 5)
two := x . ":" . y . ":" . z
FileAppend, %two%`r`n, C:\[location of output file]
}
}
return
Assuming that the "timestamp" component is always 6 characters long and always at the beginning of the string, this solution should work just fine.
String test = "012345 test test test";
test = test.substring(0, 2) + ":" + test.substring(2, 4) + ":" + test.substring(4, test.length());
This outputs 01:23:45 test test test
Why? Because you are temporarily creating a String object that it's two characters long and then you insert the colon before taking the next pair. Lastly, you append the rest of the String and assign it to whichever String variable you want. Remember, the substring method doesn't modify the String object you are calling the method on. This method returns a "new" String object. Therefore, the variable test is unmodified until the assignment operation kicks in at the end.
Alternatively, you can use a StringBuilder and append each component like this:
StringBuilder sbuff = new StringBuilder();
sbuff.append(test.substring(0,2));
sbuff.append(":");
sbuff.append(test.substring(2,4));
sbuff.append(":");
sbuff.append(test.substring(4,test.length()));
test = sbuff.toString();
You could also use a "fancy" loop to do this, but I think for something this simple, looping is just overkill. Oh, I almost forgot, this should work with both of your test strings because after the last colon insert, the code takes the substring from index position 4 all the way to the end of the string indiscriminately.

Scala: Transforming List of Strings containing long descriptions to list of strings containing only last sentences

I have a List[String], for example:
val test=List("this is, an extremely long sentence. Check; But. I want this sentence.",
"Another. extremely. long. (for eg. description). But I want this sentence.",
..)
I want the result to be like:
List("I want this sentence", "But I want this sentence"..)
I tried few approaches but didn't work
test.map(x=>x.split(".").reverse.head)
test.map(x=>x.split(".").last)
Try using this
test.reverse.head.split("\\.").last
To handle any Exception
Try(List[String]().reverse.head.split("\\.").last).getOrElse("YOUR_DEFAULT_STRING")
You can map over you List, split each String and then take the last element. Try the below code.
val list = List("this is, an extremely long sentence. Check; But. I want this sentence.",
"Another. extremely. long. (for eg. description). But I want this sentence.")
list.map(_.split("\\.").last.trim)
It will give you
List(I want this sentence, But I want this sentence)
test.map (_.split("\\.").last)
Split takes a regular expression, and in such, the dot stands for every character, so you have to mask it.
Maybe you want to include question marks and bangs:
test.map (_.split("[!?.]").last)
and trim surrounding whitespace:
test.map (_.split("[!?.]").last.trim).
The reverse.head would have been a good idea, if there wasn't the last:
scala> test.map (_.split("[!?.]").reverse.head.trim)
res138: List[String] = List(I want this sentence, But I want this sentence)
You can do this a number of ways:
For each string in your original list: split by ., reverse the list, take the first value
test.map(_.split('.').reverse.headOption)
// List(Some( I want this sentence), Some( But I want this sentence))
.headOption results in Some("string") or None, and you can do something like a .getOrElse("no valid string found") on it. You can trim the unwanted whitespace if you want.
Regex match
test.map { sentence =>
val regex = ".*\\.\\s*([^.]*)\\.$".r
val regex(value) = sentence
value
}
This will fetch any string at the end of a longer string which is preceded by a full stop and a space and followed by a full stop. You can modify the regex to change the exact rules of the regex, and I recommend playing around with regex101.com if you fancy learning more regex. It's very good.
This solution is better for more complicated examples and requirements, but it's worth keeping in mind. If you are worried that the regex might not match, you can do something like checking if the regex matches before extracting it:
test.map { sentence =>
val regexString = ".*\\.\\s*([^.]*)\\.$"
val regex = regexString.r
if(sentence.matches(regexString)) {
val regex(value) = sentence
value
} else ""
}
Take the last after splitting the string by .
test.map(_.split('.').map(_.trim).lastOption)

How to strip everything except digits from a string in Scala (quick one liners)

This is driving me nuts... there must be a way to strip out all non-digit characters (or perform other simple filtering) in a String.
Example: I want to turn a phone number ("+72 (93) 2342-7772" or "+1 310-777-2341") into a simple numeric String (not an Int), such as "729323427772" or "13107772341".
I tried "[\\d]+".r.findAllIn(phoneNumber) which returns an Iteratee and then I would have to recombine them into a String somehow... seems horribly wasteful.
I also came up with: phoneNumber.filter("0123456789".contains(_)) but that becomes tedious for other situations. For instance, removing all punctuation... I'm really after something that works with a regular expression so it has wider application than just filtering out digits.
Anyone have a fancy Scala one-liner for this that is more direct?
You can use filter, treating the string as a character sequence and testing the character with isDigit:
"+72 (93) 2342-7772".filter(_.isDigit) // res0: String = 729323427772
You can use replaceAll and Regex.
"+72 (93) 2342-7772".replaceAll("[^0-9]", "") // res1: String = 729323427772
Another approach, define the collection of valid characters, in this case
val d = '0' to '9'
and so for val a = "+72 (93) 2342-7772", filter on collection inclusion for instance with either of these,
for (c <- a if d.contains(c)) yield c
a.filter(d.contains)
a.collect{ case c if d.contains(c) => c }

Capitalizing only the first letters without changing any numbers or punctuation

I would like to modify a string that will have make the first letter capitalized and all other letters lower cased, and anything else will be unchanged.
I tried this:
function new_string=switchCase(str1)
%str1 represents the given string containing word or phrase
str1Lower=lower(str1);
spaces=str1Lower==' ';
caps1=[true spaces];
%we want the first letter and the letters after space to be capital.
strNew1=str1Lower;
strNew1(caps1)=strNew1(caps1)-32;
end
This function works nicely if there is nothing other than a letter after space. If we have anything else for example:
str1='WOW ! my ~Code~ Works !!'
Then it gives
new_string =
'Wow My ^code~ Works !'
However, it has to give (according to the requirement),
new_string =
'Wow! My ~code~ Works !'
I found a code which has similarity with this problem. However, that is ambiguous. Here I can ask question if I don't understand.
Any help will be appreciated! Thanks.
Interesting question +1.
I think the following should fulfil your requirements. I've written it as an example sub-routine and broken down each step so it is obvious what I'm doing. It should be straightforward to condense it into a function from here.
Note, there is probably also a clever way to do this with a single regular expression, but I'm not very good with regular expressions :-) I doubt a regular expression based solution will run much faster than what I've provided (but am happy to be proven wrong).
%# Your example string
Str1 ='WOW ! my ~Code~ Works !!';
%# Convert case to lower
Str1 = lower(Str1);
%# Convert to ascii
Str1 = double(Str1);
%# Find an index of all locations after spaces
I1 = logical([0, (Str1(1:end-1) == 32)]);
%# Eliminate locations that don't contain lower-case characters
I1 = logical(I1 .* ((Str1 >= 97) & (Str1 <= 122)));
%# Check manually if the first location contains a lower-case character
if Str1(1) >= 97 && Str1(1) <= 122; I1(1) = true; end;
%# Adjust all appropriate characters in ascii form
Str1(I1) = Str1(I1) - 32;
%# Convert result back to a string
Str1 = char(Str1);

ignore spaces and cases MATLAB

diary_file = tempname();
diary(diary_file);
myFun();
diary('off');
output = fileread(diary_file);
I would like to search a string from output, but also to ignore spaces and upper/lower cases. Here is an example for what's in output:
the test : passed
number : 4
found = 'thetest:passed'
a = strfind(output,found )
How could I ignore spaces and cases from output?
Assuming you are not too worried about accidentally matching something like: 'thetEst:passed' here is what you can do:
Remove all spaces and only compare lower case
found = 'With spaces'
found = lower(found(found ~= ' '))
This will return
found =
withspaces
Of course you would also need to do this with each line of output.
Another way:
regexpi(output(~isspace(output)), found, 'match')
if output is a single string, or
regexpi(regexprep(output,'\s',''), found, 'match')
for the more general case (either class(output) == 'cell' or 'char').
Advantages:
Fast.
robust (ALL whitespace (not just spaces) is removed)
more flexible (you can return starting/ending indices of the match, tokenize, etc.)
will return original case of the match in output
Disadvantages:
more typing
less obvious (more documentation required)
will return original case of the match in output (yes, there's two sides to that coin)
That last point in both lists is easily forced to lower or uppercase using lower() or upper(), but if you want same-case, it's a bit more involved:
C = regexpi(output(~isspace(output)), found, 'match');
if ~isempty(C)
C = found; end
for single string, or
C = regexpi(regexprep(output, '\s', ''), found, 'match')
C(~cellfun('isempty', C)) = {found}
for the more general case.
You can use lower to convert everything to lowercase to solve your case problem. However ignoring whitespace like you want is a little trickier. It looks like you want to keep some spaces but not all, in which case you should split the string by whitespace and compare substrings piecemeal.
I'd advertise using regex, e.g. like this:
a = regexpi(output, 'the\s*test\s*:\s*passed');
If you don't care about the position where the match occurs but only if there's a match at all, removing all whitespaces would be a brute force, and somewhat nasty, possibility:
a = strfind(strrrep(output, ' ',''), found);