Adding new line to NSCharacterSet - swift

I want to strip a string of all new lines and commas (and place it into an array), so I created this:
let results = text.componentsSeparatedByCharactersInSet(NSCharacterSet(charactersInString: ",\n"))
However, the newlines are still existing in my array (the commas are being removed). What's the correct way of adding newline to the NSCharacterSet? Or, how to add comma to NSCharacterSet.newLineCharacterSet.
Thanks.
Here is janky solution, but still looking for a more elegant one.
var results = text.componentsSeparatedByCharactersInSet(NSCharacterSet(charactersInString: ","))
text = results.joinWithSeparator(" ")
results = text.componentsSeparatedByCharactersInSet(NSCharacterSet.whitespaceAndNewlineCharacterSet())
(one-line) SOLUTION:
var results = text.componentsSeparatedByCharactersInSet(NSCharacterSet(charactersInString: " ,\u{000A}\u{000B}\u{000C}\u{000D}\u{0085}"))
Explanation is below.

You can unite two NSCharacterSet by first using an NSMutableCharacterSet, for example:
let charset = NSMutableCharacterSet(charactersInString: ",")
charset.formUnionWithCharacterSet(NSCharacterSet.newlineCharacterSet())
let results = text.componentsSeparatedByCharactersInSet(charset)

So MartinR brought to my attention that there are more line feeds than just "\n".
I looked at the values used in NSCharacterSet.newlineCharacterSet and added them all, giving me:
var results = text.componentsSeparatedByCharactersInSet(NSCharacterSet(charactersInString: " ,\u{000A}\u{000B}\u{000C}\u{000D}\u{0085}"))
This got rid of all the whitespace, commas, and new lines. Interestingly - when I used all the newline values separately to see if I could figure out which newline was being used in my case, none of them worked. But when used all together, it strips my new lines.

Related

Is it possible to build a list from a carriage return separated string?

Background
I have the following string:
var MyString = 'Test1⏎Test2⏎Test3⏎Test4'
⏎ = line feed = \n
What I'm trying to do
I want to create a List which is a list of lines. Basically every item that is followed by a \n would become an entry in the list.
I want the base string MyString to become shortened to reflect what pieces of the string have been moved to the List
The reason I want to leave a residual MyString is that new data might come in later that might be considered part of the same line, so I do not want to commit the data to the List until there is a carriage return seen
What the result of all this would be
So in my above example, only Test1 Test2 Test3 are followed by \n but not Test4
Output List would be: [Test1, Test2, Test3]
MyString would become: Test4
What I've tried and failed with
I tried using LineSplitter but it seems to want to take Test4 as a separate entry as well
final lines = const LineSplitter().convert(MyString);
for (final daLine in lines) {
MyList.add(daLine);
}
And it creates [Test1, Test2, Test3, Test4]
A solution would be to just .removeLast() on the list that you split.
String text = 'Test1\nTest2\nTest3\nTest4';
List<String> list = text.split('\n');
text = list.removeLast();
print(list); // [Test1, Test2, Test3]
print(text); // Test4
To me you are combining two questions. Every language I know has built-in ways to split a string on a char, including newline chars. The distinct thing you want is a split function that doesn't include the last entry.
You may be combining your answers as well :) Is there some resource constraint or streamed input that prevents you from just building the list, then popping off the final entry?
If yes:
I think you have to build your own split. Look at the implementation code for LineSplitter(), and make something similar except which leaves the final entry.
If no:
simply call
MyString = MyList.removeLast();
after your for-loop.

How to get non-escaped apostrophe from .components(separatedBy: CharacterSet)

How I can get components(separatedBy: CharacterSet) to return the substrings so that they do not contain escaped apostrophes or single quotes?
When I print the resulting array, I want it to not include the backslash character.
I am using a playground to manipulate text and produce output in the terminal that I can copy and use outside of Xcode, so I want to strip the escape character from the string representation produced in the terminal output.
var str = "can't,,, won't, , good-bye, Santa Claus"
var delimiters = CharacterSet.letters.inverted.subtracting(.whitespaces)
delimiters = delimiters.subtracting(CharacterSet(charactersIn: "-"))
delimiters = delimiters.subtracting(CharacterSet(charactersIn: "'"))
var result = str.components(separatedBy: delimiters)
.map({ $0.trimmingCharacters(in: .whitespaces) })
.filter({ !$0.isEmpty })
print(result) // ["can\'t", "won\'t", "good-bye", "Santa Claus"]
What you are asking for is a metaphysical impossibility. You cannot want anything about how print prints. It's only a representation in the log.
Your strings do not actually contain any backslashes, so what's the problem? How the print command output notates them is irrelevant. You might as well "want" the print command to translate your strings into French. No, that's not what it does. It just prints, and the way it prints is the way it prints.
Another way to look at it: An array doesn't contain square brackets at both ends. And a string doesn't contain double-quotes at both ends. Those are things you might write in order express those things as literals, but they are not real as part of the actual object. Well, I don't see you objecting to those!
Basically, if you want to control the output of something, you write an output routine. If you're doing to rely on print, just accept the funny old way it writes stuff and move on.

ignore spaces and cases MATLAB

diary_file = tempname();
diary(diary_file);
myFun();
diary('off');
output = fileread(diary_file);
I would like to search a string from output, but also to ignore spaces and upper/lower cases. Here is an example for what's in output:
the test : passed
number : 4
found = 'thetest:passed'
a = strfind(output,found )
How could I ignore spaces and cases from output?
Assuming you are not too worried about accidentally matching something like: 'thetEst:passed' here is what you can do:
Remove all spaces and only compare lower case
found = 'With spaces'
found = lower(found(found ~= ' '))
This will return
found =
withspaces
Of course you would also need to do this with each line of output.
Another way:
regexpi(output(~isspace(output)), found, 'match')
if output is a single string, or
regexpi(regexprep(output,'\s',''), found, 'match')
for the more general case (either class(output) == 'cell' or 'char').
Advantages:
Fast.
robust (ALL whitespace (not just spaces) is removed)
more flexible (you can return starting/ending indices of the match, tokenize, etc.)
will return original case of the match in output
Disadvantages:
more typing
less obvious (more documentation required)
will return original case of the match in output (yes, there's two sides to that coin)
That last point in both lists is easily forced to lower or uppercase using lower() or upper(), but if you want same-case, it's a bit more involved:
C = regexpi(output(~isspace(output)), found, 'match');
if ~isempty(C)
C = found; end
for single string, or
C = regexpi(regexprep(output, '\s', ''), found, 'match')
C(~cellfun('isempty', C)) = {found}
for the more general case.
You can use lower to convert everything to lowercase to solve your case problem. However ignoring whitespace like you want is a little trickier. It looks like you want to keep some spaces but not all, in which case you should split the string by whitespace and compare substrings piecemeal.
I'd advertise using regex, e.g. like this:
a = regexpi(output, 'the\s*test\s*:\s*passed');
If you don't care about the position where the match occurs but only if there's a match at all, removing all whitespaces would be a brute force, and somewhat nasty, possibility:
a = strfind(strrrep(output, ' ',''), found);

Powershell - Capture text in a var from a specific character

I want to grab the first char of a var string and the first char of the following caracter
Example:
$var1 = "Jean-Martin"
I want a way to grab the first letter "J" then I want to take the first char following the "-" (dash) which is "M".
Something like this?
$initial1 = $var1[0]
$initial2 = $var1.Split('-')[1][0]
Strings in Powershell use the System.String class from the .Net framework. As such, they are indexable to retrieve individual characters and have many methods available such as the Split method used above.
See the documentation here.
$var1 = "Jean-Martin"
To get the first character:
$var1[0]
To get the first character after the dash:
$characterToSeek = '-'
$var1[$var1.IndexOf($characterToSeek)+1]
Another option using regex:
PS> $var1 -replace '^(.)[^-]+-(.).+$','$1$2'
JM

Regexp to find a matching condition in a string

Hi need help in using regexp for condition matching.
ex.my file has the following content
{hello.program='function'`;
bye.program='script'; }
I am trying to use regexp to match the string that has .program='function' in them:
pattern = '[.program]+\=(function)'
also tried pattern='[^\n]*(.hello=function)[^\n]*';
pattern_match = regexp(myfilename,pattern , 'match')
but this returns me pattern_match={} while i expect the result to be hello.program='function'`;
If 'function' comes with string-markers, you need to include these in the match. Also, you need to escape the dot (otherwise, it's considered "any character"). [.program]+ looks for one or several letters contained in the square brackets - but you can just look for program instead. Also, you don't need to escape the =-sign (which is probably what messed up the match).
cst = {'hello.program=''function''';'bye.program=''script'''; };
pat = 'hello\.program=''function''';
out = regexp(cst,pat,'match');
out{1}{1} %# first string from list, first match
hello.program='function'
EDIT
In response to the comment
my file contains
m2 = S.Parameter;
m2.Value = matlabstandard;
m2.Volatility = 'Tunable';
m2.Configurability = 'None';
m2.ReferenceInterfaceFile ='';
m2.DataType = 'auto';
my objective is to find all the lines that match, .DataType='auto'
Here's how you find the matching lines with regexp
%# read the file with textscan into a variable txt
fid = fopen('myFile.m');
txt = textscan(fid,'%s');
fclose(fid);
txt = txt{1};
%# find the match. Allow spaces and equal signs between DataType and 'auto'
match = regexp(txt,'\.DataType[ =]+''auto''','match')
%# match is not empty only if a match was found. Identify the non-empty match
matchedLine = find(~cellfun(#isempty,match));
Try this as it matches .program='function' exactly:
(\.)program='function'
I think this did not work:
'[.program]+\=(function)'
because of how the []'s work. Here is a link explaining why I say that: http://www.regular-expressions.info/charclass.html