Getting substring from Series of list Python - substring

I have a data like the following:
a=[["S(8)"],[],["S(5)",T(3)]]
and I want to obtain integer numbers between parenthesis while protecting the shape:
b=[[8],[],[5,3]]
I tried to do it with loop but it didn't work out.
numbers=[]
sublist=[]
for y in a:
for x in y:
if len(y)==0:
numbers.append([])
pass
elif len(y)>0:
sublist.append([x[x.find("(")+1:x.find(")")]])
numbers.append(toplama)
pass
And I tried this one also
numbers=[x[x.find("(")+1:x.find(")")] for x in y for y in a]
Can you help me with it, please? Thank you.

Try the following:
import re
# Finds a number (at least 1 digit) in brackets
x = re.compile(r"\(([0-9]+)\)")
# Your input
a = [["S(8)"],[],["S(5)","T(3)"]]
# The output
b = []
# Loop over list
for i in a:
# Append each element list containing converted numbers
b.append([int(x.findall(j)[0]) for j in i])
print(b)
# Output: [[8], [], [5, 3]]

Related

convert number string into float with specific precision (without getting rounding errors)

I have a vector of cells (say, size of 50x1, called tokens) , each of which is a struct with properties x,f1,f2 which are strings representing numbers. for example, tokens{15} gives:
x: "-1.4343429"
f1: "15.7947111"
f2: "-5.8196158"
and I am trying to put those numbers into 3 vectors (each is also 50x1) whose type is float. So I create 3 vectors:
x = zeros(50,1,'single');
f1 = zeros(50,1,'single');
f2 = zeros(50,1,'single');
and that works fine (why wouldn't it?). But then when I try to populate those vectors: (L is a for loop index)
x(L)=tokens{L}.x;
.. also for the other 2
I get :
The following error occurred converting from string to single:
Conversion to single from string is not possible.
Which I can understand; implicit conversion doesn't work for single. It does work if x, f1 and f2 are of type 50x1 double.
The reason I am doing it with floats is because the data I get is from a C program which writes the some floats into a file to be read by matlab. If I try to convert the values into doubles in the C program I get rounding errors...
So, (after what I hope is a good question,) how might I be able to get the numbers in those strings, at the right precision? (all the strings have the same number of decimal places: 7).
The MCVE:
filedata = fopen('fname1.txt','rt');
%fname1.txt is created by a C program. I am quite sure that the problem isn't there.
scanned = textscan(filedata,'%s','Delimiter','\n');
raw = scanned{1};
stringValues = strings(50,1);
for K=1:length(raw)
stringValues(K)=raw{K};
end
clear K %purely for convenience
regex = 'x=(?<x>[\-\.0-9]*),f1=(?<f1>[\-\.0-9]*),f2=(?<f2>[\-\.0-9]*)';
tokens = regexp(stringValues,regex,'names');
x = zeros(50,1,'single');
f1 = zeros(50,1,'single');
f2 = zeros(50,1,'single');
for L=1:length(tokens)
x(L)=tokens{L}.x;
f1(L)=tokens{L}.f1;
f2(L)=tokens{L}.f2;
end
Use function str2double before assigning into yours arrays (and then cast it to single if you want). Strings (char arrays) must be explicitely converted to numbers before using them as numbers.

passing varargin to subfunction if string

I want to modify function rand and define my own function
function num = rand(varargin)
Most of the time, i just wrap the invocation
num = builtin("rand", [varargin{:}]);
and this works well except in case there is a string argument.
For rand(2,3,"double") I obtain
warning: implicit conversion from numeric to char
warning: called from rand at line 83 column 11
error: rand: unrecognized string argument
error: called from rand at line 83 column 11
and for rand("seed",2) the same.
ON the other hand, rand("seed") seems to work fine.
Can anyone offer an explanation and a solution?
The syntax:
num = builtin('rand', [varargin{:}]);
Will only work for you in cases where the input arguments can be represented as either a comma-separated list or a vector, such as when you specify a size for rand:
num = rand(2, 3, 4);
% Or ...
num = rand([2 3 4]);
It will not work for inputs that must be entered separately, like so:
num = rand(2, 3, 'double'); % Works
num = rand([2 3 'double']); % Throws an error
In general, you should just pass the contents of varargin as a comma-separated list (without collecting the contents into a vector/matrix) since builtin is designed to handle that just fine:
num = builtin('rand', varargin{:});
Also, be mindful of the difference between "strings" like 'rand' (a character array) and "rand" (a string). They can have different behavior in certain cases.

Matlab: regexp usage

I am going to start illustration using a code:
A = 'G1(General G1Airlines american G1Fungus )';
Using regexp (or any other function) in Matlab I want to distinctively locate: G1, G1A and G1F.
Currently if I try to do something as:
B = regexp( A, 'G1')
It is not able to distinguish G1 with the G1A and G1F i.e. I need to force the comparison to find me only case with G1 and ignore G1A and G1F.
However, when I am searching for G1A then it should still find me the location of G1A.
Can someone please help ?
Edit: Another case for A is:
A = 'R1George Service SmalR1Al C&I)';
And the expression this time I need to find is R1 and R1A instead.
Edit:
I have a giant array containing A's and another big vector containing G1, R1, etc I need to search for.
If you want to find 'G1' but not 'G1A' or 'G1F' you can use
>> B = regexp(A, 'G1[^AF]')
B =
1
This will find 'G1' and the ^ is used to specify that it should not match any characters contained with []. Then you could use
>> B = regexp(A, 'G1[AF]')
B =
12 32
to find both 'G1A' and 'G1F'.

Convert matlab symbol to array of products

Can I convert a symbol that is a product of products into an array of products?
I tried to do something like this:
syms A B C D;
D = A*B*C;
factor(D);
but it doesn't factor it out (mostly because that isn't what factor is designed to do).
ans =
A*B*C
I need it to work if A B or C is replaced with any arbitrarily complicated parenthesized function, and it would be nice to do it without knowing what variables are in the function.
For example (all variables are symbolic):
D = x*(x-1)*(cos(z) + n);
factoring_function(D);
should be:
[x, x-1, (cos(z) + n)]
It seems like a string parsing problem, but I'm not confident that I can convert back to symbolic variables afterwards (also, string parsing in matlab sounds really tedious).
Thank you!
Use regexp on the string to split based on *:
>> str = 'x*(x-1)*(cos(z) + n)';
>> factors_str = regexp(str, '\*', 'split')
factors_str =
'x' '(x-1)' '(cos(z) + n)'
The result factor_str is a cell array of strings. To convert to a cell array of sym objects, use
N = numel(factors_str);
factors = cell(1,N); %// each cell will hold a sym factor
for n = 1:N
factors{n} = sym(factors_str{n});
end
I ended up writing the code to do this in python using sympy. I think I'm going to port the matlab code over to python because it is a more preferred language for me. I'm not claiming this is fast, but it serves my purposes.
# Factors a sum of products function that is first order with respect to all symbolic variables
# into a reduced form using products of sums whenever possible.
# #params orig_exp A symbolic expression to be simplified
# #params depth Used to control indenting for printing
# #params verbose Whether to print or not
def factored(orig_exp, depth = 0, verbose = False):
# Prevents sympy from doing any additional factoring
exp = expand(orig_exp)
if verbose: tabs = '\t'*depth
terms = []
# Break up the added terms
while(exp != 0):
my_atoms = symvar(exp)
if verbose:
print tabs,"The expression is",exp
print tabs,my_atoms, len(my_atoms)
# There is nothing to sort out, only one term left
if len(my_atoms) <= 1:
terms.append((exp, 1))
break
(c,v) = collect_terms(exp, my_atoms[0])
# Makes sure it doesn't factor anything extra out
exp = expand(c[1])
if verbose:
print tabs, "Collecting", my_atoms[0], "terms."
print tabs,'Seperated terms with ',v[0], ', (',c[0],')'
# Factor the leftovers and recombine
c[0] = factored(c[0], depth + 1)
terms.append((v[0], c[0]))
# Combines trivial terms whenever possible
i=0
def termParser(thing): return str(thing[1])
terms = sorted(terms, key = termParser)
while i<len(terms)-1:
if equals(terms[i][1], terms[i+1][1]):
terms[i] = (terms[i][0]+terms[i+1][0], terms[i][1])
del terms[i+1]
else:
i += 1
recombine = sum([terms[i][0]*terms[i][1] for i in range(len(terms))])
return simplify(recombine, ratio = 1)

How can I filter my array of numbers in Matlab/Octave?

I have a very trivial example where I'm trying to filter by matching a String:
A = [0:1:999];
B = A(int2str(A) == '999');
This
A(A > 990);
works
This
int2str(5) == '5'
also works
I just can't figure out why I cannot put the two together. I get an error about nonconformant arguments.
int2str(A) produces a very long char array (of size 1 x 4996) containing the string representations of all those numbers (including spacing) appended together end to end.
int2str(A) == '999'
So, in the statement above, you're trying to compare a matrix of size 1 x 4996 with another of size 1 x 3. This, of course, fails as the two either need to be of the same size, or at least one needs to be a scalar, in which case scalar expansion rules apply.
A(A > 990);
The above works because of logical indexing rules, the result will be the elements from the indices of A for which that condition holds true.
int2str(5) == '5'
This only works because the result of the int2str call is a 1 x 1 matrix ('5') and you're comparing it to another matrix of the same size. Try int2str(555) == '55' and it'll fail with the same error as above.
I'm not sure what result you expected from the original statements, but maybe you're looking for this:
A = [0:1:999];
B = int2str(A(A == 999)) % outputs '999'
I am not sure that the int2str() conversion is what you are looking for. (Also, why do you need to convert numbers to strings and then carry out a char comparison?)
Suppose you have a simpler case:
A = 1:3;
strA = int2str(A)
strA =
1 2 3
Note that this is a 1x7 char array. Thus, comparing it against a scalar char:
strA == '2'
ans =
0 0 0 1 0 0 0
Now, you might wanna transpose A and carry out the comparison:
int2str(A')=='2'
ans =
0
1
0
however, this approach will not work if the number of digits of each number is not the same because lower numbers will be padded with spaces (try creating A = 1:10 and comparing against '2').
Then, create a cell array of string without whitespaces and use strcmp():
csA = arrayfun(#int2str,A','un',0)
csA =
'1'
'2'
'3'
strcmp('2',csA)
Should be much faster, and correct to turn the string into a number, than the other way around. Try
B = A(A == str2double ('999'));