Specify string format for numeric during conversion to pl.Utf8 - python-polars

Is there any way to specify a format specifier if, for example, casting a pl.Float32, without resorting to complex searches for the period character? As in something like:
s = pl.Series([1.2345, 2.3456, 3.4567])
s.cast(pl.Utf8, fmt="%0.2f") # fmt obviously isn't an argument
My current method is the following:
n = 2 # number of decimals desired
expr = pl.concat_str((
c.floor().cast(pl.Int32).cast(pl.Utf8),
pl.lit('.'),
((c%1)*(10**n)).round(0).cast(pl.Int32).cast(pl.Utf8)
)).str.ljust(width)
i.e separate the pre-decimal and post-decimal, format individually as strings, and concat together. Is there an easier way to do this?
Expected output:
shape: (3,)
Series: '' [str]
[
"1.23"
"2.34"
"3.45"
]

I'm not aware of a direct way to specify a format when casting, but here's two easy ways to obtain a specific number of decimal points.
Use write_csv
We can write a DataFrame as a csv file (to a StringIO buffer), which allows us to set a float_precision parameter. We can then use read_csv to parse the StringIO buffer to obtain our result. (This is much faster than you might think.) Note: we must use infer_schema_length=0 in the read_csv to prevent parsing the string back to a float.
from io import StringIO
s = pl.Series([1.2345, 2.3456, 3.4567])
n = 2
(
pl.read_csv(
StringIO(
pl.select(s)
.write_csv(float_precision=n)
),
infer_schema_length=0
)
.to_series()
)
shape: (3,)
Series: '1.23' [str]
[
"1.23"
"2.35"
"3.46"
]
Pad with zeros and then use a single regex
Another approach is to cast to a string and then append zeroes. From this, we can use a single regex expression to extract our result.
n = 2
zfill = '0' * n
regex = r"^([^\.]*\..{" + str(n) + r"})"
(
pl.select(s)
.with_column(
pl.concat_str([
pl.col(pl.Float64).cast(pl.Utf8),
pl.lit(zfill)
])
.str.extract(regex)
)
.to_series()
)
shape: (3,)
Series: '' [str]
[
"1.23"
"2.34"
"3.45"
]

Related

Regex expression in q to match specific integer range following string

Using q’s like function, how can we achieve the following match using a single regex string regstr?
q) ("foo7"; "foo8"; "foo9"; "foo10"; "foo11"; "foo12"; "foo13") like regstr
>>> 0111110b
That is, like regstr matches the foo-strings which end in the numbers 8,9,10,11,12.
Using regstr:"foo[8-12]" confuses the square brackets (how does it interpret this?) since 12 is not a single digit, while regstr:"foo[1[0-2]|[1-9]]" returns a type error, even without the foo-string complication.
As the other comments and answers mentioned, this can't be done using a single regex. Another alternative method is to construct the list of strings that you want to compare against:
q)str:("foo7";"foo8";"foo9";"foo10";"foo11";"foo12";"foo13")
q)match:{x in y,/:string z[0]+til 1+neg(-/)z}
q)match[str;"foo";8 12]
0111110b
If your eventual goal is to filter on the matching entries, you can replace in with inter:
q)match:{x inter y,/:string z[0]+til 1+neg(-/)z}
q)match[str;"foo";8 12]
"foo8"
"foo9"
"foo10"
"foo11"
"foo12"
A variation on Cillian’s method: test the prefix and numbers separately.
q)range:{x+til 1+y-x}.
q)s:"foo",/:string 82,range 7 13 / include "foo82" in tests
q)match:{min(x~/:;in[;string range y]')#'flip count[x]cut'z}
q)match["foo";8 12;] s
00111110b
Note how unary derived functions x~/: and in[;string range y]' are paired by #' to the split strings, then min used to AND the result:
q)flip 3 cut's
"foo" "foo" "foo" "foo" "foo" "foo" "foo" "foo"
"82" ,"7" ,"8" ,"9" "10" "11" "12" "13"
q)("foo"~/:;in[;string range 8 12]')#'flip 3 cut's
11111111b
00111110b
Compositions rock.
As the comments state, regex in kdb+ is extremely limited. If the number of trailing digits is known like in the example above then the following can be used to check multiple patterns
q)str:("foo7"; "foo8"; "foo9"; "foo10"; "foo11"; "foo12"; "foo13"; "foo3x"; "foo123")
q)any str like/:("foo[0-9]";"foo[0-9][0-9]")
111111100b
Checking for a range like 8-12 is not currently possible within kdb+ regex. One possible workaround is to write a function to implement this logic. The function range checks a list of strings start with a passed string and end with a number within the range specified.
range:{
/ checking for strings starting with string y
s:((c:count y)#'x)like y;
/ convert remainder of string to long, check if within range
d:("J"$c _'x)within z;
/ find strings satisfying both conditions
s&d
}
Example use:
q)range[str;"foo";8 12]
011111000b
q)str where range[str;"foo";8 12]
"foo8"
"foo9"
"foo10"
"foo11"
"foo12"
This could be made more efficient by checking the trailing digits only on the subset of strings starting with "foo".
For your example you can pad, fill with a char, and then simple regex works fine:
("."^5$("foo7";"foo8";"foo9";"foo10";"foo11";"foo12";"foo13")) like "foo[1|8-9][.|0-2]"

How to connect words/phrases in matlab

How can I connect these two parts?
In Excel if you say 'state'&2 you will get a combined phrase state2.
I want to join 'state' and 'i' where i is a number between e.g. 1,2,3...
Then I can end up with state1 or state5 for example depending on what i is equal to.
How can I do this?
You can
Use num2str to convert 2 to '2', and then concatenation to build your char array
Use sprintf to create a char array with a specified placeholder format
Use strings.
Importantly here I've made a distinction between strings ("double quotes") and character arrays ('single quotes') - read here for more details about their differences.
Corresponding code would look like
% 1. Use num2str and concatenation
str = ['state', num2str(2)]; % -> 'state2' (char)
% 2. Use sprintf
str = sprintf( 'state%d', 2 ); % -> 'state2' (char)
% 3. Use strings
str = "state" + 2 % -> "state2" (string)
I would opt for number 2, since I think it's cleaner than 1 and more flexible, and I have used MATLAB since before strings existed so I'm predisposed to dislike them!

Adding Decimal to an Integer

I need to add decimal to an integer.
Eg:
Amount = 12345
The output should be
Amount = 123.45
Could someone help me how to achieve this using power shell
Always use a comma if you're looking to format a long string, adding a decimal point implies the number has a decimal component.
(12345).ToString("N0")
12,345
the N0 is a default formatting string which here gives the comma separated string.
if you're looking to fix badly stored decimal numbers or something where your question is actually what you're looking for, dividing by 100 will work for your needs.
12345 / 100
123.45
if you need a more code based solution which handles trailing zeroes or something you could use this:
$num = 12345
$numstr = "$num"
$splitat = $numstr.Length - 2
$before = $numstr.Substring(0,$SplitAt)
$after = $numstr.Substring($SplitAt)
"$($before).$($after)"
or this
"12345" -replace '(\d*)(\d{2})','$1.$2'

Function to split string in matlab and return second number

I have a string and I need two characters to be returned.
I tried with strsplit but the delimiter must be a string and I don't have any delimiters in my string. Instead, I always want to get the second number in my string. The number is always 2 digits.
Example: 001a02.jpg I use the fileparts function to delete the extension of the image (jpg), so I get this string: 001a02
The expected return value is 02
Another example: 001A43a . Return values: 43
Another one: 002A12. Return values: 12
All the filenames are in a matrix 1002x1. Maybe I can use textscan but in the second example, it gives "43a" as a result.
(Just so this question doesn't remain unanswered, here's a possible approach: )
One way to go about this uses splitting with regular expressions (MATLAB's strsplit which you mentioned):
str = '001a02.jpg';
C = strsplit(str,'[a-zA-Z.]','DelimiterType','RegularExpression');
Results in:
C =
'001' '02' ''
In older versions of MATLAB, before strsplit was introduced, similar functionality was achieved using regexp(...,'split').
If you want to learn more about regular expressions (abbreviated as "regex" or "regexp"), there are many online resources (JGI..)
In your case, if you only need to take the 5th and 6th characters from the string you could use:
D = str(5:6);
... and if you want to convert those into numbers you could use:
E = str2double(str(5:6));
If your number is always at a certain position in the string, you can simply index this position.
In the examples you gave, the number is always the 5th and 6th characters in the string.
filename = '002A12';
num = str2num(filename(5:6));
Otherwise, if the formating is more complex, you may want to use a regular expression. There is a similar question matlab - extracting numbers from (odd) string. Modifying the code found there you can do the following
all_num = regexp(filename, '\d+', 'match'); %Find all numbers in the filename
num = str2num(all_num{2}) %Convert second number from str

Creating an array from an str concatenation Matlab

Hello I have these two vectors
Q = [1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4]
and
Year = [2000,2000,2000,2000,2001,2001,2001,2001,2002.....]
and I would like to concatenate them into one single array Time
Time = [20001,20002,20003,20004,20010....]
Or
Time= {'2000Q1', '2000Q2', '2000Q3', '2000Q4', '2001Q1'....}
So far I tried with this code
m = zeros(136,1)
for i=1:136
m(i,1)= strcat(Q(i),Year(i));
end
And Matlab outputed me this:
Subscripted assignment dimension mismatch.
Help pls ?
If your vectors Year and Q have the same number of elements, you do not need a loop, just transpose them (or just make sure they are in column), then concatenate with the [] operator:
Time = [ num2str(Year.') num2str(Q.') ] ;
will give you:
20001
20002
20003
20004
20011
...
And if you want the 'Q' character, insert it in the expression:
Time = [ num2str(Year.') repmat('Q',length(Q),1) num2str(Q.') ]
Will give you:
2000Q1
2000Q2
2000Q3
2000Q4
2001Q1
...
This will be a char array, if you want a cell array, use cellstr on the same expression:
time = cellstr( [num2str(Year.') repmat('Q',length(Q),1) num2str(Q.')] ) ;
To obtain strings:
strtrim(mat2cell(num2str([Year(:) Q(:) ],'%i%i'), ones(1,numel(Q))));
Explanation:
Concat both numeric vectors as two columns (using [...])
Convert to char array, where each row is the concatenation of two numbers (using num2str with sprintf-like format specifiers). It is assumed that all numbers are integers (if not, change the format specifiers). This may introduce unwanted spaces if not all the concatenated numbers have the same number of digits.
Convert to a cell array, putting each row in a different cell (using mat2cell).
Remove whitespaces in each cell (using strtrim)
To obtain numbers: apply str2double to the above:
str2double(strtrim(mat2cell(num2str([Year(:) Q(:) ],'%i%i'), ones(1,numel(Q)))));
Or compute directly
10.^ceil(max(log10(Q)))*Year + Q;
You can use arrayfun
If you want your output in string format (with a 'Q' in the middle) then use sprintf to format the string
Time = arrayfun( #(y,q) sprintf('%dQ%d', y, q ), Year, Q, 'uni', 0 );
Resulting with a cellarray
Time =
'2000Q1' '2000Q2' '2000Q3' '2000Q4' '2001Q1' '2001Q2' '2001Q3'...
Alternatively, if you skip the 'Q' you can save each number in an array
Time = arrayfun( #(y,q) y*10+q, Year, Q )
Resulting with a regular array
Time =
20001 20002 20003 20004 20011 20012 20013 ...
Thats because you are initializing m to zeros(136,1) and then trying to save a full string into the first value. and obviously a double cannot hold a string.
I give you 2 options, but I favor the first one.
1.- you can just use cell arrays, so your code converts into:
m = cell(136,1)
for ii=1:136
m{ii}= strcat(Q(ii),Year(ii));
end
and then m will be: m{1}='2000Q1';
2.- Or if you know that your strings will ALWAYS be the same size (in your case it lokos like they are always 6) you can:
m = zeros(136,strsize)
for ii=1:136
m(ii,:)= strcat(Q(ii),Year(ii));
end
and then m will be: m(1,:)= [ 50 48 48 48 81 49 ] wich translated do ASCII will be 2000Q1