Loading array file with other numbering format - numbers

I am trying to load a file in which the numbers are in the following form:
0.0000000D+00 -0.1145210D-16 0.1262408D-16
0.1000000D+00 -0.4697286D-06 0.1055963D-06
0.2000000D+00 -0.1877806D-05 0.4220493D-06
0.3000000D+00 -0.4220824D-05 0.9482985D-06
I am trying the numpy.loadtext function, but apparently it's not in its recognized numbering format as I'm getting the following error:
ValueError: could not convert string to float: b'0.0000000D+00'
Any idea?
Thanks

You can use a converter with numpy.loadtxt that converts the value to a parsable float. In this case we trivially replace Dwith E;
import numpy as np
numconv = lambda x : str.replace(x.decode('utf-8'), 'D', 'E')
np.loadtxt('test.txt', converters={0:numconv, 1:numconv, 2:numconv}, dtype='double')
# array([[ 0.00000000e+00, -1.14521000e-17, 1.26240800e-17],
# [ 1.00000000e-01, -4.69728600e-07, 1.05596300e-07],
# [ 2.00000000e-01, -1.87780600e-06, 4.22049300e-07],
# [ 3.00000000e-01, -4.22082400e-06, 9.48298500e-07]])

Related

Specify string format for numeric during conversion to pl.Utf8

Is there any way to specify a format specifier if, for example, casting a pl.Float32, without resorting to complex searches for the period character? As in something like:
s = pl.Series([1.2345, 2.3456, 3.4567])
s.cast(pl.Utf8, fmt="%0.2f") # fmt obviously isn't an argument
My current method is the following:
n = 2 # number of decimals desired
expr = pl.concat_str((
c.floor().cast(pl.Int32).cast(pl.Utf8),
pl.lit('.'),
((c%1)*(10**n)).round(0).cast(pl.Int32).cast(pl.Utf8)
)).str.ljust(width)
i.e separate the pre-decimal and post-decimal, format individually as strings, and concat together. Is there an easier way to do this?
Expected output:
shape: (3,)
Series: '' [str]
[
"1.23"
"2.34"
"3.45"
]
I'm not aware of a direct way to specify a format when casting, but here's two easy ways to obtain a specific number of decimal points.
Use write_csv
We can write a DataFrame as a csv file (to a StringIO buffer), which allows us to set a float_precision parameter. We can then use read_csv to parse the StringIO buffer to obtain our result. (This is much faster than you might think.) Note: we must use infer_schema_length=0 in the read_csv to prevent parsing the string back to a float.
from io import StringIO
s = pl.Series([1.2345, 2.3456, 3.4567])
n = 2
(
pl.read_csv(
StringIO(
pl.select(s)
.write_csv(float_precision=n)
),
infer_schema_length=0
)
.to_series()
)
shape: (3,)
Series: '1.23' [str]
[
"1.23"
"2.35"
"3.46"
]
Pad with zeros and then use a single regex
Another approach is to cast to a string and then append zeroes. From this, we can use a single regex expression to extract our result.
n = 2
zfill = '0' * n
regex = r"^([^\.]*\..{" + str(n) + r"})"
(
pl.select(s)
.with_column(
pl.concat_str([
pl.col(pl.Float64).cast(pl.Utf8),
pl.lit(zfill)
])
.str.extract(regex)
)
.to_series()
)
shape: (3,)
Series: '' [str]
[
"1.23"
"2.34"
"3.45"
]

OCTAVE data import from PCE-VDL data logger device and conversion of decimal coma to decimal point

I have a measurement device PCE-VDL, which gives me measurements in following CSV format below, which I need to import to OCTAVE for further investigation.
Especially I need to import last 3 columns with xyz acceleration data.
The file is in CSV format with delimiter of semicolon ";".
I have tried:
A_1 = importdata ("file.csv", ";", 3);
but have recieved
error: missing_idx(10): out of bound 9
The CSV file looks like this:
#PCE-VDL X - TableView series
#2020.16.11
#Date;Time;Duration [s];t [°C];RH [%];p [mbar];aX [g];aY [g];aZ [g];
2020.28.10;16:16:32:0000;00:000;;;;0,0195;-0,0547;1,0039;
2020.28.10;16:16:32:0052;00:005;;;;0,0898;-0,0273;0,8789;
2020.28.10;16:16:32:0104;00:010;;;;0,0977;-0,0313;0,9336;
2020.28.10;16:16:32:0157;00:015;;;;0,1016;-0,0273;0,9297;
The numbers in last 3 columns have also decimal coma and not decimal point. So there probably should be done also some conversion.
Thank you very much for any help.
Regards
EDIT: 18.11.2020
Thanks for help. I have tried now following:
A_1_str = fileread ("file.csv");
A_1_str_m = strrep (A_1_str, ".", "-");
A_1_str_m = strrep (A_1_str_m, ",", ".");
save "A_1_str_m.csv" A_1_str_m;
A_1 = importdata ("A_1_str_m.csv", ";", 8);
and still receive error: file_content(140): out of bound 139
There is probably some problem with time format in first columns, which I do not want to read. I just need last three columns.
After my conversion, the file looks like this:
# Created by Octave 5.1.0, Wed Nov 18 21:40:52 2020 CET <zdenek#ASUS-F5V>
# name: A_1_str_m
# type: sq_string
# elements: 1
# length: 7849
#PCE-VDL X - TableView series
#2020-16-11
#Date;Time;Duration [s];t [°C];RH [%];p [mbar];aX [g];aY [g];aZ [g];
2020-28-10;16:16:32:0000;00:000;;;;0.0195;-0.0547;1.0039;
2020-28-10;16:16:32:0052;00:005;;;;0.0898;-0.0273;0.8789;
2020-28-10;16:16:32:0104;00:010;;;;0.0977;-0.0313;0.9336;
Thanks for support!
You can first read the data with fileread, which stores the data as a string. Then you can manipulate the string like this:
new_string = strrep(string, ",", ".");
strrep replaces all occurrences of a pattern within a string. Afterwards you save this data as a separate file or you overwrite the existing file with the manipulated data. When this is done you proceed as you have tried before.
EDIT: 19.11.2020
To avoid the additional heading lines in the new file, you can save it like this:
fid = fopen("A_1_str_m.csv", "w");
fputs(fid, A_1_str_m);
fclose(fid);
fputs will just write the string to the file.
The you can read the new file with dlmread.
A1_buf = dlmread("A_1_str_m.csv", ";");
A1_buf = real(A1); # get the real value of the complex number
A1_buf(1:3, :) = []; # remove the headlines
A1 = A1_buf(:, end-3:end-1); # get only the the 3 columns you're looking for
This will give you the three columns your looking for. But the date and time data will be ignored.
EDIT 20.11.2020
Replaced abs with real, so the sign of the value will be kept.
Use csv2cell from the io package.

PySpark - ValueError: Cannot convert column into bool

So I've seen this solution:
ValueError: Cannot convert column into bool
which has the solution I think. But I'm trying to make it work with my dataframe and can't figure out how to implement it.
My original code:
if df2['DayOfWeek']>=6 :
df2['WeekendOrHol'] = 1
this gives me the error:
Cannot convert column into bool: please use '&' for 'and', '|' for
'or', '~' for 'not' when building DataFrame boolean expressions.
So based on the above link I tried:
from pyspark.sql.functions import when
when((df2['DayOfWeek']>=6),df2['WeekendOrHol'] = 1)
when(df2['DayOfWeek']>=6,df2['WeekendOrHol'] = 1)
but this is incorrect as it gives me an error too.
To update a column based on a condition you need to use when like this:
from pyspark.sql import functions as F
# update `WeekendOrHol` column, when `DayOfWeek` >= 6,
# then set `WeekendOrHol` to 1 otherwise, set the value of `WeekendOrHol` to what it is now - or you could do something else.
# If no otherwise is provided then the column values will be set to None
df2 = df2.withColumn('WeekendOrHol',
F.when(
F.col('DayOfWeek') >= 6, F.lit(1)
).otherwise(F.col('WeekendOrHol')
)
Hope this helps, good luck!
Best answer as provided by pault:
df2=df2.withColumn("WeekendOrHol", (df2["DayOfWeek"]>=6).cast("int"))
This is a duplicate of:
this

Formatting small decimal numbers in MATLAB

How can I format the number 0.00935349 in a string using fprintf() so that I can display ?
The %e format specifier gets you close:
>> fprintf('%.1e\n', 0.00935349)
9.4e-03
If you want the e to appear as x10, you can use sprintf to generate the number string, replace the e using strrep on the result, then pass that to fprintf:
>> fprintf(strrep(sprintf('%.1e\n', 0.00935349), 'e', 'x10'))
9.4x10-03

String conversion in matlab doesn't work with int values

I'm parsing longstrings in matlab and whenever I use str2num with an int it doesn't work, it outputs a weird Chinese or Greek symbol instead.
satrec.satnum = str2num(longstr1(3:7));
I checked by outputting it as a string, it works properly but I won't be able to use it in my calculations later on if I don't manage to change it to an int. The characters 3 to 7 of my string are ints (ex : 8188). As it appears to work if my strings are doubles, I tried this :
satrec.satnum = longstr1(3:7);
satrec.satnum = strcat(satrec.satnum,'.0');
satrec.satnum = str2num(satrec.satnum);
fprintf('satellite number : %s\n',satrec.satnum);
But it outputs the same weird symbol. Does anyone know what I can do ?
This looks like NORAD 2-line element data. In that case the file encoding is US-ASCII or effectively UTF-8 since no non-ASCII characters should be present.
Your problem appears to be in this line:
fprintf('satellite number : %s\n',satrec.satnum);
satrec.satnum is an integer, but you are printing it with a %s character in the format string, so Matlab is interpreting it as a string. Replace this with
fprintf('satellite number : %d\n',satrec.satnum);
and you get the correct result.
Edited to add
Matlab has in fact converted the string to an int correctly!
I tried running the the code you provided along with your example, and am unable to reproduce the problem you described:
longstr1='1 28895U 05043F 14195.24580016 .00000503 00000-0 10925-3 0 8188';
satrec.satnum = str2num(longstr1(3:7))
satrec =
satnum: 28895
In any case, I'd suggest using something like textscan or dlmread:
Data = textscan(longstr1,'%u8 %u16 %c %u16 %c %f %f %u16-%u8 %u16-%u8 %u8 %u16', 'delimiter', '')
Data =
Columns 1 through 9
[1] [28895] 'U' [5043] 'F' [1.4195e+04] [5.0300e-06] [0] [0]
Columns 10 through 13
[10925] [3] [0] [8188]
In the above example I guessed some of the data-types so you should update them for your use.
As you can see, this code works on a string. If however you provide it with fileID it will read all the lines in the file (see documentation for textscan) using this template.
On a side note: I noticed that char(28895) outputs a Chinese character.