I import a sheet from Excel into matlab using the command "readtable":
TABLE = readtable(Excel.FN, 'sheet', Excel.Sheet);
The table contains both, numeric values and strings.
If I try to access the numeric values, I can't get them as double.
TABLE{j,i} = '0.00069807'
is still a cell.
cell2num(TABLE{j,i}) = NaN
cell2mat(TABLE{j,i}) = 0.00069807,
but this is a char. So I use
str2num(cell2mat(TABLE{j,i}))
to obtain the numeric value. There must be a simpler way. Could you please tell me the command.
If you don't insist on readtable, the xlsread would be better for you. Loaded data are more "matlab-friendly" with this function.
I am not sure whether there is a simpler solution with readtable. I think that's just the price you need to pay for not working with the "rawer" data such as CSV or simple text files.
Related
I've got a dataflow with a csv file as source. The column NewPositive is a string and it contains numbers formatted in European style with a dot as thousand seperator e.g 1.019 meaning 1019
If I use the function toInteger to convert my NewPositive column to an int via toInteger(NewPositive,'#.###','de'), I only get the thousand cipher e.g 1 for 1.019 and not the rest. Why? For testing I tried creating a constant column: toInteger('1.019','#.###','de') and it gives 1019 as expected. So why does the function not work for my column? The column is trimmed and if I compare the first value with equality function: equals('1.019',NewPositive) returns true.
Please note: I know it's very easy to create a workaround by toInteger(replace(NewPositive,'.','')), but I want to learn how to use the toInteger function with the locale and format parameters.
Here is sample data:
Dato;NewPositive
2021-08-20;1.234
2021-08-21;1.789
I was able to repro this and probably looks to be a bug to me . I have reported this to the ADF team , will let you know once I hear back from them . You already have a work around please go ahead that to unblock yourself .
I am working on an ingestion feature that will take a strongly formatted .xlsx file and import the records to a temp storage table and then process the rows to create db records.
One of the columns is strictly formatted as "Text" but it seems like the Open XML API handles the columns cells differently on a row-by-row basis. Some of the values while appearing to be numeric values are truly not (which is why we format the column as Text) -
some examples are "211377", "211727.01", "209395.388", "209395.435"
what these values represent is not important but what happens is that some values (using the Open XML API v2.5 library) will be read in properly as text whether retrieved from the Shared Strings collection or simply from InnerXML property while others get sucked in as numbers with what appears to be appended rounding or precision.
For example the "211377", "211727.01" and "209395.435" all come in exactly as they are in the spreadsheet but the "209395.388" value is being pulled in as "209395.38800000001" (there are others that this happens to as well).
There seems to be no rhyme or reason to which values get messed up and which ones which import fine. What is really frustrating is that if I use the native Import feature in SQL Server Management Studio and ingest the same spreadsheet to a temp table this does not happen - so how is that the SSMS import can handle these values as purely text for all rows but the Open XML API cannot.
To begin the answer you main problem seems to be values,
"209395.388" value is being pulled in as "209395.38800000001"
Yes in .xlsx file value is stored as 209395.38800000001 instead of 209395.388. And it's the correct format to store floating point numbers; nothing wrong in it. You van simply confirm it by following code snippet
string val = "209395.38800000001"; // <= What we extract from Open Xml
Console.WriteLine(double.Parse(val)); // < = Simply pass it to double and print
The output is :
209395.388 // <= yes the expected value
So there's nothing wrong in the value you extract from .xlsx using Open Xml SDK.
Now to cells, yes cell can have verity of formats. Numbers, text, boleans or shared string text. And you can styles to a cell which would format your string to a desired output in Excel. (Ex - Date Time format, Forced strings etc.). And this the way Excel handle the vast verity of data. It need this kind of formatting and .xlsx file format had to be little complex to support all.
My advice is to use a proper parse method set at extracted values to identify what format it represent (For example to determine whether its a number or a text) and apply what type of parse.
ex : -
string val = "209395.38800000001";
Console.WriteLine(float.Parse(val)); // <= Float parse will be deduce a different value ; 209395.4
Update :
Here's how value is saved in internal XML
Try for yourself ;
Make an .xlsx file with value 209395.388 -> Change extention to .zip -> Unzip it -> goto worksheet folder -> open Sheet1
You will notice that value is stored as 209395.38800000001 as scene in attached image.. So nothing wrong on API for extracting stored number. It's your duty to decide what format to apply.
But if you make the whole column Text before adding data, you will see that .xlsx hold data as it is; simply said as string.
I have a matlab code which is for printing a cell array to excel. The size of matrix is 50x13.
The row 1 is the column names.
Column 1 is dates and rest columns are numbers.
The dateformat being defined in the code is:
dFormat = struct;
dFormat.Style = struct( 'NumberFormat', '_(* #,##0.00_);_(* (#,##0.00);_(* "-"??_);_(#_)' );
dFormat.Font = struct( 'Size', 8 );
Can someone please explain me what the dFormat.Style code means ?
Thanks
The first line creates an empty struct (struct with no fields) called dFormat. A structure can contain pretty much anything in one of its fields, including another structure. The second line adds a field called 'Style' to the dFormat struct and sets it equal to another struct with a field called 'NumberFormat'. The 'NumberFormat' field is set equal to that long string of characters. You now have a structure of structures. The third line is similar to the second.
Note that the first line isn't really necessary unless dFormat already exists and it needs to be "zeroed out" as dFormat.Style with create it implicitly. However, using the struct function can make code more readable in some cases as objects use a similar notation for access methods and properties. In other words, all of your code could be replaced with:
dFormat.Style.NumberFormat = '_(* #,##0.00_);_(* (#,##0.00);_(* "-"??_);_(#_)';
dFormat.Font.Size = 8;
See this video from the MathWorks for more details and this list of helpful structure functions and examples.
#horchler already elaborated on structs, but I imagine you may actually be more interested in the content of this structs Style field.
In case you are solely interested in _(* #,##0.00_);_(* (#,##0.00);_(* "-"??_);_(#_), that does not really look like something MATLAB related to me.
My best guess is that this code is used to later feed some other program, for examle to build an excel file.
I want to create a structure with a variable name in a matlab script. The idea is to extract a part of an input string filled by the user and to create a structure with this name. For example:
CompleteCaseName = input('s');
USER WRITES '2013-06-12_test001_blabla';
CompleteCaseName = '2013-06-12_test001_blabla'
casename(12:18) = struct('x','y','z');
In this example, casename(12:18) gives me the result test001.
I would like to do this to allow me to compare easily two cases by importing the results of each case successively. So I could write, for instance :
plot(test001.x,test001.y,test002.x,test002.y);
The problem is that the line casename(12:18) = struct('x','y','z'); is invalid for Matlab because it makes me change a string to a struct. All the examples I find with struct are based on a definition like
S = struct('x','y','z');
And I can't find a way to make a dynamical name for S based on a string.
I hope someone understood what I write :) I checked on the FAQ and with Google but I wasn't able to find the same problem.
Use a structure with a dynamic field name.
For example,
mydata.(casename(12:18)) = struct;
will give you a struct mydata with a field test001.
You can then later add your x, y, z fields to this.
You can use the fields later either by mydata.test001.x, or by mydata.(casename(12:18)).x.
If at all possible, try to stay away from using eval, as another answer suggests. It makes things very difficult to debug, and the example given there, which directly evals user input:
eval('%s = struct(''x'',''y'',''z'');',casename(12:18));
is even a security risk - what happens if the user types in a string where the selected characters are system(''rm -r /''); a? Something bad, that's what.
As I already commented, the best case scenario is when all your x and y vectors have same length. In this case you can store all data from the different files into 2 matrices and call plot(x,y) to plot each column as a series.
Alternatively, you can use a cell array such that:
c = cell(2,nufiles);
for ii = 1:numfiles
c{1,ii} = import x data from file ii
c{2,ii} = import y data from file ii
end
plot(c{:})
A structure, on the other hand
s.('test001').x = ...
s.('test001').y = ...
Use eval:
eval(sprintf('%s = struct(''x'',''y'',''z'');',casename(12:18)));
Edit: apologies, forgot the sprintf.
I'm using Excel::Template to generate a series of Excel files via perl. However, I need to do a SUM function on the current Column. I know I can do
=SUM(3:15)
but that gives the sum of ALL columns in rows 3-15. Is there an easier way to do what I'm trying to do?
=sum(indirect(concatenate(address(<row_start>,column()),":")&address(<row_end>,column())))
gives me exactly what I need. Not exactly sure how it works, but found on MrExcel.com
For column C,
=SUM(C3:C15)
Since =SUM(...) is just a string, you may have to parametrize the column if you don't know it before runtime. For instance
$str = "=SUM(" . col_char . "3:" . col_char . "15)";