Is it possible to read a vector of parameters from a file?
I'm trying to create a vector of objects, such as shown here: enter link description here starting on page 49. However, I would like to pull the specific resistance and capacitance values from a text file. (I'm actually just using this as an example for how to read it in).
So, the example fills in data like this:
A.Basic.Resistor R[N + 1](R = vector([Re/2; fill(Re,N-1); Re/2]) );
A.Basic.Capacitor C[N](each C = c*L/N);
But, instead I have a text file that contains something like, where the first column is the index, the second is the R values and the third is the C values:
#1
double test1(4,3) #First set of data (row then col)
1.0 1.0 10.0
2.0 2.0 30.0
3.0 5.0 50.0
4.0 7.0 100.0
I know that I can read this data in using a CombiTable1D or CombiTable2D. But, is there a way to essentially convert each column of data to a vector so that I can do something analogous to:
ReadInTableFromDisk
A.Basic.Resistor R[N + 1](R = FirstDataColumnOfDataOnDisk );
A.Basic.Capacitor C[N](each C = SecondDataColumnOfDataOnDisk);
I would recommend the ExternData library if you want to load external data files into your modelica tool.
Modelica library for data I/O of INI, JSON, XML, MATLAB MAT and Excel XLS/XLSX files
There is the vector() function that converts arrays to vectors.
Related
I'm trying to load an external CSV file using MATLAB.
I managed to download it using webread, but I only need a subset of the columns.
I tried
Tb = webread('https://datahub.io/machine-learning/iris/r/iris.csv');
X = [sepallength sepalwidth petallength petalwidth];
But I cannot form X this way because the names are not recognized. How can I create X correctly?
The line
Tb = webread('https://datahub.io/machine-learning/iris/r/iris.csv');
Produces a table object with column names you later try to access as if they were workspace variables - which they aren't. Instead, you should modify your code to use:
X = [Tb.sepallength Tb.sepalwidth Tb.petallength Tb.petalwidth];
I have the following code which takes in a mix of categorical, numeric features, string indexes the categorical features, then one hot encodes the categorical features, then assembles both one hot encoded categorical features and numeric features, runs them trough a random forest and prints the resultant tree. I want the tree nodes to have the original features names (i.e Frame_Size etc). How can I do that? In general how can I decode one hot encoded and assembled features?
# categorical features : start singindexing and one hot encoding
column_vec_in = ['Commodity','Frame_Size' , 'Frame_Shape', 'Frame_Color','Frame_Color_Family','Lens_Color','Frame_Material','Frame_Material_Summary','Build', 'Gender_Global', 'Gender_LC'] # frame Article_Desc not slected because the cardinality is too high
column_vec_out = ['Commodity_catVec', 'Frame_Size_catVec', 'Frame_Shape_catVec', 'Frame_Color_catVec','Frame_Color_Family_catVec','Lens_Color_catVec','Frame_Material_catVec','Frame_Material_Summary_catVec','Build_catVec', 'Gender_Global_catVec', 'Gender_LC_catVec']
indexers = [StringIndexer(inputCol=x, outputCol=x+'_tmp') for x in column_vec_in ]
encoders = [OneHotEncoder(dropLast=False, inputCol=x+"_tmp", outputCol=y) for x,y in zip(column_vec_in, column_vec_out)]
tmp = [[i,j] for i,j in zip(indexers, encoders)]
tmp = [i for sublist in tmp for i in sublist]
#categorical and numeric features
cols_now = ['SODC_Regular_Rate','Commodity_catVec', 'Frame_Size_catVec', 'Frame_Shape_catVec', 'Frame_Color_catVec','Frame_Color_Family_catVec','Lens_Color_catVec','Frame_Material_catVec','Frame_Material_Summary_catVec','Build_catVec', 'Gender_Global_catVec', 'Gender_LC_catVec']
assembler_features = VectorAssembler(inputCols=cols_now, outputCol='features')
labelIndexer = StringIndexer(inputCol='Lens_Article_Description_reduced', outputCol="label")
tmp += [assembler_features, labelIndexer]
# converter = IndexToString(inputCol="featur", outputCol="originalCategory")
# converted = converter.transform(indexed)
pipeline = Pipeline(stages=tmp)
all_data = pipeline.fit(df_random_forest_P_limited).transform(df_random_forest_P_limited)
all_data.cache()
trainingData, testData = all_data.randomSplit([0.8,0.2], seed=0)
rf = RF(labelCol='label', featuresCol='features',numTrees=10,maxBins=800)
model = rf.fit(trainingData)
print(model.toDebugString)
After I run the spark machine learning pipeline I want to print out the random forest as a tree.Currently it looks like below.
What I actually want to see is the original categorical feature names instead of feature 1, feature 2 etc. The fact that the categorical features are one hot encoded and vector assembled makes it hard for me to unindex/decode the feature names. How can I unidex/decode onehot encoded and assembled feature vectors in pyspark? I have a vague idea that I have to use " IndexToString()" but I am not exactly sure because there is a mix of numeric, categorical features and they are one hot encoded and assembled.
Export the Apache Spark ML pipeline to a PMML document using the JPMML-SparkML library. A PMML document can be inspected and interpreted by humans (eg. using Notepad), or processed programmatically (eg. using other Java PMML API libraries).
The "model schema" is represented by the /PMML/MiningModel/MiningSchema element. Each "active feature" is represented by a MiningField element; you can retrieve their "type definitions" by looking up the corresponding /PMML/DataDictionary/DataField element.
Edit: Since you were asking about PySpark, you might consider using the JPMML-SparkML-Package package for export.
I am working on an ingestion feature that will take a strongly formatted .xlsx file and import the records to a temp storage table and then process the rows to create db records.
One of the columns is strictly formatted as "Text" but it seems like the Open XML API handles the columns cells differently on a row-by-row basis. Some of the values while appearing to be numeric values are truly not (which is why we format the column as Text) -
some examples are "211377", "211727.01", "209395.388", "209395.435"
what these values represent is not important but what happens is that some values (using the Open XML API v2.5 library) will be read in properly as text whether retrieved from the Shared Strings collection or simply from InnerXML property while others get sucked in as numbers with what appears to be appended rounding or precision.
For example the "211377", "211727.01" and "209395.435" all come in exactly as they are in the spreadsheet but the "209395.388" value is being pulled in as "209395.38800000001" (there are others that this happens to as well).
There seems to be no rhyme or reason to which values get messed up and which ones which import fine. What is really frustrating is that if I use the native Import feature in SQL Server Management Studio and ingest the same spreadsheet to a temp table this does not happen - so how is that the SSMS import can handle these values as purely text for all rows but the Open XML API cannot.
To begin the answer you main problem seems to be values,
"209395.388" value is being pulled in as "209395.38800000001"
Yes in .xlsx file value is stored as 209395.38800000001 instead of 209395.388. And it's the correct format to store floating point numbers; nothing wrong in it. You van simply confirm it by following code snippet
string val = "209395.38800000001"; // <= What we extract from Open Xml
Console.WriteLine(double.Parse(val)); // < = Simply pass it to double and print
The output is :
209395.388 // <= yes the expected value
So there's nothing wrong in the value you extract from .xlsx using Open Xml SDK.
Now to cells, yes cell can have verity of formats. Numbers, text, boleans or shared string text. And you can styles to a cell which would format your string to a desired output in Excel. (Ex - Date Time format, Forced strings etc.). And this the way Excel handle the vast verity of data. It need this kind of formatting and .xlsx file format had to be little complex to support all.
My advice is to use a proper parse method set at extracted values to identify what format it represent (For example to determine whether its a number or a text) and apply what type of parse.
ex : -
string val = "209395.38800000001";
Console.WriteLine(float.Parse(val)); // <= Float parse will be deduce a different value ; 209395.4
Update :
Here's how value is saved in internal XML
Try for yourself ;
Make an .xlsx file with value 209395.388 -> Change extention to .zip -> Unzip it -> goto worksheet folder -> open Sheet1
You will notice that value is stored as 209395.38800000001 as scene in attached image.. So nothing wrong on API for extracting stored number. It's your duty to decide what format to apply.
But if you make the whole column Text before adding data, you will see that .xlsx hold data as it is; simply said as string.
I have a Modis image with hdf format.
fileinfo = hdfinfo('MOD09GA.A2011288.hdf');
I'm trying to create a matrix but I only need three bands that are stored on the attributes (I know it because I've checked on Erdas). I've checked the structure of the attributes and there are 12 bands (fileinfo.Attributes= <1x12 struct>). How can I extract and create a matrix with three bands?
sds_info = fileinfo.SDS(2);
What I'm trying to do is the following...
data1 = hdfread(sds_info.Attributes)
But I get the following error:
??? Error using ==>
hdfread>dataSetInfo at 418
HINFO must be a structure
describing a specific data set
in the file.
Checking the help I know I have to use that structure. How can I know the content of the attributes? How can I select and create a matrix with that information?
data1 = hdfread(s.Vdata(1), 'Fields', {'Idx', 'Temp', 'Dewpt'})
PS) I'm using the hdftool importing every band. There another way to do it?
At the end, this is what I've done (I don't erase the post just in case could help someone):
sur_refl_b01_1 = hdfread('MOD09GA.A2011288.h17v05.005.2011293000105.hdf', '/MODIS_Grid_500m_2D/Data Fields/sur_refl_b01_1', 'Index', {[1 1],[1 1],[2400 2400]});
I am reading a dicom file in matlab and modifying some data of it and trying to save it into another file, but while doing so, the private dicom data are either not written at all (when 'WritePrivate' is set to 0) or it's written as a UINT8 array which become incomprehensible and useless. I even tried to copy the data that I get in from the original dicom file to a new structure and write to a new dicom file but even though the private data remains fine in new structure it doesn't remain so in the new dicom file. Is there any way to keep this private data intact while copying in to a new dicom file without changing the matlab dicom dictionary?
I have provided the following code to show what I'm trying to do.
X=dicomread('Bad011_4CH_01.dcm');
metadata = dicominfo('Bad011_4CH_01.dcm');
metadata.PatientName.FamilyName='LastName';
metadata.PatientName.GivenName='FirstName';
birthday=metadata.PatientBirthDate;
year=birthday(1,1:4);
newyear=strcat(year,'0101');
metadata.PatientBirthDate=newyear;
names=fieldnames(metadata);
h=metadata;
dicomwrite(X,'example.dcm',h,'CreateMode','copy');
newh=dicominfo('example.dcm');
Here the data in newh contains none of the private data. If I change the code to the following
dicomwrite(X,'example.dcm',h,'CreateMode','copy','WritePrivate',1);
In this case the private data gets totally changed to some UIN8 array and useless. The ideal solution for my task would be to enable keeping the private data in the newly created dicom file without changing the matlab dicom dictionary.
Have you tried something like:
dicomwrite(uint16(image), fileName, 'ObjectType', 'MR Image Storage', ...
'WritePrivate', true, header);
where "header" is a struct composed of name-value pairs using the same format as header data that you would get from MATLAB's dicominfo function? My general approach to image creation in MATLAB is to avoid using CreateMode 'copy' and instead build my own DICOM header by explicitly copying the attributes that it makes sense to copy and generating my own values for attributes that should have new values.
To write private tags, you would do something like:
header.Private_0045_10xx_Creator = 'MY_PRIVATE_BLOCK';
header.Private_0045_1001 = int32(65535);
If you then write this out using dicomwrite and read it back in using hdr = dicominfo('mynewimg');, you can see that it really did write the value as a 32-bit integer even though, unfortunately, if is always going to read the data in as a vector of uint8 values.
>> hdr.Private_0045_1001
ans =
255
255
0
0
As long as you know what type to expect, you should be able to typecast the data back to the desired type after you've read the header. For example:
>> typecast(hdr.Private_0045_1001, 'int32')
ans =
65535
I know I'm about 8 years late, but have you tried
dicomwrite(..., 'VR', 'explicit')
?
It solves the "reading as uint8" problem for me.
Edit:
Actually, it looks like you need to specify a dicom dictionary with the VR of that tag. If you combine this with 'VR', 'explicit', then the program reading the dicom won't need to dictionary file.