Attributes do not match in rapidminer - neural-network

I successfully apply a neural net operator in rapidminer on a data set in which I have 3 columns and the 4th one the labelled one
column1|column2|column3|column4(labelled)
data |data |data |data
, now I have a testing data in order to predict the value of labelled column based upon the column1, column2, column3, testing data looks like:
column1|column2|column3
data |data |data
Question: is this correct?
Using this approach, I created a model so that the process can predict value of unlabelled column:
Then, using the solution in the below reference :
Split data solution
I again created a model using split data, for this I combined my data set for training and testing (now the combined data has some values for labelled column and some does not have this column value as this is the part of testing data).
But still I am getting this error.

from what I can see the problem is, that you don't apply the Nominal to Numerical operator to your test set.
In the default settings, this operator creates a dummy encoding for each nominal value found in the specified attribute. In your case you will have a column/attribute named "Course1=A" with a 1 as entry for each example where the original column was "A" and so on.
What you need to do is to apply the same encoding to your test data as to your training data.
As you can see, the Nominal to Numerical operator has an additional output port called pre (short for preprocessing model). This can be used apply the same pre-processing steps (like normalization or encoding) on multiple data sets.
For convince you can also also group several models into one by using the Group Model operator.
See the process XML below (just c&p it into the process view of RapidMiner) for an example.
<?xml version="1.0" encoding="UTF-8"?><process version="8.2.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="8.2.000" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="8.2.000" expanded="true" height="68" name="Retrieve Golf" width="90" x="45" y="34">
<parameter key="repository_entry" value="//Samples/data/Golf"/>
</operator>
<operator activated="true" class="nominal_to_numerical" compatibility="8.2.000" expanded="true" height="103" name="Nominal to Numerical" width="90" x="179" y="34">
<list key="comparison_groups"/>
<description align="center" color="purple" colored="true" width="126">Transform the nominal attributes into a dummy encoding with 0/1 for each expression.<br>This encoding is then also delivered via &quot;pre&quot; output port.</description>
</operator>
<operator activated="true" class="neural_net" compatibility="8.2.000" expanded="true" height="82" name="Neural Net" width="90" x="447" y="34">
<list key="hidden_layers"/>
</operator>
<operator activated="true" class="retrieve" compatibility="8.2.000" expanded="true" height="68" name="Retrieve Golf-Testset" width="90" x="45" y="340">
<parameter key="repository_entry" value="//Samples/data/Golf-Testset"/>
</operator>
<operator activated="true" class="apply_model" compatibility="8.2.000" expanded="true" height="82" name="Apply Model (2)" width="90" x="447" y="340">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="apply_model" compatibility="8.2.000" expanded="true" height="82" name="Apply Model" width="90" x="648" y="340">
<list key="application_parameters"/>
</operator>
<connect from_op="Retrieve Golf" from_port="output" to_op="Nominal to Numerical" to_port="example set input"/>
<connect from_op="Nominal to Numerical" from_port="example set output" to_op="Neural Net" to_port="training set"/>
<connect from_op="Nominal to Numerical" from_port="preprocessing model" to_op="Apply Model (2)" to_port="model"/>
<connect from_op="Neural Net" from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_op="Retrieve Golf-Testset" from_port="output" to_op="Apply Model (2)" to_port="unlabelled data"/>
<connect from_op="Apply Model (2)" from_port="labelled data" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Apply Model" from_port="labelled data" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<description align="center" color="green" colored="true" height="103" resized="true" width="315" x="433" y="433">First apply the &quot;preprocessing&quot; model so the test data have the same structure<br/><br/>Then apply the trained neural net</description>
</process>
</operator>
</process>
Also feel free to ask further, or re-post, questions in the RapidMiner community forum.

Related

How to create sub table of contents in a specific section of band report in jasper?

I need to design a report using jasper reports which has main table of content for the whole report and a sub table of content at the starting of a specific section. The problem that it is not valid to use nested books. So is there another way to do that?
I have 3 parts in the report. The first and the third parts are table of contents. When the second part contains a bookmarkLevel, both table of contents don't work properly. Whenever I remove the bookmarkLeve, both table of contents work properly.
<group name="dummy">
<groupExpression><![CDATA[1]]></groupExpression>
<groupHeader>
<part evaluationTime="Report" uuid="1fadcc2f-31c1-49be-bd52-f8b69e38cd83">
<property name="net.sf.jasperreports.bookmarks.data.source.parameter" value="REPORT_DATA_SOURCE"/>
<partNameExpression><![CDATA["Table of Contents"]]></partNameExpression>
<p:subreportPart xmlns:p="http://jasperreports.sourceforge.net/jasperreports/parts" xsi:schemaLocation="http://jasperreports.sourceforge.net/jasperreports/parts http://jasperreports.sourceforge.net/xsd/parts.xsd" usingCache="true">
<subreportExpression><![CDATA["TOCPart.jasper"]]></subreportExpression>
</p:subreportPart>
</part>
<part uuid="3f63c482-39b2-43f1-a623-15fb046605a5">
<partNameExpression><![CDATA["Overview"]]></partNameExpression>
<p:subreportPart xmlns:p="http://jasperreports.sourceforge.net/jasperreports/parts" xsi:schemaLocation="http://jasperreports.sourceforge.net/jasperreports/parts http://jasperreports.sourceforge.net/xsd/parts.xsd">
<subreportParameter name="REPORT_CONNECTION">
<subreportParameterExpression><![CDATA[$P{REPORT_CONNECTION}]]></subreportParameterExpression>
</subreportParameter>
<subreportExpression><![CDATA["OrdersReport.jasper"]]></subreportExpression>
</p:subreportPart>
</part>
<part evaluationTime="Report" uuid="1fadcc2f-31c1-49be-bd52-f8b69e38cd84">
<property name="net.sf.jasperreports.bookmarks.data.source.parameter" value="REPORT_DATA_SOURCE"/>
<partNameExpression><![CDATA["Table of Contents2"]]></partNameExpression>
<p:subreportPart xmlns:p="http://jasperreports.sourceforge.net/jasperreports/parts" xsi:schemaLocation="http://jasperreports.sourceforge.net/jasperreports/parts http://jasperreports.sourceforge.net/xsd/parts.xsd">
<subreportExpression><![CDATA["TOCPart2.jasper"]]></subreportExpression>
</p:subreportPart>
</part>
</groupHeader>
</group>

Jasper CVC Component with Sub-Dataset

I'm working on CVC component in Jasper Studio. It is working fine with the "ReportMainDataset" but when I'm using the "Sub Dataset" it is not plotting.
Just wanted to know does CVC component works with "Sub Dataset" or not. If works, guide me how can I achieve this.
The tag for indicating the dataset is <cvData>
Example
subdataset
<subDataset name="Dataset1" uuid="03d50d7f-1b96-486a-ac64-7d2c6e440433">
<queryString>
<![CDATA[select count(*) t, shipcountry, shipcity from orders group by shipcountry, shipcity order by shipcountry, shipcity]]>
</queryString>
<field name="SHIPCOUNTRY" class="java.lang.String"/>
<field name="T" class="java.lang.Long"/>
<field name="SHIPCITY" class="java.lang.String"/>
</subDataset>
component
<cvc:customvisualization xmlns:cvc="http://www.jaspersoft.com/cvcomponent" xsi:schemaLocation="http://www.jaspersoft.com/cvcomponent http://www.jaspersoft.com/cvcomponent/component.xsd" evaluationTime="Report" onErrorType="Icon">
<cvc:itemProperty name="script" value="d3_zoomable_circle_packing.min.js"/>
<cvc:itemProperty name="css" value="d3_zoomable_circle_packing.css"/>
<cvc:cvData>
<dataset>
<datasetRun subDataset="Dataset1" uuid="bd23d50f-2149-4985-a0ac-883505172688">
<parametersMapExpression><![CDATA[$P{REPORT_PARAMETERS_MAP}]]></parametersMapExpression>
<connectionExpression><![CDATA[$P{REPORT_CONNECTION}]]></connectionExpression>
</datasetRun>
</dataset>
<cvc:item>
<cvc:itemProperty name="category">
<valueExpression><![CDATA[$F{SHIPCOUNTRY}]]></valueExpression>
</cvc:itemProperty>
<cvc:itemProperty name="subcategory">
<valueExpression><![CDATA[$F{SHIPCITY}]]></valueExpression>
</cvc:itemProperty>
<cvc:itemProperty name="value">
<valueExpression><![CDATA[$F{T}]]></valueExpression>
</cvc:itemProperty>
</cvc:item>
</cvc:cvData>
</cvc:customvisualization>
EDIT: As #dada67 comment it does not seem to work properly, I have also test it with sample d3_zoomable_cricle_packing.jrxml without success, this is the bug issue
The work around, create a subreport!

How to create user defined formula for user defined field

I have created one JasperReports report in iReport 4.5.0.
Here i have created one expression as:
<textField pattern="###0.0;-###0.0">
<reportElement x="248" y="3" width="46" height="20"/>
<textElement/>
<textFieldExpression><![CDATA[( $F{salesdetails_LessWeight} == 0 ? $F{salesdetails_Weight} - ($F{salesdetails_LessWeight} * $F{salesdetails_Quantity}) : $F{salesdetails_WithPlasticWeight} - ($F{salesdetails_LessWeight} * $F{salesdetails_Quantity}) )]]></textFieldExpression>
</textField>
Here I need the sum(total) of this text field created above ,
How to do that?
In this case you would simply create a variable with the same expression, but with a operation of "SUM". You can change the reset type to specify a group or simply for the whole report.
You can then use that variable in the Text Field element, with the execution time of Report/Group/Now depending on your needs.
<variable name="SOSUM" class="java.lang.Double" calculation="Sum">
<variableExpression><![CDATA[$F{salesdetails_LessWeight} == 0 ? $F{salesdetails_Weight} - ($F{salesdetails_LessWeight} * $F{salesdetails_Quantity}) : $F{salesdetails_WithPlasticWeight} - ($F{salesdetails_LessWeight} * $F{salesdetails_Quantity}) ]]></variableExpression>
</variable>

Passing main parameter to sub-datasets in JasperStudio

i'm created a report with JasperStudio 5.5 that have many parameter defined in the main and a lot of sub-dataset (defined with tables) that required this parameters.
The situation:
/main/Parameters: myPar
/main/mySubDataSet1/
/main/mySubDataSet2/
...
/main/mySubDataSetN/
The sub-dataset need to use this parameter in her query: select * from Tab t where t.attr = $P!{myPar}
So, my problem is that the sub-dataset can't access at this main paramenter, all the time that i try compiled, the program send me "Parameter not found : myPar".
How i can do for use myPar in the sub-dataset?
p.s.: i read this thread Pass main dataset parameter to subdataset query (based on iReport) but without success...
Well, you need to fill subDataset parameters with values where you actually make use of them. In this case the table which lists items from your subDataset needs to declare the necessary parameters and assign the values of the report-level dataset parameters to them.
In jrxml it sounds:
<jr:table xmlns:jr="http://jasperreports.sourceforge.net/jasperreports/components" xsi:schemaLocation="http://jasperreports.sourceforge.net/jasperreports/components http://jasperreports.sourceforge.net/xsd/components.xsd">
<datasetRun subDataset="mySubDataSet1" uuid="bbe7937c-a8f1-4838-811a-3f11ec1f8e35">
<datasetParameter name="myPar">
<datasetParameterExpression><![CDATA[$P{myPar}]]></datasetParameterExpression>
</datasetParameter>
<connectionExpression><![CDATA[$P{REPORT_CONNECTION}]]></connectionExpression>
</datasetRun>
...
</jr:table>
For detail:
<subDataset name="dsLines" uuid="a47307ff-90a8-476f-afd1-0fd8aa0517d0">
<parameter name="formalId" class="java.lang.String"/>
<queryString language="SQL">
<![CDATA[
SELECT s.formalid, sl.*
FROM salesorder s INNER JOIN salesorderline sl
ON (s.id = sl.salesorder_id)
WHERE s.formalid = $P{formalId}
]]>
</queryString>
<field name="qty" class="java.math.BigDecimal"/>
...
<jr:table xmlns:jr="http://jasperreports.sourceforge.net/jasperreports/components" xsi:schemaLocation="http://jasperreports.sourceforge.net/jasperreports/components http://jasperreports.sourceforge.net/xsd/components.xsd">
<datasetRun subDataset="dsLines" uuid="3ef5ec78-ab18-4f44-88e6-f99f3eafac07">
<datasetParameter name="formalId">
<datasetParameterExpression><![CDATA[$F{formalid}]]></datasetParameterExpression>
</datasetParameter>
</datasetRun>
<jr:column width="29" uuid="f675a273-7ea6-4bd4-8a55-c7522dfea2a8">
...

OLAP/MDX - define calculated member, sum all time to date data

I would like to define "all time to date" calculated member in OLAP cube. I'm able to calculate YTD by using the following:
SUM(YTD([Time].[Month].CurrentMember), [Measures].[Suits])
How can I include all dates since the beginning of my data? My time dimension looks like:
<Dimension type="TimeDimension" visible="true" foreignKey="granularity" highCardinality="false" name="Time">
<Hierarchy name="Time" visible="true" hasAll="true" primaryKey="eom_date">
<Table name="v_months" schema="bizdata">
</Table>
<Level name="Year" visible="true" column="year_number" type="String" uniqueMembers="false" levelType="TimeYears" hideMemberIf="Never">
</Level>
<Level name="Quarter" visible="true" column="quarter_number" type="String" uniqueMembers="false" levelType="TimeQuarters" hideMemberIf="Never">
</Level>
<Level name="Month" visible="true" column="month_number" type="String" uniqueMembers="false" levelType="TimeMonths" hideMemberIf="Never">
</Level>
</Hierarchy>
</Dimension>
Not sure if relevant: I'm using mondrian olap server (running on tomcat), Saiku as frontend, postgres as database
I've tried a lot of combinations, but I can't figure it out.
Update: I've tried to use syntax suggested by Gonsalu:
<CalculatedMember name="YTD Suits" formatString="" formula="SUM(YTD([Time].[Month].CurrentMember), [Measures].[Suits])" dimension="Measures" visible="true">
</CalculatedMember>
<CalculatedMember name="PTD Suits" formatString="" formula="Sum({NULL:[Time].[Month].CurrentMember },[Measures].[Suits])" dimension="Measures" visible="true">
</CalculatedMember>
Using this I get the following error message when starting mondrian (note that YTD function works well without the second calculated member):
Caused by: mondrian.olap.MondrianException: Mondrian Error:Failed to parse query
'WITH
MEMBER [Measures].[Measures].[YTD Suits]
AS 'SUM(YTD([Time].[Month].CurrentMember), [Measures].[Suits])',
[$member_scope] = 'CUBE',
MEMBER_ORDINAL = 6
MEMBER [Measures].[Measures].[PTD Suits]
AS 'Sum({NULL:[Time].[Month].CurrentMember },[Measures].[Suits])',
[$member_scope] = 'CUBE',
MEMBER_ORDINAL = 7
SELECT FROM [Project Performance]'
Thank you for any ideas.
I haven't used Mondrian, but in SQL Server Analysis Services (SSAS), using the NULL member causes the range to go from one end of the level to the specified member.
In your case, the calculated member you're looking for might be something like this:
Sum( { NULL : [Time].[Month].CurrentMember }
, [Measures].[Suits]
)
You could also do a to the end of times calculated member using the NULL member on the other end, like so:
{ [Time].[Month].CurrentMember : NULL }
You can use PeriodsToDate function along with the allMember.
In your case it would be:
PeriodsToDate([Time.Time].[all_Time_member_name],[Time.Time].CurrentMember)