How can I process Persian texts using Rapid Miner? - classification

I am working on a persian classification project. Persian texts is very similar to arabic texts. when I use Tokenize, it does not show any word in its wordlist page and in Example Set Page, The Image below will be shown:
I need to classify persian text to some category, but I dont know How?.
I Follow some steps like this:
1- Read Excel(using Read Excel component) dataset with 2 column => col1:persian Text ,col2: Category
2- I use Set role component to labeling data
3- I use Process Documents from Data component containing :(Tokenize(with any mode not change anythings) and Filter Token(min:5,max:25) inside it)
4- Then I use Cross Validation Component to train with SVM or Basian and in test mode to get performance.
The program runs correctly and performance is not bad for e.g accuracy is 50% but I think my work is Wrong.
Any help would be appreciated.

first, make sure your text data have UTF-8 encoding
and if u use filter tokens(by length) 5 is too much for minimum try 2 or at least 3
also, I recommend using Filter Stopwords (Dictionary) operator and the dictionary should have Persian stopwords in each line
hope it will help u

Related

PyQGIS: change expression for datadefined symbology

I have a QGIS project that displays calculation results on a map. There are several vector layers, each with >100 calculated fields. The way the data should be visualised is very similar for all these layers/fields. I try to write a script that duplicates a template layer, and changes the expressions for the symbology according to the selected fieldname.
Below is a screenshot of the properties I try to access (if I were to change them using the UI).
How do I access/change the expressions of the line width and line offset of a graduated symbology in PyQGIS?
In case someone else runs into this issue.
I solved it using a workaround. I saved the style of the source layer to a qml-file, for the duplicated layers I create a temporary copy of the qml-file, do a search-and-replace on the qml-file and apply this to the new layer using
newLayer.loadNamedStyle(pathToTheTempQmlFile)
Hope this helps:
rule=layer.renderer().rootRule().children()[0]
rule.setFilterExpression('whatever')
see:
QgsRuleBasedRenderer.Rule

Adding preview option in cq:dialog?

Is it possible to add a session in cq:dialog which renders whatever data is supplied in the fields and previews it out in real time. In the simplest of scenarios, I need to add two numbers and when I enter both the numbers the cq:dialog should preview it's output as 4. There should be two sessions in the the cq:dialog, like two columns, the left one to enter value to the fields and the right to display the rendered output. How to achieve this? Is it possible to?
You can make use of "event handlers". Adobe docs has a simple example using JQuery that you can customize for your requirement.
https://helpx.adobe.com/experience-manager/using/creating-touchui-events.html

Stacking Filters Weka Explorer

Hi I'm new to Weka and using the explorer to try to do some text classification.
I have a training set which I have tested using the "word to string vector filter" and an "attribute selection" filter. However I want to be able to test the classifier on unseen data and so have tried using the "supplied test set option". After reading around I realise that the word to string vector filter has to be applied at the same time to both sets so I have used the "Filtered Classifier" option and proceeded to do this. However I cannot seem to apply the Attribute Selection filter as well??
If I am going about this the wrong way please let me know? Or if there is an option to apply or stack multiple filters when classifying that'd be great. Cheers
You have to chain filters (StringToWordVector and AttributeSelection) using MultiFilter, they behave as a single filter you can put into a FilteredClassifier. Check a detailed tutorial at Text Mining in WEKA Revisited: Selecting Attributes by Chaining Filters.

Set xlsx to recalculate formulae on open

I am generating xlsx files and would like to not have to compute the values of all formulae during this process.
That is, I would like to set <v> to 0 (or omit it) for cells with an <f>, and have Excel fill in the values when it is opened.
One suggestion was to have a macro run Calculate on startup, but have been unable to find a complete guide on how to do this with signed macros to avoid prompting the user. A flag you can set somewhere within the xlsx would be far better.
Edit: I'm not looking for answers that involve using Office programs to make changes. I am looking for file format details.
The Python module XlsxWriter sets the formula <v> value to 0 (unless the actual value is known) and the <calcPr> fullCalcOnLoad attribute to true in the xl/workbook.xml file:
<calcPr fullCalcOnLoad="1"/>
This works for all Excel and OpenOffice, LibreOffice, Google Docs and Gnumeric versions that I have tested.
The place it won't work is for non-spreadsheet applications that cannot re-calculate the formula value such as file viewers.
If calculation mode is set to automatic, Excel always (re)calculates workbooks on open.
So, just generate your files with calculation mode set to "Automatic".
In xl/workbook.xml, add following node to workbook node:
<calcPr calcMode="auto"/>
Also check Description of how Excel determines the current mode of calculation.
You can use macros as suggested, however you will create a less secure and less compatible workbook without avoiding user interaction to force calculation.
If you opt by using VBA, you may Application.Calculate in Workbook_Open event.
In your XML contents, simply omit the <v> entity in each cell that have a formula, this will force Ms Excel to actualize the formula whatever the Excel options are.
Instead of:
<c r="B2" s="1">
<f>SUM(A1:C1)</f>
<v>6</v>
</c>
Have:
<c r="B2" s="1">
<f>SUM(A1:C1)</f>
</c>
If you have to actualize formula in an already given XML contents, then you can code easily a small parser that search for each <c> entities. If the <c> entity has a <f> entity, then delete its <v> entity.
Faced the same problem when exporting xlsx'es via openxml (with fastest SAX + template file approach w/o zip stream rewinds).
Despite Calculation option=Automatic, no recalculation on opening the file.
Furthermore no recalculation via Calculate Now and Calculate Sheet buttons.
Only upon selecting the cell and pressing enter ;(
Original formula: SUM(A3:A999)
Solution:
Create an internal hidden sheet
Place end row number (999 in my case) into any cell in hidden sheet (P1 in my case)
Reference row number in the cell via INDIRECT operator
Final formula: SUM(A3:INDIRECT("A"&Internal!P1))
Please refer to the attached gifs
before.gif
after.gif
P.S.
Theoretically, in P1 you can implement dynamic row number calculation via smth like =LOOKUP(2;1/(Sheet1!A:A<>"");ROW(Sheet1!A:A)), but my customers were satisfied with hardcoded row number solution

Convert this Html text in simple text in iphone

<strong>Occupant Safety</strong>
All ILX models come standard with dual front, front side and full-length side curtain airbags in addition to traction and stability control systems and electronic brakeforce distribution.
<strong>Key competitors</strong>
As the new entry-level offering from Acura, the ILX will compete with the Buick Verano and the Audi A3. Buyers can also consider cross-shopping it against Acura's own TSX, which is a size larger yet only marginally more expensive.
Please say how to convert this html tags in simple text in iphone
try this..
https://github.com/mwaterfall/MWFeedParser
Import the Category folder.
Define only..
import "NSString+HTML.h"
And write like this...
simpletxt.text = [YourHTMLString stringByConvertingHTMLToPlainText];
You need to escape the html tags in the string, you have some NSString categories to get it worked, follow this
Convert formatted HTML text string to NSString parts
In the accepted answer you can find your solution, how to escape html tags in the string.