Issue while applying COUNT condition in UIMA RUTA - uima

I used COUNT Condition to find the number of punctuations in an annotation.But I didn't received the expected output.
DECLARE Sentence(INT pmcount);
"Conflicts of interest"->Sentence;
DECLARE SentenceLastToken;
Sentence{-PARTOF(SentenceLastToken)->MARKLAST(SentenceLastToken)};
INT Pmcount=0;
Sentence->{ANY+?{->SHIFT(Sentence,1,1,true)} SentenceLastToken{PARTOF(PM)};};
Sentence{COUNT(PM,Pmcount)->Sentence.pmcount=Pmcount};
Sample Input:
Conflicts of interest.
Expected Output:
Conflicts of interest
pmcount:0
Received Output:
Conflicts of interest
pmcount:1
I'm facing this problem only if there is any PM after the Annotation value.

Related

How to remove the first 2 rows using zipwithindex using spark scala

I have two headers in the file. have to remove them. i tried with zipwithindex. it will assign the index from zero onwards. But its showing error while performing filter condition on it.
val data=spark.sparkContext.textFile(filename)
val s=data.zipWithIndex().filter(row=>row[0]>1) --> throwing error here
Any help here please.
Sample data:
===============
sno,empno,name --> need to remove
c1,c2,c3 ==> need to remove
1,123,ramakrishna
2,234,deepthi
Error: identifier expected but integer literal found
value row of type (String,Long) does not take type parameters.
not found: type <error>
If you have to remove rows, you can use row_num and remove rows easily by filter row_num>2.

Find category of MATLAB mlint warning ID

I'm using the checkcode function in MATLAB to give me a struct of all error messages in a supplied filename along with their McCabe complexity and ID associated with that error. i.e;
info = checkcode(fileName, '-cyc','-id');
In MATLAB's preferences, there is a list of all possible errors, and they are broken down into categories. Such as "Aesthetics and Readability", "Syntax Errors", "Discouraged Function Usage" etc.
Is there a way to access these categories using the error ID gained from the above line of code?
I tossed around different ideas in my head for this question and was finally able to come up with a mostly elegant solution for how to handle this.
The Solution
The critical component of this solution is the undocumented -allmsg flag of checkcode (or mlint). If you supply this argument, then a full list of mlint IDs, severity codes, and descriptions are printed. More importantly, the categories are also printed in this list and all mlint IDs are listed underneath their respective mlint category.
The Execution
Now we can't simply call checkcode (or mlint) with only the -allmsg flag because that would be too easy. Instead, it requires an actual file to try to parse and check for errors. You can pass any valid m-file, but I have opted to pass the built-in sum.m because the actual file itself only contains help information (as it's real implementation is likely C++) and mlint is therefore able to parse it very rapidly with no warnings.
checkcode('sum.m', '-allmsg');
An excerpt of the output printed to the command window is:
INTER ========== Internal Message Fragments ==========
MSHHH 7 this is used for %#ok and should never be seen!
BAIL 7 done with run due to error
INTRN ========== Serious Internal Errors and Assertions ==========
NOLHS 3 Left side of an assignment is empty.
TMMSG 3 More than 50,000 Code Analyzer messages were generated, leading to some being deleted.
MXASET 4 Expression is too complex for code analysis to complete.
LIN2L 3 A source file line is too long for Code Analyzer.
QUIT 4 Earlier syntax errors confused Code Analyzer (or a possible Code Analyzer bug).
FILER ========== File Errors ==========
NOSPC 4 File <FILE> is too large or complex to analyze.
MBIG 4 File <FILE> is too big for Code Analyzer to handle.
NOFIL 4 File <FILE> cannot be opened for reading.
MDOTM 4 Filename <FILE> must be a valid MATLAB code file.
BDFIL 4 Filename <FILE> is not formed from a valid MATLAB identifier.
RDERR 4 Unable to read file <FILE>.
MCDIR 2 Class name <name> and #directory name do not agree: <FILE>.
MCFIL 2 Class name <name> and file name do not agree: <file>.
CFERR 1 Cannot open or read the Code Analyzer settings from file <FILE>. Using default settings instead.
...
MCLL 1 MCC does not allow C++ files to be read directly using LOADLIBRARY.
MCWBF 1 MCC requires that the first argument of WEBFIGURE not come from FIGURE(n).
MCWFL 1 MCC requires that the first argument of WEBFIGURE not come from FIGURE(n) (line <line #>).
NITS ========== Aesthetics and Readability ==========
DSPS 1 DISP(SPRINTF(...)) can usually be replaced by FPRINTF(...).
SEPEX 0 For better readability, use newline, semicolon, or comma before this statement.
NBRAK 0 Use of brackets [] is unnecessary. Use parentheses to group, if needed.
...
The first column is clearly the mlint ID, the second column is actually a severity number (0 = mostly harmless, 1 = warning, 2 = error, 4-7 = more serious internal issues), and the third column is the message that is displayed.
As you can see, all categories also have an identifier but no severity, and their message format is ===== Category Name =====.
So now we can just parse this information and create some data structure that allows us to easily look up the severity and category for a given mlint ID.
Again, though, it can't always be so easy. Unfortunately, checkcode (or mlint) simply prints this information out to the command window and doesn't assign it to any of our output variables. Because of this, it is necessary to use evalc (shudder) to capture the output and store it as a string. We can then easily parse this string to get the category and severity associated with each mlint ID.
An Example Parser
I have put all of the pieces I discussed previously together into a little function which will generate a struct where all of the fields are the mlint IDs. Within each field you will receive the following information:
warnings = mlintCatalog();
warnings.DWVRD
id: 'DWVRD'
severity: 2
message: 'WAVREAD has been removed. Use AUDIOREAD instead.'
category: 'Discouraged Function Usage'
category_id: 17
And here's the little function if you're interested.
function [warnings, categories] = mlintCatalog()
% Get a list of all categories, mlint IDs, and severity rankings
output = evalc('checkcode sum.m -allmsg');
% Break each line into it's components
lines = regexp(output, '\n', 'split').';
pattern = '^\s*(?<id>[^\s]*)\s*(?<severity>\d*)\s*(?<message>.*?\s*$)';
warnings = regexp(lines, pattern, 'names');
warnings = cat(1, warnings{:});
% Determine which ones are category names
isCategory = cellfun(#isempty, {warnings.severity});
categories = warnings(isCategory);
% Fix up the category names
pattern = '(^\s*=*\s*|\s*=*\s*$)';
messages = {categories.message};
categoryNames = cellfun(#(x)regexprep(x, pattern, ''), messages, 'uni', 0);
[categories.message] = categoryNames{:};
% Now pair each mlint ID with it's category
comp = bsxfun(#gt, 1:numel(warnings), find(isCategory).');
[category_id, ~] = find(diff(comp, [], 1) == -1);
category_id(end+1:numel(warnings)) = numel(categories);
% Assign a category field to each mlint ID
[warnings.category] = categoryNames{category_id};
category_id = num2cell(category_id);
[warnings.category_id] = category_id{:};
% Remove the categories from the warnings list
warnings = warnings(~isCategory);
% Convert warning severity to a number
severity = num2cell(str2double({warnings.severity}));
[warnings.severity] = severity{:};
% Save just the categories
categories = rmfield(categories, 'severity');
% Convert array of structs to a struct where the MLINT ID is the field
warnings = orderfields(cell2struct(num2cell(warnings), {warnings.id}));
end
Summary
This is a completely undocumented but fairly robust way of getting the category and severity associated with a given mlint ID. This functionality existed in 2010 and maybe even before that, so it should work with any version of MATLAB that you have to deal with. This approach is also a lot more flexible than simply noting what categories a given mlint ID is in because the category (and severity) will change from release to release as new functions are added and old functions are deprecated.
Thanks for asking this challenging question, and I hope that this answer provides a little help and insight!
Just to close this issue off. I've managed to extract the data from a few different places and piece it together. I now have an excel spreadsheet of all matlab's warnings and errors with columns for their corresponding ID codes, category, and severity (ie, if it is a warning or error). I can now read this file in, look up ID codes I get from using the 'checkcode' function and draw out any information required. This can now be used to create analysis tools to look at the quality of written scripts/classes etc.
If anyone would like a copy of this file then drop me a message and I'll be happy to provide it.
Darren.

Having issues with a substring function in Tableau

I am trying to break a string up into substrings based on the position of a repeating set of characters.
The source string [UPDATES] looks like this, the number of characters between the repeating portions varies wildly.
"04/24/15 15:12:54 (PZPJ3F): Task update. 04/24/15 15:12:54 (PZPJ3F): Task update. 04/22/15 15:17:13 (SZGQ3T): updated due date prior to global problem 04/22/15 12:28:09 (PZPJ3F): Task updates."
I am trying to break them up into separate substrings so that I can display them side by side as separate columns as below
Column1 = |04/24/15 15:12:54 (PZPJ3F): Task update.
Column2 = |04/24/15 15:12:54 (PZPJ3F): Task update.
Column3 = |04/22/15 15:17:13 (SZGQ3T): updated due date prior to global problem|
I got the first portion to work with
LEFT([UPDATES],FIND([UPDATES],"): ",28)-27)
But my attempts at using FIND to locate the next occurrence of "): " and use it to begin a MID are not working, specifically where I try to end them using a FIND function.
IF [Mark1]>0 THEN MID([UPDATES],[Mark1]-25,[Mark2])
ELSE ""
END
Where Mark1 is
FLOAT(FIND([UPDATES],"): ",(FIND([UPDATES],"): ")+1)))
and Mark2 is
FLOAT(FIND([UPDATES],") ",[Mark1]+1))
I really went down the rabbit hole at the end of my attempt.
I am using Tableau 8.2, so Tableau 9 functions aren't an option (looking forward to FIND Nth!
Thanks in advance.
The key is that find() takes a second optional argument as a start position.
So in Tableau 8.2, I would write a simple calc to find the position of the first separator. Then reference that calculated field twice in your final calculated fields to yield the length of the first substring and the starting point of the second one.
Separating out substrings is painful prior to Tableau 9. Extracting the first in a list isn't bad, getting the second is clumsy and after that it gets pretty ugly.
Best approach is to upgrade to version 9 or do some preprocessing to pull out the substrings.

search a name in dataset error :Undefined function 'eq' for input arguments of type 'cell'

I load a file which has some columns with data.The first line contains ,CITY,YEAR2000 .
The first column has name of cities.The other columns contain number data.
I am trying to search for a specific city using:
data(data.CITY=='Athens',3:end)
where
data = dataset('File','cities.txt','Delimiter',',')
but I receive an error
Undefined function 'eq' for input arguments of type 'cell'.
--------UPDATE-----------------------------
Ok, use :
data(find(strncmp(data.CITY,'Athens',length('Athens'))),3:end)
Have you tried with using strncmp tangled with find?
I would use it this way
find(strncmp(data.CITY,'ATHENS',length('ATHENS')))
EDIT
Other opportunities to exploit would encompass strfind
strfind(data.CITY,'ATHENS')
EDIT 2
You could also try with
data(ismember(data.CITY,'ATHENS'),3:end)
This should lead you to the results you expect (at least I guess so).
EDIT 3
Given your last request I would go for this solution:
inp = input('Name of the CITY: ','s')
Name of the City: ATHENS
data(find(strncmp(data.CITY,inp,length(inp))),3:end)

How to limit the number of times a rule can repeat in XTEXT?

My language has certain keywords which only accept values of certain length range (say, between 5 and 10 decimal numbers). This id correct:
KeyWord = 01234
This is incorrect:
KeyWord = 1234
I have a rule;
KeyWord:
'KeyWord' '=' INT+;
How to limit the number of times INT can repeat? This would so much easier if it was a more regexp-like syntax
I would implement this as a validation check instead of trying to fit this in the grammar itself. See http://www.eclipse.org/Xtext/documentation/2_1_0/050-validation.php
This will result in better error recovery and better error messages. It even allows quick-fixes.