Mark Diagnosis with a certain ICD-10 Code as main diagnosis - uima

I would like to mark all diagnoses whose conceptId starts with "G35." as main diagnosis. How can I implement this?
d:Diagnosis{d.conceptId.startswith("G35.") -> MainDiagnosis};
d:Diagnosis{d.conceptId[0:3] == "G35." -> MainDiagnosis};
All the best
Philipp

You could use REGEXP condition to match a pattern on the value of a feature (i.e. Diagnosis.conceptId).
A solution in your case would be something like this:
d:Diagnosis{REGEXP(d.conceptId, "^G35.*") -> MainDiagnosis};
For more information on REGEXP Condition, feel free to consult the documentation
Another option would be to use StringFunctions; similar to what you tried to do in the first rule.
d:Diagnosis{startsWith(d.conceptId, "G35") -> MainDiagnosis};
However this requires you to activate an optional extension org.apache.uima.ruta.string.bool.BooleanOperationsExtension in your Ruta Analysis Engine by setting its parameter PARAM_ADDITIONAL_EXTENSIONS

Related

Simple `to_tsvector` configuration - postgres

How can I change the to_tsvector configuration to use a simple tokenization rule like:
lowercase
split by spaces only
Executing the following query:
SELECT to_tsvector('english', 'birthday=19770531 Name=John-Oliver Age=44 Code=AAA-345')
I get these lexemes:
'-345':9 '19770531':2 '44':6 'aaa':8 'age':5 'birthday':1 'code':7 'john':4 'name':3
The kind of searching I'm looking for is like:
(!birthday | birthday=19770531) & (code=AAA-345)
It means, get me all records that has a text "birthday=19770531" or doesn't have "birthday" at all, and a text equals to "code=AAA-345"). The way lexemes are being created it is not possible. I was expecting to have something like this:
'birthday=19770531':1 'age=44':2 'code=aaa-345':4 'name=john-oliver':3
You would have to code a custom parser. This can only be done in C.
But you might be able to use the existing testing parser test_parser, it seems to do what you want. If not, it would at least be a good starting point.
The problem may be that this is in src/test/modules/, and I don't think it ships with most installation packaging. So it might take some effort to get it to install. It would depend on your OS, version, and package manager.

Match part of a string with regex

I have two arrays of strings and I want to check if a string of array a matches a string from array b. Those strings are phone numbers that might come in different formats. For example:
Array a might have a phone number with prefix like so +44123123123 or 0044123123123
Array b have a standard format without prefixes like so 123123123
So I'm looking for a regex that can match a part of a string like +44123123123 with 123123123
Btw I'm using Swift but I don't think there's a native way to do it (at least a more straightforward solution)
EDIT
I decided to reactivate the question after experimenting with the library #Larme mentioned because of inconsistent results.
I'd prefer a simper solution as I've stated earlier.
SOLUTION
Thanks guys for the responses. I saw many comments saying that Regex is not the right solution for this problem. And this is partly true. It could be true (or false) depending on my current setup/architecture ( which thinking about it now I realise that I should've explained better).
So I ended up using the native solution (hasSuffix/contains) but to do that I had to do some refactoring on the way the entire flow was structured. In the end I think it was the least complicated solution and more performant of the two. I'll give the bounty to #Alexey Inkin for being the first to mention the native solution and the right answer to #Ωmega for providing a more complete solution.
I believe regex is not the right approach for this task.
Instead, you should do something like this:
var c : [String] = b.filter ({ (short : String) -> Bool in
var result = false
for full in a {
result = result || full.hasSuffix(short)
}
return result
})
Check this demo.
...or similar solution like this:
var c : [String] = b.filter ({ (short : String) -> Bool in
for full in a {
if full.hasSuffix(short) { return true }
}
return false
})
Check this demo.
As you do not mention requirements to prefixes, the simplest solution is to check if string in a ends with a string in b. For this, take a look at https://developer.apple.com/documentation/swift/string/1541149-hassuffix
Then, if you have to check if the prefix belongs to a country, you may replace ^00 with + and then run a whitelist check against known prefixes. And the prefix itself can be obtained as a substring by cutting b's length of characters. Not really a regex's job.
I agree with Alexey Inkin that this can also nicely be solved without regex. If you really want a regex, you can try something like the following:
(?:(\+|00)(93|355|213|1684|376))?(\d+)
^^^^^^^^^^^^^^^^^^^^^ Add here all your expected country prefixes (see below)
^^^ ^^ Match a country prefix if it exists but don't give it a group number
^^^^^^^ Match the "prefix-prefix" (+ or 00)
^^^^ Match the local phone number
Unfortunatly with this regex, you have to provide all the expected country prefixes. But you can surely get this list online, e.g. here: https://www.countrycode.org
With this regex above you will get the local phone number in matching group 3 (and the "prefix-prefix" in group 1 and the country code in group 2).

Can we set tolerance level on regex annotator in Ruta?

I am annotating Borrower Name
"Borrower Name" -> BorrowerNameKeyword ( "label" = "Borrower Name");
But I get this text post OCR analysis. At times I might get Borrower Name as B0rr0wer Nane. Is this possible to set tolerance limit so that this text gets annotated as BorrowerNameKeyword?
Is their any other approach which could help here?
I could think of dictionary correction but that wont help as it could auto correct right words.
You could achieve that with regular expressions in UIMA Ruta. For you particular example the following rule should work:
"B.rr.wer\\sNa.e" -> BorrowerName;
Likewise, you can create more variants of regular expressions to cover the OCR errors.

Where are `_stdlib_getTypeName()` & `_stdlib_getDemangledTypeName()` declared? -- Swift

I'm toying with some introspection in Swift and it seems like if you want to get the class of an object in a printable version, these are the best options. (introduced in beta 6.0).
_stdlib_getTypeName(someClass)
_stdlib_getDemangledTypeName(someClass) // A slightly cleaner version
I was hoping to find other introspection methods, but unfortunately, command clicking the methods take me to the Swift header and they're not declared there.
My other option would be to type _stdlib and wait for autocomplete or control space to see my options. Unfortunately, none of these methods autocomplete.
Is there a file where these and other stdlib functions are declared, or is there documentation for these methods anywhere?
I found the answer to my question via a tips and tricks blog post from realm here -- notably, the post by JP Simard.
The best way to see other methods along these lines is to go to your terminal and type:
cd `xcode-select -p`/Toolchains/XcodeDefault.xctoolchain/usr/lib/swift/macosx
And then enter the following:
nm -a libswiftCore.dylib | grep "T _swift_stdlib"
This will give you a readout of all available functions that looks something like this:
00000000001a43c0 T _swift_stdlib_NSObject_isEqual
00000000001a4490 T _swift_stdlib_NSStringHasPrefixNFD
00000000001a44f0 T _swift_stdlib_NSStringHasSuffixNFD
00000000001a4450 T _swift_stdlib_NSStringNFDHashValue
00000000001a2650 T _swift_stdlib_atomicCompareExchangeStrongPtr
00000000001a2670 T _swift_stdlib_atomicCompareExchangeStrongUInt32
00000000001a2690 T _swift_stdlib_atomicCompareExchangeStrongUInt64
00000000001a2700 T _swift_stdlib_atomicFetchAddUInt32
00000000001a2710 T _swift_stdlib_atomicFetchAddUInt64
00000000001a26f0 T _swift_stdlib_atomicLoadPtr
00000000001a26d0 T _swift_stdlib_atomicLoadUInt32
00000000001a26e0 T _swift_stdlib_atomicLoadUInt64
00000000001a26b0 T _swift_stdlib_atomicStoreUInt32
00000000001a26c0 T _swift_stdlib_atomicStoreUInt64
00000000001a4410 T _swift_stdlib_compareNSStringDeterministicUnicodeCollation
000000000017c560 T _swift_stdlib_conformsToProtocol
00000000001a5a80 T _swift_stdlib_demangleName
000000000017c8e0 T _swift_stdlib_dynamicCastToExistential1
000000000017c6f0 T _swift_stdlib_dynamicCastToExistential1Unconditional
00000000001a5910 T _swift_stdlib_getTypeName
I haven't found any documentation, but a lot of these function names are fairly explanatory and one can always discover a lot through trying them out!
All answers are good, but the result of second step that we can not use. We dont even know is this function usable or correct...
I've been trapped in these result about 1 day.
Finally I dump all funcitons & symbols for stdlib from libswiftcore.dylib, i found this..
command:
nm libswiftcore.dylib | grep "_stdlib_"
We can find one line from result:
00000000000b2ca0 T __TFSs19_stdlib_getTypeNameU__FQ_SS
Remove first underscore "_" then we get this:
_TFSs19_stdlib_getTypeNameU__FQ_SS
Maybe we can view this website to understand the meaning of "_TFSs19_stdlib_getTypeNameU__FQ_SS",
But I think we can get the correct function description faster!!
So, we demangle like this below in xcode lldb window:
(lldb) p _stdlib_demangleName("_TFSs19_stdlib_getTypeNameU__FQ_SS")
(String) $R0 = "Swift._stdlib_getTypeName <A>(A) -> Swift.String"
Finally we can expose more undocumented functions in swift that we never seen before, we can try another one that we never heard like this:
(lldb) p _stdlib_demangleName("_TFSs24_stdlib_atomicLoadARCRefFT6objectGVSs20UnsafeMutablePointerGSqPSs9AnyObject____GSqPS0___")
(String) $R1 = "Swift._stdlib_atomicLoadARCRef (object : Swift.UnsafeMutablePointer<Swift.Optional<Swift.AnyObject>>) -> Swift.Optional<Swift.AnyObject>"
All clear~ Thank god!!
Share this to you, wish it can help~
:D

UIMA RUTA: How to check if String variable is in StringList?

I am looking for something like this:
WORDLIST lemmas = 'lemmas.txt';
DECLARE Test;
BLOCK(AnnotateTests) Token{} {
STRING lemma;
Token{->GETFEATURE("lemma", lemma)};
INLIST(lemma, lemmas) -> MARK(Action); // <- How to do this?
}
I know this is broken code, but I would like to know how I can supply a list of terms by a text file and annotate all instances of, say, Token, who have a certain feature (Lemma in the example) value among the ones in the list. I know String equality is possible, but list membership I was not able to find in the documentation or figure out myself.
Thanks!
UIMA Ruta 2.1.0: Unfortunately, the INLIST condition does not accept additional arguments, but only checks on the covered text of the matched annotation. So you cannot use that. The CONTAINS condition accepts an additional argument, but not word lists. You can also not apply the wordlist with MARKFAST since the dictionary check is token-based.
The best solution for this problem is to ask the developers to add the functionality, or adding an external condition that provides the functionality.
In UIMA Ruta 2.1.0, you could use StringListExpressions instead of word lists:
STRINGLIST LemmaSL = {"cat", "dog"}; // the content of the wordlist
Token{CONTAINS(LemmaSL, Token.lemma) -> MARK(Action)};
In UIMA Ruta 2.2.0, the INLIST condition is able to process an additional argument that replaces the covered text of the matched annotation, which should solve your problem:
WORDLIST LemmaList = 'lemmas.txt';
Token{INLIST(LemmaList, Token.lemma) -> MARK(Action)};
DISCLAIMER: I am a developer of Apache UIMA Ruta.