Frameworks for semantic annotation for user defined domain model - annotations

I have some documents and an ontology for some concepts. Are there any frameworks that automatically extracts those concepts from the given documents and creates triples? The ontology must contain special properties?
I found UIMA, but as far as I understood with UIMA I can do only something like this:
create some dictionaries which keep associations with the ontology
use this dictionary with ConceptMapper
write a CAS consumer that creates the triples and persists them -
I don't like this approach because I have to keep in sync the concepts from the ontology and the dictionary.
Can be UIMA used differently, or are there any advanced frameworks that can use directly my ontology with lets say some custom properties as input and based on it annotate the documents?
I want to use ontologies as domain model because I want to create further a knowledge base and ontologies seem more flexible than for example relational model.
Thanks.

After spending more time searching on Google I found GATE and more specifically OntoRoot Gazetter and Large KB Gazetteer.
OntoRoot Gazetteer is a type of a dynamically created gazetteer that is, in combination with few other generic GATE resources, capable of producing ontology-based annotations over the given content with regards to the given ontology. This gazetteer is a part of ‘Gazetteer_Ontology_Based’ plugin that has been developed as a part of the TAO project.
I didn't test them but these ones seem good solution candidates for my problem.

Related

How to expose read model from shared module

I am working on developing a set of assemblies that encapsulate parts of our domain that will be shared by many applications. Using the example of an order management system, one such assembly will contain all of the core operations an application can perform to/with an order. We are applying a simple version of CQS/CQRS so that all operations that change the state of the "system" are represented as public commands, such as CancelOrderCommand, ShipOrderCommand and CreateORderCommand. The command handlers are internal to the assembly.
The question I am struggling to answer is how to best expose the read model to consuming code?
The read model will be used by consuming code to perform queries. I don't know how all of the ways the read model will be used so the interface needs to be flexible to allow any query.
What complicates it for me is that I not only need to expose my aggregate root but there are also several "lookup" lists of related data that client applications may use. For example, each order has an associated OrderType which is data-driven (i.e., not an enum) and contains several properties that will drive some of our business rules that control what operations can/cannot be performed, etc. It is easy inside my module to manage this relationship; however, a client application that allows order creation will most likely need to display the list of possible OrderTypes to the user. As a result, I need to not only expose the list of Order aggregates but the supporting list of OrderTypes (and other lookup lists) from my read model.
How is this typically done?
I'm not sure what else to explain that will help trigger a solution, so please ask away...
I have never seen a CQRS based implementation expose a full dataset for ad-hoc querying so this is an interesting situation! In a typical CQRS scenario you would expose very specific queries because you may want to raise events when they are called (for caching for example - see this post for more details on that).
However since this is your design, let's not worry about "typical" or "correct" CQRS, I guess you just need a solution! One of the best new mechanisms for exposing data for flexible querying I have seen is the Open Data Protocol (OData). It will allow consumers to implement their own filtering, sorting and paging over a data source you expose.
Most implementations of this seem to deal with relational data. If you are dealing with a relational data source then OData might be a nice way to go. I suspect by your comment of "expose my aggregate root" that you might be using a document database? If so, there is one example I have seen of OData services on top of MongoDB: http://bloggingabout.net/blogs/vagif/archive/2012/10/11/mongodb-odata-provider-now-supports-arrays-and-nested-collections.aspx.
I hope that helps, OData is definitely worth looking into. It seems to be growing really quickly and is getting good support on both server and client technology platforms.

mongodb: how to express "schema"?

Once many teams work with the same mongodb database there needs to be some way to express what each document may contain. Otherwise the document will end be having "email", "mail", "email_addr" fields added by each team. What's the best way to represent this for the purpose of communication across teams?
Obviously, the best way is what the team is most comfortable with. It can be UML, whiteboard drawings, XML mappings, model code files, maybe even haiku poems :)
I personally prefer using an ODM (mongoid). It encourages you to specify all fields in the model class. Then you just need one glance at it to understand the schema.
What you can do is create your Objects first in a set of commons that all team members import into their projects. If you change schema design, you update Commons project and all team members import latest.
It's more about process and project management and less about technology given Mongo's schema-less design. One thing we find helpful is design your Tests first and lately, SoapUI and LoadUI have been excellent tools. Once you define your tests, it can stub the returns for you and produces HTML documentation you can distribute to team.
Check out: http://www.soapui.org/REST-Testing/working-with-rest-services.html
When you create collection, just add to it some first "reference" object that would have all the fields/sub-objects that object of this collection can possibly have and use it as "schema". You can even write validator that would check that new objects conform to this reference object.

Semi-automatic annotation tool - How to find RDF Triplets

I'm developing a semi-automatic annotation tool for medical texts and I am completely lost in finding the RDF triplets for annotation.
I am currently trying to use an NLP based approach. I have already looked into Stanford NER and OpenNLP and they both do not have models for extracting disease names.
My question is:
* How can I create a new NER model for extracting disease names? and can I get any help from the OpenNLP or Standford NERs?
* Is there another approach all-together - other than NLP - to extracting the RDF triplets from a text?
Any help would be appreciated! Thanks.
I have done something similar to what you need both with OpenNLP and LingPipe.
I found the exact dictionary-based chunking of LingPipe good enough for my use case and used that. Documentation available here: http://alias-i.com/lingpipe/demos/tutorial/ne/read-me.html
You can find a small demo here:
https://github.com/castagna/nerdf
If a gazetteer/dictionary approach isn't good enough for you, you can try creating your own model, OpenNLP has API for training models as well. Documentation is here: http://opennlp.apache.org/documentation/1.5.2-incubating/manual/opennlp.html#tools.namefind.training
Extracting RDF triples from natural language is a different problem than identify named entities. NER is a related and perhaps necessary step, but not enough. To extract an RDF statement from natural language not only you need to identify entities such as the subject and the object of a statement. But you also need to identify the verb and/or relationship of those entities and also you need to map those to URIs.

Neo4j (or any other graph database) modeling

I'm starting to work with graph databases, and in my team we've started modeling a graph for our software. The problem comes when we try to "document" the model, to see the structure of our database. With SQL databases you only have to look at the SQL schema.
We've spent some time reading neo4j blogs and documentation, but we've seen that the usual way to show how a graph works is with a minimal graph showing some sample data (Random samples: sample1, sample2, etc). That's great for educational purposes, but we'd love to be able to do it in a little more formal way. We'd like to set what kind of node can relate with another one, and with what kind of relationship, that kind of stuff.
Using Spring you can wrap the graph with classes, but it's very specific to Java and OO model, and we're working with Erlang. We're looking for some kind of formal language (SQL Schema equivalent), or a E-R model equivalent, or something like that.
One way to do this is to put the "meta-model" of your graph (a type network) in the graph as well and then connect the instances (nodes) to their meta-model-type. So you can visualize the meta-model using the graph visualization and at the same time use the meta-model to enforce additional constraints (by storing constraint information in the meta-model and using that when the actual model is updated) and also use the type-nodes of the meta-model to quickly access all "instance"-nodes of this type.
What is the domain you want to model?
A quick idea - could you use a subset of UML? Graph modeling seems to be closer to the domain, so maybe that's reasonable.
What we do is a generalization of the "example data" approach, where we include cardinality on each side of a relationship, as well as type and direction. I also often include a node "type" in the diagram (or some other specification of it's role/relation to domain models) instead of example data, and of course note the expected properties, their types, and whether they are optional. It's less than formal, but has served well so far.

Entity Framework and Encapsulation

I would like to experimentally apply an aspect of encapsulation that I read about once, where an entity object includes domains for its attributes, e.g. for its CostCentre property, it contains the list of valid cost centres. This way, when I open an edit form for an Extension, I only need pass the form one Extension object, where I normally access a CostCentre object when initialising the form.
This also applies where I have a list of Extensions bound to a grid (telerik RadGrid), and I handle an edit command on the grid. I want to create an edit form and pass it an Extension object, where now I pass the edit form an ExtensionID and create my object in the form.
What I'm actually asking here is for pointers to guidance on doing this this way, or the 'proper' way of achieving something similar to what I have described here.
It would depend on your data source. If you are retrieving the list of Cost Centers from a database, that would be one approach. If it's a short list of predetermined values (like Yes/No/Maybe So) then property attributes might do the trick. If it needs to be more configurable per-environment, then IoC or the Provider pattern would be the best choice.
I think your problem is similar to a custom ad-hoc search page we did on a previous project. We decorated our entity classes and properties with attributes that contained some predetermined 'pointers' to the lookup value methods, and their relationships. Then we created a single custom UI control (like your edit page described in your post) which used these attributes to generate the drop down and auto-completion text box lists by dynamically generating a LINQ expression, then executing it at run-time based on whatever the user was doing.
This was accomplished with basically three moving parts: A) the attributes on the data access objects B) the 'attribute facade' methods at the middle-tier compiling and generation dynamic LINQ expressions and C) the custom UI control that called our middle-tier service methods.
Sometimes plans like these backfire, but in our case it worked great. Decorating our objects with attributes, then creating a single path of logic gave us just enough power to do what we needed to do while minimizing the amount of code required, and completely eliminated any boilerplate. However, this approach was not very configurable. By compiling these attributes into the code, we tightly coupled our application to the datasource. On this particular project it wasn't a big deal because it was a clients internal system and it fit the project timeline. However, on a "real product" implementing the logic with the Provider pattern or using something like the Castle Projects IoC would have allowed us the same power with a great deal more configurability. The downside of this is there is more to manage, and more that can go wrong with deployments, etc.