How to access document metadata in UIMA?

How to access document metadata in UIMA? - metadata

How do you access document meta data such as Created and Modified date or Title etc. using UIMA and JCas?

By default, such information is not available in UIMA.
Some collection reader implementations may store such information in subtypes of DocumentAnnotation or other dedicated annotations. E.g. DKPro Core defines a DocumentMetaData type which derives from DocumentAnnotation and DKPro Core reader components store there the original path/URI of the document and may store a title if available.
Other component collections or collection reader implementations may do similar things.
Disclosure: I am working on DKPro Core and Apache UIMA

Related

Install database datatypes using MDG file or add-in?

I was able to define a custom programming language in Enterprise Architect with custom data types by navigating to Project > Settings > Code Engineering Datatypes.... When I create a MDG file, I have the option to include the programming language definition, and as far as I can tell, this is working - at least, in a new project that uses the MDG file, I can see the programming language.
Now I would like to have the same behavior for DBMS and database datatypes defined through Project > Settings > Database Datatypes.... From my tests, I get the impression that these types are not automatically included in the MDG file, and I haven't found a trivial way to include them. Is there a way to add the database datatypes to the MDG file as well? If not, is there a way to achieve the same result through the automation interface, e. g. by writing an add-in that creates the DBMS and the associated datatypes?

Going the MDG Technology way, the answer appears to be no. It's possible to trick EA (11) into exporting DB types in an MDG Technology, but even if they're in there, they will be ignored in projects that use the MDG Technology.
DB types and code engineering (or, sometimes, "programming language") datatypes are both stored in EA's t_datatypes table. The same product name can be used for both a programming language and a DB engine.
It looks like the MDG Technology Wizard scans this table looking for rows with "Code" in the Type column during the setup (Code Modules wizard page), but when the time comes to write the actual datatypes into the output file, it retrieves all rows with the specified ProductName.
This means that if you create a DB product and populate it with a set of datatypes, and then create a programming language product with the same name but just a single dummy datatype, your DB types will be included in the MDG Technology XML file along with the dummy type.
However, it appears that while the regular properties dialog (for classes etc) checks the loaded MDG Technologies in addition to the t_datatypes table in order to populate the Language drop-down list, the specialized properties dialog for database tables does not check the MDG Technologies when populating the corresponding Database drop-down. So even though the datatypes are in the file, you can't use them.
Going the Add-In route, the answer is yes.
Have your Add-In respond to the EA_FileOpen event and check the Repository.Datatypes collection to see if your DB types are installed and if not, add them.
You don't actually need to write an Add-In if you don't want, you can write an in-EA script. The only thing an Add-In can do that scripts can't is respond to events (which is why those are listed in the Add-In Model section of the help file). So with a script you would have to trigger the function manually.
There is also an API to manage project's reference data of which code / DB datatypes is one category, but it only gives you control of some of the categories (eg requirement types and constraint types), and the datatype category is not one of them.

How entity framework reveals properties and types of a code first entity in runtime?

I just want to know how Entity Framework internally works to reveal properties and their types in runtime, particularly in case of Code-First approach, where there won't be system generated code. Can some body give some heads up? I don't think System.Reflection was being used implicitly?

Code first was first presented to developers as part of the EF Feature
CTP1 in June 2009 with the name “code only.” The basic premise behind
this variation of using the EF was that developers simply want to
define their domain classes and not bother with a physical model.
However, the EF runtime depends on that model’s XML to coerce queries
against the model into database queries and then the query results
from the database back into objects that are described by the model.
Without that metadata, the EF can’t do its job. But the metadata does
not need to be in a physical file. The EF reads those XML files once
during the application process, creates strongly typed metadata
objects based on that XML, and then does all of that interaction with
the in-memory XML.
Code first creates in-memory metadata objects, too. But instead of
creating it by reading XML files, it infers the metadata from the
domain classes (see Figure 1). It uses convention to do this and then
provides a means by which you can add additional configurations to
further refine the model.
ModelBuilder will now take this additional information into account as
it’s creating the in-memory model and working out the database schema.
By Julie Lerman

How to store only metadata for an object of type sys:base or cm:object in Alfresco Share?

How to store only metadata for an object of type sys:base or cm:object? I have a type which is sub-type of sys:base. I need to store the metadata of it.
My doubts are:
1. Where should the data be stored?
How to retrieve the data, as in content and folder you can see the appearance as icon in the company home or user home etc. How does the metadata appear in a folder(is it like an icon of cm:object)?
With reference to the question posted earlier in alfresco forum (http://forums.alfresco.com/forum/developer-discussions/content-modeling/can-i-store-only-metadata-alfresco-11292012-0437) should the object be of type cm:content to store the metadata (except that they do not have a content property). Is there anyway to create metadata other than being subtype of cm:content?
Thank you

You can store metadata on instances of any type as long as the content model includes properties that meet your needs. Those might be out-of-the-box types or they might be custom types.
If you need to create objects that have a content stream, use cm:content. If you don't need to set a content stream you might consider using one of the two types you mentioned. But it won't necessarily hurt anything if you elect to simply create instances of cm:content that don't have a content stream.
If you want to use sys:base or cm:object, simply create an instance of either of those types. Alternatively, define your own sub-type of those types then instantiate that custom type and set your metadata.
If you don't know how to define custom types or you are unsure about how to set, update, or query metadata, you could start by reading this tutorial.
Also note that if you create instances of any type other than cm:content, cm:folder, or a custom type that inherits from one of those two types, you will be unable to do that using CMIS. That isn't the end of the world, just be aware that it is currently a limitation. Someday Alfresco will support the "item" type which is new in CMIS 1.1, but until then, CMIS in Alfresco can only work with documents, folders, and your custom types that inherit from those types.

mongodb: how to express "schema"?

Once many teams work with the same mongodb database there needs to be some way to express what each document may contain. Otherwise the document will end be having "email", "mail", "email_addr" fields added by each team. What's the best way to represent this for the purpose of communication across teams?

Obviously, the best way is what the team is most comfortable with. It can be UML, whiteboard drawings, XML mappings, model code files, maybe even haiku poems :)
I personally prefer using an ODM (mongoid). It encourages you to specify all fields in the model class. Then you just need one glance at it to understand the schema.

What you can do is create your Objects first in a set of commons that all team members import into their projects. If you change schema design, you update Commons project and all team members import latest.
It's more about process and project management and less about technology given Mongo's schema-less design. One thing we find helpful is design your Tests first and lately, SoapUI and LoadUI have been excellent tools. Once you define your tests, it can stub the returns for you and produces HTML documentation you can distribute to team.
Check out: http://www.soapui.org/REST-Testing/working-with-rest-services.html

When you create collection, just add to it some first "reference" object that would have all the fields/sub-objects that object of this collection can possibly have and use it as "schema". You can even write validator that would check that new objects conform to this reference object.

Frameworks for semantic annotation for user defined domain model

I have some documents and an ontology for some concepts. Are there any frameworks that automatically extracts those concepts from the given documents and creates triples? The ontology must contain special properties?
I found UIMA, but as far as I understood with UIMA I can do only something like this:
create some dictionaries which keep associations with the ontology
use this dictionary with ConceptMapper
write a CAS consumer that creates the triples and persists them -
I don't like this approach because I have to keep in sync the concepts from the ontology and the dictionary.
Can be UIMA used differently, or are there any advanced frameworks that can use directly my ontology with lets say some custom properties as input and based on it annotate the documents?
I want to use ontologies as domain model because I want to create further a knowledge base and ontologies seem more flexible than for example relational model.
Thanks.

After spending more time searching on Google I found GATE and more specifically OntoRoot Gazetter and Large KB Gazetteer.
OntoRoot Gazetteer is a type of a dynamically created gazetteer that is, in combination with few other generic GATE resources, capable of producing ontology-based annotations over the given content with regards to the given ontology. This gazetteer is a part of ‘Gazetteer_Ontology_Based’ plugin that has been developed as a part of the TAO project.
I didn't test them but these ones seem good solution candidates for my problem.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse