TreeTagger can't find Charsetname when used in Uima Pipeline

TreeTagger can't find Charsetname when used in Uima Pipeline - uima

I would like to use the TreeTagger for chunking inside an uima pipeline for a German text. The chunking works fine when I start the Tagger with cmd, but causes the following error when used in the pipeline:
org.apache.uima.analysis_engine.AnalysisEngineProcessException: Annotator processing failed.
at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:401)
at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:308)
at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:570)
at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.<init>(ASB_impl.java:412)
at org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:344)
at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:265)
at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:269)
at org.apache.uima.fit.pipeline.SimplePipeline.runPipeline(SimplePipeline.java:150)
at de.fraunhofer.fkie.re_analysis.RA_pipeline.main(RA_pipeline.java:107)
Caused by: java.lang.NullPointerException: charsetName
at java.io.InputStreamReader.<init>(InputStreamReader.java:99)
at org.annolab.tt4j.TreeTaggerWrapper$Reader.<init>(TreeTaggerWrapper.java:946)
at org.annolab.tt4j.TreeTaggerWrapper.process(TreeTaggerWrapper.java:598)
at de.tudarmstadt.ukp.dkpro.core.treetagger.TreeTaggerChunker.process(TreeTaggerChunker.java:293)
at org.apache.uima.analysis_component.JCasAnnotator_ImplBase.process(JCasAnnotator_ImplBase.java:48)
at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:385)
... 8 more
I suppose I should specify the parameter "Chunk_Mapping_Location", but I don't know to which file. The chunker is initialised in the following way:
AnalysisEngineDescription chunker =
AnalysisEngineFactory.createEngineDescription(
TreeTaggerChunker.class,
TreeTaggerChunker.PARAM_EXECUTABLE_PATH, "C:/TreeTagger/bin/tree-tagger.exe",
TreeTaggerChunker.PARAM_MODEL_LOCATION, "C:/TreeTagger/lib/german-chunker-utf8.par",
TreeTaggerChunker.PARAM_PERFORMANCE_MODE, true,
TreeTaggerChunker.PARAM_PRINT_TAGSET, true,
TreeTaggerChunker.PARAM_LANGUAGE, "de"
);

Looks like TreeTaggerChunking is missing PARAM_MODEL_ENCODING which prevents it being usable with directly specified models. I have opened an issue.
You can get around this by packaging the TreeTagger models as JARs using the build.xml Ant script included with DKPro Core. The process is described in the DKPro Core developer documentation.
Disclosure: I am one of the DKPro Core developers.

Related

JPAM Configuration for Apache Drill

I'm trying to configure PLAIN authentification based on JPAM 1.1 and am going crazy since it doesnt work after x times checking my syntax and settings. When I start drill with cluster-id and zk-connect only, it works, but with both options of PLAIN authentification it fails. Since I started with pam4j and tried JPAM later on, I kept JPAM for this post. In general I don't have any preferences. I just want to get it done. I'm running Drill on CentOS in embedded mode.
I've done anything required due to the official documentation:
I downloaded JPAM 1.1, uncompressed it and put libjpam.so into a specific folder (/opt/pamfile/)
I've edited drill-env.sh with:
export DRILLBIT_JAVA_OPTS="-Djava.library.path=/opt/pamfile/"
I edited drill-override.conf with:
drill.exec: {
cluster-id: "drillbits1",
zk.connect: "local",
impersonation: {
enabled: true,
max_chained_user_hops: 3
},
security: {
auth.mechanisms: ["PLAIN"],
},
security.user.auth: {
enabled: true,
packages += "org.apache.drill.exec.rpc.user.security",
impl: "pam",
pam_profiles: [ "sudo", "login" ]
}
}
It throws the subsequent error:
Error: Failure in starting embedded Drillbit: org.apache.drill.exec.exception.DrillbitStartupException: Problem in finding the native library of JPAM (Pluggable Authenticator Module API). Make sure to set Drillbit JVM option 'java.library.path' to point to the directory where the native JPAM exists.:no jpam in java.library.path (state=,code=0)
I've run that *.sh file by hand to make sure that the necessary path is exported since I don't know if Drill is expecting that. The path to libjpam should be know known. I've started Sqlline with sudo et cetera. No chance. Documentation doesn't help. I don't get it why it's so bad and imo incomplete. Sadly there is 0 explanation how to troubleshoot or configure basic user authentification in detail.
Or do I have to do something which is not told but expected? Are there any Prerequsites concerning PLAIN authentification which aren't mentioned by Apache Drill itself?

Try change:
export DRILLBIT_JAVA_OPTS="-Djava.library.path=/opt/pamfile/"
to:
export DRILL_JAVA_OPTS="$DRILL_JAVA_OPTS -Djava.library.path=/opt/pamfile/"
It works for me.

Documentation Generation is disabled

I did all that is specified in the tutorial - Doxygen Plugin.
Here is the sonarqube-4.5.1/conf/sonar.propeties file doxygen entries:
# Doxygen
sonar.doxygen.generateDocumentation=enable
sonar.doxygen.deploymentPath=D:\Downloads\sonarqube-4.5.1\web
sonar.doxygen.deploymentUrl=http://localhost:9000/sonar/documentation
The output of the sonarqube runner:
16:07:16.265 INFO - ANALYSIS SUCCESSFUL
16:07:16.266 DEBUG - Post-jobs : org.sonar.plugins.doxygen.DoxygenPostJob#28bda649
16:07:16.266 INFO - Executing post-job class org.sonar.plugins.doxygen.DoxygenPostJob
16:07:16.271 INFO - === SUPPRESS PREVIOUS GENERATION ===
16:07:16.395 INFO - === DOXYGEN EXECUTION ===
16:07:16.396 INFO - ### Generating configuration ###
16:07:16.427 INFO - ### Generating documentation ###
Also, in the specified \web folder there is a documentation folder which seems to contain the correct doxygen documentation output.
Yet I keep getting this Documentation Generation is disabled. message in the SonarQube web interface:
UPDATE
This is what my sonar-project.properties file contains now ― using a unix-style path:
#Doxygen
sonar.doxygen.generateDocumentation=enable
sonar.doxygen.deploymentPath=/Downloads/sonarqube-4.5.1/web
sonar.doxygen.deploymentUrl=http://localhost:9000/sonar/documentation
The output remains the same, same issue.
What do I need to do in order to see the documentation in the web server interface?
This seems to be a server linkage problem, because the documentation is being generated correctly, at this location: /Downloads/sonarqube-4.5.1/web/documentation.
I have also found this content:
core,true,sonar-core-plugin-4.5.1.jar|9289fc1067c31372c0b020aa01163087
emailnotifications,true,sonar-email-notifications-plugin-4.5.1.jar|bb35818e4a655a3ba2cff2afc65a296b
findbugs,false,sonar-findbugs-plugin-2.4.jar|bb0bf263ef1e0d56f569878732060cc9
java,false,sonar-java-plugin-2.4.jar|a105d018165ddeb2c0f5074100768660
cpd,true,sonar-cpd-plugin-4.5.1.jar|e11ff5066c9e2308036838510d87a6fe
dbcleaner,true,sonar-dbcleaner-plugin-4.5.1.jar|a444b3b4571791e1cde146ffa5132ee4
design,true,sonar-design-plugin-4.5.1.jar|0c6476994a44904307cfa8b8a08bbddd
doxygen,false,sonar-doxygen-plugin-0.1.jar|d86e1ab81c3ac34e6b31aa1da28d7f72
l10nen,true,sonar-l10n-en-plugin-4.5.1.jar|c21d53f67901cf6df3da1b4dd48a441b
in sonarqube-4.5.1\web\deploy\plugins\index.txt.
It looks like doxygen has a false associated with it. If I try to edit it (to true) and restart the server nothing changes. The file is overwritten at by the sonar-runner.

sonar.doxygen.generateDocumentation is a project property, not a server property. You have to set it in your "sonar-project.properties" file if you run your analysis with the SonarQube Runner or in your pom.xml file if you run the analysis with Maven.

Here is how I solved this:
Stopped the sonar-qube server.
Replaced the old sonar-doxygen-plugin-0.1.jar, from /Downloads/sonarqube-4.5.1/extensions/plugins, with the updated doxygen plugin from here https://github.com/SonarCommunity/sonar-doxygen/releases/download/1.0/sonar-doxygen-plugin-1.0-SNAPSHOT.jar.
Commented out the old configuration entries for doxygen from the project sonar-project.properties file. And replaced them with the follwoing entries:
sonar.doxygen.url=http://localhost:8000/
sonar.doxygen.enable=true
Used a simple python script to post the documentation html on that server (http://localhost:8000/).
Restarted the sonar-qube server.
Run the sonar-runner.bat again.
The documentation is in its place now.

Spring Data Cassandra - Environment must not be null Error

I am following basic tutorial at Spring Data Cassandra reference http://docs.spring.io/spring-data/cassandra/docs/1.1.0.RC1/reference/html/ and I am running into following exception
java.lang.IllegalArgumentException: Environment must not be null!
at org.springframework.util.Assert.notNull(Assert.java:112)
at org.springframework.data.repository.config.RepositoryConfigurationSourceSupport.<init>(RepositoryConfigurationSourceSupport.java:50)
at org.springframework.data.repository.config.AnnotationRepositoryConfigurationSource.<init>(AnnotationRepositoryConfigurationSource.java:74)
at org.springframework.data.repository.config.RepositoryBeanDefinitionRegistrarSupport.registerBeanDefinitions(RepositoryBeanDefinitionRegistrarSupport.java:74)
at org.springframework.context.annotation.ConfigurationClassParser.processImport(ConfigurationClassParser.java:394)
at org.springframework.context.annotation.ConfigurationClassParser.doProcessConfigurationClass(ConfigurationClassParser.java:204)
at org.springframework.context.annotation.ConfigurationClassParser.processConfigurationClass(ConfigurationClassParser.java:163)
at org.springframework.context.annotation.ConfigurationClassParser.parse(ConfigurationClassParser.java:138)
at org.springframework.context.annotation.ConfigurationClassPostProcessor.processConfigBeanDefinitions(ConfigurationClassPostProcessor.java:284)
at org.springframework.context.annotation.ConfigurationClassPostProcessor.postProcessBeanDefinitionRegistry(ConfigurationClassPostProcessor.java:225)
at org.springframework.context.support.AbstractApplicationContext.invokeBeanFactoryPostProcessors(AbstractApplicationContext.java:630)
at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:461)
at org.springframework.context.annotation.AnnotationConfigApplicationContext.<init>(AnnotationConfigApplicationContext.java:73)
at com.strides.platform.domain.UserRepositoryDaoTest.<init>(UserRepositoryDaoTest.java:28)
I have completed steps mentioned in document,
1) Use Cassandra Properties
2) Create Java configuration
3) Create domain and repository classes
I have autowired Environment variable in Test classes. I checked couple of sample projects and not sure what needs to be done more.

I've encountered this error message and found the problem only occuring when I used Spring Framework version 3.2.8.RELEASE.
My solution was to upgrade to version 3.2.9.RELEASE.
See also java.lang.IllegalArgumentException: Environment must not be null

Error running hadoop application in Eclipse on Windows

I'm trying to set up an Eclipse environment for developing and debugging hadoop. I'm following Tom White's Definitive Hadoop 3rd ed. What I would like to do is get the MaxTemperature app working locally on my Windows within Eclipse before moving it to my Hortonworks sandbox VM. The comment on page 158 about using the local job runner seems to be what I want. I don't want to set up a full hadoop implementation on Windows. I'm hoping with the right config params I can convince it to run as a java application inside Eclipse.
Windows: 7
Eclipse: Luna
Hadoop: 2.4.0
JDK: 7
When I set the Run configuration for MaxTemperatureDriver (Source code on page 157) to
inputfile outputdir foo (deliberate bogus 3rd parameter)
I get the usage message so I know I'm running my program with those params.
If I remove the bogus third param I get
Exception in thread "main" java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120)
at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:82)
at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:75)
at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1255)
at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1251)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapreduce.Job.connect(Job.java:1250)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1279)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
at mark.MaxTemperatureDriver.run(MaxTemperatureDriver.java:52)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at mark.MaxTemperatureDriver.main(MaxTemperatureDriver.java:56)
I've tried inserting -conf but it seems to be ignored. There is no error message if I specify a nonexistent path.
I've tried inserting -fs file:/// -jt local, but it makes no difference
I've tried inserting -D mapreduce.framework.name=local
I've tried specifying the input and output with the file: format
Note. I'm not asking about how to configure eclipse to connect to a remote Hadoop installation. I want the application to run within eclipse.
Is this possible? Any ideas?
Additional info:
I turned on debugging. I saw:
582 [main] DEBUG org.apache.hadoop.mapreduce.Cluster - Trying ClientProtocolProvider : org.apache.hadoop.mapred.YarnClientProtocolProvider
583 [main] DEBUG org.apache.hadoop.mapreduce.Cluster - Cannot pick org.apache.hadoop.mapred.YarnClientProtocolProvider as the ClientProtocolProvider - returned null protocol
I'm wondering not why YarnClientProtocolProvider failed, but why it didn't try LocalClientProtocolProvider.
New info:
It seems that this is an issue with Hadoop 2.4.0. I recreated my environment with Hadoop 1.2.1, followed the instructions in
http://gerrymcnicol.com/index.php/2014/01/02/hadoop-and-cassandra-part-4-writing-your-first-mapreduce-job/
added the Windows hack from
http://bigdatanerd.wordpress.com/2013/11/14/mapreduce-running-mapreduce-in-windows-file-system-debug-mapreduce-in-eclipse
and it all started working.

Following blog will be useful.
Running mapreduce in Windows filesystem

Seam 2.2GA + JBoss AS 5.1GA + Postgres 8.4

Sorry for the big wall of text, but its mostly logs
Thx for any help in any of my problems
I've been trying to get help from Seam forums, but in vain.
I'm trying this Setup mentioned in the title, but unsuccessfully.
I have it all installed correctly and the problems start with the seam-gen.
This is my build.properties
#Generated by seam setup
#Sat Aug 29 19:12:18 BRT 2009
hibernate.connection.password=abc123
workspace.home=/home/rgoytacaz/workspace
hibernate.connection.dataSource_class=org.postgresql.ds.PGConnectionPoolDataSource
model.package=com.atom.Commerce.model
hibernate.default_catalog=PostgreSQL
driver.jar=/home/rgoytacaz/postgresql-8.4-701.jdbc4.jar
action.package=com.atom.Commerce.action
test.package=com.atom.Commerce.test
database.type=postgres
richfaces.skin=glassX
glassfish.domain=domain1
hibernate.default_schema=Core
database.drop=n
project.name=Commerce
hibernate.connection.username=postgres
glassfish.home=C\:/Program Files/glassfish-v2.1
hibernate.connection.driver_class=org.postgresql.Driver
hibernate.cache.provider_class=org.hibernate.cache.HashtableCacheProvider
jboss.domain=default
project.type=ear
icefaces.home=
database.exists=y
jboss.home=/srv/jboss-5.1.0.GA
driver.license.jar=
hibernate.dialect=org.hibernate.dialect.PostgreSQLDialect
hibernate.connection.url=jdbc\:postgresql\:Atom
icefaces=n
./seam create-project works okay, but when I try generate-entities, I get the following...
generate-model:
[echo] Reverse engineering database using JDBC driver /home/rgoytacaz/postgresql-8.4-701.jdbc4.jar
[echo] project=/home/rgoytacaz/workspace/Commerce
[echo] model=com.atom.Commerce.model
[hibernate] Executing Hibernate Tool with a JDBC Configuration (for reverse engineering)
[hibernate] 1. task: hbm2java (Generates a set of .java files)
[hibernate] log4j:WARN No appenders could be found for logger (org.hibernate.cfg.Environment).
[hibernate] log4j:WARN Please initialize the log4j system properly.
[javaformatter] Java formatting of 4 files completed. Skipped 0 file(s).
this is problem no.1. How do I fix this? What is this? I had to do this in eclipse. It worked.
Then I import the seam-gen created project into eclipse, and deploy to JBoss 5.1. While my servers start I've noticed the following..
03:18:56,405 ERROR [SchemaUpdate] Unsuccessful: alter table PostgreSQL.atom.productsculturedetail add constraint FKBD5D849BC0A26E19 foreign key (culture_Id) references PostgreSQL.atom.cultures
03:18:56,406 ERROR [SchemaUpdate] ERROR: cross-database references are not implemented: "postgresql.atom.productsculturedetail"
03:18:56,407 ERROR [SchemaUpdate] Unsuccessful: alter table PostgreSQL.atom.productsculturedetail add constraint FKBD5D849BFFFC9417 foreign key (product_Id) references PostgreSQL.atom.products
03:18:56,408 ERROR [SchemaUpdate] ERROR: cross-database references are not implemented: "postgresql.atom.productsculturedetail"*
03:18:56,408 INFO [SchemaUpdate] schema update complete
Problem no.2. What is this cross-database references?
What about this..
03:18:55,089 INFO [SettingsFactory] JDBC driver: PostgreSQL Native Driver, version: PostgreSQL 8.4 JDBC3 (build 701)
Problem no.3 I've said in the build.properties to use JDBC4 driver, I don't know why seam insists to use JDBC3 driver. Where do I change this?
When I go into http://localhost:5443/Commerce and try to browse the auto-generated CRUD UI.
I get this error.. Error reading 'resultList' on type com.atom.Commerce.action.ProductsList_$$_javassist_seam_2
And this is what is showing in my server logs...
03:34:00,828 INFO [STDOUT] Hibernate:
select
products0_.product_Id as product1_0_,
products0_.active as active0_
from
PostgreSQL.atom.products products0_ limit ?
03:34:00,848 WARN [JDBCExceptionReporter] SQL Error: 0, SQLState: 0A000
03:34:00,849 ERROR [JDBCExceptionReporter] ERROR: cross-database references are not implemented: "postgresql.atom.products"
Position: 81
03:34:00,871 SEVERE [viewhandler] Error Rendering View[/ProductsList.xhtml]
javax.el.ELException: /ProductsList.xhtml: Error reading 'resultList' on type com.atom.Commerce.action.ProductsList_$$_javassist_seam_2
Caused by: javax.persistence.PersistenceException: org.hibernate.exception.GenericJDBCException: could not execute query
Problem no.4 What is going on here? Cross-database references?
Thx for any help in any of my problems.

You did receive a few answers on the Seam forums (here and here), but you didn't follow up. Anyway, all these are actually caused by one problem:
As Stuart Douglas told you, you shouldn't use a catalog when connecting to PostgreSQL. To fix this, replace the property "hibernate.default_catalog=PostgreSQL" in your properties file by the property: "hibernate.default_catalog.null=", so that your file looks like this:
...
model.package=com.atom.Commerce.model
hibernate.default_catalog.null= # <-- This is the replaced property
driver.jar=/home/rgoytacaz/postgresql-8.4-701.jdbc4.jar
...
You should be able to use seam generate-entities fine afterwards (assuming the rest of your configuration is correct). I'd recommend doing the generation into a clean folder.
Cross-database references is when a query tries to access two or more different databases. PostgreSQL does not support this, and thus complains when there is more than 1 period in the table name, so in PostgreSQL.atom.productsculturedetail, the bold part should be removed. Hibernate adds this prefix when you tell it to use a default catalog, which we already fixed in step 1 above (by telling it not to use a catalog), so this problem should be fixed after you regenerate your entities.
(Note that this is effectively the same as what Stuart Douglas told you, that you should remove the catalog="PostgreSQL" attribute in the annotations on your entity classes.)
When you specified the postgresql-8.4-701.jdbc4.jar file in the properties file, this didn't mean that the driver supports JDBC4. Although the name of the file would suggest so, the driver's website clearly states that "The driver provides a reasonably complete implementation of the JDBC 3 specification". This shouldn't be a problem for you, as you're not using the driver directly (or at least you're not supposed to). The driver is sufficient for Hibernate to fulfill its requirements and provide the required functionality.
This issue is caused by the same problem above. Hibernate is unable to read data from the database because of the incorrect query. Fixing the catalog problem should fix this issue.