Apache nutch inject urls - mongodb

I am new to Apache Nutch(2.3.1) and mongodb(3.4.7). After instalation steps I want to inject urls and crawl wikipedia website. when I run "./nutch inject urls" in terminal I faced to this error.
~/apache-nutch-2.3.1/runtime/local/bin$ ./nutch inject urls
InjectorJob: starting at 2017-11-26 19:07:35
InjectorJob: Injecting urlDir: urls
InjectorJob: org.apache.gora.util.GoraException: java.lang.NullPointerException
at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:167)
at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:135)
at org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:78)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:218)
at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:252)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:275)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:284)
Caused by: java.lang.NullPointerException
at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936)
at java.util.concurrent.ConcurrentHashMap.containsKey(ConcurrentHashMap.java:964)
at org.apache.gora.mongodb.store.MongoStore.getDB(MongoStore.java:192)
at org.apache.gora.mongodb.store.MongoStore.initialize(MongoStore.java:122)
at org.apache.gora.store.DataStoreFactory.initializeDataStore(DataStoreFactory.java:102)
at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:161)
... 7 more

Actually I had set wrong Mongo database'name in $NUTCH_HOME/conf/gora.properties file. After fix it , Apache nutch work correctly.

Related

java.lang.RuntimeException: Failed to resolve Oracle database version

I am using debezium oracle connector in kafka connect.While starting connector I am getting below error,
java.lang.RuntimeException: Failed to resolve Oracle database version
at io.debezium.connector.oracle.OracleConnection.resolveOracleDatabaseVersion(OracleConnection.java:159)
at io.debezium.connector.oracle.OracleConnection.<init>(OracleConnection.java:71)
at io.debezium.connector.oracle.OracleConnector.validateConnection(OracleConnector.java:74)
at io.debezium.connector.common.RelationalBaseSourceConnector.validate(RelationalBaseSourceConnector.java:52)
at org.apache.kafka.connect.runtime.AbstractHerder.validateConnectorConfig(AbstractHerder.java:400)
at org.apache.kafka.connect.runtime.AbstractHerder.lambda$validateConnectorConfig$2(AbstractHerder.java:351)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.sql.SQLException: No suitable driver found for jdbc:oracle:thin:#**.**.*.**:1521/CDB
at java.sql.DriverManager.getConnection(DriverManager.java:689)
at java.sql.DriverManager.getConnection(DriverManager.java:208)
at io.debezium.jdbc.JdbcConnection.lambda$patternBasedFactory$0(JdbcConnection.java:184)
at io.debezium.jdbc.JdbcConnection$ConnectionFactoryDecorator.connect(JdbcConnection.java:121)
at io.debezium.jdbc.JdbcConnection.connection(JdbcConnection.java:890)
at io.debezium.jdbc.JdbcConnection.connection(JdbcConnection.java:885)
at io.debezium.jdbc.JdbcConnection.queryAndMap(JdbcConnection.java:643)
at io.debezium.jdbc.JdbcConnection.queryAndMap(JdbcConnection.java:517)
at io.debezium.connector.oracle.OracleConnection.resolveOracleDatabaseVersion(OracleConnection.java:129)
... 10 more
I am refering to the link for oracle setup and connector configuration,
**https://debezium.io/documentation/reference/connectors/oracle.html#setting-up-oracle**
connector-configuration.properties
name=debeziumoraclesource
connector.class=io.debezium.connector.oracle.OracleConnector
database.hostname=**.*.**.**
database.port=1521
database.user=username
database.password=password
database.dbname=CDBNAME
database.server.name=**.*.**.**
tasks.max=1
database.pdb.name=PDBNAME
database.history.kafka.bootstrap.servers=kafka:9092
database.history.kafka.topic=history.ENTITY_GROUP_PARAMETER_VALUES
database.connection.adaptor=logminer
snapshot.mode=initial
table.include.list=schema.ENTITY_GROUP_PARAMETER_VALUES
Also I have download ojdbc8.jar and placed inside kafka/libs folder.I have tried using different version of jars like ojdbc10 and different versions of ojdbc8.Nothing helped me.Also to the point of note I am using oracle19c.Please help me in resolving this issue.Thanks in advance.
Using OJDBC6.jar with all dependencies helped me to resolve the issue. And most importantly i placed the jars in connectors lib folder.
Using OJDBC8.jar helped me solve the problem, the problem appeared as I didn't place the jar on all the servers that were running kafka connect.

Getting following error in console while solr connection to Mongo . No error in solr log

Exception in thread "Thread-14" java.lang.NoSuchMethodError: com.mongodb.ReadPreference.secondaryPreferred()Lcom/mongodb/ReadPreference;
at org.apache.solr.handler.dataimport.MongoDataSource.init(MongoDataSource.java:51)
at org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(DataImporter.java:397)
at org.apache.solr.handler.dataimport.ContextImpl.getDataSource(ContextImpl.java:100)
at org.apache.solr.handler.dataimport.MongoEntityProcessor.init(MongoEntityProcessor.java:33)
at org.apache.solr.handler.dataimport.EntityProcessorWrapper.init(EntityProcessorWrapper.java:77)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:434)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415)
at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330)
at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:233)
at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:424)
at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:483)
at org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:466)
at java.lang.Thread.run(Thread.java:745)
There are jar files interdependent (though I took all latest was not compatible). Finally the combination worked is listed below.
solr-dataimporthandler-8.3.0.jar
solr-mongo-importer-1.0.0.jar
mongo-java-driver-2.13.2.jar
It was frustrating.

WSO2 Integration Studio v6.5.0's built-in Kafka template throws NoClassDefFoundError

I've installed WSO2 Integration Studio version 6.5.0 in my Windows workstation and created a project using the Kafka Consumer and Producer built-in template.
Then I configured the project with my own Kafka server settings (topic name "myTopic").
I then right-clicked the composite application and chose Export Project Artifacts and Run.
The Console window displayed at the very top the following messages:
[2019-06-25 09:23:45,499] [micro-integrator] INFO - LibraryArtifactDeployer Synapse Library named '{org.wso2.carbon.connector}kafkaTransport' has been deployed from file : C:\IntegrationStudio\runtime\microesb\tmp\carbonapps\-1234\1561465425230TestCompositeApplication_1.0.0.car\kafkaTransport-connector_2.0.6\kafkaTransport-connector-2.0.6.zip
[2019-06-25 09:23:45,517] [micro-integrator] INFO - SynapseImportFactory Successfully created Synapse Import: kafkaTransport
[2019-06-25 09:23:45,533] [micro-integrator] ERROR - ClassMediatorFactory
Error in instantiating class :
org.wso2.carbon.connector.KafkaProduceConnector
java.lang.NoClassDefFoundError: org/apache/kafka/common/header/Headers
at java.lang.Class.getDeclaredConstructors0(Native Method)
at java.lang.Class.privateGetDeclaredConstructors(Class.java:2671)
at java.lang.Class.getConstructor0(Class.java:3075)
[snipped rest for clarity]
I've tried uninstalling Integrator Studio and running it with elevated right to no avail.
I expected the project to be deployed normally.
EDIT: after copying:
kafka_2.11-2.2.1.jar
metrics-core-2.2.0.jar
zkclient-0.11.jar
kafka-clients-2.2.1.jar
scala-library-2.11.12.jar
zookeeper-3.4.13.jar
to the EI_HOME/lib directory, the exception changed to:
org.apache.axis2.deployment.DeploymentException: kafka/consumer/ConsumerTimeoutException
at org.apache.synapse.deployers.AbstractSynapseArtifactDeployer.deploy(AbstractSynapseArtifactDeployer.java:219)
at org.wso2.carbon.application.deployer.synapse.SynapseAppDeployer.deployArtifactType(SynapseAppDeployer.java:1099)
at org.wso2.carbon.application.deployer.synapse.SynapseAppDeployer.deployArtifacts(SynapseAppDeployer.java:114)
at org.wso2.carbon.application.deployer.internal.ApplicationManager.deployCarbonApp(ApplicationManager.java:272)
at org.wso2.carbon.application.deployer.CappAxis2Deployer.deploy(CappAxis2Deployer.java:72)
at org.apache.axis2.deployment.repository.util.DeploymentFileData.deploy(DeploymentFileData.java:136)
at org.apache.axis2.deployment.DeploymentEngine.doDeploy(DeploymentEngine.java:807)
at org.apache.axis2.deployment.repository.util.WSInfoList.update(WSInfoList.java:144)
[snipped for clarity]
Caused by: org.apache.axis2.deployment.DeploymentException: kafka/consumer/ConsumerTimeoutException
at org.apache.synapse.deployers.AbstractSynapseArtifactDeployer.deploy(AbstractSynapseArtifactDeployer.java:207)
... 87 more
Caused by: java.lang.NoClassDefFoundError: kafka/consumer/ConsumerTimeoutException
at org.wso2.carbon.inbound.endpoint.protocol.kafka.KAFKAPollingConsumer.startsMessageListener(KAFKAPollingConsumer.java:90)
at org.wso2.carbon.inbound.endpoint.protocol.kafka.KAFKAProcessor.init(KAFKAProcessor.java:96)
at org.apache.synapse.inbound.InboundEndpoint.init(InboundEndpoint.java:79)
at org.apache.synapse.deployers.InboundEndpointDeployer.deploySynapseArtifact(InboundEndpointDeployer.java:57)
at org.apache.synapse.deployers.AbstractSynapseArtifactDeployer.deploy(AbstractSynapseArtifactDeployer.java:197)
... 87 more
Caused by: java.lang.ClassNotFoundException: kafka.consumer.ConsumerTimeoutException cannot be found by synapse-core_2.1.7.wso2v111
at org.eclipse.osgi.internal.loader.BundleLoader.findClassInternal(BundleLoader.java:475)
at org.eclipse.osgi.internal.loader.BundleLoader.findClass(BundleLoader.java:421)
at org.eclipse.osgi.internal.loader.BundleLoader.findClass(BundleLoader.java:412)
at org.eclipse.osgi.internal.baseadaptor.DefaultClassLoader.loadClass(DefaultClassLoader.java:107)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 92 more
Have you copied the required jars from kafka_home/libs folder to EI_home/lib, if yes then share your code to get the issue detail
According to this documentation https://docs.wso2.com/display/EI650/Kafka+Inbound+Protocol, the recommended versions for Kafka is kafka_2.9.2-0.8.1.1. You can download it in the below link. http://kafka.apache.org/downloads.html. Please use those jars and copy them to the EI_HOME/lib. There is an github issue for this as well. https://github.com/wso2/product-ei/issues/2239
I may be a bit too late but we have been using Custom inbound endpoint for Kafka. We also faced exactly same issue and that was the only way to fix it.
You could use https://github.com/wso2-extensions/esb-inbound-kafka/blob/master/docs/config.md to configure it.

Trouble saving session when mixing spring-security-gemfire and spring-security-oauth2

Background: I have a web app that utilizes AngularJS, spring-mvc, and spring-rest for delivering the UI. I have a requirement to load balance using an Elastic LB and it is not using sticky sessions; requests are round robin. I implemented session replication using spring-session with gemfire for session storage. This works well.
I need to integrate with an OAuth2 auth server (and eventually multiple OAuth2 servers) purely for authentication and the passing of userInfo. I attempted to use the spring cloud oauth2 #EnableOAuth2Sso on the web-app and hit some session serialization issues. The mere addition of the oauth2ClientContext to the session seemed to cause ClassCastException problems during session saving.
I attempted to pull down the following samples and they worked well out of the box, Particularly the UI and the Authserver.
https://github.com/spring-guides/tut-spring-security-and-angular-js
However, when I added spring session into the mix, trying to serialize to a gemfire server, I encountered the exact same issue.
Here is the stacktrace highlight:
java.lang.ClassCastException: cannot assign instance of org.springframework.beans.factory.support.StaticListableBeanFactory to field org.springframework.aop.scope.DefaultScopedObject.beanFactory of type org.springframework.beans.factory.config.ConfigurableBeanFactory in instance of org.springframework.aop.scope.DefaultScopedObject
Below is abbreviated stacktrace:
ERROR o.a.c.c.C.[.[.[/].[dispatcherServlet] : Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception
org.springframework.dao.DataAccessResourceFailureException: remote server on machine(gemfire:21800:loner):57660:9d1f3438:gemfire: : While performing a remote put; nested exception is com.gemstone.gemfire.cache.client.ServerOperationException: remote server on machine(gemfire:21800:loner):57660:9d1f3438:gemfire: : While performing a remote put
at org.springframework.data.gemfire.GemfireCacheUtils.convertGemfireAccessException(GemfireCacheUtils.java:238) ~[spring-data-gemfire-1.7.4.RELEASE.jar:1.7.4.RELEASE]
at org.springframework.data.gemfire.GemfireAccessor.convertGemFireAccessException(GemfireAccessor.java:91) ~[spring-data-gemfire-1.7.4.RELEASE.jar:1.7.4.RELEASE]
at org.springframework.data.gemfire.GemfireTemplate.put(GemfireTemplate.java:190) ~[spring-data-gemfire-1.7.4.RELEASE.jar:1.7.4.RELEASE]
at org.springframework.session.data.gemfire.GemFireOperationsSessionRepository.save(GemFireOperationsSessionRepository.java:147) ~[spring-session-1.2.1.RELEASE.jar:na]
at org.springframework.session.data.gemfire.GemFireOperationsSessionRepository.save(GemFireOperationsSessionRepository.java:35) ~[spring-session-1.2.1.RELEASE.jar:na]
at org.springframework.session.web.http.SessionRepositoryFilter$SessionRepositoryRequestWrapper.commitSession(SessionRepositoryFilter.java:244) ~[spring-session-1.2.1.RELEASE.jar:na]
at org.springframework.session.web.http.SessionRepositoryFilter$SessionRepositoryRequestWrapper.access$100(SessionRepositoryFilter.java:214) ~[spring-session-1.2.1.RELEASE.jar:na]
at org.springframework.session.web.http.SessionRepositoryFilter.doFilterInternal(SessionRepositoryFilter.java:167) ~[spring-session-1.2.1.RELEASE.jar:na]
at org.springframework.session.web.http.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:80) ~[spring-session-1.2.1.RELEASE.jar:na]
... tomcat filter chain and spring filter stuff
Caused by: com.gemstone.gemfire.cache.client.ServerOperationException: remote server on machine(gemfire:21800:loner):57660:9d1f3438:gemfire: : While performing a remote put
... gemfire internal stuff
at org.springframework.data.gemfire.GemfireTemplate.put(GemfireTemplate.java:187) ~[spring-data-gemfire-1.7.4.RELEASE.jar:1.7.4.RELEASE]
... 31 common frames omitted
Caused by: java.lang.ClassCastException: cannot assign instance of org.springframework.beans.factory.support.StaticListableBeanFactory to field org.springframework.aop.scope.DefaultScopedObject.beanFactory of type org.springframework.beans.factory.config.ConfigurableBeanFactory in instance of org.springframework.aop.scope.DefaultScopedObject
at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133) ~[na:1.7.0_80]
at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1305) ~[na:1.7.0_80]
... java.io stuff
at org.springframework.aop.framework.AdvisedSupport.readObject(AdvisedSupport.java:557) ~[spring-aop-4.3.2.RELEASE.jar:4.3.2.RELEASE]
at sun.reflect.GeneratedMethodAccessor224.invoke(Unknown Source) ~[na:na]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.7.0_80]
at java.lang.reflect.Method.invoke(Method.java:497) ~[na:1.7.0_80]
at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1058) ~[na:1.7.0_80]
... java.io stuff
at com.gemstone.gemfire.internal.InternalDataSerializer.basicReadObject(InternalDataSerializer.java:2966) ~[gemfire-8.1.0.jar:na]
at com.gemstone.gemfire.DataSerializer.readObject(DataSerializer.java:3210) ~[gemfire-8.1.0.jar:na]
at org.springframework.session.data.gemfire.AbstractGemFireOperationsSessionRepository$GemFireSessionAttributes.readObject(AbstractGemFireOperationsSessionRepository.java:800) ~[spring-session-1.2.1.RELEASE.jar:na]
at org.springframework.session.data.gemfire.AbstractGemFireOperationsSessionRepository$GemFireSessionAttributes.fromDelta(AbstractGemFireOperationsSessionRepository.java:834) ~[spring-session-1.2.1.RELEASE.jar:na]
at org.springframework.session.data.gemfire.AbstractGemFireOperationsSessionRepository$GemFireSession.fromDelta(AbstractGemFireOperationsSessionRepository.java:589) ~[spring-session-1.2.1.RELEASE.jar:na]
at com.gemstone.gemfire.internal.cache.EntryEventImpl.processDeltaBytes(EntryEventImpl.java:1345) ~[gemfire-8.1.0.jar:na]
... gemfire internal stuff
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.7.0_80]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.7.0_80]
at com.gemstone.gemfire.internal.cache.tier.sockets.AcceptorImpl$1$1.run(AcceptorImpl.java:577) ~[gemfire-8.1.0.jar:na]
... 1 common frames omitted
I found the following, https://jira.spring.io/browse/SPR-14117, which encouraged me to update some of the jars to the newest versions, hoping the spring boot versions were simply behind, however it didn't seem to help.
Version info:
spring-cloud-starter-parent: Brixton.SR4
spring-cloud-security: 1.1.2.RELEASE
spring-core: 4.3.2.RELEASE
spring-security-oauth2: 2.0.10.RELEASE
spring-session: 1.2.1.RELEASE
I've considered a few options: rewiring the OAuth2 framework to no longer use ScopedProxyMode.INTERFACES (seems daunting), use Redis vs. Gemfire, write the entire client from scratch (I've done it before... wasn't fun).
FWIW I've already added the RequestContextFilter as recommended here: OAuth2ClientContext (spring-security-oauth2) not persisted in Redis when using spring-session and spring-cloud-security
Does anyone have any guidance?
I don't know if this speaks to your problem directly but I had/have a similar problem and I think I have all the same versions as you. Seems that there are so many Spring projects and they all try to keep up with each other so sometimes there seem to be compatibility issues. I found the steps outlined here by Rob Winch fixed my issue -https://github.com/spring-projects/spring-session/issues/395

(Eclipse) JasperStudio JasperReports Server Access Configuration: AxisFault: NullPointerException

Got this error on Repository Explorer->Create JasperReports Server Connection
Error Details:
----------------------------------------------------------------------------------------------
AxisFault
faultCode: {http://schemas.xmlsoap.org/soap/envelope/}Server.userException
faultSubcode:
faultString: java.lang.NullPointerException
faultActor:
faultNode:
faultDetail:
{http://xml.apache.org/axis/}stackTrace:java.lang.NullPointerException
at org.apache.axis.transport.http.CommonsHTTPSender.invoke(CommonsHTTPSender.java:193)
...
Caused by: java.lang.NullPointerException
at org.apache.axis.transport.http.CommonsHTTPSender.invoke(CommonsHTTPSender.java:193)
although the calling the server URL, e.g. http://somehost:8080/jasperserver/services/repository works in the browser stating:
repository
Hi there, this is an AXIS service!
Perhaps there will be a form for invoking the service here...
Using JasperServer 5.2.0 and JasperStudio 5.5.0.
This may likely be due to some local HTTP proxy setup problem.
To solve this you can try the following:
Window->Preferences->General->Network Connections->
Provider: Manual (where Eclipse proxy settings are applied)
(adjust proxy rules as desired) (try including/excluding rules matching your jasper server host)
restart Eclipse (important!)
I found this solution a little hidden here: Jasper forum question 3143 comment 16.