Apache Spark: I always got org.apache.axis.AxisFault: (404)Not Found when using google-spark-adwords - scala

I'm still a newbie in Apache Spark dev.
I'm using apache spark to query data from google ads words using spark-google-adwords. But, I always got this org.apache.axis.AxisFault: (404)Not Found
I'm using Scala 2.11 and latest stable Apache Spark. I've tried to look for the solution for this problem, but I still couldn't find out the cause.
Regards,

This issue was resolved by adding a copy of axis2.xml to classpath and overriding few connection manager params as follows:
HttpConnectionManagerParams params = new HttpConnectionManagerParams();
params.setDefaultMaxConnectionsPerHost(20); //SET VALUE BASED ON YOUR REQUIREMENTS/LOAD TESTING etc
MultiThreadedHttpConnectionManager multiThreadedHttpConnectionManager = new MultiThreadedHttpConnectionManager();
multiThreadedHttpConnectionManager.setParams(params);
HttpClient httpClient = new HttpClient(multiThreadedHttpConnectionManager);
ConfigurationContext configurationContext = ConfigurationContextFactory.createConfigurationContextFromFileSystem("**PATH TO COPY OF AXIS2.XML**");
configurationContext.setProperty(HTTPConstants.CACHED_HTTP_CLIENT, httpClient);
credit : https://issues.apache.org/jira/browse/AXIS2-4807

Related

Could not instantiate EventHubSourceProvider for Azure Databricks

Using the steps documented in structured streaming pyspark, I'm unable to create a dataframe in pyspark from the Azure Event Hub I have set up in order to read the stream data.
Error message is:
java.util.ServiceConfigurationError: org.apache.spark.sql.sources.DataSourceRegister: Provider org.apache.spark.sql.eventhubs.EventHubsSourceProvider could not be instantiated
I have installed the Maven libraries (com.microsoft.azure:azure-eventhubs-spark_2.11:2.3.12 is unavailable) but none appear to work:
com.microsoft.azure:azure-eventhubs-spark_2.11:2.3.15
com.microsoft.azure:azure-eventhubs-spark_2.11:2.3.6
As well as ehConf['eventhubs.connectionString'] = sc._jvm.org.apache.spark.eventhubs.EventHubsUtils.encrypt(connectionString) but the error message returned is:
java.lang.NoSuchMethodError: org.apache.spark.internal.Logging.$init$(Lorg/apache/spark/internal/Logging;)V
The connection string is correct as it is also used in a console application that writes to the Azure Event Hub and that works.
Can someone point me in the right direction, please. Code in use is as follows:
from pyspark.sql.functions import *
from pyspark.sql.types import *
# Event Hub Namespace Name
NAMESPACE_NAME = "*myEventHub*"
KEY_NAME = "*MyPolicyName*"
KEY_VALUE = "*MySharedAccessKey*"
# The connection string to your Event Hubs Namespace
connectionString = "Endpoint=sb://{0}.servicebus.windows.net/;SharedAccessKeyName={1};SharedAccessKey={2};EntityPath=ingestion".format(NAMESPACE_NAME, KEY_NAME, KEY_VALUE)
ehConf = {}
ehConf['eventhubs.connectionString'] = connectionString
# For 2.3.15 version and above, the configuration dictionary requires that connection string be encrypted.
# ehConf['eventhubs.connectionString'] = sc._jvm.org.apache.spark.eventhubs.EventHubsUtils.encrypt(connectionString)
df = spark \
.readStream \
.format("eventhubs") \
.options(**ehConf) \
.load()
To resolve the issue, I did the following:
Uninstall azure event hub library versions
Install com.microsoft.azure:azure-eventhubs-spark_2.11:2.3.15 library version from Maven Central
Restart cluster
Validate by re-running code provided in the question
I received this same error when installing libraries with the version number com.microsoft.azure:azure-eventhubs-spark_2.11:2.3.* on a Spark cluster running Spark 3.0 with Scala 2.12
For anyone else finding this via google - check if you have the correct Scala library version. In my case, my cluster is Spark v3 with Scala 2.12
Changing the "2.11" in the library version from the tutorial I was using to "2.12", so it matches my cluster runtime version, fixed the issue.
I had to take this a step further. in the format method I had to add in this:
.format("org.apache.spark.sql.eventhubs.EventHubsSourceProvider") directly.
check the cluster scala version and the library version
Unisntall the older libraries and install :
com.microsoft.azure:azure-eventhubs-spark_2.12:2.3.17
in the shared workspace(right click and install library) and also in the cluster

Authenticate with ECE ElasticSearch Sink from Apache Fink (Scala code)

Compiler error when using example provided in Flink documentation. The Flink documentation provides sample Scala code to set the REST client factory parameters when talking to Elasticsearch, https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/elasticsearch.html.
When trying out this code i get a compiler error in IntelliJ which says "Cannot resolve symbol restClientBuilder".
I found the following SO which is EXACTLY my problem except that it is in Java and i am doing this in Scala.
Apache Flink (v1.6.0) authenticate Elasticsearch Sink (v6.4)
I tried copy pasting the solution code provided in the above SO into IntelliJ, the auto-converted code also has compiler errors.
// provide a RestClientFactory for custom configuration on the internally created REST client
// i only show the setMaxRetryTimeoutMillis for illustration purposes, the actual code will use HTTP cutom callback
esSinkBuilder.setRestClientFactory(
restClientBuilder -> {
restClientBuilder.setMaxRetryTimeoutMillis(10)
}
)
Then i tried (auto generated Java to Scala code by IntelliJ)
// provide a RestClientFactory for custom configuration on the internally created REST client// provide a RestClientFactory for custom configuration on the internally created REST client
import org.apache.http.auth.AuthScope
import org.apache.http.auth.UsernamePasswordCredentials
import org.apache.http.client.CredentialsProvider
import org.apache.http.impl.client.BasicCredentialsProvider
import org.apache.http.impl.nio.client.HttpAsyncClientBuilder
import org.elasticsearch.client.RestClientBuilder
// provide a RestClientFactory for custom configuration on the internally created REST client// provide a RestClientFactory for custom configuration on the internally created REST client
esSinkBuilder.setRestClientFactory((restClientBuilder) => {
def foo(restClientBuilder) = restClientBuilder.setHttpClientConfigCallback(new RestClientBuilder.HttpClientConfigCallback() {
override def customizeHttpClient(httpClientBuilder: HttpAsyncClientBuilder): HttpAsyncClientBuilder = { // elasticsearch username and password
val credentialsProvider = new BasicCredentialsProvider
credentialsProvider.setCredentials(AuthScope.ANY, new UsernamePasswordCredentials(es_user, es_password))
httpClientBuilder.setDefaultCredentialsProvider(credentialsProvider)
}
})
foo(restClientBuilder)
})
The original code snippet produces the error "cannot resolve RestClientFactory" and then Java to Scala shows several other errors.
So basically i need to find a Scala version of the solution described in Apache Flink (v1.6.0) authenticate Elasticsearch Sink (v6.4)
Update 1: I was able to make some progress with some help from IntelliJ. The following code compiles and runs but there is another problem.
esSinkBuilder.setRestClientFactory(
new RestClientFactory {
override def configureRestClientBuilder(restClientBuilder: RestClientBuilder): Unit = {
restClientBuilder.setHttpClientConfigCallback(new RestClientBuilder.HttpClientConfigCallback() {
override def customizeHttpClient(httpClientBuilder: HttpAsyncClientBuilder): HttpAsyncClientBuilder = {
// elasticsearch username and password
val credentialsProvider = new BasicCredentialsProvider
credentialsProvider.setCredentials(AuthScope.ANY, new UsernamePasswordCredentials(es_user, es_password))
httpClientBuilder.setDefaultCredentialsProvider(credentialsProvider)
httpClientBuilder.setSSLContext(trustfulSslContext)
}
})
}
}
The problem is that i am not sure if i should be doing a new of the RestClientFactory object. What happens is that the application connects to the elasticsearch cluster but then discovers that the SSL CERT is not valid, so i had to put the trustfullSslContext (as described here https://gist.github.com/iRevive/4a3c7cb96374da5da80d4538f3da17cb), this got me past the SSL issue but now the ES REST Client does a ping test and the ping fails, it throws an exception and the app shutsdown. I am suspecting that the ping fails because of the SSL error and maybe it is not using the trustfulSslContext i setup as part of new RestClientFactory and this makes me suspect that i should not have done the new, there should be a simple way to update the existing RestclientFactory object and basically this is all happening because of my lack of Scala knowledge.
Happy to report that this is resolved. The code i posted in Update 1 is correct. The ping to ECE was not working for two reasons:
The certificate needs to include the complete chain including the root CA, the intermediate CA and the cert for the ECE. This helped get rid of the whole trustfulSslContext stuff.
The ECE was sitting behind an ha-proxy and the proxy did the mapping for the hostname in the HTTP request to the actual deployment cluster name in ECE. this mapping logic did not take into account that the Java REST High Level client uses the org.apache.httphost class which creates the hostname as hostname:port_number even when the port number is 443. Since it did not find the mapping because of the 443 therefore the ECE returned a 404 error instead of 200 ok (only way to find this was to look at unencrypted packets at the ha-proxy). Once the mapping logic in ha-proxy was fixed, the mapping was found and the pings are now successfull.

Starting KsqlRestApplication form scala and getting NoSuchMethodError org.apache.kafka.streams.StreamsConfig.getConsumerConfigs

I am trying to write a program that enables me to run predefined KSQL operations on Kafka topics in Scala, but I don't want to open the KSQL Cli everytime. Therefore I want to start the KSQL "Server" from within my Scala program. If I understand the KSQL source code right, I have to build and start a KsqlRestApplication:
def restServer = KsqlRestApplication.buildApplication(new
KsqlRestConfig(defaultServerProperties), true, new VersionCheckerAgent
{override def start(ksqlModuleType: KsqlModuleType, properties:
Properties): Unit = ???})
But when I try doing that, I get the following error:
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.kafka.streams.StreamsConfig.getConsumerConfigs(Ljava/lang/String;Ljava/lang/String;)Ljava/util/Map;
at io.confluent.ksql.rest.server.BrokerCompatibilityCheck.create(BrokerCompatibilityCheck.java:62)
at io.confluent.ksql.rest.server.KsqlRestApplication.buildApplication(KsqlRestApplication.java:241)
I looked into the function call in BrokerCompatibilityCheck and in the create function it calls the StreamsConfig.getConsumerConfigs() with 2 Strings as parameters instead of the parameters defined in
https://kafka.apache.org/0102/javadoc/org/apache/kafka/streams/StreamsConfig.html#getConsumerConfigs(StreamThread,%20java.lang.String,%20java.lang.String).
Are my KSQL and Kafka version simply not compatible or am I doing something wrong?
I am using KSQL version 4.1.0-SNAPSHOT and Kafka version 1.0.0.
Yes, NoSuchMethodError typically indicates a version incompatibility between libraries.
The link you posted is to javadoc for kafka 0.10.2. The method hasn't changed in 1.0 but indeed in the upcoming 1.1 it only takes 2 Strings:
https://kafka.apache.org/11/javadoc/org/apache/kafka/streams/StreamsConfig.html#getConsumerConfigs(java.lang.String,%20java.lang.String)
. That suggests the version of KSQL you're using (4.1.0-SNAPSHOT) depends on version 1.1 of kafka streams, which is currently in the release candidate phase and I believe and should be out soon:
https://lists.apache.org/thread.html/780c4458b16590e99261b69d7b41b9ec374a3226d72c8d38885a008a#%3Cusers.kafka.apache.org%3E
As per that email you can find the latest (1.1.0-rc2) artifacts in the apache staging repo:
https://repository.apache.org/content/groups/staging/

Grails rest spring security plugin does not store generated token using GORM in database

I am using the GORM option to store the generated token in database for my Grails 3.x application using grails spring security rest plugin.
The application generates the token but does not get stored in database. Do we need to override the tokenStorage method and have our own implementation to store the token in database
The plugin properties configured in application.groovy are listed below
grails.plugin.springsecurity.rest.token.validation.useBearerToken = false
grails.plugin.springsecurity.rest.login.endpointUrl = '/api/login'
grails.plugin.springsecurity.rest.token.validation.headerName = 'X-Auth-Token'
grails.plugin.springsecurity.rest.token.storage.useJwt = false
grails.plugin.springsecurity.rest.token.storage.useGorm=true
grails.plugin.springsecurity.rest.token.storage.gorm.tokenDomainClassName='com.auth.AuthenticationToken'
grails.plugin.springsecurity.rest.token.storage.gorm.tokenValuePropertyName='token'
grails.plugin.springsecurity.rest.token.storage.gorm.usernamePropertyName='username'
grails.plugin.springsecurity.rest.login.passwordPropertyName = 'password'
grails.plugin.springsecurity.rest.login.useJsonCredentials = true
grails.plugin.springsecurity.rest.login.useRequestParamsCredentials = false
grails.plugin.springsecurity.rest.token.rendering.authoritiesPropertyName = 'permissions'
Make sure you have added the following to your build.gradle:
compile 'org.grails.plugins:spring-security-rest:2.0.0.M2'
compile 'org.grails.plugins:spring-security-rest-gorm:2.0.0.M2'
And you have defined the following in application.groovy or application.yml
grails.plugin.springsecurity.rest.token.storage.useGorm=true
grails.plugin.springsecurity.rest.token.storage.gorm.tokenDomainClassName = 'com.yourdomain.AuthenticationToken'
grails.plugin.springsecurity.rest.token.storage.gorm.tokenValuePropertyName = 'tokenValue'
grails.plugin.springsecurity.rest.token.storage.gorm.usernamePropertyName = 'username'
There is almost no information to help you. No build configuration, no logs, no idea how the requests are made...
But from the description of your problem, my guess is that you are missing the GORM module in your classpath. It's clearly stated in the documentation.
Be also sure to read the what's new in 2.0 chapter.
I had the same problem, token not stored and no error messages seen.
After installing the GORM plugin:
compile "org.grails.plugins:spring-security-rest-gorm:2.0.0.M2"
I could login and a token was saved into the table.

How to check if ElasticSearch is running properly

I am new to ElasticSearch and I am facing issues while connecting to ElasticSearch. Please find below details:
hq plugin and head plugin are showing different results:
Output of HQ Plugin:
Output of Head Plugin:
When I try to connect from my scala code, I get following error:
org.elasticsearch.client.transport.NoNodeAvailableException: None of the configured nodes are available: []
at org.elasticsearch.client.transport.TransportClientNodesService.ensureNodesAreAvailable(TransportClientNodesService.java:305)
at org.elasticsearch.client.transport.TransportClientNodesService.execute(TransportClientNodesService.java:200)
at org.elasticsearch.client.transport.support.InternalTransportClient.execute(InternalTransportClient.java:106)
at org.elasticsearch.client.support.AbstractClient.index(AbstractClient.java:102)
at org.elasticsearch.client.transport.TransportClient.index(TransportClient.java:340)
at com.sksamuel.elastic4s.IndexDsl$IndexDefinitionExecutable$$anonfun$apply$1.apply(IndexDsl.scala:23)
at com.sksamuel.elastic4s.IndexDsl$IndexDefinitionExecutable$$anonfun$apply$1.apply(IndexDsl.scala:23)
at com.sksamuel.elastic4s.Executable$class.injectFuture(Executable.scala:21)
at com.sksamuel.elastic4s.IndexDsl$IndexDefinitionExecutable$.injectFuture(IndexDsl.scala:20)
at com.sksamuel.elastic4s.IndexDsl$IndexDefinitionExecutable$.apply(IndexDsl.scala:23)
at com.sksamuel.elastic4s.IndexDsl$IndexDefinitionExecutable$.apply(IndexDsl.scala:20)
at com.sksamuel.elastic4s.ElasticClient.execute(ElasticClient.scala:28)
Here is my Code which I use for connection:
val settings = ImmutableSettings.settingsBuilder()
.put("cluster.name", "elasticsearch")
.build()
val client = ElasticClient.remote(settings, ElasticsearchClientUri("elasticsearch://10.50.xxx.xxx:9300"))
I also checked my connection and I am able to successfully telnet 10.50.xxx.xxx on both 9200 and 9300 ports
I read somewhere that the problem might be with http.cors, So I added following lines to /etc/elasticsearch/elasticsearch.yml file on the server:
http.cors.allow-origin: "/.*/"
http.cors.enabled: true
Please suggest what am I doing wrong ?
-- Update --
Thanks # Evaldas Buinauskas, It was version problem, I had installed elastic version 2.0 and was using libraries and plugins of version 1.7. I downgraded my elastic to version 1.7 and everything worked!
The issue comes from different Elasticsearch, head plugin and Scala client versions.
In pre 2.0 Elasticsearch still supported deprecated _status endpoint (deprecated in 1.2.0)
Version 2.0 completely dropped it and replaced it with _recovery.
Both head and Scala weren't upgraded and tried to call dropped api.