While trying to run the SAM topology using HDF 3.0.0 sandbox, I am getting the below exception. I have only 2 components in the canvas.
1) Get input from Kafka Topic
2) Write the contents from the topic to HDFS Sink.
java.lang.InstantiationException: org.apache.storm.kafka.bolt.selector.DefaultTopicSelector
The engine behind the scene is Storm. While trying to execute the flow, the above mentioned error occurs. I am trying to get more information on the specific error message, but not able to find more help on the internet for Hortonworks Stream Analytics Manager.
Screenshot will make the issue clear. Upon execution of the flow, the exception occurs.
Do you see any errors in the streamline.log ? Can you paste the stack trace?
It may be due to some missing classes. You might want to delete and recreate the App and if that doesn't help, raise an issue here - https://github.com/hortonworks/streamline/issues with the relevant information and someone will take a look.
Related
I am totally new on Kubernetes, I am reading the book: Getting Started with Kubernetes from Jonathan Baier. After all the billing process from Google I was able to setup my project both in GCP and in my system, but then the book says that I need to execute:.
kubernetes/cluster/kube-up.sh
The first time, it reached the point from the following picture:
I had to cancel it, because it took too much time. The second time it was able to pass that message, but then 3 error messages appears:
I saw in another post a similar issue, and someone said that gcloud needed a downgrade to version 167. But I am not sure if that also applies to this issue,
Regards
well
All I had to do was open that link the error message gave me, then activate/enable the Compute Engine API, wait for some mins and then exute the kube-up script again..
Hope it can help someone later
I've built a Kafka Streams application. It's my first one, so I'm moving out of a proof-of-concept mindset into a "how can I productionalize this?" mindset.
The tl;dr version: I'm looking for kafka streams deployment recommendations and tips, specifically related to updating your application code.
I've been able to find lots of documentation about how Kafka and the Streams API work, but I couldn't find anything on actually deploying a Streams app.
The initial deployment seems to be fairly easy - there is good documentation for configuring your Kafka cluster, then you must create topics for your application, and then you're pretty much fine to start it up and publish data for it to process.
But what if you want to upgrade your application later? Specifically, if the update contains a change to the topology. My application does a decent amount of data enrichment and aggregation into windows, so it's likely that the processing will need to be tweaked in the future.
My understanding is that changing the order of processing or inserting additional steps into the topology will cause the internal ids for each processing step to shift, meaning at best new state stores will be created with the previous state being lost, and at worst, processing steps reading from an incorrect state store topic when starting up. This implies that you either have to reset the application, or give the new version a new application id. But there are some problems with that:
If you reset the application or give a new id, processing will start from the beginning of source and intermediate topics. I really don't want to publish the output to the output topics twice.
Currently "in-flight" data would be lost when you stop your application for an upgrade (since that application would never start again to resume processing).
The only way I can think to mitigate this is to:
Stop data from being published to source topics. Let the application process all messages, then shut it off.
Truncate all source and intermediate topics.
Start new version of application with a new app id.
Start publishers.
This is "okay" for now since my application is the only one reading from the source topics, and intermediate topics are not currently used beyond feeding to the next processor in the same application. But, I can see this getting pretty messy.
Is there a better way to handle application updates? Or are my steps generally along the lines of what most developers do?
I think you have a full picture of the problem here and your solution seems to be what most people do in this case.
During the latest Kafka-Summit this question has been asked after the talk of Gwen Shapira and Matthias J. Sax about Kubernetes deployment. The responses were the same: If your upgrade contains topology modifications, that implies rolling upgrades can't be done.
It looks like there is no KIP about this for now.
I followed this great tutorial, and everything worked great except for one thing.
step#11, in the table, all the emotions scores are 0!
it seems that Tone Analyzer is not connected.
I am sure that I put the correct (credintials)username & password.
After I searched a lot, I found that one month ago, IBM changed Tone Analyzer plan from experimental to Beta.
I don't know what should I change in the code, to make the Tone Analyzer with a new plan works for this example?
I recently updated the tutorial to deal with API changes in Tone Analyzer which transitioned from experimental to Beta. Are you using the latest version of the tutorial?
There are multiple reasons that could explain why you are not getting any tweets: wrong twitter or Tone analyzer credentials. Please double check these according to the tutorial instructions. To better diagnose errors, I've also added a StreamingListener in the latest tutorial version that should give you more information. You should see messages as follow:
Twitter stream started
Tweets are collected real-time and analyzed
To stop the streaming and start interacting with the data use: StreamingTwitter.stopTwitterStreaming
Receiver Started: TwitterReceiver-0
Batch started with 139 records
Batch completed with 139 records
Batch started with 270 records
Stopping Twitter stream. Please wait this may take a while
Receiver Stopped: TwitterReceiver-0
Reason: : Stopped by driver
Batch completed with 270 records
Twitter stream stopped
You can now create a sqlContext and DataFrame with 38 Tweets created. Sample usage:
val (sqlContext, df) = com.ibm.cds.spark.samples.StreamingTwitter.createTwitterDataFrames(sc)
df.printSchema
sqlContext.sql("select author, text from tweets").show
Finally, if you are using the pre-built jar file I posted on Github, make sure that you are using Spark 1.6 and not a back level version.
I've converted a console app into a scheduled WebJob. All is working well, but I'm having a little trouble figuring out how to accomplish the error logging/emailing I'd like to have.
1.) I am using Console.WriteLine and Console.Error.WriteLine to create log messages. I see these displayed in the portal when I go to WebJob Run Details. Is there any way to have these logs saved to files somewhere? I added my storage account connection string as AzureWebJobsDashboard and AzureWebJobsStorage. But this appears to have just created an "azure-webjobs-dashboard" blob container that only has a "version" file in it.
2.) Is there a way to get line numbers to show up for exceptions in the WebJob log?
3.) What is the best way to send emails from within the WebJob console app? For example, if a certain condition occurs, I may want to have it send me and/or someone else (depending on what the condition is) an email along with logging the condition using Console.WriteLine or Console.Error.WriteLine. I've seen info on triggering emails via a queue or triggering emails on job failure, but what is the best way to just send an email directly in your console app code when it's running as a WebJob?
How is your job being scheduled? It sounds like you're using the WebJobs SDK - are you using the TimerTrigger for scheduling (from the Extensions library)? That extensions library also contains a new SendGrid binding that you can use to send emails from your job functions. We plan on expanding on that to also facilitate failure notifications like you describe, but it's not there yet. Nothing stops you from building something yourself however, using the new JobHostConfiguration.Tracing.Trace to plug in your own TraceWriter that you can use to catch errors/warnings and act as you see fit. All of this is in the beta1 pre-release.
Using that approach of plugging in a custom TraceWriter, I've been thinking of writing one that allows you to specify an error threshold/sliding window, and if the error rate exceeds, an email or other notification will be sent. All the pieces are there for this, just haven't done it yet :)
Regarding logging, the job logs (including your Console.WriteLines) are actually written to disk in your Web App (details here). You should be able to see them if you browse your site log directory. However, if you're using the SDK and Dashboard, you can also use the TextWriter/TraceWriter bindings for logging. These logs will be written to your storage account and will show up in the Dashboard Functions page per invocation. Here's an example.
Logs to files: You can use a custom TraceWriter https://gist.github.com/aaronhoffman/3e319cf519eb8bf76c8f3e4fa6f1b4ae
Exception Stack Trace Line Numbers: You will need to make sure your project is built with debug info set to "full" (more info http://aaron-hoffman.blogspot.com/2016/07/get-line-numbers-in-exception-stack.html)
SendGrid, Amazon Simple Email Service (SES), etc.
I have an assignment where before I get a message from a server and tweet it, I have to check if an error occurs. If it does, it says that I have to "show with a human task an error message specifying a number and the error received. After that, the process ends".
In another part of the workflow I do check for errors but I'm not required to show anything, and frankly I do not understand how that would work, I believe my mistake is that I might be thinking too literally or too close to code showing errors and such.
Any help or place to look for information?
The answer to this question will vary depending on the edition of Bonita BPM that you are using.
With Community edition:
Note that error management will impact process design.
You can implement the following scenario:
retrieve the error (this can be done by using a custom connector output).
store the error details in a process variable.
have an exclusive gateway with a condition that branches to an optional human task that shows the error in a form.
With Performance edition:
There is a built in error management feature in Bonita BPM Portal. As an administrator you may review stack traces associated to connector execution failures, edit some settings and replay the connectors.
All of this is done without impacting the process design.