I am following an OReilly book, "Advanced Analytics with spark" book. It seems that they expect you to use the command shell to follow the examples in the book (PuTTY). But i don't want to use that. I'd prefer to use Zeppelin. I'd like to create notebooks, but my own comments into the code etc.
So, using an Azure subscription, I spin up a Spark cluster and go into zeppelin. I am able to follow the guide fine for the most part. But there is one bit that trips me up. And its probably pretty basic.
You are asked to create a scala file called "StatsWithMissing.scala" with code in it. I do that. I upload it to blob to: //user/Zeppelin
(this is where i expect the Zeppelin user directory to be)
Then it asks you to run the following;
":load StatsWithMissing.scala"
At this point it gives the error:
:1: error: illegal start of definition
My first question is, where exactly is this scala file supposed to be on Blob Storage for Zeppelin to see it? How do i determine that? Is where i am putting it correct?
And second what does this message mean? Does it not like the Load statement?
I believe the Interpreter set at the top of the page is Livy, and that covers scala.
Any help would be great.
Regards
Conor
Related
I am using cov-capture, and cov-analyze to get the reports in my VM. Can anyone help in getting the command to run the cov-analyze only for getting specific errors? Example: There are various XML files created and analysis takes time to run. So to save time If we can get only a single report for a single issue like URL Manipulation or Encryption Error.
Note: Tool Used in Synopsys with REST API code in python and flask.
To run the analysis with only a single checker enabled, use the --disable-default and --enable options like this:
$ cov-analyze --disable-default --enable CHECKER_NAME ...
CHECKER_NAME is the all-caps, identifier-like name of the checker that reports issues of a certain type. For URL Manipulation, the checker is called PATH_MANIPULATION. The Checker Reference lists all of the checker names.
However, be aware that doing this repeatedly for each checker will take significantly longer than simply running all desired checkers at once because there is substantial overhead involved in simply reading the program into memory for analysis.
If your goal is faster analysis turnaround for changes you are making during development before check-in or push, you may want to look into using the cov-run-desktop command, which is meant for that use case.
In a green screen session, caling a program MYLIB/TESTPRG works when my library list is set to QGPL, QTEMP, VENDRLIB1, VENDRLIB2, VENDRLIB3. I can execute call MYLIB/TESTPRG on a green screen command line.
I want to be able to run this command from my Windows client. I created an external stored procedure MYLIB/TESTPROC with external name MYLIB/TESTPRG as I have seen in various articles. My original question stated that I could execute this procedure successfully in STRSQL in a green screen session with my library list as above, but that is false. It does not work. It simply says 'Trigger program or external routine detected an error.' Sorry for the wrong information.
When MYLIB/TESTPROC is called from the client (CALL MYLIB/TESTPROC), it fails with CPF9810 (Library &1 not found). I connected to the database via i Navigator -> Run SQL Scripts. In Connection -> JDBC Settings I had Default SQL schema = 'Use library list of server job' and set Schema list=QGPL,QTEMP,VENDRLIB1,VENDRLIB2,VENDRLIB3. I then executed CALL MYLIB/TESTPROC and got the message as stated above.
What works is when I run the program, i.e. CALL MYLIB/TESTPRG on a green screen command line.
TESTPRG is a C program that takes no arguments. The stored procedure was defined like this:
CREATE PROCEDURE MYLIB/TESTPROC
LANGUAGE C
SPECIFIC MYLIB/TESTPROC
NOT DETERMINISTIC
NO SQL
CALLED ON NULL INPUT
EXTERNAL NAME 'MYLIB/TESTPRG'
PARAMETER STYLE GENERAL ;
CPF9810 - Library &1 not found means that something is trying to access Library &1 (whatever that is, you didn't tell us) and the library as typed is not on the system anywhere. &1 is not the name of the library, it is a substitution variable which will display the library name in the job log. Look at the real library spelling in the job log. Check your spelling. Check the connection to make sure all the libraries are specified correctly. The message will tell you exactly which library is causing the problem.
If indeed the program works in green screen when the library list is set properly, then I would expect the problem to be in your connection where it is trying to set a library list. You cannot add a non-existent library to the library list. That is why it works in green screen, your library is necessarily typed correctly there, or it wouldn't be in the library list. You would get a similar error (same text, different error code) if you tried to add library with a spelling error to the library list in green screen.
Figure out the full text of the message (look in the job log), and you will see just what is throwing the error, and what the library is. Hint, it is not likely SQL throwing the error as those errors all look like SQL#### or SQ#####. More likely a CL command or it's processing program being called by an IBM server that is sending a CPF message.
As you have discovered, you can directly call simple programs without defining an external SQL procedure based on this documentation from IBM:
https://www.ibm.com/support/knowledgecenter/en/ssw_ibm_i_73/db2/rbafzcallsta.htm
I believe the recommendation to create your own external procedure definition for simple programs is primarily to reduce ambiguity. If you have programs and procedures that happen to have matching names, you need to know the rules list to figure out which one is being called for instance.
Also, the rules for external functions are different than external stored procedures and those get confused as well.
Per my comment, I usually make my procedure calls with the library within the call command.
In a terminal session using CALL PGM(MYLIB/TESTPROC). Or in a SQL session using CALL MYLIB.TESTPROC.
This can prevent someone from inadvertently putting your procedure in a personal library or the like. I usually do not specify a session library list on my SQL clients, accepting the system library list.
I had promised to accept Douglas Korinke's comment as an answer. However, I was experimenting a lot and I am no longer sure of what I knew and when I knew it. My problem had something to do with parameter passing to the C program. If I can reproduce it with a simple case I will ask another question.
In a Java program it is possible to set the libraries by using the following method :
ds.setLibraries("list of libraries");
Example :
ds.setServerName("server1");
ds.setPortNumber(1780);
ds.setDatabaseName("DBTEST");
ds.setLibraries("*LIBL,DAT452BS,DAT452BP");
The graph options with Zeppelin are pretty basic. So I am looking for an example of how to do something simple, like a barchart, with ds3.js. From what I can tell that would be the best graphing library to use to create stunning graphs.
Anyway my question is how to pass data to the JavaScript code. With regular Zeppelin charts you write scala or other code and then save that in a dataframe. Then on the next line you use the %sql option and you can write a SQL command and then buttons appear to let you graph the data.
But what I have found looking on the internet is no indication that data created in the scala code section would be passed to the Angular section where you put the ds3.js code.
Some examples I found are like this one where all the html and Javascript is put in one giant print statement in the scala code https://rawkintrevo.org/2016/09/20/gelly-on-apache-flink/
And then there is an example like this one Using d3.js with Apache Zeppelin where the Zeppelin line is all JavaScript, but the data is just a locally created array.
So I need (1) an example and (2) some understanding of how RDDs ad Dataframes can be passed into the JavaScript code, which of course is on a different line that the scala code. How do you bring objects in the scala section of the notebook into scope for the Javascript section.
You can refer to zeppelin docs for a good getting-started guide to creating custom visualization. Also, you might want to check out the code of some of the built-ins viz.
Regarding how data from DataFrames are passed to js, I'm pretty sure z.show or %sql triggers dataFrame.take(${zeppelin.spark.maxResult}) which collects the RDD[T] as a Seq[T] object to the driver whose elements are then used to render graphs.
Alternatively if you have a javascript graph defined in another paragraph, you can also usez.angularBind("values", rdd.take(maxResult)) to send the data to the angular view. There's a really nice answer here on the subject which might help.
Hope you find this helpful.
I am trying to do a word count lab in Spark on Scala. I am able to successfully load the text file into a variable (RDD), but when I do the .flatmap, .map, and reduceByKey, I receive the attached error message. I am new to this, so any type of help would be greatly appreciated. Please let me know.capture
Your program is failing because it was not able to detect the file present on Hadoop
Need to specify the file in the following format
sc.textFile("hdfs://namenodedetails:8020/input.txt")
You need to give the complete qualified path of the file. Since Spark builds a Dependency graph and evaluates lazily when an action is called, you are facing the error when you are trying to call an action.
It is better to debug after reading the file from HDFS using .first or .take(n) methods
I'm trying to invoke some C++ code in case a trigger is called inside my db2 DB.
to do so i thought of compiling the C++ code to an executable and to run it as a system call from DB2.
ps: I'm new to DB in general.
thanks in advance !
I think you want to use a DB2 system call:
http://www.ibm.com/developerworks/data/library/techarticle/0303stolze/0303stolze.html
EDIT:
Specifically, it appears that you could just re-use the system call solution referenced in the "Making system calls" section to call an arbitrary command from your trigger:
http://www.ibm.com/developerworks/data/library/techarticle/0303stolze/0303stolze.html#section5
Generally, the from the docs I gather that you will need to call an external UDF (User Defined Function) from your trigger. The UDF itself defines the call to your external program and needs to be created and configured in such a way that DB2 will recognize it.
Here's a PDF that covers UDFs is some detail. The "External UDFs" section on page 453 might be useful.
http://www.redbooks.ibm.com/redbooks/pdfs/sg246503.pdf
This article may also be helpful. It shows a solution for integrating a Java function as a UDF to be called from a trigger.
http://www.ibm.com/developerworks/data/library/techarticle/0205bhogal/0205bhogal.html#download