Editing Spark Module in Spark-kernel - scala

We are currently editing a specific module in Spark. We are using spark-kernel https://github.com/ibm-et/spark-kernel to run all our spark jobs. So, what we did is compile again the code that we have edited. This produces a jar file. However, we do not know how to point the code to the jar file.
It looks like it is referencing again to the old script and not to the newly edited and newly compiled one. Do you have some idea on how to modify some spark packages/modules and reflect the changes with spark-kernel? If we're not going to use spark-kernel, is there a way we can edit a particular module in spark for example, the ALS module in spark: https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala. Thanks!

You likely edited a scala or java file and recompiled (even though you call them scripts, they are not scripts in the strict sense because they are not interperted). Assuming that's what you did....
You probably then don't have a clean replacement of the resulting JAR file in the deployment you are testing. Odds are your newly compiled JAR file is somewhere, just not in the somewhere you are observing. To get it there properly, you will have to build more than the JAR file, you will have to repackage your installable and reinstall.
Other techniques exist, if you can identify the unpacked item in an installation, sometimes you can copy it in place; however, such a technique is inherently unmaintainable, so I recommend it only on throw away verification of the change and not on any system that will be used.
Keep in mind that with Spark, sometimes the worker nodes are dynamically deployed. If that is so, you might have to locate the installable of the dynamic deployment system and assure you have the right packaging there too.

Related

Deliver temporary build-time assets with nuget

What is the proper way of delivering temporary build-time assets using nuget?
I am making a nuget package with a single file, which dependent projects require during the build phase. I would like the content of the file to be copied to obj\$(Configuration) folder inside a dependent project before proceeding with the rest of the build. Of course, the obj folder is temporary, so I would like my file to be copied there again as part of the next build if obj gets cleared out.
I tried contentFiles approach described here. This takes care of packaging my file inside nupkg file, but I was unable to set it up so that my file gets delivered (and re-delivered) to obj\$(Configuration).
You're looking for NuGet's MSBuild extensibility. Unfortunately it means you'll need to learn a bit about MSBuild if you don't already know it. I recommend running msbuild -bl or dotnet build -bl, which will create a msbuild.binlog file, which you can view with the msbuild structured log viewer.
One option is to have a target that creates the file in the intermediate output directory at an appropriate time (probably need to use BeforeTargets). You could use the Inputs and Outputs attributes to have msbuild do incremental build checks and skip copying when it doesn't need to, possibly making the build a little faster.
However, unless the file is has dynmanic content, copying the file is a waste. it's just going to be included as an item in another part of the build process. So, if it's static content, you could just create the relevant item in your targets file from your package's extracted directory, and then it's just as good as if it was copied to the intermediate output directory, without wasted time and duplicated disk space.

Babel transpiler global from CLI

I'm trying to get to grips with the babel transpiler. It's docs start by telling you how to install it globally, and then, shortly thereafter tells you you should never do this, and never explains how to run it that way. Well, I believe I wish to run it that way (because the presence of the node_modules directory, or possibly the .babelrc file, cripples brackets, which is the editor I'm currently needing to use).
I can run babel from the global installation easily enough, but it doesn't do anything. The only way I've succeeded in getting it to do any actual translation has been using the local invocation with the .babelrc file, which of course kills my editor (and yes, I actually do have to use that, and I'm not creating a node-based project in any other way, just plain ES6).
Is there some way to use the command line to provide the information that the .babelrc file specifies (and thereby have something other than simply file copying)? Or some other way to get babel to do what I need without physical presence in my source tree?

"Not A Valid Jar" When trying to run Map Reduce Job

I am trying to run a my MapReduce job by building a jar from eclipse , but while trying to execute the job , I am getting "Not a valid Jar" error.
I have tried to follow the link Not a valid Jar but that didnt help.
Can anyone please give me the instructions on how to build the jar from eclipse, for it to run on Hadoop.
I am aware of the process of building the Jar file from eclipse,however I am not sure, do I have to take any special care for building a jar file, so that it runs on Hadoop.
When you submit the command, make certain you have the following things on the line to do the command:
When you indicate the jar, make certain you are directing to the jar properly. It may be easiest to be certain by using the absolute path. To get the absolute path, if you navigate to the place where the jar is, then run 'readlink -f ' command to get the absolute path. So for you, not just hist.jar, but maybe /home/akash_user/jars/hist.jar or wherever it is on your system. If you are using Eclipse, it may be saving it somewhere funny, so make sure that is not the problem. The jar cannot be run from HDFS storage. must run from local storage.
When you name your main class, in your example Histogram, you must use the fully qualified name of the class, with the package, the project, and the class. So, usually, if the program/project is named Histogram, and there is a HistogramDriver, HistogramMapper, HistogramReducer, and your main() is in HistogramDriver, you need to type Histogram.HistogramDriver to get the program running. (Unless you made your jar runnable, which requires extra stuff at the beginning, making .mdf and things.)
Make sure that the jar you are submitting (hist.jar) is in the current directory from where you are submitting the 'hadoop jar' command.
If the issue is still persisting, please tell the Java, Hadoop and Linux version you are using.
You should not keep the jar file in HDFS when executing the MapReduce job. Make sure Jar is available in the local path. Input path and output directory should be the path from HDFS.

Is it possible to save settings and load resources when compiling to just one standalone exe?

If I compile a script for distribution as a standalone exe, is there any way I can store settings within the exe itself, to save having to write to an external file? The main incentive for this is to save having to develop an installation process. I only need to store a few bytes.
Also, can resources such as images be compiled into the exe?
Using alternate data streams opens up a can of worms so i wouldn't go that way. Writing back config data into the exe itself won't work as the file is locked for write access during execution.
What i usually do is to store config data under %A_AppData%\%A_ScriptName%\%A_ScriptName%.ini
When the script starts i use IniRead which also provides a default value if the key isn't found - which is the case the script is executing for the first time.
The complementing IniWrite's in a OnExit subroutine/function will create the ini file if necessary.
This way no installation is needed and the config is stored in the proper, familiar place.
The autohotkey forum has dealt with this question before.
In that case, the user didn't want extra files -- period.
The method was to use the file system to save alternate data.
Unfortunately I can't find the post.
A simpler method is to use fileinstall command.
When the script is compiled, the external file is stored within the exe.
When the script executes the same command as an exe, the file is copied to the same
directory as the running script. It is a simple yet effective 'install'.
With a little testing for the config file, the fileinstall command can be skipped.
Skipping the fileinstall could allow changes to be made to the configuration after 'installation'
I have not tried saving settings within the compiled exe file, but I have included resources. I'm not sure which version of AHK you're using or how you are compiling, but I can right-click my scripts to compile. There's an option to compile with options, where you can include resources in your compiled exe.Compile with options

Keeping SSIS packages under the source control

I store all SSIS packages in Subversion repository, their configuration files as well. Configuration file almost always stored in the same folder where package is.
Problem is - SSIS seems to always store path to configuration file (the one saved in the package itself) as an absolute path.
When someone else checks out folder with the package in the location different from where I had on my development PC the configuration file is not detected (because my absolute path is stored and it doesn't exist on the other developer PC). So another developer has to remove this configuration and add it again from where it is now on his local hard drive. Then changed package is saved which will cause new version to be committed. When I get that version from SVN it will no longer match local path on my PC.
On a related note: another developer may want to change values in configuration file as well. If I later get the latest version of everything from SVN package will no longer work on my PC.
How do you work around these inconveniences?
Another solution is to save your configuration in a database with an environment variable as the first configuration to tell it what database to look in, that's what we do. We have scripts to populate ssisconfig for each server in our source control, but the package uses the actual table data for the database in the environment variable we are using.
Anyone who has heard my SQL Saturday presentations knows I don't much care for XML and this is one of the reasons. A trick to using XML configuration with varying locations is to use an environment variable (indirect configuration) to direct SSIS where it can look for that resource. The big, big downside to this approach is you'd generally need to create an environment variable for each set of configuration files or have a massive, honking .dtsconfig file which becomes painful for versioning.
The option I prefer if XML configuration is a must is that the "variableness" is removed. Developers and admins get together and everyone agrees "there will be a folder everywhere SSIS is done to hold configuration files and that location is X" and then it's just a matter of solving for X. At a previous job, we used D:\ssisdata\configs
#HLGEM's approach of a table for configurations is hands down my favorite approach to SSIS configuration (until you get to 2012 and their project deployment model where configuration is an entirely different animal)
I add a folder called "config" under my projects folder, add it to source control and mantain the config file in this folder. You can also add it to the SSIS project if you like.
I think its a good solution because everybody can have this folder and dowload the config file.
When the package is deployed it will read the config file from where you inform in the deployment manifest so this solution wont impact your development