Accessing a file that was passed via --files to spark submit - pyspark

I am submitting a script to spark-submit and passing it a file using --files property. Later on I need to read it in a worker.
I don't understand what API I should use to do that. I figured I'd try just:
with open('myfile'):
but this did not work.
I am able to pass the file using the addFile mechanism but it may not be good enough for me.
This may seem like a very simple question but I did not find any comprehensive documentation on spark-submit. The docs sure doen't cover it.

Well, this is embarrassing. I forgot to look inside spark-submit --help.
And this is what it says:
--files FILES Comma-separated list of files to be placed in the working
directory of each executor. File paths of these files
in executors can be accessed via SparkFiles.get(fileName).
Sometimes it's right under ones own nose..

Related

How can I prevent user from displaying content of the shiro.ini file in Zeppelin

I followed the turorial from here on how to install and use Zeppelin. I also created users with passwords by specifying them in the conf/shiro.ini file.
The problem however is that a user can write this simple script to see the contents of the shiro file.
%python
import os
os.system("cat <path_to_zeppelin_folder>conf/shiro.ini")
My question is, how can I prevent the user from seeing this file, as this file is accessed by the Zeppelin program and therefore I can't just make it unreadable by removing read permissions.
What I am doing now is to remove read/write permissions of the shiro.ini file after starting Zeppelin, but there should be a more elegant way of preventing such a thing.

Hide passwords in Buildbot shell commands from logs

I need to be able to pass password to shell command but don't want it to be available in logs. I tried finding a solution and found something called an Obfuscated class in buildbot, but it seems i'm missing something it's not working and i couldn't find any examples.
Is their some other way or if this is the only way if someone could provide an example.
secrets are supported in Buildbot since 0.9.7. http://docs.buildbot.net/latest/manual/secretsmanagement.html
Using this api to access your secret will automatically make them redacted from the build logs.

Celery remote worker accessing file

I have a function which should take an executable file as argument, execute it and return the result. This function should be run asynchronously so I'm using celery. I want to use multiple computers as workers so each worker should be able to access the executable file. However since the executable files are uploaded by the moderators it's not an option to put a version of each file in each worker by hand. So what would be the best way to handle this?
The only option I could thought of was storing the files in the database. the function should retrieve the file from DB and store it temporarily. Execute it ,remove the file and return the result.
Is this a good approach? Are there any better ways to handle this?

Meteor App Environment Variables

I'm using meteor and new to it. After some research, I stumbled upon this post. Exactly what I need as of the moment. I need to connect to an external mongodb somewhere in the server.
Now the question is where can I find the meteor config file (If ever if it's what I'm looking for) containing all environment variables (For example, MONGO_URL). If it's not any config file then how can I make this possible?
This is done by setting the environment variable(s) directly on the command line or in a startup script. For a production use case, have a look at my answer to this question.
When you are developing your app, on your localhost you can also have a script wrapper around your call to meteor like:
#!/bin/bash
MONGO_URL="..." meteor
exit 0
or alternatively you can just add a line to your ~/.bashrc to export the MONGO_URL variable to all console sessions.
There is no fully documented list of environment variables for meteor that I'm aware of, but an answer to this question lists a number of them. For nearly all circumstances, however, you need only a few and they appear in the meteor docs or somewhere in the wiki.

starting warden after zookeeper of MapR

I am installing the MapR and I stucked at starting warden after start zookeeper on a single node.
# service mapr-warden start
Error: warden can not be started. See /opt/mapr/logs/warden.log for details
On this file there is no detail. Does anybody have a hint? Thanks =)
If you aren't getting anything in warden.log, then it's likely that the warden JVM is never even being started by the mapr-warden init script.
In some MapR versions, the mapr-warden init script will log some details into /opt/mapr/logs/wardeninit.log. You can try checking there.
However, I will also caution that currently the logging done by the init script is sparse and not necessarily user friendly to read. If you can't discern the cause from the contents of the wardeninit.log you can post them here and maybe I can help.
Another thing you can do is edit /etc/init.d/mapr-warden and add "set -x" towards the top of the file, right before the "BASEMAPR=" line, then try starting warden again and you'll get a bunch of shell debugging output on your screen. If you copy and paste that output here that should be enough to tell the root cause of the problem.
One more thing to mention, you may be better off using the http://answers.mapr.com forum as that is MapR specific and I think there may be more users there that could help.
Was configure.sh (/opt/mapr/server/configure.sh -C nodeA -Z nodeA)run on the node? Did zookeeper come up successfully?
service mapr-zookeeper status
Even when using MapR in a single node configure.sh is still required. In fact, without configure.sh warden, zookeeper, cldb and other MapR components will lack their configuration and in many cases will fail to start.
You must run configure.sh after installing the software packages (deb or rpm).