Running a process as sudo in Scala - scala

I am trying to run a process as sudo in scala. I have written this code
val l : Seq[String] = Seq("echo", "SecretXYZ!", "|", "sudo", "-S", "-u", "web", "spark-submit", "--class",
"com.abhi.Foo", "--master", "yarn-cluster", "Foo-assembly-1.0.jar", DateTimeFormat.forPattern(pattern).print(date), ">",
"fn_output.txt", "2>", "fn_error.txt")
l.!
println("completed...")
but when I run this, it doesn't run the process. it just prints
SecretXYZ! | sudo -S -u web spark-submit --class com.abhi.Foo --master yarn-cluster Foo-assembly-1.0.jar 2015-03-19 > fn_output.txt 2> fn_error.txt
completed...

As Ɓukasz has pointed out, the "right" answer is to build the pipeline yourself with sys.process.
The lazy answer is to explicitly wrap everything in a call to bash -c ...:
val miniScript: Seq[String] = Seq(
"echo", "SecretXYZ!",
"|", "sudo", "-S", "-u", "web",
"spark-submit", "--class", "com.abhi.Foo", "--master", "yarn-cluster",
"Foo-assembly-1.0.jar", DateTimeFormat.forPattern(pattern).print(date),
">", "fn_output.txt", "2>", "fn_error.txt")
val cmd: Seq[String] = Seq("bash", "-c", miniScript.mkString(" "))
cmd.!
Be careful with escaping, though -- your password in this version would need single-quotes around it, for example -- and really, if you want this code to be robust, you should really do it with sys.process so you know exactly what's happening.

Related

How to pass a database connection into Airflow KubernetesPodOperator

I'm having a confusion with KubernetesPodOperator from Airflow, and I'm wondering how to pass the load_users_into_table() function that it has a conn_id parameter stored in connection of Airflow in the Pod ?
In the official doc proposes to put the conn_id in Secret but I don't understand how can I pass it in my function load_users_into_table() after that.
https://airflow.apache.org/docs/stable/kubernetes.html
the function (task) to be executed in the pod:
def load_users_into_table(postgres_hook, schema, path):
gdf = read_csv(path)
gdf.to_sql('users', con=postgres_hook.get_sqlalchemy_engine(), schema=schema)
the dag:
_pg_hook = PostgresHook(postgres_conn_id = _conn_id)
with dag:
test = KubernetesPodOperator(
namespace=namespace,
image=image_name,
cmds=["python", "-c"],
arguments=[load_users_into_table],
labels={"dag-id": dag.dag_id},
name="airflow-test-pod",
task_id="task-1",
is_delete_operator_pod=True,
in_cluster=in_cluster,
get_logs=True,
config_file=config_file,
executor_config={
"KubernetesExecutor": {"request_memory": "512Mi",
"limit_memory": "1024Mi",
"request_cpu": "1",
"limit_cpu": "2"}
}
)
Assuming you want to run with K8sPodOperator, you can use argparse and add arguments to the docker cmd. Something in these lines should do the job:
import argparse
def f(arg):
print(arg)
parser = argparse.ArgumentParser()
parser.add_argument('--foo', help='foo help')
args = parser.parse_args()
if __name__ == '__main__':
f(args.foo)
Dockerfile:
FROM python:3
COPY main.py main.py
CMD ["python", "main.py", "--foo", "somebar"]
There are other ways to solve this such as using secrets, configMaps or even Airflow Variables, but this should get you moving forward.

Executing Curl command in scala

I am trying to execute the curl command through scala for scalatest,
CURL :
curl -H "Content-Type: application/json" --data #/home/examples/demo/demo.json http://localhost:9090/job
which works aas expected while I tried doing it with scala ,like
import scala.sys.process._
val json = getClass.getClassLoader.getResource(arg0).getPath
val cmd = Seq("curl", "-H", "'Content-Type: application/json'","-d", s"#$json","http://localhost:9090/job")
cmd.!
and it produces following error
Expected 'application/json'
You're quoting too much:
Seq("curl", "-H", "'Content-Type: application/json'","-d", s"#$json","http://localhost:9090/job")
// !^! !^!
Should be:
Seq("curl", "-H", "Content-Type: application/json", "-d", s"#$json","http://localhost:9090/job")
// ! !
The reason you need to quote the content-type in the shell is because the shell will break it up if it isn't quoted. Curl doesn't know how to deal with quotes, because quotes aren't its business. Scala won't do any word-splitting, either, because that's your job.

grunt, grunt-shell - command inheritance

I need something like command inheritance like this:
shell:
virtualenvActivate:
command: [
'. `command -v virtualenvwrapper.sh`'
'workon <%= pkg.name %>'
].join '&&'
pelican:
command: [
shell:virtualenvActivate # <-- THIS LINE
'pelican src/content/ -o dist/ -s publishconf.py'
].join '&&'
Is it ever possible?
Unfortunately, as far as I know, CoffeeScript does not currently implements YAML's anchor/reference features.
I don't know Grunt, but generally speaking, for now, you would probably have to declare the common data outside of your data structure:
_my_command = [
'. `command -v virtualenvwrapper.sh`'
'workon <%= pkg.name %>'
].join '&&'
shell:
virtualenvActivate:
command: _my_command
pelican:
command: [
_my_command
'pelican src/content/ -o dist/ -s publishconf.py'
].join '&&'
For more advanced use case, you probably can go to something like (ab)using an object constructor to achieve the desired result. Something like that maybe:
shell: new ->
#virtualenvActivate =
command: [
'. `command -v virtualenvwrapper.sh`'
'workon <%= pkg.name %>'
].join '&&'
#pelican =
command: [
console.log "A", this.virtualenvActivate
#virtualenvActivate.command
'pelican src/content/ -o dist/ -s publishconf.py'
].join '&&'
#
That being said, I don't know if this would be the recommended way though.

Why does Groovy execute() hang?

I have the following Groovy script:
#!/opt/groovy-1.8.6/bin/groovy
final env = null // []
final command = ["./setter-for-catan.scala"]
final process = command.execute(env, null)
println (['echo', '********************** 0'].execute(env, null).text)
final stdout = process.inputStream
BufferedReader reader = new BufferedReader(new InputStreamReader(stdout))
while ((line = reader.readLine()) != null) {
System.out.println ("Stdout: " + line);
}
and the following Scala script:
#!/bin/bash
export SCALA_HOME=/opt/scala-2.10.1
echo '********************* 1' "$0" "$#"
${SCALA_HOME}/bin/scala -version 2>&1
exec ${SCALA_HOME}/bin/scala "$0" "$#" 2>&1
!#
println("******************* 2")
Calling the Groovy script outputs:
********************** 0
Stdout: ********************* 1 ./setter-for-catan.scala
Stdout: Scala code runner version 2.10.1 -- Copyright 2002-2013, LAMP/EPFL
Stdout: ******************* 2
If env is defined as [], the Groovy script hangs with the following output:
********************** 0
Stdout: ********************* 1 ./setter-for-catan.scala
Stdout: Scala code runner version 2.10.1 -- Copyright 2002-2013, LAMP/EPFL
What's going on and what needs to be done so that execute() doesn't hang when env is an Array?
JAVA_HOME isn't being inherited by the Scala script so it needs to be defined.
One way to do it would be in the Scala script:
#!/bin/bash
export JAVA_HOME=/Library/Java/Home
export SCALA_HOME=/opt/scala-2.10.1
echo '********************* 1' "$0" "$#"
${SCALA_HOME}/bin/scala -version 2>&1
exec ${SCALA_HOME}/bin/scala "$0" "$#" 2>&1
!#
Another way would be to do it in the Groovy script:
final env = ['JAVA_HOME=/Library/Java/Home']

How do I have sbt's Jetty use a local domain name?

I'd like the version of Jetty launched by sbt> ~jetty to listen on my.name.local, which I've set to 127.0.0.1 in /etc/hosts. It seems to be possible to change Jetty's settings from within sbt.
Here's what I have for my project:
import sbt._
class LiftProject(info: ProjectInfo) extends DefaultWebProject(info) {
// ...
val jetty = "org.eclipse.jetty" % "jetty-webapp" % "7.3.0.v20110203" % "test"
override lazy val jettyInstance = new JettyRunner(customJettyConfiguration)
def customJettyConfiguration = {
val myLog = log
val myJettyClasspath = jettyClasspath
val myScanDirectories = scanDirectories
val myScanInterval = scanInterval
new CustomJettyConfiguration {
def classpath = jettyRunClasspath
def jettyClasspath = myJettyClasspath
def war = jettyWebappPath
def contextPath = jettyContextPath
def classpathName = "test"
def parentLoader = buildScalaInstance.loader
def scanDirectories = Path.getFiles(myScanDirectories).toSeq
def scanInterval = myScanInterval
def port = jettyPort
def log = myLog
override def jettyConfigurationXML =
<Configure class="org.eclipse.jetty.webapp.WebAppContext">
<Set name="virtualHosts">
<Array type="java.lang.String">
<Item>my.name.local</Item>
</Array>
</Set>
</Configure>
}
}
}
While it seems to launch without complaints, visiting my.name.local doesn't hit Jetty as far as I can tell.
Rather than running sbt as root (dangerous), I personally prefer rerouting port 80 to 8080 using iptables on Linux :
sudo iptables -t nat -A PREROUTING -i eth0 -p tcp --dport 80 -j REDIRECT --to-port 8080
Which works only until the next reboot. To make the setting persistent on Ubuntu 10.04, I use :
sudo bash -c "iptables-save > /etc/iptables.rules"
echo "#!/bin/sh
iptables-restore < /etc/iptables.rules
exit 0
" > /etc/network/if-pre-up.d/iptablesload
echo "#!/bin/sh
iptables-save -c > /etc/iptables.rules
if [ -f /etc/iptables.downrules ]; then
iptables-restore < /etc/iptables.downrules
fi
exit 0
" > /etc/network/if-post-down.d/iptablessave
chmod +x /etc/network/if-post-down.d/iptablessave
chmod +x /etc/network/if-pre-up.d/iptablesload
(see this Ubuntu iptables wiki)
I posted too soon. All I need to do is override jettyPort:
override def jettyPort = 80
And run sbt via sudo.