FFmpeg transcoding on Lambda results in unusable (static) audio - encoding

I'd like to move towards serverless for audio transcoding routines in AWS. I've been trying to setup a Lambda function to do just that; execute a static FFmpeg binary and re-upload the resulting audio file. The static binary I'm using is here.
The Lambda function I'm using in Python looks like this:
import boto3
s3client = boto3.client('s3')
s3resource = boto3.client('s3')
import json
import subprocess
from io import BytesIO
import os
os.system("cp -ra ./bin/ffmpeg /tmp/")
os.system("chmod -R 775 /tmp")
def lambda_handler(event, context):
bucketname = event["Records"][0]["s3"]["bucket"]["name"]
filename = event["Records"][0]["s3"]["object"]["key"]
audioData = grabFromS3(bucketname, filename)
with open('/tmp/' + filename, 'wb') as f:
f.write(audioData.read())
os.chdir('/tmp/')
try:
process = subprocess.check_output(['./ffmpeg -i /tmp/joe_and_bill.wav /tmp/joe_and_bill.aac'], shell=True, stderr=subprocess.STDOUT)
pushToS3(bucketname, filename)
return process.decode('utf-8')
except subprocess.CalledProcessError as e:
return e.output.decode('utf-8'), os.listdir()
def grabFromS3(bucket, file):
obj = s3client.get_object(Bucket=bucket, Key=file)
data = BytesIO(obj['Body'].read())
return(data)
def pushToS3(bucket, file):
s3client.upload_file('/tmp/' + file[:-4] + '.aac', bucket, file[:-4] + '.aac')
return
You can listen to the output of this here. WARNING: Turn your volume down or your ears will bleed.
The original file can be heard here.
Does anyone have any idea what might be causing the encoding errors? It doesn't seem to be an issue with the file upload, since the md5 on the Lambda fs matches the MD5 of the uploaded file.
I've also tried building the static binary on an Amazon Linux instance in EC2, then zipping and porting it into the Lambda project, but the same issue persists.
I'm stumped! :(

Alright this is a fun one.
So it turns out the Python subprocess inherits stdin from some Lambda processes going on in the background. I was watching this AWS re:Invent keynote and he was describing some issues they were having w.r.t. this issue.
I added stdin=subprocess.DEVNULL to the subprocess call and the audio is now fixed.
Very interesting bug if you ask me.

Related

Reading a Python file from Scala

I'm trying to work with a file
But when I try to access this file, I get an error: No such file or directory
Can you tell me how to access files in hdfs correctly?
UPD:
The author of the answer directed me in the right direction.
As a result, this is how I execute the python script:
#!/usr/bin/python
# -*- coding: utf-8 -*-
#import pandas as pd
import sys
for line in sys.stdin:
print('Hello, ' + line)
# this is hello.py
And Scala application:
spark.sparkContext.addFile(getClass.getResource("hello.py").getPath, true)
val test = spark.sparkContext.parallelize(List("Body!")).repartition(1)
val piped = test.pipe(SparkFiles.get("./hello.py"))
val c = piped.collect()
c.foreach(println)
Output: Hello, Body!
Now I have to think about whether, as a cluster user, I can install pandas on workers.
I think you should try directly referencing the external file rather than attempting to download it to your Spark driver just to upload it again
spark.sparkContext.addFile(s"hdfs://$srcPy")

How to take screenshots in AWS Device farm for a run using appium python?

Even after a successful execution of my tests in DeviceFarm, I get an empty screenshots report. I have kept my code as simple as below -
from appium import webdriver
import time
import unittest
import os
class MyAndroidTest(unittest.TestCase):
def setUp(self):
caps = {}
self.driver = webdriver.Remote("http://127.0.0.1:4723/wd/hub", caps)
def test1(self):
self.driver.get('http://docs.aws.amazon.com/devicefarm/latest/developerguide/welcome.html')
time.sleep(5)
screenshot_folder = os.getenv('SCREENSHOT_PATH', '/tmp')
self.driver.save_screenshot(screenshot_folder + 'screen1.png')
time.sleep(5)
def tearDown(self):
self.driver.quit()
if __name__ == '__main__':
suite = unittest.TestLoader().loadTestsFromTestCase(MyAndroidTest)
unittest.TextTestRunner(verbosity=2).run(suite)
I tested on a single device pool -
How can I make this work ?
TIA.
Missing a slash (/) before the filename (i.e., screen1.png). Line 15 should be as below -
self.driver.save_screenshot(screenshot_folder + '/screen1.png')
Though I'm not sure exactly how to write this to a file in Device Farm here are the appium docs for the screenshot endpoint and a python example.
https://github.com/appium/appium/blob/master/docs/en/commands/session/screenshot.md
It gets a base 64 encoded string which then we would just need to save it somewhere like the appium screenshot dir the other answers mentioned. Otherwise we could also save it in the /tmp dir and then export it using the custom artifacts feature.
Let me know if that link helps.
James

moving local data to google cloud bucket using python api

I can move data in google storage to buckets using the following:
gsutil cp afile.txt gs://my-bucket
How to do the same using the python api library:
from google.cloud import storage
storage_client = storage.Client()
# Make an authenticated API request
buckets = list(storage_client.list_buckets())
print(buckets)
Cant find anything more than the above.
There is an API Client Library code sample code here. My code typically looks like below which is a slight variant on the code they provide:
from google.cloud import storage
client = storage.Client(project='<myprojectname>')
mybucket = storage.bucket.Bucket(client=client, name='mybucket')
mydatapath = 'C:\whatever\something' + '\\' #etc
blob = mybucket.blob('afile.txt')
blob.upload_from_filename(mydatapath + 'afile.txt')
In case it is of interest, another method is to run the "gsutil" command line how you have typed in your Original Post using the subprocess command, e.g.:
import subprocess
subprocess.call("gsutil cp afile.txt gs://mybucket/", shell=True)
In my view, there are pros and cons of both methods depending on what you are trying to achieve - the latter method allows multi-threading if you have many files to upload whereas the former method perhaps allows better control, specification of metadata for each file, etc.

Docopt and Classes

Ive gotten quite comfortable over the last year coding in python on an off, but I have stayed away from Classes (as in structuring my code in them) because I have not understood them.
I am now trying to get my head around what I need to change in my coding practices to take advantage of using Classes in all their glory.
I have been trying to use an example script I wrote and pipe that to a Class based version. Safe to say I am sucking bad and cant get my simple script to work. Im sure there are a myriad of this Im most likely doing incorrectly. I would really appreciate someone pointing them out to me.
I dont mind finger points and belly laughs too ^_^
Coder After (not working)
"""
Description:
This script is used to walk a directory and print out each filename and directory including the full path.
Author: Name
Usage:
DirLister.py (-d <directory>)
DirLister.py -h | --help
DirLister.py --version
Options:
-d <directory> The top level directory you want to list files and directories from.
-h --help Show this screen.
--version Show version.
"""
import os
from docopt import docopt
class walking:
def __init__(self, directory):
self.directory = arguments['-d']
def walk(self, directory):
for root, dirs, files in os.walk(self.directory):
for filename in files:
print os.path.join(root, filename)
if __name__ == '__main__':
arguments = docopt(__doc__, version= '1.0.0')
print arguments
if arguments['-d'] is None:
print __doc__
exit(0)
else:
walking.walk(directory)
Original Non-Class Based Code (working)
"""
Description:
This script is used to walk a directory and print out each filename and directory including the full path.
Author: Name
Usage:
DirLister.py (-d <directory>)
DirLister.py -h | --help
DirLister.py --version
Options:
-d <directory> The top level directory you want to list files and directories from.
-h --help Show this screen.
--version Show version.
"""
import os
from docopt import docopt
arguments = docopt(__doc__, version= '1.0.0')
def walk(dir):
for root, dirs, files in os.walk(dir):
for filename in files:
print os.path.join(root, filename)
if __name__ == '__main__':
if arguments['-d'] is None:
print __doc__
exit(0)
else:
walk(arguments['-d'])
You've forgotten to post the error you get (since you say it's not working).
But indeed there are several issues. First, I'd call the class Walking.
Then in your __init__ function, you try to access arguments which is neither a global variable nor an argument; you wanted to write:
def __init__(self, directory):
self.directory = directory
But you also need to create an instance of your class in you main:
walking = Walking(arguments['-d'])
That assumes that the name of the class is Walking instead of walking. I advise you to look at PEP8 for the naming conventions.
The general idea is that the class is the type of an object, but not the object itself*, so the class Walking: block is basically defining a new kind of objects. And then you can create objects that are instances of this class. It's the same when you create a list: mylist = list() (but there are also other ways for lists like mylist = [1, 2]).
*It happens that most things in Python are objects, including classes, but they have obviously other methods and they have another base class.

How do you write a Scala script that will react to file changes

I would like to change the following batch script to Scala (just for fun), however, the script must keep running and listen for changes to the *.mkd files. If any file is changed, then the script should re-generate the affected doc. File IO has always been my Achilles heel...
#!/bin/sh
for file in *.mkd
do
pandoc --number-sections $file -o "${file%%.*}.pdf"
done
Any ideas around a good approach to this will be appreciated.
The following code, taken from my answer on: Watch for project files also can watch a directory and execute a specific command:
#!/usr/bin/env scala
import java.nio.file._
import scala.collection.JavaConversions._
import scala.sys.process._
val file = Paths.get(args(0))
val cmd = args(1)
val watcher = FileSystems.getDefault.newWatchService
file.register(
watcher,
StandardWatchEventKinds.ENTRY_CREATE,
StandardWatchEventKinds.ENTRY_MODIFY,
StandardWatchEventKinds.ENTRY_DELETE
)
def exec = cmd run true
#scala.annotation.tailrec
def watch(proc: Process): Unit = {
val key = watcher.take
val events = key.pollEvents
val newProc =
if (!events.isEmpty) {
proc.destroy()
exec
} else proc
if (key.reset) watch(newProc)
else println("aborted")
}
watch(exec)
Usage:
watchr.scala markdownFolder/ "echo \"Something changed!\""
Extensions have to be made to the script to inject file names into the command. As of now this snippet should just be regarded as a building block for the actual answer.
Modifying the script to incorporate the *.mkd wildcards would be non-trivial as you'd have to manually search for the files and register a watch on all of them. Re-using the script above and placing all files in a directory has the added advantage of picking up new files when they are created.
As you can see it gets pretty big and messy pretty quick just relying on Scala & Java APIs, you would be better of relying on alternative libraries or just sticking to bash while using INotify.