How to transform monads in a conduit pipeline? - mongodb

I am trying to copy a file from disk to a File in MongoDB GridFS with the Database.MongoDB packages.
main :: IO ()
main = do
pipe <- MDB.connect (host "127.0.0.1")
_ <- access pipe master "baseball" run
close pipe
run :: MDB.Action IO GFS.File
run = do
uploadImage "sandbox/bat.jpg"
uploadImage :: Text -> MDB.Action IO GFS.File
uploadImage src = do
bucket <- GFS.openDefaultBucket
runConduitRes $ sourceFileBS (unpack src) .| (hole $ GFS.sinkFile bucket src)
This does not work because the sourceFileBS expects as Resource in the base monad and the GFS.sinkFile wants a MongoDB an Action (a specialized Reader).
What is an elegant way to connect these pieces of a conduit together?

Without all of the types and functions available, it's a bit hard to tell you the best way to do it. However, one way that should work looks something like this:
withBinaryFile (unpack src) ReadMode $ \h -> runMongo $ runConduit $
sourceHandle h .| GFS.sinkFile bucket src

Related

How to run gsutil from Scala without the .cmd suffix?

I am trying to run gsutil in Scala, but it doesn't work unless I explicitly put .cmd in the code. I don't like this approach, since others I work with use Unix systems. How do I let Scala understand that gsutil == gsutil.cmd? I could just write a custom shell script and add that to path, but I'd like a solution that doesn't include scripting.
I have already tried with various environment variables (using IntelliJ, don't know if it's relevant). I have tried adding both /bin and /platform/gsutil to path, neither works (without .cmd at least). I have also tried giving full path to see if it made a difference, it didn't.
Here is the method that uses gsutil:
def readFilesInBucket(ss: SparkSession, bucket: String): DataFrame = {
import ss.implicits._
ss.sparkContext.parallelize((s"gsutil ls -l $bucket" !!).split("\n")
.map(r => r.trim.split(" ")).filter(r => r.length == 3)
.map(r => (r(0), r(1), r(2)))).toDF(Array("Size", "Date", "File"): _*)
}
This is my first ever question on SO, I apologize for any formattic errors there may be.
EDIT:
Found out, that even when I write a script like this:
exec gsutil.cmd "$#"
called just gsutil in the same folder, it spits out the same error message as before: java.io.IOException: Cannot run program "gsutil": CreateProcess error=2, The system cannot find the file specified.
It works if I write gsutil in git bash, which otherwise didn't work without the script.
Maybe just use a different version whether you're on Windows or *nix system?
Create some helper:
object SystemDetector {
lazy val isWindows = System.getProperty("os.name").startsWith("Windows")
}
And then just use it like:
def readFilesInBucket(ss: SparkSession, bucket: String): DataFrame = {
import ss.implicits._
val gsutil = if(SystemDetector.isWindows) "gsutil.cmd" else "gsutil"
ss.sparkContext.parallelize((s"$gsutil ls -l $bucket" !!).split("\n")
.map(r => r.trim.split(" ")).filter(r => r.length == 3)
.map(r => (r(0), r(1), r(2)))).toDF(Array("Size", "Date", "File"): _*)
}

FFmpeg transcoding on Lambda results in unusable (static) audio

I'd like to move towards serverless for audio transcoding routines in AWS. I've been trying to setup a Lambda function to do just that; execute a static FFmpeg binary and re-upload the resulting audio file. The static binary I'm using is here.
The Lambda function I'm using in Python looks like this:
import boto3
s3client = boto3.client('s3')
s3resource = boto3.client('s3')
import json
import subprocess
from io import BytesIO
import os
os.system("cp -ra ./bin/ffmpeg /tmp/")
os.system("chmod -R 775 /tmp")
def lambda_handler(event, context):
bucketname = event["Records"][0]["s3"]["bucket"]["name"]
filename = event["Records"][0]["s3"]["object"]["key"]
audioData = grabFromS3(bucketname, filename)
with open('/tmp/' + filename, 'wb') as f:
f.write(audioData.read())
os.chdir('/tmp/')
try:
process = subprocess.check_output(['./ffmpeg -i /tmp/joe_and_bill.wav /tmp/joe_and_bill.aac'], shell=True, stderr=subprocess.STDOUT)
pushToS3(bucketname, filename)
return process.decode('utf-8')
except subprocess.CalledProcessError as e:
return e.output.decode('utf-8'), os.listdir()
def grabFromS3(bucket, file):
obj = s3client.get_object(Bucket=bucket, Key=file)
data = BytesIO(obj['Body'].read())
return(data)
def pushToS3(bucket, file):
s3client.upload_file('/tmp/' + file[:-4] + '.aac', bucket, file[:-4] + '.aac')
return
You can listen to the output of this here. WARNING: Turn your volume down or your ears will bleed.
The original file can be heard here.
Does anyone have any idea what might be causing the encoding errors? It doesn't seem to be an issue with the file upload, since the md5 on the Lambda fs matches the MD5 of the uploaded file.
I've also tried building the static binary on an Amazon Linux instance in EC2, then zipping and porting it into the Lambda project, but the same issue persists.
I'm stumped! :(
Alright this is a fun one.
So it turns out the Python subprocess inherits stdin from some Lambda processes going on in the background. I was watching this AWS re:Invent keynote and he was describing some issues they were having w.r.t. this issue.
I added stdin=subprocess.DEVNULL to the subprocess call and the audio is now fixed.
Very interesting bug if you ask me.

How to use continuation monad transformer with spawn/exec in purescript?

I want to execute a command using spawn in Node.ChildProcess but i have no clue on how to hook the the function which spawns the command and the rest of the application with spawn. I have a vague idea that i need to use ContT for hooking up the error and success callbacks and am not able to figure out the data pipeline as a single program.
This is the program I am trying to write -
Wait for a request (let's say as an HTTP server)
On request, write something to a file
Fire a terminal command
Collect output from terminal command
Send response
I would go for the easier solution and use Aff. I have an example of how I'm doing spawned child processes here: https://github.com/justinwoo/vidtracker/blob/b5756099a4f683d262bc030d33b68343a47f14d7/src/GetIcons.purs#L44-L54
curl :: forall e.
String
-> String
-> Aff
_
Unit
curl url path = do
cp <- liftEff $ spawn "curl" [url, "-o", path] defaultSpawnOptions
makeAff \e s -> do
onError cp (e <<< Exc.error <<< unsafeStringify)
onClose cp (s <<< const unit)
I have another example here where I'm collecting the output: https://github.com/justinwoo/simple-rpc-telegram-bot/blob/09e894c81493f913fe316f6689cb94ce1f5056e6/src/Main.purs#L149

Create a zip with no root directory with universal sbt-native-packager

I use universal sbt-native-packager to generate a zip file distribution.
sbt universal:packageBin
The generated zip file, once extracted, contains everything inside a main directory named as my zip file:
unzip my-project-0.0.1.zip
my-project-0.0.1/bin
my-project-0.0.1/lib
my-project-0.0.1/conf
...
How can I create a zip that has no root folder, so that when extracted it will have a structure like that?
bin
lib
conf
Thanks
I'm not confident enough with sbt and scala to submit a pull request.
bash scripting has to be excluded right now, so my current (and ugly) solution is this one:
packageBin in Universal := {
val originalFileName = (packageBin in Universal).value
val (base, ext) = originalFileName.baseAndExt
val newFileName = file(originalFileName.getParent) / (base + "_dist." + ext)
val extractedFiles = IO.unzip(originalFileName,file(originalFileName.getParent))
val mappings: Set[(File, String)] = extractedFiles.map( f => (f, f.getAbsolutePath.substring(originalFileName.getParent.size + base.size + 2)))
val binFiles = mappings.filter{ case (file, path) => path.startsWith("bin/")}
for (f <- binFiles) f._1.setExecutable(true)
ZipHelper.zip(mappings,newFileName)
IO.move(newFileName, originalFileName)
IO.delete(file(originalFileName.getParent + "/" + originalFileName.base))
originalFileName
}
The solution proposed on github seems to be way nicer than mine even tough it doesn't work for me:
https://github.com/sbt/sbt-native-packager/issues/276
Unfortunately it looks like the name of that top-level directory is fixed to be the same as the name of the distributable zip (check out line 24 of the ZipHelper source on GitHub).
So unless you feel like making it configurable and submitting a pull request, it might just be easier to modify the resulting zip on the command line (assuming some flavour of UNIX):
unzip my-project-0.0.1.zip && cd my-project-0.0.1 && zip -r ../new.zip * && cd -
This will create new.zip alongside the existing zipfile - you could then mv it over the top if you like.

How do you write a Scala script that will react to file changes

I would like to change the following batch script to Scala (just for fun), however, the script must keep running and listen for changes to the *.mkd files. If any file is changed, then the script should re-generate the affected doc. File IO has always been my Achilles heel...
#!/bin/sh
for file in *.mkd
do
pandoc --number-sections $file -o "${file%%.*}.pdf"
done
Any ideas around a good approach to this will be appreciated.
The following code, taken from my answer on: Watch for project files also can watch a directory and execute a specific command:
#!/usr/bin/env scala
import java.nio.file._
import scala.collection.JavaConversions._
import scala.sys.process._
val file = Paths.get(args(0))
val cmd = args(1)
val watcher = FileSystems.getDefault.newWatchService
file.register(
watcher,
StandardWatchEventKinds.ENTRY_CREATE,
StandardWatchEventKinds.ENTRY_MODIFY,
StandardWatchEventKinds.ENTRY_DELETE
)
def exec = cmd run true
#scala.annotation.tailrec
def watch(proc: Process): Unit = {
val key = watcher.take
val events = key.pollEvents
val newProc =
if (!events.isEmpty) {
proc.destroy()
exec
} else proc
if (key.reset) watch(newProc)
else println("aborted")
}
watch(exec)
Usage:
watchr.scala markdownFolder/ "echo \"Something changed!\""
Extensions have to be made to the script to inject file names into the command. As of now this snippet should just be regarded as a building block for the actual answer.
Modifying the script to incorporate the *.mkd wildcards would be non-trivial as you'd have to manually search for the files and register a watch on all of them. Re-using the script above and placing all files in a directory has the added advantage of picking up new files when they are created.
As you can see it gets pretty big and messy pretty quick just relying on Scala & Java APIs, you would be better of relying on alternative libraries or just sticking to bash while using INotify.