pyaudio save multiple .WAV file with nonblocking - callback

Updates:
Now I found out that we can paste some code in the callback function and ended up more questions:
When will be call and stop the callback functions? when we open and close the stream?
The callback function can return the stream data(audio_data from the code). As we did not call the function, the pyaudio do it internally I believe. How do I get the return stream data from callback?
import pyaudio
import wave
import numpy as np
import npstreams
import time
CHUNK = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 2
RATE = 44100
audio = pyaudio.PyAudio()
channel_1_frames = []
channel_2_frames = []
fulldata = np.array([])
def callback(in_data, frame_count, time_info, flag):
global b,a,fulldata #global variables for filter coefficients and array
audio_data = np.fromstring(in_data, dtype=np.int16)
channel_1 = audio_data[0::CHANNELS]
channel_2 = audio_data[1::CHANNELS]
data1 = channel_1.tostring()
data2 = channel_2.tostring()
channel_1_frames.append(data1)
channel_2_frames.append(data2)
wf1 = wave.open('Channel_1.wav', 'wb')
wf2 = wave.open('Channel_2.wav', 'wb')
wf1.setnchannels(1)
wf2.setnchannels(1)
wf1.setsampwidth(audio.get_sample_size(FORMAT))
wf2.setsampwidth(audio.get_sample_size(FORMAT))
wf1.setframerate(RATE)
wf2.setframerate(RATE)
wf1.writeframes(b''.join(channel_1_frames))
wf2.writeframes(b''.join(channel_2_frames))
wf1.close()
wf2.close()
return (audio_data, pyaudio.paContinue)
stream = audio.open(format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
#frames_per_buffer=CHUNK,
stream_callback=callback)
stream.start_stream()
while stream.is_active():
time.sleep(10)
stream.stop_stream()
stream.close()
audio.terminate()
=============================================
I am trying to record multiple channels into multiple .WAV file.
I can do that with stream.read() and numpy array to separate into different array, and save to .WAV file
stream = audio.open(format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
frames_per_buffer=CHUNK)
print("* recording")
channel_1_frames = []
channel_2_frames = []
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
data = stream.read(CHUNK)
# convert string to numpy array
data_array = np.frombuffer(data, dtype='int16')
# select channel
channel_1 = data_array[0::CHANNELS]
channel_2 = data_array[1::CHANNELS]
# convert numpy array to string
data1 = channel_1.tostring()
data2 = channel_2.tostring()
channel_1_frames.append(data1)
channel_2_frames.append(data2)
stream.stop_stream()
stream.close()
audio.terminate()
However, from the module documentation, https://people.csail.mit.edu/hubert/pyaudio/docs/#class-stream, it said stream.read() and stream.write() should not be used for non-blocking.
And I found a good non-blocking pyaudio example from Github: https://gist.github.com/sloria/5693955
Which did not use stream.read().
I am not sure if I can read and turn steam numpy array without stream.read()
So is it still possible to export the stream in to different .WAV? and make it non blocking?
Thanks

As I learn more in coding, I found the answers.
A1: The callback function run and stop with the stream.
######open stream with out starting automatically
audio = pyaudio.PyAudio()
stream = audio.open(format=format,
channels=2,
rate=44100,
input=True,
frames_per_buffer=44100,
stream_callback=self.get_callback(),
start=False)
######start,stop stream
stream.start_stream()
stream.close()
audio.terminate()
A2: To capture data in real time, we can use queue
self.recorded_frames = queue.Queue()
def get_callback(self):
def callback(in_data, frame_count, time_info, status):
self.recorded_frames.put(np.frombuffer(in_data, dtype=np.int16))
return in_data, pyaudio.paContinue
return callback

Related

In Flutter, how can I combine data into a string very quickly?

I am gathering accelerometer data from my phone using the sensors package, adding that data to a List<AccelerometerEvent>, and then combining that data into a (csv) String so I can use file.writeAsString() to save this data as a csv file. The problem I am having is that it takes too long to combine the data into a string.
For example:
List length : 28645
Milliseconds to combine into csv string: 113580
Code:
for (AccelerometerEvent event in history) {
dataString = dataString + '${event.timestamp},${event.x},${event.y},${event.z}\n';
}
What would be a more efficient way to do this?
Should I even combine the data into a string, or is there a better way to save this data to a file?
Thanks
Create a file object
write first line with column names, and after that each row (after \n) will be an event
See: FileMode.append
Will add new strings without replacing existing string in file
File file = File('events.csv');
file.writeAsStringSync('TIMESTAMP, X, Y, Z\n', mode: FileMode.append);
for (AccelerometerEvent event in history) {
final x = event.x;
final y = event.y;
final z = event.z;
final timestamp = event.timestamp;
String data = '$timestamp, $x, $y, $z';
file.writeAsStringSync('$data\n', mode: FileMode.append);
}

How do i implement a locustfile where each locust takes unique value from csv files for it's task?

enter code here
from locust import HttpLocust, TaskSet, task
class ExampleTask(TaskSet):
csvfile = open('failed.csv', 'r')
data = csvfile.readlines()
bakdata = list(data)
#task
def fun(self):
try:
value = self.data.pop().split(',')
print('------This is the value {}'.format(value[0]))
except IndexError:
self.data = list(self.bakdata)
class ExampleUser(HttpLocust):
host = 'https://www.google.com'
task_set = ExampleTask
Following my csv file:
516,True,success
517,True,success
518,True,success
519,True,success
520,True,success
521,True,success
522,True,success
523,True,success
524,True,success
525,True,success
526,True,success
527,True,success
528,True,success
529,True,success
530,True,success
531,True,success
532,True,success
533,True,success
534,True,success
535,True,success
536,True,success
537,True,success
538,True,success
539,True,success
540,True,success
541,True,success
542,True,success
543,True,success
544,True,success
545,True,success
546,True,success
547,True,success
548,True,success
549,True,success
550,True,success
551,True,success
552,True,success
553,True,success
554,True,success
555,True,success
556,True,success
557,True,success
558,True,success
559,True,success
Here after csv file end , locust does not takes unique value, it takes same value for all the users which is simulated.
I'm not 100% sure, but I think your problem is this line:
self.data = list(self.bakdata)
This will give each User instance a different copy of the list.
It should work if you change it to:
ExampleTask.data = list(self.bakdata)
Or you can use locust-plugins's CSVReader, see the example here:
https://github.com/SvenskaSpel/locust-plugins/blob/master/examples/csvreader_ex.py

Micropython webserver stops working when add code to read data from dht11

I've downloaded webserver code from https://docs.micropython.org/en/v1.8/esp8266/esp8266/tutorial/network_tcp.html, it worked well. But after adding code reading dht11 values, webserver stops responding. What's wrong with my code?
import machine
import dht
import socket
import network
sta_if = network.WLAN(network.STA_IF)
sta_if.connect(SSID, PASS)
addr = socket.getaddrinfo('0.0.0.0', 80)[0][-1]
d = machine.Pin(5, machine.Pin.IN, machine.Pin.PULL_UP)
def measure():
d.measure()
temp = d.temperature()
hum = d.humidity()
return temp, hum
s = socket.socket()
s.bind(addr)
s.listen(1)
print('listening on', addr)
while True:
cl, addr = s.accept()
print('client connected from', addr)
cl_file = cl.makefile('rwb', 0)
while True:
line = cl_file.readline()
if not line or line == b'\r\n':
break
response = measure()
cl.send(response)
cl.close()
I see two problems with your code:
First, to read the DHT11 sensor you need to use a DHT object. Try replacing
d = machine.Pin(5, machine.Pin.IN, machine.Pin.PULL_UP)
with
d = dht.DHT11(machine.Pin(5))
Second, the output of your measure() function is a numeric tuple and you're passing that directly to cl.send(), but that method needs a bytes object. You need to encode the two values into a string then convert that into bytes first. Instead of
cl.send(response)
you probably want something like
message = 'Temperature {} Humidity {}'.format(response[0], response[1])
cl.send(bytes(message, 'utf-8'))

Deploying Keras model to Google Cloud ML for serving predictions

I need to understand how to deploy models on Google Cloud ML. My first task is to deploy a very simple text classifier on the service. I do it in the following steps (could perhaps be shortened to fewer steps, if so, feel free to let me know):
Define the model using Keras and export to YAML
Load up YAML and export as a Tensorflow SavedModel
Upload model to Google Cloud Storage
Deploy model from storage to Google Cloud ML
Set the upload model version as default on the models website.
Run model with a sample input
I've finally made step 1-5 work, but now I get this strange error seen below when running the model. Can anyone help? Details on the steps is below. Hopefully, it can also help others that are stuck on one of the previous steps. My model works fine locally.
I've seen Deploying Keras Models via Google Cloud ML and Export a basic Tensorflow model to Google Cloud ML, but they seem to be stuck on other steps of the process.
Error
Prediction failed: Exception during model execution: AbortionError(code=StatusCode.INVALID_ARGUMENT, details="In[0] is not a matrix
[[Node: MatMul = MatMul[T=DT_FLOAT, _output_shapes=[[-1,64]], transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/cpu:0"](Mean, softmax_W/read)]]")
Step 1
# import necessary classes from Keras..
model_input = Input(shape=(maxlen,), dtype='int32')
embed = Embedding(input_dim=nb_tokens,
output_dim=256,
mask_zero=False,
input_length=maxlen,
name='embedding')
x = embed(model_input)
x = GlobalAveragePooling1D()(x)
outputs = [Dense(nb_classes, activation='softmax', name='softmax')(x)]
model = Model(input=[model_input], output=outputs, name="fasttext")
# export to YAML..
Step 2
from __future__ import print_function
import sys
import os
import tensorflow as tf
from tensorflow.contrib.session_bundle import exporter
import keras
from keras import backend as K
from keras.models import model_from_config, model_from_yaml
from optparse import OptionParser
EXPORT_VERSION = 1 # for us to keep track of different model versions (integer)
def export_model(model_def, model_weights, export_path):
with tf.Session() as sess:
init_op = tf.global_variables_initializer()
sess.run(init_op)
K.set_learning_phase(0) # all new operations will be in test mode from now on
yaml_file = open(model_def, 'r')
yaml_string = yaml_file.read()
yaml_file.close()
model = model_from_yaml(yaml_string)
# force initialization
model.compile(loss='categorical_crossentropy',
optimizer='adam')
Wsave = model.get_weights()
model.set_weights(Wsave)
# weights are not loaded as I'm just testing, not really deploying
# model.load_weights(model_weights)
print(model.input)
print(model.output)
pred_node_names = output_node_names = 'Softmax:0'
num_output = 1
export_path_base = export_path
export_path = os.path.join(
tf.compat.as_bytes(export_path_base),
tf.compat.as_bytes('initial'))
builder = tf.saved_model.builder.SavedModelBuilder(export_path)
# Build the signature_def_map.
x = model.input
y = model.output
values, indices = tf.nn.top_k(y, 5)
table = tf.contrib.lookup.index_to_string_table_from_tensor(tf.constant([str(i) for i in xrange(5)]))
prediction_classes = table.lookup(tf.to_int64(indices))
classification_inputs = tf.saved_model.utils.build_tensor_info(model.input)
classification_outputs_classes = tf.saved_model.utils.build_tensor_info(prediction_classes)
classification_outputs_scores = tf.saved_model.utils.build_tensor_info(values)
classification_signature = (
tf.saved_model.signature_def_utils.build_signature_def(inputs={tf.saved_model.signature_constants.CLASSIFY_INPUTS: classification_inputs},
outputs={tf.saved_model.signature_constants.CLASSIFY_OUTPUT_CLASSES: classification_outputs_classes, tf.saved_model.signature_constants.CLASSIFY_OUTPUT_SCORES: classification_outputs_scores},
method_name=tf.saved_model.signature_constants.CLASSIFY_METHOD_NAME))
tensor_info_x = tf.saved_model.utils.build_tensor_info(x)
tensor_info_y = tf.saved_model.utils.build_tensor_info(y)
prediction_signature = (tf.saved_model.signature_def_utils.build_signature_def(
inputs={'images': tensor_info_x},
outputs={'scores': tensor_info_y},
method_name=tf.saved_model.signature_constants.PREDICT_METHOD_NAME))
legacy_init_op = tf.group(tf.tables_initializer(), name='legacy_init_op')
builder.add_meta_graph_and_variables(
sess, [tf.saved_model.tag_constants.SERVING],
signature_def_map={'predict_images': prediction_signature,
tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY: classification_signature,},
legacy_init_op=legacy_init_op)
builder.save()
print('Done exporting!')
raise SystemExit
if __name__ == '__main__':
usage = "usage: %prog [options] arg"
parser = OptionParser(usage)
(options, args) = parser.parse_args()
if len(args) < 3:
raise ValueError("Too few arguments!")
model_def = args[0]
model_weights = args[1]
export_path = args[2]
export_model(model_def, model_weights, export_path)
Step 3
gsutil cp -r fasttext_cloud/ gs://quiet-notch-xyz.appspot.com
Step 4
from __future__ import print_function
from oauth2client.client import GoogleCredentials
from googleapiclient import discovery
from googleapiclient import errors
import time
projectID = 'projects/{}'.format('quiet-notch-xyz')
modelName = 'fasttext'
modelID = '{}/models/{}'.format(projectID, modelName)
versionName = 'Initial'
versionDescription = 'Initial release.'
trainedModelLocation = 'gs://quiet-notch-xyz.appspot.com/fasttext/'
credentials = GoogleCredentials.get_application_default()
ml = discovery.build('ml', 'v1', credentials=credentials)
# Create a dictionary with the fields from the request body.
requestDict = {'name': modelName, 'description': 'Online predictions.'}
# Create a request to call projects.models.create.
request = ml.projects().models().create(parent=projectID, body=requestDict)
# Make the call.
try:
response = request.execute()
except errors.HttpError as err:
# Something went wrong, print out some information.
print('There was an error creating the model.' +
' Check the details:')
print(err._get_reason())
# Clear the response for next time.
response = None
raise
time.sleep(10)
requestDict = {'name': versionName,
'description': versionDescription,
'deploymentUri': trainedModelLocation}
# Create a request to call projects.models.versions.create
request = ml.projects().models().versions().create(parent=modelID,
body=requestDict)
# Make the call.
try:
print("Creating model setup..", end=' ')
response = request.execute()
# Get the operation name.
operationID = response['name']
print('Done.')
except errors.HttpError as err:
# Something went wrong, print out some information.
print('There was an error creating the version.' +
' Check the details:')
print(err._get_reason())
raise
done = False
request = ml.projects().operations().get(name=operationID)
print("Adding model from storage..", end=' ')
while (not done):
response = None
# Wait for 10000 milliseconds.
time.sleep(10)
# Make the next call.
try:
response = request.execute()
# Check for finish.
done = True # response.get('done', False)
except errors.HttpError as err:
# Something went wrong, print out some information.
print('There was an error getting the operation.' +
'Check the details:')
print(err._get_reason())
done = True
raise
print("Done.")
Step 5
Use website.
Step 6
def predict_json(instances, project='quiet-notch-xyz', model='fasttext', version=None):
"""Send json data to a deployed model for prediction.
Args:
project (str): project where the Cloud ML Engine Model is deployed.
model (str): model name.
instances ([Mapping[str: Any]]): Keys should be the names of Tensors
your deployed model expects as inputs. Values should be datatypes
convertible to Tensors, or (potentially nested) lists of datatypes
convertible to tensors.
version: str, version of the model to target.
Returns:
Mapping[str: any]: dictionary of prediction results defined by the
model.
"""
# Create the ML Engine service object.
# To authenticate set the environment variable
# GOOGLE_APPLICATION_CREDENTIALS=<path_to_service_account_file>
service = googleapiclient.discovery.build('ml', 'v1')
name = 'projects/{}/models/{}'.format(project, model)
if version is not None:
name += '/versions/{}'.format(version)
response = service.projects().predict(
name=name,
body={'instances': instances}
).execute()
if 'error' in response:
raise RuntimeError(response['error'])
return response['predictions']
Then run function with test input: predict_json({'inputs':[[18, 87, 13, 589, 0]]})
There is now a sample demonstrating the use of Keras on CloudML engine, including prediction. You can find the sample here:
https://github.com/GoogleCloudPlatform/cloudml-samples/tree/master/census/keras
I would suggest comparing your code to that code.
Some additional suggestions that will still be relevant:
CloudML Engine currently only supports using a single signature (the default signature). Looking at your code, I think prediction_signature is more likely to lead to success, but you haven't made that the default signature. I suggest the following:
builder.add_meta_graph_and_variables(
sess, [tf.saved_model.tag_constants.SERVING],
signature_def_map={tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY: prediction_signature,},
legacy_init_op=legacy_init_op)
If you are deploying to the service, then you would invoke prediction like so:
predict_json({'images':[[18, 87, 13, 589, 0]]})
If you are testing locally using gcloud ml-engine local predict --json-instances the input data is slightly different (matches that of the batch prediction service). Each newline-separated line looks like this (showing a file with two lines):
{'images':[[18, 87, 13, 589, 0]]}
{'images':[[21, 85, 13, 100, 1]]}
I don't actually know enough about the shape of model.x to ensure the data being sent is correct for your model.
By way of explanation, it may be insightful to consider the difference between the Classification and Prediction methods in SavedModel. One difference is that, when using tensorflow_serving, which is based on gRPC, which is strongly typed, Classification provides a strongly-typed signature that most classifiers can use. Then you can reuse the same client on any classifier.
That's not overly useful when using JSON since JSON isn't strongly typed.
One other difference is that, when using tensorflow_serving, Prediction accepts column-based inputs (a map from feature name to every value for that feature in the whole batch) whereas Classification accepts row based inputs (each input instance/example is a row).
CloudML abstracts that away a bit and always requires row-based inputs (a list of instances). We even though we only officially support Prediction, but Classification should work as well.

Spark: run an external process in parallel

Is it possible with Spark to "wrap" and run an external process managing its input and output?
The process is represented by a normal C/C++ application that usually runs from command line. It accepts a plain text file as input and generate another plain text file as output. As I need to integrate the flow of this application with something bigger (always in Spark), I was wondering if there is a way to do this.
The process can be easily run in parallel (at the moment I use GNU Parallel) just splitting its input in (for example) 10 part files, run 10 instances in memory of it, and re-join the final 10 part files output in one file.
The simplest thing you can do is to write a simple wrapper which takes data from standard input, writes to file, executes an external program, and outputs results to the standard output. After that all you have to do is to use pipe method:
rdd.pipe("your_wrapper")
The only serious considerations is IO performance. If it is possible it would be better to adjust program you want to call so it can read and write data directly without going through disk.
Alternativelly you can use mapPartitions combined with process and standard IO tools to write to the local file, call your program and read the output.
If you end up here based on the question title from a Google search, but you don't have the OP restriction that the external program needs to read from a file--i.e., if your external program can read from stdin--here is a solution. For my use case, I needed to call an external decryption program for each input file.
import org.apache.commons.io.IOUtils
import sys.process._
import scala.collection.mutable.ArrayBuffer
val showSampleRows = true
val bfRdd = sc.binaryFiles("/some/files/*,/more/files/*")
val rdd = bfRdd.flatMap{ case(file, pds) => { // pds is a PortableDataStream
val rows = new ArrayBuffer[Array[String]]()
var errors = List[String]()
val io = new ProcessIO (
in => { // "in" is an OutputStream; write the encrypted contents of the
// input file (pds) to this stream
IOUtils.copy(pds.open(), in) // open() returns a DataInputStream
in.close
},
out => { // "out" is an InputStream; read the decrypted data off this stream.
// Even though this runs in another thread, we can write to rows, since it
// is part of the closure for this function
for(line <- scala.io.Source.fromInputStream(out).getLines) {
// ...decode line here... for my data, it was pipe-delimited
rows += line.split('|')
}
out.close
},
err => { // "err" is an InputStream; read any errors off this stream
// errors is part of the closure for this function
errors = scala.io.Source.fromInputStream(err).getLines.toList
err.close
}
)
val cmd = List("/my/decryption/program", "--decrypt")
val exitValue = cmd.run(io).exitValue // blocks until subprocess finishes
println(s"-- Results for file $file:")
if (exitValue != 0) {
// TBD write to string accumulator instead, so driver can output errors
// string accumulator from #zero323: https://stackoverflow.com/a/31496694/215945
println(s"exit code: $exitValue")
errors.foreach(println)
} else {
// TBD, you'll probably want to move this code to the driver, otherwise
// unless you're using the shell, you won't see this output
// because it will be sent to stdout of the executor
println(s"row count: ${rows.size}")
if (showSampleRows) {
println("6 sample rows:")
rows.slice(0,6).foreach(row => println(" " + row.mkString("|")))
}
}
rows
}}
scala> :paste "test.scala"
Loading test.scala...
...
rdd: org.apache.spark.rdd.RDD[Array[String]] = MapPartitionsRDD[62] at flatMap at <console>:294
scala> rdd.count // action, causes Spark code to actually run
-- Results for file hdfs://path/to/encrypted/file1: // this file had errors
exit code: 255
ERROR: Error decrypting
my_decryption_program: Bad header data[0]
-- Results for file hdfs://path/to/encrypted/file2:
row count: 416638
sample rows:
<...first row shown here ...>
...
<...sixth row shown here ...>
...
res43: Long = 843039
References:
https://www.scala-lang.org/api/current/scala/sys/process/ProcessIO.html
https://alvinalexander.com/scala/how-to-use-closures-in-scala-fp-examples#using-closures-with-other-data-types