GStreamer Editing Service for Rust / How to specify encoder settings - encoding

I'm new to gstreamer-rs. I try to render a video using GStreamer Editing Service. At the moment I am using the example code for encoding profiles as follows:
let p = gstreamer_editing_services::Pipeline::new();
let t = gstreamer_editing_services::Timeline::new_audio_video();
p.set_timeline(&t);
let l = t.append_layer();
// Every audiostream piped into the encodebin should be encoded using vorbis.
let audio_profile = gstreamer_pbutils::EncodingAudioProfileBuilder::new()
.format(&gstreamer::Caps::new_simple("audio/x-opus", &[]))
.presence(0)
.build()?;
// Every videostream piped into the encodebin should be encoded using theora.
let video_profile = gstreamer_pbutils::EncodingVideoProfileBuilder::new()
.format(&gstreamer::Caps::new_simple("video/x-h264", &[]))
.presence(0)
.build()?;
// All streams are then finally combined into a matroska container.
let container_profile = gstreamer_pbutils::EncodingContainerProfileBuilder::new()
.name("container")
.format(&gstreamer::Caps::new_simple("video/x-matroska", &[]))
.add_profile(&(video_profile))
.add_profile(&(audio_profile))
.build()?;
p.set_render_settings("file:///tmp/my.mp4", &container_profile);
The code is working; but I do not have any clue on how to enforce specific settings. Maybe someone can help.
How do you set things like bitrate, framerate and quality options for the audio and video encoder?
How do you define a specific encoder (like nvenc/nvh264enc for h264 video)?

Related

How to configure audio session and give output port a output value?

Does someone know how to configure the audio session in swift?
I've tried configuring audio session using some help from here: https://stackoverflow.com/a/52608869
Unfortunately, I don't have enough reputation to reply to that article but I'm getting this:
// Configuring audio session
AVAudio Session out options: <AVAudioSessionRouteDescription: 0x283054bf0,
inputs = (
"<AVAudioSessionPortDescription: 0x283054c50, type = MicrophoneBuiltIn; name = iPhone Microphone; UID = Built-In Microphone; selectedDataSource = Bottom>"
);
outputs = (
"<AVAudioSessionPortDescription: 0x283054fa0, type = Receiver; name = Receiver; UID = Built-In Receiver; selectedDataSource = (null)>"
)>
For output, it says, selectedDataSource = (null)> and as per Apple documentation
if this property returns nil, the port doesn’t support selecting between multiple data sources.
My question is how to assign selectedDataSource a value or how to make it support selecting between multiple data sources.

Pjsip/Pjsua video problem : frame buffer too small

I try to make a sip video call using Pjsip/Pjsua on my raspberry pi 3.
Before coding, I'm using the main sample app to test different options. Everything seems to work (registering, audio calling,..) but when I try to start a video call, the programs stops with the following message :
pjsua-armv7l-unknown-linux-gnueabihf: ../src/pjmedia-videodev/v4l2_dev.c:737: vid4lin_stream_get_frame_mmap: Assertion `!"frame buffer is too small for v4l2"' failed.
I've searched a lot, including the source code :
/* get frame from mmap */
static pj_status_t vid4lin_stream_get_frame_mmap(vid4lin_stream *stream, pjmedia_frame *frame)
{
struct v4l2_buffer buf;
pj_time_val time;
pj_status_t status = PJ_SUCCESS;
pj_bzero(&buf, sizeof(buf));
buf.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
buf.memory = V4L2_MEMORY_MMAP;
status = xioctl(stream->fd, VIDIOC_DQBUF, &buf);
if (status != PJ_SUCCESS)
return status;
if (frame->size < buf.bytesused) {
/* supplied buffer is too small */
pj_assert(!"frame buffer is too small for v4l2");
status = PJ_ETOOSMALL;
goto on_return;
}
So I understand that the pjmedia_frame has a "size" inferior to the v4l2 buffer, resulting to my failure.
My question is simple : how can i change this setting ?
I tried evetything in the sample app : changing resolution, bitrate, fps,..
I found some ressources saying to change the h264 profile level.. ok, but where do i set it ? Is it within the v4l2 manager ? or directly in the app ? How can i do it ?
I played with different options in v4l2 to reduce the bitrate/resolution in order to have a small buffer, but still getting the same error.
At this point I'm completely clueless.
For info, I compiled PJsip using openh264 (no libx264) as suggested by PjSip.
Thanks for your help/ideas ;)
According to your question about profile level, you can try with:
const pj_str_t codec_id = {"H264", 4};
pjmedia_vid_codec_param param;
pj_status_t status;
status = pjsua_vid_codec_get_param(&codec_id, &param);
param.dec_fmtp.param[0].name = pj_str("profile-level-id");
param.dec_fmtp.param[0].val = pj_str("42e01f");
status = pjsua_vid_codec_set_param(&codec_id, &param);
do this anywhere after pjsua_start(). Last two characters in val property are profile level. Description of levels can be found here (link). More information about h264 profile here (link).
I'm not an expert of v4l2, but have little experience with encoding video on rpi3, and I suggest you to use FFmpeg instead of pure openh264, beacuse of support of hardware acceleration (link).
Good luck!

How to store an image in a mongodb document using python and pillow?

I have the following code that creates a thumbnail from a request to a url:
r = requests.get(image_url, stream=True, headers=headers)
size = 500, 500
img = Image.open(r.raw)
thumb = ImageOps.fit(img, size, Image.ANTIALIAS)
At this point I would like to store the image inside a mongo document like so:
photo = {
'thumbnail': img,
'source': source,
'tags': tags,
'creationDate': datetime.now(),
}
Obviously that won't work so what kind of transformation do I need to apply before I can do this?
Okay here are my thoughts on this (I am not certain it will work though; some thoughts adopted from here).
I think you can achieve what you need using the Binary BSON type in pymongo library. Try loading the image in binary. Say using PILLOW (pil.image) or
image_file = open('1.bmp', 'rb')
or as
image_file = StringIO(open("test.jpg",'rb').read())
and then send it to Binary(image_file) type in pymongo
Binary_image_file = Binary(image_file) #pymongo libary
Then do a normal insert in mongo.
To read. do a normal find(). Then load the value from key and convert the data stored to image as:
image_data = StringIO.StringIO(Stringio_image_file)
image = Image.open(image_data)
I hope that helps a little. (also you could go with Aydin base64 proposition).
All the best.

Why is my Sphinx4 Recognition poor?

I am learning how to use Sphinx4 using the Maven plug-in for Eclipse.
I took the transcribe demo found on GitHub and altered it to process a file of my own. The audio file is 16bit, mono, 16khz. It is approximately 13 seconds long. I noticed that it sounds like it is in slow motion.
The words spoken in the file are, "also make sure it's easy for you to access the recording files so you could upload it if asked".
I am attempting to transcribe the file and my results are horrendous. My attempts at finding forum posts or links that thoroughly explain how to improve the results, or what I am not doing correctly have lead me no where.
I am looking to strengthen the accuracy of the transcription, but would like to avoid having to train a model myself due to the variance in the type of data that my current project will have to deal with. Is this not possible, and is the code I am using off?
CODE
(NOTE: Audio file available at https://instaud.io/8qv)
public class App {
public static void main(String[] args) throws Exception {
System.out.println("Loading models...");
Configuration configuration = new Configuration();
// Load model from the jar
configuration
.setAcousticModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us");
// You can also load model from folder
// configuration.setAcousticModelPath("file:en-us");
configuration
.setDictionaryPath("resource:/edu/cmu/sphinx/models/en-us/cmudict-en-us.dict");
configuration
.setLanguageModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us.lm.dmp");
StreamSpeechRecognizer recognizer = new StreamSpeechRecognizer(
configuration);
FileInputStream stream = new FileInputStream(new File("/home/tmscanlan/workspace/example/vocaroo_test_revised.wav"));
// stream.skip(44); I commented this out due to the short length of my file
// Simple recognition with generic model
recognizer.startRecognition(stream);
SpeechResult result;
while ((result = recognizer.getResult()) != null) {
// I added the following print statements to get more information
System.out.println("\ngetWords() before loop: " + result.getWords());
System.out.format("Hypothesis: %s\n", result.getHypothesis());
System.out.print("\nThe getResult(): " + result.getResult()
+ "\nThe getLattice(): " + result.getLattice());
System.out.println("List of recognized words and their times:");
for (WordResult r : result.getWords()) {
System.out.println(r);
}
System.out.println("Best 3 hypothesis:");
for (String s : result.getNbest(3))
System.out.println(s);
}
recognizer.stopRecognition();
// Live adaptation to speaker with speaker profiles
stream = new FileInputStream(new File("/home/tmscanlan/workspace/example/warren_test_smaller.wav"));
// stream.skip(44); I commented this out due to the short length of my file
// Stats class is used to collect speaker-specific data
Stats stats = recognizer.createStats(1);
recognizer.startRecognition(stream);
while ((result = recognizer.getResult()) != null) {
stats.collect(result);
}
recognizer.stopRecognition();
// Transform represents the speech profile
Transform transform = stats.createTransform();
recognizer.setTransform(transform);
// Decode again with updated transform
stream = new FileInputStream(new File("/home/tmscanlan/workspace/example/warren_test_smaller.wav"));
// stream.skip(44); I commented this out due to the short length of my file
recognizer.startRecognition(stream);
while ((result = recognizer.getResult()) != null) {
System.out.format("Hypothesis: %s\n", result.getHypothesis());
}
recognizer.stopRecognition();
System.out.println("...Printing is done..");
}
}
Here is the output (a photo album I took): http://imgur.com/a/Ou9oH
As Nikolay says, the audio sounds odd, probably because you haven't resampled it in the right way.
To downsample the audio from the original 22050 Hz to the desired 16kHz, you can run the following command:
sox Vocaroo.wav -r 16000 Vocaroo16.wav
The Vocaroo16.wav will sounds much better and it will (probably) give you better ASR results.

How to get the uncompressed file size of an MP3 file using CoreAudio API

Using CoreAudio, I am able to get the sampleRate (frames per second) and the file size, but in order to get the "total" time of the song, I need to know the Real file size of that compressed mp3.
AudioStreamBasicDescription asbd;
UInt32 asbdSize = sizeof(asbd);
// get the stream format.
err = AudioFileStreamGetProperty(inAudioFileStream, kAudioFileStreamProperty_DataFormat, &asbdSize, &asbd);
if (err)
{
[self failWithErrorCode:AS_FILE_STREAM_GET_PROPERTY_FAILED];
return;
}
sampleRate = asbd.mSampleRate;
Is there any way I can know the real size of the song using Objective-C?
Thanks in advance.
See the answer to this question
There's a property you can ask in AudioFileGetProperty called kAudioFilePropertyEstimatedDuration that should do the trick.