Speaker Mode Video in Azure Media Services - azure-media-services

I have two hi-res local video recording files from a podcast interview.
I would like to merge them into one output file with the speaker showing at all times.
So we'd need to analyse the audio track and see who is speaking (guest has priority) and then create an array of timestamps of the speaker.
Volume analysis example using ffmpeg similar to what I'm describing
Then I'd like to use AMS to merge the video files based on the timestamps (eg. host.mp4 source for 20 seconds then guest.mp4 for 30 seconds, etc)
How would I go about this?

This sounds like the speaker enumeration feature in Azure Video Indexer https://learn.microsoft.com/en-us/azure/azure-video-indexer/video-indexer-overview#videoaudio-ai-features.

Related

How to play video while it is downloading using AVPro video in unity3D?

I want to play the video simultaneously while it is downloading via unitywebrequest. Will AVPro video support this? If so please provide me some guidance, as i am new to unity and avpro video. I can able to play the video which is downloaded fully through FullscreenVideo.prefab in AVPro demo. Any help will be much appreciated.
There are two main options you could use for displaying the video while it is still downloading.
Through livestream
You can stream a video to AVPro video using the "absolute path or URL" option on the media player component, then linking this to a stream in rtsp, MPEG-DASH, HLS, or HTTP progressive streaming format. Depending on what platforms you will be targeting some of these options will work better than others
A table of which file format supports what platform can be found in the AVProVideo Usermanual that is included with AVProVideo from page 12 and onwards.
If you want to use streaming you also need to set the "internet access" option to "required" in the player settings, as a video cannot stream without internet access.
A video that is being streamed will automatically start/resume playing when enough video is buffered.
This does however require a constant internet connection which may not be ideal if you're targeting mobile devices, or unnecessary if you're planning to play videos in a loop.
HLS m3u8
HTTP Live Streaming (HLS) works by cutting the overall stream into shorter, manageable hunks of data. These chunks will then get downloaded in sequence regardless of how long the stream is. m3u8 is a file format that works with playlists that keeps information on the location of multiple media files instead of an entire video, this can then be fed into a HLS player that will play the small media files in sequence as dictated in the m3u8 file.
using this method is usefull if you're planning to play smaller videos on repeat as the user will only have to download each chunk of the video once, which you can then store for later use.
You can also make these chunks of video as long or short as you want, and set a buffer of how many chunks you want to have pre-loaded. if for example you set the chunk size to 5 seconds, with a buffer of 5 videos the only loading time you'll have is when loading the first 25 seconds of the video. once these first 5 chunks are loaded it will start playing the video and load the rest of the chunks in the background, without interrupting the video (given your internet speed can handle it)
a con to this would be that you have to convert all your videos to m3u8 yourself. a tool such as FFMPEG can help with this though.
references
HLS
m3u8
AVPro documentation

get access to soundcloud tracks

I'd like to built an application for analysis and classifications of tracks (everyday sound tracks instead of speech or music) recorded and/or streamed by soundcloud.
The idea is to use existing soundcloud infastructure (database, record, share, comment...) and just add an analysis level in between.
It is possible trought the API to access to the track binary files? We'd like to process some of them.
Is there also a way to access to the audio stream durring recording? it's for live classification task.
Thanks
Boris
Some users allow for their tracks to be downloaded, others don't. On one of the downloadable tracks, the track information will have a download_url, and you can download and process that however you like. As for accessing the stream during recording, at that point in time, the file doesn't exist on SoundCloud yet (it's only uploaded once the recording is complete). You could write your own Flash recorder or use the Web Audio API to get audio information during recording.

Correct approach to play a multiple files simultaneously using Core Audio

I've developed a model, which plays up to 10 tracks with a number of clips on each using AVFoundation.
Still I don't like the performance and sound corruption in deed.
I read documentation of Core Audio and tried out some samples.
Some plays only 1 file using AU Generator (AudioFilePlayer subtype).
Those samples where 2 files are playing use MultichannelMixer and custom buffers to render audio data.
Could I use MultichannelMixer and connect multiple Generators (AudioFilePlayer) to its nodes?
Or the best way is to render data by myself?
Thanks in advance!

iPhone SDK Audio Mixer

What I need to do is be able to mix 4 channels of audio (not from a live source, just prerecorded audio files in the app bundle), and change their volumes individually, in real time, preferably with MP3s. What's the best/correct road for me to take, regarding all the various sound APIs for the iPhone?
Thanks!
Storm Sim does this with AVAudioPlayer, which is certainly the simplest methdod. You can call prepareToPlay on each of the player objects then kick them off with play later so there won't be any delay. I also use a blank 1-second audio player on eternal loop to keep the deviceTime counting down, so you can use playAfter to give a specific deviceTime in the future to make all the samples play in-sync or offset relative to each other (deviceTime only ticks if there is some sort of audio playing). The AVAudioPlayerDelegate has interrupted/resumed events and finishedPlaying so you can get notification of what is happening.
However there is only one hardware MP3/AAC decoder, so the other three will use up CPU (and thus battery) doing the decoding. If you want to maximize battery life, use CAF files in IMA4#44100. It is about 1/4 the size of the raw WAV files so it isn't as good as MP3 but the performance is much better, especially if you are using a lot of small audio tracks. If you are using voice you can get away with much less fidelity and smash the files even more. afconvert in terminal can help you getting your source files in the CAF format (you should use CAF files no matter what the encoding).

Create Audio file on iPhone/iPad from many other audio files (mixer)

I am trying to create something similar like Piano app on the iPhone. When people tap a key, it play a piano note. Basically, there will have only 7 notes (C) at the moment. Each note is a .caf file and its length is 5 seconds.
I do not know if there is any way to save the song user played and export to mp3/caf format? The AVAudioRecord seems only record from the microphone input.
Many thanks
For such an app you probably don't want to record into an audio-file - instead record the note-presses and timings for a much more compact format and then play back the same as if the user was pressing the notes at the recorded times.
If you do want to be able to export an audio file format then you can write a simple mixer which adds together the individual samples from your source samples with the correct offsets and puts the results in your output audio buffer. You should probably also write a very simple compressor so that you keep the sample volume without any distortion caused by 'clipping'. This can be done by dividing down any summed sample above 95% of the maximum sample value. There may also be a way to use OpenAL to do this mixing for you and play back into an audio buffer.