Google cloud text to speech silence at the beginning and at the end of generated mp3

Google cloud text to speech silence at the beginning and at the end of generated mp3 - google-text-to-speech

I need to quickly play several generated audio file from google cloud text to speech service.
Here is what i get:
https://yadi.sk/i/jbkGpd23bprmyw
As you see it has about 0.15-0.3 s silence at the beginning and at the end of mp3 data.
Is there a way to tell API not to include these silent parts?

You can use ffmpeg to extract the portion of the audio clip you wish to keep.
For example, if you want the 0.5 seconds in the middle of a 0.8 second clip with 0.15s of silence at the beginning and end you set -t 00:00:00.500 (length of audio to keep) and use -ss 00:00:00.150 parameter at the beginning to set where to start.
Full command will look like this:
ffmpeg -ss 00:00:00.150 -i ttsclip.mp3 -t 00:00:00.500 -acodec copy ttsclip-cut.mp3

Related

Is there a way to include a transition video when using ffmpeg's select filter?

I wanted to take some segments of an input video and make a highlight reel automatically using ffmpeg. I was originally experimenting with trimming and concating but it was difficult because the audio was falling out of sync and also I have a variable number of highlights that I want.
I discovered from the top comment here that the 'select' filter is very powerful. I have a python program that inputs all the parts I want selected into the command and run it in the terminal and it works perfect. Only issue is that I want a quick transition video to play in between each of these highlights. Is this not possible with select? Do I have to return to using trim and concat? Thank you!
Edit: for reference, here is an example ffmpeg command I am running
ffmpeg -y -i video.mp4 -i audio.mp4 -vf "select='between(t,56,69)+between(t,60,135)+between(t,73,132)+between(t,152,163)+between(t,251,278)+between(t,600,700)+between(t,774,872)', setpts=N/FRAME_RATE/TB " -af "aselect='between(t,56,69)+between(t,60,135)+between(t,73,132)+between(t,152,163)+between(t,251,278)+between(t,600,700)+between(t,774,872)',asetpts=N/SR/TB" output.mp4

ffmpeg audio conversion in flutter

I would like to get data from audio file based on microphone input (both Android and iOS), currently I'm using audioplayers and recordMp3 to record the microphone input. This results in a mp3 file with a local file path. In order to use the audio data, I want an uncompressed format like WAV. Would ffmpeg help with this conversion ? I want to eventually use this data for visualization.

MP3 to WAV
ffmpeg -i input.mp3 output.wav
Note that any encoding artifacts in the MP3 will be included in the WAV.
Piping from ffmpeg to your visualizer
I'm assuming you need WAV/PCM because your visualizer only accepts that format and does not accept MP3. You can create a WAV file as shown in the example above, but if your visualizer accepts a pipe as input you can avoid creating a temporary file:
ffmpeg -i input.mp3 -f wav - | yourvisualizer …
Using ffmpeg for visualization
See examples at How do I turn audio into video (that is, show the waveforms in a video)?

How to programmatically output fragmented mp4 file using Bento4

I want to record video conference. I can receive rtp media from video conferencing server. I want to output fragmented mp4 file format for live streaming. So, how to write a fragmented mp4 file programmatically using Bento4?

MP4Box supports DASH. i supply the following simple example:
MP4Box -dash 4000 -frag 4000 -rap -segment-name test_ input.mp4
'-dash 4000' to segment the input mp4 file into 4000ms chunks
'-frag 4000' since frag = dash, actually segments are not fragmented further.
'-rap' to enforce each segment to start random access points, i.e. at keyframes. In such case the segment duration may differ from 4000ms depending on distribution of key frames.
'-segment-name' to specify the pattern of segments names. So in this case, the segments will be named like this: test_1.m4s, test_2.m4s, ...

How to record mono with arecord?

I'm using the SpeechRecognition package for text to speech. Its input for WAV files, however has to be mono. When I use arecord -D plughw:0 --duration=5 -f cd -vv ~/test.wav and I play it using aplay test.wav I get (it plays back):
Playing WAVE 'test.wav' : Signed 16 bit Little Endian, Rate 44100 Hz, Stereo
How do I get arecord to record in mono? I have tried adding in --channels=1 also but when it starts recording and displays information about its (Plug PCM: Route conversion PCM) setup, it always shows:
...
channels : 2
...
My USB PnP Sound Device's setup shows the channel is 1 though.
Even though I've set it to 1, it plays back as stereo too. What's wrong?

Your issue is weird, but I usually use sox for recording or conversion
You can use the rec command to record directly :
rec -r 16000 -c 1 -d 5 ~/test.wav
In this case see also this question : https://raspberrypi.stackexchange.com/questions/4715/sox-alsa-sound-recording-issue
Or you can convert your wav file from stereo to mono : sox ~/test.wav -c 1 ~/test_mono.wav
Documentation & examples : http://linux.die.net/man/1/sox

How to use Media Segmenter for split video?

I have read many documents still very confused about HTTP Live Streaming.
But i am still trying for solution.. and i have convert my video in .ts format with ffmpeg.
Now i know that i have to split my video and have to create playlist with the use of mediasegmenter.
But i don't know where is mediasegmenter and how to use it to split video.
I am very new to this so sorry for this silly Question..
Any help would be appreciated..!!
Thanks in advance..!!

Here: 35703_streamingtools_beta.dmg or go to http://connect.apple.com/ and search for "HTTP Live Streaming", or download from https://developer.apple.com/streaming/. Usage:
mediafilesegmenter -t 10 myvideo-iphone.ts
This will generate one .ts file for each 10 seconds of the video plus a .m3u8 file pointing to all of them.

If you use FFMpeg, it's very easy to split files with it.
Don't use Media Segmenter.
Simply write something like this:
ffmpeg.exe -i YourFile.mp4 -ss 00:10:00 -t 00:05:00 OutFile.mp4
where -ss 00:10:00 is time offset , -t 00:05:00 is duration of OutFile.mp4.
This will create OutFile.mp4 which contains 5 minute video(-t 00:05:00) of YourFile.mp4
(from 00:10:00 to 00:15:00 of YourFile.mp4).
Useful ?)
And also you can create .ASX playlist which is able to cast streams and is very simple.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Google cloud text to speech silence at the beginning and at the end of generated mp3 - google-text-to-speech

I need to quickly play several generated audio file from google cloud text to speech service. Here is what i get: https://yadi.sk/i/jbkGpd23bprmyw As you see it has about 0.15-0.3 s silence at the beginning and at the end of mp3 data. Is there a way to tell API not to include these silent parts?

Related

Is there a way to include a transition video when using ffmpeg's select filter?

ffmpeg audio conversion in flutter

How to programmatically output fragmented mp4 file using Bento4

How to record mono with arecord?

How to use Media Segmenter for split video?

Categories

Resources