I'm using the SpeechRecognition package for text to speech. Its input for WAV files, however has to be mono. When I use arecord -D plughw:0 --duration=5 -f cd -vv ~/test.wav and I play it using aplay test.wav I get (it plays back):
Playing WAVE 'test.wav' : Signed 16 bit Little Endian, Rate 44100 Hz, Stereo
How do I get arecord to record in mono? I have tried adding in --channels=1 also but when it starts recording and displays information about its (Plug PCM: Route conversion PCM) setup, it always shows:
...
channels : 2
...
My USB PnP Sound Device's setup shows the channel is 1 though.
Even though I've set it to 1, it plays back as stereo too. What's wrong?
Your issue is weird, but I usually use sox for recording or conversion
You can use the rec command to record directly :
rec -r 16000 -c 1 -d 5 ~/test.wav
In this case see also this question : https://raspberrypi.stackexchange.com/questions/4715/sox-alsa-sound-recording-issue
Or you can convert your wav file from stereo to mono : sox ~/test.wav -c 1 ~/test_mono.wav
Documentation & examples : http://linux.die.net/man/1/sox
Related
Hi am currently trying to retrieve 3 second clips of an audio file whilst it is recording in flutter. I am using the recording module flutter sound and flutter ffmpeg.
I record the audio file with default codec (.aac). The file is saved to the cache getTemporaryDirectory()
I then copy the file using this flutter ffmpeg code
List<String> arguments = ["-ss", start.toString(), "-i", inPath, "-to", end.toString(), "-c", "copy", outPath];
await flutterFFmpeg.executeWithArguments(arguments);
Start: start time (e.g. 0) and End: end time (e.g. 3)
It then returns this error
FFmpeg exited with rc: 1 [mov,mp4,m4a,3gp,3g2,mj2 # 0x748964ea00] moov
atom not found
Helpful information:
A moov atom is data about the file (e.g timescale,duration)
I know the inPath exists because I check that before executing ffmpeg command
The outPath is also format .aac
This ffmpeg function is being ran whilst the recording is still occurring
Example inPath uri looks like this /data/user/0/com.my.app/cache/output.aac
I have no problems when running on iOS, only on android
I would be grateful for help, I have spent many days trying to fix this problem. If you need anymore info please leave a comment. Thanks
Default Codec is not guaranteed to be AAC/ADTS.
It will depend of the Android version of your device.
You can do several things to understand better :
ffprobe on your file to see what has been recorded by Flutter Sound.
Use a specific Codec instead of default : aac/adts is a good choice because it can be streamed (you want to process the audio data during the recording and not after closing the file)
Verify that your file contains something and that the data are not still in internal buffers
Record to a dart PCM stream instead of a file. Working with a file and use FFmpeg to seek into it is complicated and perhaps does not fill your needs.
I need to quickly play several generated audio file from google cloud text to speech service.
Here is what i get:
https://yadi.sk/i/jbkGpd23bprmyw
As you see it has about 0.15-0.3 s silence at the beginning and at the end of mp3 data.
Is there a way to tell API not to include these silent parts?
You can use ffmpeg to extract the portion of the audio clip you wish to keep.
For example, if you want the 0.5 seconds in the middle of a 0.8 second clip with 0.15s of silence at the beginning and end you set -t 00:00:00.500 (length of audio to keep) and use -ss 00:00:00.150 parameter at the beginning to set where to start.
Full command will look like this:
ffmpeg -ss 00:00:00.150 -i ttsclip.mp3 -t 00:00:00.500 -acodec copy ttsclip-cut.mp3
I would like to get data from audio file based on microphone input (both Android and iOS), currently I'm using audioplayers and recordMp3 to record the microphone input. This results in a mp3 file with a local file path. In order to use the audio data, I want an uncompressed format like WAV. Would ffmpeg help with this conversion ? I want to eventually use this data for visualization.
MP3 to WAV
ffmpeg -i input.mp3 output.wav
Note that any encoding artifacts in the MP3 will be included in the WAV.
Piping from ffmpeg to your visualizer
I'm assuming you need WAV/PCM because your visualizer only accepts that format and does not accept MP3. You can create a WAV file as shown in the example above, but if your visualizer accepts a pipe as input you can avoid creating a temporary file:
ffmpeg -i input.mp3 -f wav - | yourvisualizer …
Using ffmpeg for visualization
See examples at How do I turn audio into video (that is, show the waveforms in a video)?
I'm working on a networking component where the server provides a Texture and sends it to FFmpeg to be encoded (h264_qsv), and sends it over the network. The client receives the stream (mp4 presumably), decodes it using FFmpeg again and displays it on a Texture.
Currently this works very slowly since I am saving the texture to the disk before encoding it to a mp4 file (also saved to disk), and on the client side I am saving the .png texture to disk after decoding it so that I could use it in Unity.
Server side FFmpeg process is started with process.StartInfo.Arguments = #" -y -i testimg.png -c:v h264_qsv -q 5 -look_ahead 0 -preset:v faster -crf 0 test.qsv.mp4"; currently and client side with process.StartInfo.Arguments = #" -y -i test.qsv.mp4 output.png";
Since this needs to be very fast (30 fps at least) and real time, I need to pipe the Texture directly to the FFmpeg process. On the client side, I need to pipe the decoded data to the displayed Texture directly as well (opposed to saving it and then reading from disk).
A few days of researching showed me that FFmpeg supports various pipelining options, including using data formats such as bmp_pipe (piped bmp sequence), bin(binary text), data (raw data) and image2pipe (piped image2 sequence) however documentation and examples on how to use these options are very scarce.
Please help me: which format should I use (and how should it be used) ?
=== BACKGROUND ===
Some time ago I ripped a lot of music from an internet radio station. Unfortunately something seems to have went wrong, since the length of most files is displayed as being several hours, but they started playing at the correct position.
Example: If a file is really 3 minutes long and it would be displayed as 3 hours, playback would start at 2 hours and 57 minutes.
Before I upgraded my system, gstreamer was in an older version and its behaviour would be as described above, so I didn't pay too much attention. Now I have a new version of gstreamer which cannot handle these files correctly: It "plays" the whole initial offset.
=== /BACKGROUND ===
So here is my question: How is it possible to modify an OGG/Vorbis file in order to get rid of useless initial offsets? Although I tried several tag-edit programs, none of them would allow me to edit these values. (Interestingly enough easytag will display me both times, but write the wrong one...)
I finally found a solution! Although it wasn't quite what I expected...
After trying several other options I ended up with the following code:
#!/bin/sh
cd "${1}"
OUTDIR="../`basename "${1}"`.new"
IFS="
"
find . -wholename '*.ogg' | while read filepath;
do
# Create destination directory
mkdir -p "${OUTDIR}/`dirname "${filepath}"`"
# Convert OGG to OGG
avconv -i "${filepath}" -f ogg -acodec libvorbis -vn "${OUTDIR}/${filepath}"
# Copy tags
vorbiscomment -el "${filepath}" | vorbiscomment -ew "${OUTDIR}/${filepath}"
done
This code recursively reencodes all OGG files and then copies all vorbis comments. It's not a very efficient solution, but it works nevertheless...
What the problem was: I guess it has something to do with the output of ogginfo:
...
New logical stream (#1, serial: 74a4ca90): type vorbis
WARNING: Vorbis stream 1 does not have headers correctly framed. Terminal header page contains additional packets or has non-zero granulepos
Vorbis headers parsed for stream 1, information follows...
Version: 0
Vendor: Xiph.Org libVorbis I 20101101 (Schaufenugget)
...
Which disappears after reencoding the file...
At the rate at which I'm currently encoding it will probably take several hours until my whole media library will be completely reencoded... but at least I verified with several samples that it works :)