Has someone been able to make
http://www.surina.net/soundtouch/
work for iPhone?
Simple Xcode Demo would be helpful.
I'd just like to play a soundeffect with some pitch manipulation.
thx
chris
The best way to implement audio effects using SoundTouch is using also SoundStretch.
You can download the source code of both from here http://www.surina.net/soundtouch/sourcecode.html
SoundStretch is a command-line program that performs SoundTouch
library effects on WAV audio files. The program is a source code
example how SoundTouch library routines can be used to process sound
in other programs, but it can be used as a stand-alone audio
processing tool as well.
SoundStretch features:
Reads & writes .wav audio files
Allows very broad parameter adjustment ranges:
Tempo & Playback Rate adjustable in range -95% .. +5000%
The sound Pitch (key) adjustable in range -60 .. +60 semitones (+- 5 octaves).
Beats-Per-Second (BPM) detection that can adjust tempo to match with the desired BPM rate.
Full source codes available
Command-line interface allows using the SoundStretch utility for processing .wav audio files in batch mode
Supports processing .wav audio streams through standard input/output pipes
SoundStretch uses the SoundTouch library routines for the audio procesing.
Example of use:
NSArray *effects = [NSArray arrayWithObjects:#"-rate=-22", nil];
NSURL *audio = [self base:input output:output effects:effects];
Where base:output:effects is defined as:
- (NSURL *)base:(NSURL *)input output:(NSURL *)output effects:(NSArray *)effects{
int _argc = 3 + (int)[effects count];
const char *_argv[]={"createWavWithEffect",[[input path] UTF8String], [[output path] UTF8String],[#"" UTF8String],[#"" UTF8String],[#"" UTF8String],[#"" UTF8String],[#"" UTF8String],[#"" UTF8String],[#"" UTF8String],[#"" UTF8String],[#"" UTF8String]};
for (int i=0; i<[effects count]; i++) {
_argv[i+3] = [effects[i] UTF8String];
}
createWavWithEffect(_argc, _argv);
// IMPORTANT! Check the file size, maybe you will need to set by yourself
return output;
}
If you don't want to compile by yourself SoundTouch, I have shared a GitHub repository with that libraries compiled for armv7, armv7s, arm64, i386 and x86_64
https://github.com/enrimr/soundtouch-ios-library
And if you want to use SoundTouch by yourself without using SoundStretch, you have to add SoundTouch directory (which includes libSoundTouch.a and the directory with headers) in your Xcode project.
For SWIFT projects:
Programming with SWIFT you can't import a .h so you will need to create a .h file named <Your-Project-Name>-Bridging-Header-File.h
Then reference it in your projects build settings (under "Swift Compiler" look for "Objective C Bridging Header") with:
$(SRCROOT)/<Your-Project-Name>-Bridging-Header.h
And now you must be able to use SoundTouch class.
For Objective-C projects:
Include the following line
#include "SoundTouch.h"
in your controller file.
Implementation of createWavWithEffect:
int createWavWithEffect(const int nParams, const char * const paramStr[])
{
WavInFile *inFile;
WavOutFile *outFile;
RunParameters *params;
SoundTouch soundTouch;
fprintf(stderr, _helloText, SoundTouch::getVersionString());
try
{
// Parse command line parameters
params = new RunParameters(nParams, paramStr);
// Open input & output files
openFiles(&inFile, &outFile, params);
if (params->detectBPM == TRUE)
{
// detect sound BPM (and adjust processing parameters
// accordingly if necessary)
detectBPM(inFile, params);
}
// Setup the 'SoundTouch' object for processing the sound
setup(&soundTouch, inFile, params);
// clock_t cs = clock(); // for benchmarking processing duration
// Process the sound
process(&soundTouch, inFile, outFile);
// clock_t ce = clock(); // for benchmarking processing duration
// printf("duration: %lf\n", (double)(ce-cs)/CLOCKS_PER_SEC);
// Close WAV file handles & dispose of the objects
delete inFile;
delete outFile;
delete params;
fprintf(stderr, "Done!\n");
}
catch (const runtime_error &e)
{
// An exception occurred during processing, display an error message
fprintf(stderr, "%s\n", e.what());
return -1;
}
return 0;
}
Related
I am learning how to use Sphinx4 using the Maven plug-in for Eclipse.
I took the transcribe demo found on GitHub and altered it to process a file of my own. The audio file is 16bit, mono, 16khz. It is approximately 13 seconds long. I noticed that it sounds like it is in slow motion.
The words spoken in the file are, "also make sure it's easy for you to access the recording files so you could upload it if asked".
I am attempting to transcribe the file and my results are horrendous. My attempts at finding forum posts or links that thoroughly explain how to improve the results, or what I am not doing correctly have lead me no where.
I am looking to strengthen the accuracy of the transcription, but would like to avoid having to train a model myself due to the variance in the type of data that my current project will have to deal with. Is this not possible, and is the code I am using off?
CODE
(NOTE: Audio file available at https://instaud.io/8qv)
public class App {
public static void main(String[] args) throws Exception {
System.out.println("Loading models...");
Configuration configuration = new Configuration();
// Load model from the jar
configuration
.setAcousticModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us");
// You can also load model from folder
// configuration.setAcousticModelPath("file:en-us");
configuration
.setDictionaryPath("resource:/edu/cmu/sphinx/models/en-us/cmudict-en-us.dict");
configuration
.setLanguageModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us.lm.dmp");
StreamSpeechRecognizer recognizer = new StreamSpeechRecognizer(
configuration);
FileInputStream stream = new FileInputStream(new File("/home/tmscanlan/workspace/example/vocaroo_test_revised.wav"));
// stream.skip(44); I commented this out due to the short length of my file
// Simple recognition with generic model
recognizer.startRecognition(stream);
SpeechResult result;
while ((result = recognizer.getResult()) != null) {
// I added the following print statements to get more information
System.out.println("\ngetWords() before loop: " + result.getWords());
System.out.format("Hypothesis: %s\n", result.getHypothesis());
System.out.print("\nThe getResult(): " + result.getResult()
+ "\nThe getLattice(): " + result.getLattice());
System.out.println("List of recognized words and their times:");
for (WordResult r : result.getWords()) {
System.out.println(r);
}
System.out.println("Best 3 hypothesis:");
for (String s : result.getNbest(3))
System.out.println(s);
}
recognizer.stopRecognition();
// Live adaptation to speaker with speaker profiles
stream = new FileInputStream(new File("/home/tmscanlan/workspace/example/warren_test_smaller.wav"));
// stream.skip(44); I commented this out due to the short length of my file
// Stats class is used to collect speaker-specific data
Stats stats = recognizer.createStats(1);
recognizer.startRecognition(stream);
while ((result = recognizer.getResult()) != null) {
stats.collect(result);
}
recognizer.stopRecognition();
// Transform represents the speech profile
Transform transform = stats.createTransform();
recognizer.setTransform(transform);
// Decode again with updated transform
stream = new FileInputStream(new File("/home/tmscanlan/workspace/example/warren_test_smaller.wav"));
// stream.skip(44); I commented this out due to the short length of my file
recognizer.startRecognition(stream);
while ((result = recognizer.getResult()) != null) {
System.out.format("Hypothesis: %s\n", result.getHypothesis());
}
recognizer.stopRecognition();
System.out.println("...Printing is done..");
}
}
Here is the output (a photo album I took): http://imgur.com/a/Ou9oH
As Nikolay says, the audio sounds odd, probably because you haven't resampled it in the right way.
To downsample the audio from the original 22050 Hz to the desired 16kHz, you can run the following command:
sox Vocaroo.wav -r 16000 Vocaroo16.wav
The Vocaroo16.wav will sounds much better and it will (probably) give you better ASR results.
When moving file from one place to another, or when replacing file, I always use the methods moveItemAtURL:toURL: or replaceItemAtURL:WithItemAtURL: from NSFileManager.
When calling these methods, I want to determine how much time needed, so that I can use the NSProgressIndicator to tell users how long it's going to take. Just like when you are moving file using OSX, it tells u how much time remaining.
I have looked at the apple doc but couldn't find any information regarding this.
Wondering if this can be implemented, please advise.
You can't know in advance haw long it going to take. What you can do is compute the "percent complete" while you are copying the file. But to do that you need to use lower level APIs. You can use NSFileManagers attributesOfItemAtPath:error to get the file size and NSStreams for doing the copying (there are so many way to do this). Percent complete is bytesWritten / totalBytesInFile.
--- Edit: added sample code as a category on NSURL with a callback block passing the total number of bytes written, percen complete and estimated time left in seconds.
#import <mach/mach_time.h>
#interface NSURL(CopyWithProgress)<NSObject>
- (void) copyFileURLToURL:(NSURL*)destURL withProgressBlock:(void(^)(double, double, double))block;
#end
#implementation NSURL(CopyWithProgress)
- (void) copyFileURLToURL:(NSURL*)destURL
withProgressBlock:(void(^)(double, double, double))block
{
///
// NOTE: error handling has been left out in favor of simplicity
// real production code should obviously handle errors.
NSUInteger fileSize = [[NSFileManager defaultManager] attributesOfItemAtPath:self.path error:nil].fileSize;
NSInputStream *fileInput = [NSInputStream inputStreamWithURL:self];
NSOutputStream *copyOutput = [NSOutputStream outputStreamWithURL:destURL append:NO];
static size_t bufferSize = 4096;
uint8_t *buffer = malloc(bufferSize);
size_t bytesToWrite;
size_t bytesWritten;
size_t copySize = 0;
size_t counter = 0;
[fileInput open];
[copyOutput open];
uint64_t time0 = mach_absolute_time();
while (fileInput.hasBytesAvailable) {
do {
bytesToWrite = [fileInput read:buffer maxLength:bufferSize];
bytesWritten = [copyOutput write:buffer maxLength:bytesToWrite];
bytesToWrite -= bytesWritten;
copySize += bytesWritten;
if (bytesToWrite > 0)
memmove(buffer, buffer + bytesWritten, bytesToWrite);
}
while (bytesToWrite > 0);
if (block != nil && ++counter % 10 == 0) {
double percent = (double)copySize / fileSize;
uint64_t time1 = mach_absolute_time();
double elapsed = (double)(time1 - time0)/NSEC_PER_SEC;
double estTimeLeft = ((1 - percent) / percent) * elapsed;
block(copySize, percent, estTimeLeft);
}
}
if (block != nil)
block(copySize, 1, 0);
}
#end
int main (int argc, const char * argv[])
{
#autoreleasepool {
NSURL *fileURL = [NSURL URLWithString:#"file:///Users/eric/bin/data/english-words.txt"];
NSURL *destURL = [NSURL URLWithString:#"file:///Users/eric/Desktop/english-words.txt"];
[fileURL copyFileURLToURL:destURL withProgressBlock:^(double bytes, double pct, double estSecs) {
NSLog(#"Bytes=%f, Pct=%f, time left:%f s",bytes,pct,estSecs);
}];
}
return 0;
}
Sample Output:
Bytes=40960.000000, Pct=0.183890, time left:0.000753 s
Bytes=81920.000000, Pct=0.367780, time left:0.004336 s
Bytes=122880.000000, Pct=0.551670, time left:0.002672 s
Bytes=163840.000000, Pct=0.735560, time left:0.001396 s
Bytes=204800.000000, Pct=0.919449, time left:0.000391 s
Bytes=222742.000000, Pct=1.000000, time left:0.000000 s
I mostly concur with CRD. I just want to note that under certain common circumstances, both -moveItemAtURL:toURL: and -replaceItemAtURL:WithItemAtURL:... are very fast. When the source and destination are on the same volume, no data has to be copied or moved, only metadata. When the volume is local (as opposed to network-mounted), this typically takes negligible time. That said, it is appropriate to plan for the possibility that they could take significant time.
Also, he mentioned the copyfile() routine for moving files. A copy followed by deleting the original is the necessary approach when moving a file between volumes, but the rename() system call will perform a move within a volume without needing to copy anything. So, a reasonable approach would be to try rename() first and, if it fails with EXDEV, fall back to copyfile().
Finally, the exchangedata() system call can be used as part of a reimplementation of -replaceItemAtURL:WithItemAtURL:....
I don't recommend the approach suggested by aLevelOfIndirection because there are a lot of fiddly details about copying files. It's much better to rely on system libraries than trying to roll your own. His example completely ignores file metadata (file dates, extended attributes, etc.), for example.
The methods moveItemAtURL:toURL: and replaceItemAtURL:WithItemAtURL: are high-level operations. While they provide the semantics you want for the move/replace, as you've found out, they don't provide the kind of feedback you wish during those operations.
Apple is in the process of changing lower-level file handling routines, many are now marked as deprecated in 10.8, so you'll want to pick carefully what you choose to use. However at the lowest levels, system calls (manual section 2) and library functions (manual section 3), there are functions that you can use that are not being deprecated.
One option, there are others, is the function copyfile (manual section 3) which will copy a file or folder hierarchy and provides for a progress callback. That should give you most of the semantics of moveItemAtURL:toURL: along with progress, but you'll need to do more work for replaceItemAtURL:WithItemAtURL: to preserve safety (no data loss in case of error).
If that doesn't meet all your needs you can also look additionally at the low-evel stat and friends to find out file sizes etc.
HTH
I'm trying to write a password encryption function into my app, following this article.
I wrote a function that runs the CCCalibratePBKDF function and outputs the number of rounds.
const uint32_t oneSecond = 1000;
uint rounds = CCCalibratePBKDF(kCCPBKDF2,
predictedPasswordLength,
predictedSaltLength,
kCCPRFHmacAlgSHA256,
kCCKeySizeAES128,
oneSecond);
This works perfectly, but when I try to implement the next part it all goes wrong.
I can start writing the CCKeyDerivationPBKDF function call and it auto-completes the function and all the parameters. As I go through filling it in all the parameters are also auto-completed.
- (NSData *)authenticationDataForPassword: (NSString *)password salt: (NSData *)salt rounds: (uint) rounds
{
const NSString *plainData = #"Fuzzy Aliens";
uint8_t key[kCCKeySizeAES128] = {0};
int keyDerivationResult = CCKeyDerivationPBKDF(kCCPBKDF2,
[password UTF8String],
[password lengthOfBytesUsingEncoding: NSUTF8StringEncoding],
[salt bytes],
[salt length],
kCCPRFHmacAlgSHA256,
rounds,
key,
kCCKeySizeAES128);
if (keyDerivationResult == kCCParamError) {
//you shouldn't get here with the parameters as above
return nil;
}
uint8_t hmac[CC_SHA256_DIGEST_LENGTH] = {0};
CCHmac(kCCHmacAlgSHA256,
key,
kCCKeySizeAES128,
[plainData UTF8String],
[plainData lengthOfBytesUsingEncoding: NSUTF8StringEncoding],
hmac);
NSData *hmacData = [NSData dataWithBytes: hmac length: CC_SHA256_DIGEST_LENGTH];
return hmacData;
}
But as soon as I hit ; it marks an error saying "No matching function for call to 'CCKeyDerivationPBKDF'" and it won't build or anything.
I've imported CommonCrypto/CommonKeyDerivation.h and CommonCrypto/CommonCryptor.h as both of these were necessary for the enum names.
First, make sure that you haven't done anything funny with your include path (in particular, I do not recommend #HachiEthan's solution, which just confuses things). In general, leave this alone, and specifically don't add things like /usr/include to it. Make sure you've added Security.framework to your link step. This is the usual cause of problems.
The biggest thing you want to be sure of is that you're getting the iOS 5 Security.framework (rather than some other version like the OS X 10.6 or iOS 4 versions). But my suspicion is that you have a problem with your build settings.
If you want to see a framework that does all of this for reference, take a look at RNCryptor.
Right, I've found the problem (and solution).
Because I was using ZXing I had to rename the .m file to .mm so it could run the C++ stuff in the ZXing library.
I don't know why but renaming the file this way broke the CCKeyDerivationPBKDF function.
I've now moved the crypto code into it's own class and left it as .m and all I need now is to include the two imports as I did in the original post.
I didn't have to include any frameworks or anything.
I'm working with the MusicPlayer API. I understand that when you load in a .mid as a sequence, the API creates a default AUGraph for you that includes an AUSampler. This AUSampler uses a simple sine-wave based instrument to synthesize the notes in the .mid
My question is, how does one change the default instrument in the AUSampler? I understand that you can use SoundFont2 files (.sf2) and add them using the AudioUnitSetProperty method. But, how does one access this default AUGraph? Do you have to open the graph before you can edit the AudioUnit or is opening a graph only for editing connections between nodes?
Thanks :)
I've written a tutorial on this but here but here's an outline of the process:
Function to load a Sound Font file (taken from the Apple documentation):
-(OSStatus) loadFromDLSOrSoundFont: (NSURL *)bankURL withPatch: (int)presetNumber {
OSStatus result = noErr;
// fill out a bank preset data structure
AUSamplerBankPresetData bpdata;
bpdata.bankURL = (__bridge CFURLRef) bankURL;
bpdata.bankMSB = kAUSampler_DefaultMelodicBankMSB;
bpdata.bankLSB = kAUSampler_DefaultBankLSB;
bpdata.presetID = (UInt8) presetNumber;
// set the kAUSamplerProperty_LoadPresetFromBank property
result = AudioUnitSetProperty([pointer to your AUSampler node here],
kAUSamplerProperty_LoadPresetFromBank,
kAudioUnitScope_Global,
0,
&bpdata,
sizeof(bpdata));
// check for errors
NSCAssert (result == noErr,
#"Unable to set the preset property on the Sampler. Error code:%d '%.4s'",
(int) result,
(const char *)&result);
return result; }
Then you need to load the Sound Font from your Resources folder:
NSURL *presetURL = [[NSURL alloc] initFileURLWithPath:[[NSBundle mainBundle] pathForResource:#"Name of sound font" ofType:#"sf2"]];
// Initialise the sound font
[self loadFromDLSOrSoundFont: (NSURL *)presetURL withPatch: (int)10];
Hope this helps!
You might take a look at the Audiograph example. It doesn't use soundFonts but should give you an idea of how to set up a graph.
When I use the MusicPlayer I always generate the midi note data from code/GUI and create the AUGraph (with a mixer) from scratch. There are ways to derive/extract the default generated AUGraph & AUSampler resulting from loading a midi file (example code below) but I never had success setting a new soundFont this way. On the other hand, creating the AUGraph from scratch and then loading an .sf2 file works great.
AUGraph graph;
result = MusicSequenceGetAUGraph (sequence, &graph);
MusicTrack firstTrack;
result = MusicSequenceGetIndTrack (sequence, 0, &firstTrack);
AUNode myNode;
result = MusicTrackGetDestNode(firstTrack,&myNode);
AudioUnit mySamplerUnit;
result = AUGraphNodeInfo(graph, myNode, 0, &mySamplerUnit);
Context: iOS5 AUSampler AudioUnit
I've been digging around trying to determine is there is a programmatic way to determine the number of presets in a DLS or sf2 file. I was hoping it would be available either through 'AudioUnitGetProperty' or 'AudioUnitGetParameter' for an AUSampler. Then of course I want to be able to switch presets on the fly. The Docs don't indicate if this is possible or not.
I'm using the standard code for loading DLS/sf2 per TechNote TN2283. The problem is that with lots of sf2 files it is a trial and error process to find out what the presets are.
-(OSStatus) loadFromDLSOrSoundFont: (NSURL *)bankURL withPatch: (int)presetNumber
OSStatus result = noErr;
// fill out a bank preset data structure
AUSamplerBankPresetData bpdata;
bpdata.bankURL = (CFURLRef) bankURL;
bpdata.bankMSB = kAUSampler_DefaultMelodicBankMSB;
bpdata.bankLSB = kAUSampler_DefaultBankLSB;
bpdata.presetID = (UInt8) presetNumber;
// set the kAUSamplerProperty_LoadPresetFromBank property
result = AudioUnitSetProperty(self.mySamplerUnit,
kAUSamplerProperty_LoadPresetFromBank,
kAudioUnitScope_Global,
0,
&bpdata,
sizeof(bpdata));
// check for errors
NSCAssert (result == noErr,
#"Unable to set the preset property on the Sampler. Error code:%d '%.4s'",
(int) result,
(const char *)&result);
return result;
}
OK - had an answer from an Apple Core Audio engineer:
"There is no API to retrieve the number of presets. The Sampler AU only loads a single instrument at a time from any SF2 or DLS bank, so it does not "digest" the entire bank file (and so has no knowledge of its complete contents)."