Most efficient way to locate a the presence of a substring in a dictionary (NLP) - swift

I'm currently working on a speech recognition feature, where a user can say a command and have that command trigger an event.
The way I have it structured now does work, but mostly because the recognition dictionary is small. It likely will never be millions of commands, but that's no reason to be sloppy.
Here is how it works now:
#ObservedObject var speechText
private var matchables: [String:Int] = [["start the action",0], //formal
["start action",0], //informal
["star traction",0], //common misfire by NLU
["stop the action",1]] //different action
//Call processText with lowercase speechText
//when Observed string value changes
//assume text is conversational, such as "Jenny, I like chicken, also device why
//don't you start the action"
func processText(text: String) {
for (key, value) in matchables {
if text.contains(key) {
executeActionByID(value)
}
}
}
This will loop through the matchables collection and search for the contents of each key inside of the text value. This works fine on a small dictionary, but becomes cumbersome at scale.
I could theoretically break text into N-Grams and then access the dictionary directly by key, but this is long running recognition, and text might contain a substantial number of words (hundreds?) which may exceed the maximum practical size of the dictionary.
Is there a third, better way to analyze long running streams of text and quickly pick out commands that match a small substring?

Here is my back-of-the envelope thinking about this problem:
Searching for keys in a Dictionary is really fast (almost constant time). Searching strings for substrings using String.contains(_:) is slow. (Around O(n) where n is the length of the string.)
As your string length goes up and your number of keys goes up, your time to completion is going to go up by O(n*x) (n=string length, x = number of keys.)
That's likely to get slow for longer search strings, and total time will grow geometrically if both your number of keys and string length increase.
I'd suggest breaking your string into discrete units to search for (the obvious way is to divide it with spaces and other separators like punctuation.) If you do that you could check to see if each word appears in your dictionary keys. That should get you roughly O(n) time performance, since each search for a key in a dictionary runs in nearly constant time.

Related

RedPark Cable readBytesAvailable read twice every time

I have not been able to find this information anywhere. How long can a string be send with the TTL version of the redpark cable?
The following delegate method is called twice when I print something thorough serial from my Arduino, an example of a string is this: 144;480,42;532,40;20e
- (void) readBytesAvailable:(UInt32)length{
When I use the new function methods of retrieving available data [getStringFromBytesAvailable] I will only get 144;480,42;532,40; and then the whole function is called again and the string now contains the rest of the string: 20e
The following method is working for appending the two strings, but only if the rate of data transmission is 'slow' (1 time a second, I would prefer minimum 10 times a second).
-
(void) readBytesAvailable:(UInt32)length{
if(string && [string rangeOfString:#"e"].location == NSNotFound){
string = [string stringByAppendingString:[rscMgr getStringFromBytesAvailable]];
NSLog(string);
finishedReading = YES;
}
else{
string = [rscMgr getStringFromBytesAvailable];
}
if (finishedReading == YES)
{
//do stuff
}
finishedReading = NO;
string = nil;
}
}
But can you tell my why the methods is called twice if I write a "long" string, and how to avoid this issue?
Since your program fragment runs faster then the time it takes to send a string, you need to capture the bytes and append them to a string.
If the serial data is terminated with a carriage return you can test for it to know when you have received the entire string.
Then you can allow your Arduino to send 10 times a second.
That is just how serial ports work. You can't and don't need to avoid those issues. There is no attempt at any level of the SW/HW to keep your serial data stream intact, so making any assumptions about that in your code is just wrong. Serial data is just a stream of bytes, with no concept of packetization. So you have to deal with the fact that you might have to read partial data and read the rest later.
The serialPortConfig within the redparkSerial header file provided by RedPark does, in fact, give you more configuration control than you may realize. The readBytesAvailable:length method is abstracted, and is only called when one of two conditions is met: rxForwardingTimeout value is exceeded with data in the primary buffer (default set to 100 ms) or rxForwardCount is reached (default set to 16 characters).
So, in your case it looks like you've still got data in your buffer after your initial read, which means that the readBytesAvailable:length method will be called again (from the main run loop) to retrieve the remaining data. I would propose playing around with the rxForwardingTimeout and rxForwardCount until it performs as you'd expect.
As already mentioned, though, I'd recommend adding a flag (doesn't have to be carriage return) to at least the end of your packet, for identification.
Also, some good advice here: How do you design a serial command protocol for an embedded system?
Good luck!

NSString parsing of continuous data

Good morning,
I am retrieving a stream of bytes from a serial device that connects to the iPad. Once connected the supplied SDK will call a delegate method with the bytes that have been forwarded.
The readings forwarded by the serial device via the SDK are in the following format:
!X1:000.0;
Once connected to the serial device the delegated methods will start receiving data immediately - this could be in various states of completion i.e.
:000.00;
What I need to do is establish a concrete way of splitting the readings returned from the serial device so that I can manipulate the data.
Some of the tried options are:
Simply concatenate the received strings for a fixed period and then split the NSString on the ";" character. This is a little inefficient though and does not allow me to manipulate the data dynamically
-(void)receivingDelegateMethod:(NSString *)aString {
if(counter < 60){
[self.PropertyString stringByAppendingString:aString];
}else{
NSArray *readings = [self.PropertyString componentsSeparatedByString: #";"];
}
}
Determine a starting point by looking for the "!" character and then appending the resulting substring to a NSString property. All previous calls to the delegated method will append to this property and then remove the first 10 characters.
I know there are further options such as NSScanners and RegEx but I wanted to get the opinion of the community before wasting more time of different methods.
Thanks
Make a BOOL flag that indicates that the stream has been initialized, and set it to false. When you receive the next chunk of data, check the flag first. If it is not set, skip all characters until you see an exclamation point '!'. Once you see it, discard everything in front of it, and copy the rest of the string into the buffer. If the "is initialized" flag is set, append the entire string to the buffer without skipping characters.
Once you finish the append, scan the buffer for ! and ; delimited sections. For each occurrence of that pattern, call a designated method with a complete portion of the pattern. You can get fancy, and define your own "secondary" delegate for processing pre-validated strings.
You may need to detect disconnections, and set the "is initialized" flag back to NO.

insertion sort on a singly linked list

Am i right in thinking that it is not possible to perform insertion sort on a singly linked list?
My reasoning: assuming that insertion sort by definition means that, as we move to the right in the outer loop, we move to the left in the inner loop and shift values up (to the right) as required and insert our current value when done with the inner loop. As a result an SLL cannot accomodate such an algorithm. Correct?
Well, I'd sound like the Captain Obvious, but the answer mostly depends on whether you're ok with keeping all iterations directed the same way as elements are linked and still implementing the proper sorting algorithm as per your definition. I don't really want to mess around your definition of insertion sorting, so I'm afraid you'd really have to think yourself. At least for a while. It's an homework anyway... ;)
Ok, here's what I got just before closing the page. You may iterate over an SLL in reversed direction, but this would take n*n/2 traversals to visit all the n elements. So you're theoretically okay with any traversal directions for your sorting loops. Guess it pretty much solves your question.
It is doable and is an interesting problem to explore.
The core of insertion sort algorithm is creating a sorted sequence with the first one element and extending it by adding new element and keeping the sequence is still sorted until it contains all the input data.
Singly linked list can not be traversed back, but you can always start from it's head to search the position for the new element.
The tricky part is when inserting node i before node j, you must handle their neighbor relationship well(I mean both node i and j's neighbor needs to be taken care of).
Here is my code. I hope it useful for you.
int insertSort(Node **pHead)
{
Node *current1 = (*pHead)->next;
Node *pre1 =*pHead;
Node *current2= *pHead;
Node *pre2=*pHead;
while(NULL!=current1)
{
pre2=*pHead;
current2=*pHead;
while((current2->data < current1->data))
{
pre2 = current2;
current2 = current2->next;
}
if(current2 != current1)
{
pre1->next=current1->next;
if(current2==*pHead)
{
current1->next=*pHead;
*pHead = current1;
}
else
{
pre2->next = current1;
current1->next = current2;
}
current1 = pre1->next;
}
else
{
pre1 = pre1->next;
current1 = current1->next;
}
}
return 0;
}

How to store text as paragraphs in SQLite database in a iPhone app?

In my iPhone app, I have a requirement to store a huge amount of text. I have paragraphs of text to be stored in my database along with the newline characters.
What should I do to store the text as paragraphs in SQLite database?
For example, I want to store paragraphs like the ones below in:
(the mother of the faithful believers) The commencement of the Divine Inspiration to Allah's Apostle was in the form of good dreams which came true like bright day light, and then the love of seclusion was bestowed upon him. He used to go in seclusion in the cave of Hira where he used to worship (Allah alone) continuously for many days before his desire to see his family. He used to take with him the journey food for the stay and then come back to (his wife) Khadija to take his food like-wise again till suddenly the Truth descended upon him while he was in the cave of Hira. The angel came to him and asked him to read. The Prophet replied, "I do not know how to read.
The Prophet added, "The angel caught me (forcefully) and pressed me so hard that I could not bear it any more. He then released me and again asked me to read and I replied, 'I do not know how to read.'
Basically I want to save the paragraphs in database in the same format with carriage returns.
It depends on what you mean by huge and how you're planning on showing the data. The SQLite TEXT field, by default, can store 1 billion bytes.
You could in theory store all of it in a TEXT field in SQLite, then render it in a UIScrollView (or whatever it is you're using to render) and check the performance, memory usage, etc.
If the performance is unacceptable, you can try "chunking" the text into multiple rows and displaying only the records of the text required for the UI.
See the SQLite Limits document:
Maximum length of a string or BLOB
The maximum number of bytes in a string or BLOB in SQLite is defined by
the preprocessor macro
SQLITE_MAX_LENGTH. The default value
of this macro is 1 billion (1 thousand
million or 1,000,000,000). You can
raise or lower this value at
compile-time using a command-line
option like this:
-DSQLITE_MAX_LENGTH=123456789
On the face of it, SQLite doesn't treat newlines any differently than other characters; you can just store the test as-is.
The issue, though, is why are you storing large volumes of raw text in SQLite? If you want to search it or organize it somehow, SQLite (nor Core Data) is probably not the best choice without first massaging the text into some other form. Or, alternatively, you'd want to store the raw text on disk then keep some kind of searchable index in the database.
My suggestion would be if you want to display your text in a webview then add HTML tags to your text.So in that way you can add paragraphs,New lines and many other effects to your text.
Thanks
so do you want to split the text into paragraph and store each in its own row like:
(paragraph_number, text_of_paragraph)
that would be:
create table paragraphs (paragraph_number, text_of_paragraph);
then in what ever language you use split the text into a list of (pn, tp) named l and do like:
executemany("insert into paragraphs values (?, ?)", l)
or do like:
for p in l:
execute("insert into paragraphs values (?, ?)", p)
i would use HTML to represent my paragraphs (i.e)
Saving the Text
<div>
<p>(the mother of the faithful believers) The commencement of the Divine Inspiration to Allah's Apostle was in the form of good dreams which came true like bright day light, and then the love of seclusion was bestowed upon him. He used to go in seclusion in the cave of Hira where he used to worship (Allah alone) continuously for many days before his desire to see his family. He used to take with him the journey food for the stay and then come back to (his wife) Khadija to take his food like-wise again till suddenly the Truth descended upon him while he was in the cave of Hira. The angel came to him and asked him to read. The Prophet replied, "I do not know how to read.</p>
<p>The Prophet added, "The angel caught me (forcefully) and pressed me so hard that I could not bear it any more. He then released me and again asked me to read and I replied, 'I do not know how to read.</p>
</div>
Loading the Paragraphs
I would load them inside a UIWebView as html, you can save the HTML into a file in the app sandbox let's say Paragraph1.HTML load it as the following:
// this is a user defined method
-(void)loadDocument:(NSString*)documentName inView:(UIWebView*)webView
{
NSURL *url = [NSURL fileURLWithPath:sFilePath];// Path of the HTML File
NSURLRequest *request = [NSURLRequest requestWithURL:url];
[web loadRequest:request];
}
dispose the File after loading it, this will save you much time and space.
Good luck.

unable to get simple program logic

i m making an iphone app in which when same person calls you and u dont pick the phone then the a sound will be played when same user calls u more than 4 times ,now when a call is incoming i am storing its callid in a string or whatever but my problem is i cant find logic to check that the same user has called four times or more??
Use an NSDictionary (a form of hash database). If the current callers name isn't there as the key, add it, and set the value to be count of 1. If the callers name exists as a key in the dictionary, increment the count value by 1. After that, read the count value and do whatever you want depending on the comparison against 4.
But getting the callers name may require some sort of non-stock OS on an iPhone.
Hm.
Loop through an array of received calls.
Instead of storing the callerId in a string, store it in an array called receievedCalls.
During each incoming call, loop through the array (foreach loop?), looking for the callerId of the current caller.
foreach (receivedCalls as $key => $value) {
if ($value == $callerId) {
count++;
}
if (count >= 4) {
(play sound)
}
}
Probably flawed logic but meh. Again, I haven't worked with iPhone apps before so I don't know what kind of language it uses.
No API for that, Sorry.