Recomendations for Light Summary Algorithm on ionic - ionic-framework

I'm trying to upload big texts to my database, on this case cant be text files because are generated dynamically on each use of the app. And this can create like hundreds of files if I use that method.
The main problem is that MySQL has a maximum number of characters quantity allowed.
I saw that some people recommend MD5 or SHA to reduce big text to a short string. But this only works in one way.
Is there another way to do this, or with other Light Summary Algorithm?

Related

Word database for iOS custom keyboard

I am writing a custom keyboard for an Asian language and I have a word database with over half a million words. I use Realm for now and use it to give word suggestions. When users type the first few letters keyboard will search the DB and provide words based on priority values given to each word. But this seems inefficient compared to other keyboards in the App Store, I can't find any concrete way or idea on this issue. Anyone can point me in the direction to increase the efficiency of word searching with a custom iOS keyboard.
I haven't tried CoreData but generally, the realm is considered faster than CoreData.
First, type of storage: maybe consider using plist files or .text files wouldn't be bad.
and saving words in a sorted way in ASCII mode would be great.
Second: you need an algorithm to break into a group of words so fast. you can do this by saving the ASCII code.
Here is an example of a binary search algorithm :
Please search around about different algorithms.

How do you generate a CAD geometry of randomly oriented objects?

How can one generate CAD geometries of randomly oriented and randomly sized objects (3D)? I need to model randomly sized and randomly oriented rectangles--thousands to millions of them.
I have not yet come across any CAD tools that have =rand() functions that can be inputted into dimensions. Is one way perhaps to have a CAD program import a CSV file of these randomly generated parameter values?
In SolidWorks, you can have model parameters (dimension lengths/angles, constraints, etc.) stored in an Excel spreadsheet called a Design Table. Each row in the spreadsheet will represent a different configuration of your model, and each column a different parameter. You can use Excel's built-in capabilities or an export-capable tool of your choosing to generate the configurations according to your desired distribution. I don't recall off the top of my head the easiest way to get a large number of instances with different configurations into the same assembly, but you haven't really told us what you're trying to accomplish so I can't give you specific recommendations anyways.
If you have a specific CAD tool then you can often find documentation on the internal file format. With a little experimentation you can sometimes write a small external program that will generate the header of the CAD file and then loop thousands or millions of times generating each individual object. Finally you generate the lines needed to complete the file. That can sometimes be easier than trying to force a tool to do something the designers never expected. And this might let you use the software of your choice to generate the file.
I would suggest starting small. Use the CAD tool to create a file with two or three of your rectangles. Save and inspect the contents of the file to see that it matches your understanding of the needed format. Then try externally creating what should be the same file and verify your version is correctly accepted.
You might consider that some tool designers never expected someone to want thousands or millions of anything. I would suggest sneaking up on the problem. Try doubling the number of items, check this works as expected and then repeat this process again and again until either you successfully get to millions or until you find the CAD tool won't be able to handle this.

Teaching OCR to understand NSA and FISC redactions

I'm maintaining an archive of the heavily redacted documents coming out of the Foreign Intelligence Surveillance Court.
They come with big sections of text that look like this:
And when the OCR tries to work with this, you get text like:
production of this data on a daily basis for a period of 90 days. The sole purpose of this
production is to obtain foreign intelligence information in support of
individual authorized investigations to protect against international terrorism and
So in the OCRed version, where there are blacked out spots, there are just missing words. Sometimes, the missing words create a grammatically correct sentence with a different/weird meaning (like above). Other times, the resulting sentences make no sense, but either way it's a problem. It would be much better if the OCR engine could return X's for these spots or Unicode squares like ▮▮▮▮ instead.
The result I'd like is something like:
production of this data on a daily basis for a period of 90 days. The sole purpose of this
production is to obtain foreign intelligence information in support of XXXXXXXXXXX
individual authorized investigations to protect against international terrorism and
My question is how to go about getting these X's. Is there a way to analyze the images to identify the black spots? Is there a way to replace them with X's or some better unicode character? I'm open to any ideas to make this look right, but image editing is not a strong suit for me nor is hacking deep within the OCR engine.
You may want to train Tesseract for those long blobs. Depending on the length of the blob, you would assign a different number of 'X' characters. Read TrainingTesseract3 for training process.

Huge dictionary with random selections: iPhone Dev

i am making an app that will have a very large dictionary of words i choose (so that the words aren't too complicated) and i want it to randomly choose the words. I dont have a problem with the randomly selecting words, but what would be the best way to store all these words, and how? I feel like using an NSMutable array would take up too much memory creating thousands of objects, so what else can i use... Thanks for you help
Core data!, is your best option, or to manage your own SQLite
check a core data tutorial
or a SQLite on iOS tutorial
If all all the application needs to do is access words at random (so no key based queries, or updates), an alternative to core data and SQLite would be to just fseek() to a random location in a flat text file of newline delimited words and then read out the next complete word, possibly with fscanf(dict,"%s\n%s\n",partial_word,full_word).
Deal with EOF by retrying with a different random number, or limit the fseek() range to never hit last word in file.
An issue with the above outline is words won't be uniformly selected. There is a bias towards words following long words. Discarding strlen(partial_word) (or a larger random number) of words before keeping a word might help the distribution if it is a concern.

How can I create a web page that shows aggregate data from Sawtooth surveys?

I'm guessing this won't apply to 99.99% of anyone that sees this. I've been doing some Sawtooth survey programming at work and I've been needing to create a webpage that shows some aggregate data from the completed surveys. I was just wondering if anyone else has done this using the flat files that Sawtooth generates and how you went about doing it. I only know very basic Perl and the server I use does not have PHP so I'm somewhat at a loss for solutions. Anything you've got would be helpful.
Edit: The problem with offering example files is that it's more complicated. It's not a single file and it occasionally gets moved to a different file with a different format. The complexities added in there are why I ask this question.
Doesn't Sawtooth export into CSV format? There are many Perl parsers for CSV files. Just about every language has a CSV parser or two (or twelve), and MS Excel can open them directly, and they're still plaintext so you can look at them in any text editor.
I know our version of Sawtooth at work (which is admittedly very old) exports Sawtooth data into SPSS format, which can then be exported into various spreadsheet formats including CSV, if all else fails.
If you have a flat (fixed-width field) file, you can easily parse it in Perl using regular expressions or just taking substrings of each line one at a time, assuming you know the width of the fields. Your question is too general to give much better advice, sorry.
Matching the values up from a plaintext file with meta-data (variable names and labels, value labels etc.) is more complicated unless you already have the meta-data in some script-readable format. Making all of that stuff available on a web page is more complicated still. I've done it and it can be a bit of a lengthy project to roll your own. There are packages you can buy, like SDA, which will help you build a website where people can browse and download your survey data and view your codebooks.
Honestly though the easiest thing to do if you're posting statistical data on a website is get the data into SPSS or SAS or another statistics package format and post those files for download directly. Then you don't have to worry about it.