Full text search in iOS application bundle files - iphone

How would I go about doing a full text search across a bunch of html files in an iOS application bundle? I need to have a lot of web content available offline and I need to be able to do a full text search across all of it. I feel like storing that content in a database will get a little bulky and slow things down significantly.
Thoughts?
--
Thanks,
Brandon

Do keep in mind that, even though it's a mobile device, the iPhone does have a fairly fast processor. All you're doing [said tongue-in-cheek] is searching text. These devices are built for video, music, and images.
In order of increasing amounts of coding on your part, you can:
Use a 3rd-party library (the only one I found is Locayta)
Port a non-Objective C search library (such as Lucene)
Store your HTML files in a database, and do a SQL search for any search text. Without a bunch of extra coding, this will really only work easily with exact matching on a single search term.
Manually load each file into the app and search the text using NSString's rangeOfString. Add features at will.
Build your own search engine, built off (at least) an index of terms. Start by trimming stop words, stem if you feel the need, decide if you want document relevance or not and possibly pursue vector space algorithms. Adding in autocorrect is probably beyond your app's focus, but nonetheless building your own text search engine would be a fast (run-time) way of providing multi-word search and a big bullet on your resume.

Related

Eclipse BIRT and Accessibility

I work for a large financial institution and all our web sites need to be accessible for people with disabilities. We are using Eclipse BIRT to generate some reports and I want to explore if anyone out there has any experience in making the reports accessible. The main problem is that there is a lot of data in the report and some of it may not be understandable by the value alone, for instance a string like "123444" may be an account number, a check number or a transaction id. In a pure HTML page we will either use a dl/dt/dd construct to make it clear the purpose of the date, or we use ARIA attributes like aria-labelledby.
Another area of concern is the creation of accessible PDF files.
Any help or report on experience will be greatly appreciated.
Given your description I presume that you are focusing on blind users. One of the most popular screen readers for English language use is JAWS by Freedom Scientific There is a free trial version which you can download for testing and/or your organization can purchase a copy.
You can read your report with JAWS and find what issues need to be addressed. Proper labeling conventions for buttons and such is probably the most overlooked by developers. (i.e. Button123 with Image1A is the submit button, JAWS can't read the picture of words in the image, soo...)
Speaking from experience (I am closely associated with a blind computer user) stay away from PDF if you want it to be blind friendly. Web pages and text documents are much more blind friendly.
PDF works to create a version of a document that is static for visual appeal. In the process is chops up the text, when JAWS tries to read it it will read half of one item, then half of another, than maybe jump back to finish the last 1/3 of the first, leaving the middle 1/6 for last, it is painful. Of course a PDF that does not have text layer (i.e. a picture of a word document) is not readable by any screen reader.

Is there a way to convert epub format to images?

I need a tool to programmatically convert epub files to a series of images. The output should look like screenshots taken on a canonical device (for this application, an iPad). I haven't been able to find any tools that do something like this.
So what I'd really like (1) is a tool that does that. But assuming that I'm correct that no such tool exists, is there (2) a library (preferably a Perl module, but I'm not that picky) that will read and render ePub?
Obviously, rolling my own I could combine tools for unzipping, reading html, reading xml, putting everything in the right order, and rendering html within certain constraints. Though I'd rather not do that, and if that's the only option I'll have to go on to look for a tool that will do the last part of that or I'll have to create that too.
Any leads on (1), or failing that (2)?
Apologies if what I'm about to type is just crazy-talk on my part--in fact, I'm pretty sure it is--but perhaps something like this might work and I'm kind of interested in knowing how well it might work for you:
Use Frank (https://github.com/moredip/Frank) to control the iOS Simulator on a Mac. Program it to open up the EPUB docs you need.
All you need then is something to automate the taking of the screen shots. Obviously, these will look like the EPUBs are being rendered in an iPad (or an iPhone if you wish--the iOS Simulator does both, of course).
Automating the screenshots can probably be done with AppleScript, although the hard part might be getting it to talk to Frank. Worst case, you can tell Frank to pause for 5 seconds after it loads each page and tell AppleScript to take a screen shot every five seconds. That sucks, but if you're desperate, it will get it done. It's also possible Frank can somehow make the screenshots happen--I haven't used it enough to know.
Pandoc can convert from EPub to LaTeX (and therefore to PDF) or to any number of other formats. Conceptually this should be a type (1) solution.
depends on your definition of "look like" - do you want the user-chrome or just the epub rendering for a given screen size.
I would check out the various epub readers for your platform of choice, size the window to your preferred dimensions, and then just "print" the epub to a virtual printer that outputs to image files - on windoze I use imageprint.
You could easily make a "frame" from an iPad product shot and place your screenshots within that - only thing missing would be as I said the user chrome.

How can I edit PDF files in an iOS application?

In my iPhone / iPad application, I show a person's medical reports in the form of a PDF. I have saved the reports in the documents directory and am reading them from there.
I want the user to be able to add or edit comments on these PDFs, as well as be able to highlight certain sections in the PDF. After editing, the application should be able to save the PDF back into the documents directory.
Is this possible within an iOS application? If so, how? Is this a task for Core Graphics?
Editing PDF directly on iPad/iPhone is a rather big job because the standard API only supports showing it (and only a bit more.) If you want to do anything more, you need to invest a huge amount of time to implement generic pdf handling code.
There is an open-source library handling these, e.g. this one. I don't know if it fits your needs, though.
A better idea, in my opinion, is to create a native UI showing the data contained in the PDF file using the standard Cocoa-Touch UIKit and create the PDF once the user is done with it so that the user can export it back. That way, you don't have to write a complicated PDF handling code.
In any case, it's not a good idea to show generic PDF on iPhone, because the screen size is so small (iPad is a different question, especially if you expect the user to be familiar with the particular format of your pdf.). A dedicated UI would be much better.

Product idea/approach : Folder based disk organization

Sweet..I bought myself a 1TB portable harddrives this week. Don't you just love how much data you could store on one of these disks? The fact that I could store my bluray rips on to my portable harddisk and that my lg lcd tv can do HD rips right from the drive - that's amazing practicality right there! However, life it seems, is never so simple. I have 100s of movies unorganized in one huge folder, which is exactly what I needed to annoy myself while browsing the same on my tv to play a single movie. That got me thinking...
What if I had an automated way to organize movies into folders such that my folder-browsing-on-a-lcd-tv-or-a-comp would make my life a little easy?
I started thinking about this... I browsed a little in this context and I realized that if only I could "tag my movies somehow and create folders on-the-fly based on tags using hardlinks", I would have addressed my problem. I googled a bit to find software that works in the above fashion, only to find none.
A few more days of serious thought (as you know by now.. I think a lot.. and I guess this question is starting to sound like a blog rant/post of sorts...), in the interest of humanity, I thought I should come up with a generic way to address this: What if someone wanted to organize photos... organize music.. organize software?!
Turned my grey cells off for a while and here is an approach I came up with to solving my what-if scenario.
Tag / Group tag individual files (rely on a slick GUI to do it fast and do it good) - Adobe Flex/Eclipse RCP to do this?
Create hardlinks to each of the tagged files.
The first point is self-explanatory. The second (coz I am talking windows here), refers to making use of mklink.exe.
Consider a scenario where I have 2 movie files: I have a movie file "Transformers.avi" tagged as "english, action, bluray, sci-fi, imdb-top-50, must-watch-with-kids" and another movie file "The Specialist.avi" tagged as "english, bluray, thriller, adult". Here are a few of the possible locations I want to see my Transformers to be found:
[root directory]->all-tags->english
[root directory]->all-tags->bluray
[root directory]->all-tags->english->all-tags->bluray
[root directory]->all-tags->bluray->all-tags->action
[root direcotry]->all-tags->english->all-tags->action->bluray->all-tags->imdb-top-50
Given that windows has a limit of 1024 hardlinks to a single file, I probably would be allowed 7 unique tags per file. Each sub-folder will have an "all-tags" folder. Having it named "all-tags" makes it more accessible when order by name.
I believe this approach when automated to let you configure tags you want and where the hardlinks are created for you, helps you organize stuff effectively.
I don't know if there are better things out there. I would like your inputs on this approach and other possible ideas. I would like to gather inputs here and release something to sourceforge for everyone to use in a couple of weeks. I am sure, I can count on your positive response as always.
I believe hardlinks are not a good approach. Reason? A standalone player won't play them, and I wouldn't like a program who's made for tagging to tell me to stop making so many tags because of a Windows limitation on hardlinks (remembering each tag will increment the number of links exponentially).
Plus, "help" is not a good tag.
And I've had an idea once that I'm still planning to make some day to sort my own files - put the files in a big storage each below a GUID foldername (filename untouched) and store metadata in a sqlite database to be used by a smart file browser.
I was considering doing something similar to this with music for detecting duplicate songs and auto-organize funcationality.
For your application, I wouldn't recommend using any shell programs through Java. Exception handling becomes difficult, and your application becomes bound by the shell interface and implementation (i.e. windows versions or installations affect your application behavior).
I would use a database with a few tables: Files, Tags, and an association table.
The Files table would list the physical location of each file, the filename, and a unique identifier. This way, you can maintain information about each file without having to modify it for every tag association.
The Tags table would list each tag, and any metadata you want to store for each tag.
A third table, maybe 'FileTags' would store the assocation between tags and files. When adding tags to the stack, you would add a statement to the WHERE clause, and the list of files with all of the tags would be returned. This structure would also allow open your codebase up to other designs, such as include/exclude (autocomplete with X buttons), or possibly search.
If implemented in Java, your app would be platform independent, and would allow a very large number of tags and files. You can then use the system default application for opening the media file, and the user can make the selection in their native OS.
Reiser4?
...
(I mean nevermind Hans, but the tech...)
[disclaimer: Not a hacker. I know nothing of programming/coding, never mind filesystems & databases. I can barely code decent HTML even, if at all. Hey y'all! :D]
[footnote: does plain HTML5 work here? Too lazy to close my tags hehe :p]

Is there an alternative to open-xml sdk to generate word documents

I'm trying to generate word documents using open xml sdk. When the documents are small this is no problem (and rather easy). When the documents become larger (+500 pages) I notice the peformance (duration, memory usage, ...) goes down significantly.
Googling this problem I came across some posts that point out the same problem. For excel there is a solution with spreadsheetgear.
I would like to know if there is a word alternative to this or if there are other solutions to generate word documents?
Thanks,
Jelle
I've written a blog post series on generating Open XML WordprocessingML documents. The approach that I take is that you create a template Word document, insert content controls, and then write XPath expressions in those content controls to specify the XML to pull from a source XML data file. I've also explored another approach where you write C# code in Open XML content controls. That approach also works.
http://ericwhite.com/blog/map/generating-open-xml-wordprocessingml-documents-blog-post-series/
-Eric
You might look at http://docx.codeplex.com/
On Java, you could use docx4j. If you were brave, you could create DLLs for it via IKVM...
I decided to go with Aspose Words. It is really fast and not very demanding on resources (CPU, memory). It has the disadvantage that it is quite expensive. I also investigated Softartisans Office writer. The posibilities are the same but due to fact that the company I'm currently working for already used other Aspose components we decided to go with Aspose Word.