How to store text as paragraphs in SQLite database in a iPhone app? - iphone

In my iPhone app, I have a requirement to store a huge amount of text. I have paragraphs of text to be stored in my database along with the newline characters.
What should I do to store the text as paragraphs in SQLite database?
For example, I want to store paragraphs like the ones below in:
(the mother of the faithful believers) The commencement of the Divine Inspiration to Allah's Apostle was in the form of good dreams which came true like bright day light, and then the love of seclusion was bestowed upon him. He used to go in seclusion in the cave of Hira where he used to worship (Allah alone) continuously for many days before his desire to see his family. He used to take with him the journey food for the stay and then come back to (his wife) Khadija to take his food like-wise again till suddenly the Truth descended upon him while he was in the cave of Hira. The angel came to him and asked him to read. The Prophet replied, "I do not know how to read.
The Prophet added, "The angel caught me (forcefully) and pressed me so hard that I could not bear it any more. He then released me and again asked me to read and I replied, 'I do not know how to read.'
Basically I want to save the paragraphs in database in the same format with carriage returns.

It depends on what you mean by huge and how you're planning on showing the data. The SQLite TEXT field, by default, can store 1 billion bytes.
You could in theory store all of it in a TEXT field in SQLite, then render it in a UIScrollView (or whatever it is you're using to render) and check the performance, memory usage, etc.
If the performance is unacceptable, you can try "chunking" the text into multiple rows and displaying only the records of the text required for the UI.
See the SQLite Limits document:
Maximum length of a string or BLOB
The maximum number of bytes in a string or BLOB in SQLite is defined by
the preprocessor macro
SQLITE_MAX_LENGTH. The default value
of this macro is 1 billion (1 thousand
million or 1,000,000,000). You can
raise or lower this value at
compile-time using a command-line
option like this:
-DSQLITE_MAX_LENGTH=123456789

On the face of it, SQLite doesn't treat newlines any differently than other characters; you can just store the test as-is.
The issue, though, is why are you storing large volumes of raw text in SQLite? If you want to search it or organize it somehow, SQLite (nor Core Data) is probably not the best choice without first massaging the text into some other form. Or, alternatively, you'd want to store the raw text on disk then keep some kind of searchable index in the database.

My suggestion would be if you want to display your text in a webview then add HTML tags to your text.So in that way you can add paragraphs,New lines and many other effects to your text.
Thanks

so do you want to split the text into paragraph and store each in its own row like:
(paragraph_number, text_of_paragraph)
that would be:
create table paragraphs (paragraph_number, text_of_paragraph);
then in what ever language you use split the text into a list of (pn, tp) named l and do like:
executemany("insert into paragraphs values (?, ?)", l)
or do like:
for p in l:
execute("insert into paragraphs values (?, ?)", p)

i would use HTML to represent my paragraphs (i.e)
Saving the Text
<div>
<p>(the mother of the faithful believers) The commencement of the Divine Inspiration to Allah's Apostle was in the form of good dreams which came true like bright day light, and then the love of seclusion was bestowed upon him. He used to go in seclusion in the cave of Hira where he used to worship (Allah alone) continuously for many days before his desire to see his family. He used to take with him the journey food for the stay and then come back to (his wife) Khadija to take his food like-wise again till suddenly the Truth descended upon him while he was in the cave of Hira. The angel came to him and asked him to read. The Prophet replied, "I do not know how to read.</p>
<p>The Prophet added, "The angel caught me (forcefully) and pressed me so hard that I could not bear it any more. He then released me and again asked me to read and I replied, 'I do not know how to read.</p>
</div>
Loading the Paragraphs
I would load them inside a UIWebView as html, you can save the HTML into a file in the app sandbox let's say Paragraph1.HTML load it as the following:
// this is a user defined method
-(void)loadDocument:(NSString*)documentName inView:(UIWebView*)webView
{
NSURL *url = [NSURL fileURLWithPath:sFilePath];// Path of the HTML File
NSURLRequest *request = [NSURLRequest requestWithURL:url];
[web loadRequest:request];
}
dispose the File after loading it, this will save you much time and space.
Good luck.

Related

Is it possible to reorder and recombine fragments in fmp4?

I have fragmented mp4 which I want to send users through HLS. It's ok if I just send it as is. But I need opportunity to reorder fragments in this video.
For example initial video, which looks like this:
original video format
I want reorganize fragments and get this:
expected video format
I try make it locally, and it's work in VLS player (HLS). For this I modified sequence number for fragments in moof (mfhd). But when I try play it remotely (HLS) it does not work. I think, that some players (js) expect some additional information from each fragment, probably for example time offset. But I can not find which atom (box) contain this information. I spent a lot of time searching and I'm still at the very beginning of the problem.
I tried to modify the fragment sequence number, but it doesn't work.
The "Track Fragment Media Decode Time Box" (tfdt) stores the baseMediaDecodeTime which is the accumulative decode time.
Consider the following...
baseMediaDecodeTime must increase monotonically for each chunk.
This means you must update (replace) the tfdt entry of chunk with expected next tftd entry.
When you naively reorder the chunks, the baseMediaDecodeTime will be invalid.
The "Track Fragment Media Decode Time Box" (tfdt) is located inside each moof header at:
moof --> traf --> tfdt

MessageSummaryItems.PreviewText Clarification

We're making use of the newly added MessageSummaryItems.PreviewText feature. Thank you!!
On issue is: sometimes the PreviewText contains HTML links? From reading through the source I see this in ImapFolderFetch.cs
var body = message.TextBody ?? message.HtmlBody;
So this is saying: use the Plaintext version, if it exists, then use the HTML version?
Therefore if I see links in the preview, I can assume no Plaintext version is available?
Our problem with this is:
If our message only has an HTML version, We could strip the links from the message in our code, but there are only 256 characters of it. In many cases, there will be nothing left to display.
As per your TODO: Using the CONVERT extension would be a better approach but, as far as I can tell its not supported by Gmail?
A fall back would be:
If we could set the preview length for both HTML and Plaintext individually, then we could say, If you only have an HTML version give me 1K of it and i'll strip out the links on the client.
Thoughts?
Very few IMAP servers support the CONVERT extension which is the main reason I didn't implement it.
The PreviewText feature is an attempt at adding a convenience feature to fetch the first 256 bytes of each message body in batched requests in order to minimize latency, but no matter what I do, it's not guaranteed to be useful (since there could be a ton of markup before any real text is included in HTML).
If I were to split text and html messages into 2 different batches so that I could request different sizes for each, then it would be less efficient and might take significantly longer to fetch, so I'm not sure if it's really worth it. The less I'm able to batch at a time, the less useful the feature becomes compared to implementing your own loop over the list of messages and downloading your own specified chunk size. one message at a time.
My suggestion would be to use the PreviewText feature and for the rare messages where the 256 bytes isn't enough, perform a folder.GetStream() on them.

Marytts HMM voice quality changes with text length

I am using MaryTTS as a text to speech engine inside a Grails Application.
During app testing I found out that the language quality drastically changes (for the worst) with increasing text length when using a HMM voice.
So naturally I tested via the MARY Web Client while tweeking all HMM relevant parameters (F0Add, F0Scale and Rate) as well as removing them or leaving the default values, but to no success.
The voice I am using is bits1-hsmm:5.2 (German Female)
gradle dependency:
compile "de.dfki.mary:voice-bits1-hsmm:5.2"
The code is as simple as:
def marytts = new LocalMaryInterface()
marytts.locale = Locale.GERMAN
marytts.generateAudio text
Everything works fine up to the point where the text to convert goes over 120 characters (not only in the code but also via the Mary Web Client)
Here the text I used for the last tests:
Baumaßnahmen im Mai und Oktober Notwendige Instandhaltungsarbeiten an der Münchner S-Bahn-Stammstrecke sollen von nun an gebündelt stattfinden. Die Bahn möchte dadurch die baubedingten Fahrplaneinschränkungen durch gesperrte Gleise geringer halten.
To see the difference in quality use a part of the text (first couple words) vs the whole.
Another important point: This does not occur when using a Unit Selection voice .
Am I missing something like a configuration or specific parameter set or is this the standard behaviour of HMM voices inside MaryTTS?
It will be great to be able to use this voice with decent quality, since Unit Selection voices are not available as standalone dependencies and having to split the text in smaller parts and play them sequentially is not really something I would consider.
Any input is appreciated.
Update
Further trial and error showed that the robotic background sound is added when the text contains punctoation marks such as . , : ; [ ] { }. Independent of text length! Not really sure what the root cause is but atleast with a text manipulation before the conversion the voice is useable.

Remove PdfImageOject from a PDF

I have 1000th of PDF generated from emails containing .png (I am not owner of the generator). For some reasons, those PDF are very very slow to render with the Imaging system I am using (I am not the developer of that system and may not change it).
If I use iTextSharp and implement a IRenderListener to count the Images to be rendered, there are thousands per page (99% being 1 or 2 pixels only). But if I count the Images in the resources of the PDF, there are only a few (~tens).
I am counting the images in the resources, per page, with the code here after
var dict = pdfReader.GetPageN(currentPage)
PdfDictionary res = (PdfDictionary)PdfReader.GetPdfObject(dict.Get(PdfName.RESOURCES));
PdfDictionary xobj = (PdfDictionary)PdfReader.GetPdfObject(res.Get(PdfName.XOBJECT));
if (xobj != null)
{
foreach (PdfName name in xobj.Keys)
{
PdfObject obj = xobj.Get(name);
if ((obj.IsIndirect()))
{
PdfDictionary tg = (PdfDictionary)PdfReader.GetPdfObject(obj);
PdfName subtype = (PdfName)PdfReader.GetPdfObject(tg.Get(PdfName.SUBTYPE));
if (PdfName.IMAGE.Equals(subtype))
{
Count++
And my IRenderListener looks like this:
class ImageRenderListener : IRenderListener
{
public void RenderImage(iTextSharp.text.pdf.parser.ImageRenderInfo renderInfo)
{
PdfImageObject image = renderInfo.GetImage();
if (image == null) return;
var refObj = renderInfo.GetRef();
if (refObj == null)
Count++; // but why no ref ??
else
Count++;
}
I just started to learn about PDF specification and iTextSharp this evening, to analyze my PDF and understand what could be wrong... if I am correct, I see that many images to be rendered that are not referencing a resource (refObj == null) and that they are .png (image.streamContentType.FileExtension = "png"). So, I think those are the images making the rendering so slow...
For testing purpose, I would like to delete those images from the PDF but don't find how to proceed.
I only found code samples to remove image that are in the resources... but the images I want to delete are not :/
Is there any code sample somewhere to help me ? I did google on "iTextSharp remove object", etc... but there was nothing similar to my case :(
Let me start with the blunt observation that you have a shitty PDF.
The image you see when opening the PDF in a PDF viewer seems to be composed of several small 1- or 2-pixel images. The drawing operations to show these pixels one by one is suboptimal, no matter which imaging system you use: you are faced with a bad PDF.
In your first snippet, I see that you loop over all of the indirect objects stored in the the XObject resources of each page in search of images. You count these images, resulting in a number of Image XObjects stored in the PDF. If you add up all the Count values for all the pages, this number can be higher than the actual number of Image XObject stored in the PDF as you don't take into account that some images can be reused on different pages.
You do not count the inline images that are stored in the content streams. I'm biased. In the ISO committees for PDF, I'm on the side of the group of people saying that "inline images are evil" and "inline images should die". For now, we didn't succeed in getting rid of inline images, but we introduced some substantial limitations that should reduce the (ab)use of inline images in PDF that conform to ISO-32000-2 (the PDF 2.0 spec that is due in 2016).
You've already discovered that your PDF has inline images. Those are the images where refObj == null. They are not stored as indirect objects; they are stored inline, in the content stream of the page. As you can imagine based on my feelings towards inline images, I consider your PDF being a bad PDF for this reason (although it does conform to ISO-32000-1).
The presence of inline images is a first explanation why you have a different image count: when you loop over the indirect objects you only find part of the images. When you parse the document for images, you also find the inline images.
A second explanation could be the fact that the Image XObject are used more than once. That's the whole point of not using inline images. For instance: if you have an image that represents a logo that needs to be repeated on every page, one could use inline images. That would be a bad idea: the same image bytes would be present in the PDF as many times as there are pages. One should use an Image XObject. In this case, the image bytes of the logo are stored only once in an indirect object. There's a reference to this object from every page, so that the image bytes are stored in the document only once. In a 10-page document, you can see 10 identical images on 10 pages, but when looking inside the document, you'll find only one image that is referenced from every page.
If you remove Image XObjects by removing the indirect objects containing the image stream objects, you have to be very careful: are you sure you're not corrupting your document? Because there's a reference to the Image XObject in the content stream of your page. This reference points to an entry in the /XObjects entry of the page's /Resources. This /XObject references to the stream object with the image bytes. If you remove that indirect object without removing the references (e.g. from the content stream), you break your PDF. Some viewers will ignore those errors, but at some point in time some tool (or some body) is going to complain that your PDF is corrupt.
If you want to remove inline images, you have to parse all the content streams in your PDF: page content streams as well as Form XObject content streams. You have to rewrite all these streams and make sure all inline images are removed. That is: all objects that that start with the BI operator (Begin Image) and end with the EI operator (End Image).
That's a task for a PDF specialist who knows both iTextSharp and ISO-32000-1 inside-out. The solution to your problem probably doesn't fit into an answering window on StackOverflow.
I'm the original author of iText. From a certain point of view, iText is like a sharp knife. A sharp knife is a very good tool that can be used for many good things. However, you can also seriously cut your fingers when you're not using the knife in a correct way. I hope you'll be careful and that you're not going to create a whole series of damaged PDF files.
For instance: you assume that some of the files in the PDF are PNGs because iText suggests to store them as PNGs. However: PNG is not supported by ISO-32000-1, so your assumption that your PDF contains PNGs is wrong. I honestly worry when I see questions like yours.

Loading text from a file

I am making an Iphone drinking card game app.
All the card mean something different and i want the user to be able to press an info button and then show a new screen with information about the current card. How can i make a document to load text from instead of using a bunch og long strings?
Thanks
You could look into plist files - they can be loaded quite easily into the various collection objects and edited with the plist editor in Xcode.
For instance, if you organize your data as a dictionary, the convenience constructor
+ (id)dictionaryWithContentsOfURL:(NSURL *)aURL
from NSDictionary would provide you with as many easily accessible strings as you need.
This method is useful if you consider your strings primarily data as opposed to UI elements.
Update:
As #Alex Nichol suggested, here is how you can do it in practice:
To create a plist file:
In your Xcode project, for instance in the Supporting Files group, select New File > Resource > Property List
You can save the file in en.lproj, to aid in localization
In the Property list editing pane, select Add Row (or just hit return)
Enter a key name (for instance user1) and a value (for instance "Joe")
To read the contents:
NSURL *plistURL = [[NSBundle mainBundle] URLForResource:#"Property List" withExtension:#"plist"];
NSLog(#"URL: %#", plistURL);
NSDictionary *strings = [NSDictionary dictionaryWithContentsOfURL:plistURL];
NSString *user1 = [strings objectForKey:#"user1"];
NSLog(#"User 1: %#", user1);
A plist, a JSON string, and an SQLite database walked into a bar ...
Oops!! I mean those are the three most obvious alternatives. The JSON string is probably the easiest to create and "transport", though it's most practical to load the entire thing into an NSDictionary and/or NSArray, vs read from the file as each string is accessed.
The SQLite DB is the most general, and most speed/storage efficient for a very large number (thousands) of strings, but it takes some effort to set it up.
In my other answer, I suggest the use of a dictionary if your texts are mostly to be considered as data. However, if your strings are UI elements (alert texts, window titles, etc.) you might want to look into strings files and NSBundle's support for them.
Strings files are ideally suited for localization, the format is explained here.
To read them into you app, use something like this:
NSString *text1 = NSLocalizedStringFromTable(#"TEXT1", #"myStringsFile", #"Comment");
If you call your file Localizable.strings, you can even use a simpler form:
NSString *str1 = NSLocalizedString(#"String1", #"Comment on String1");
A useful discussion here - a bit old, but still useful.