How best to store HTML alongside strings in Cloud Storage - google-cloud-storage

I have a collection data of, and in each case there is chunk of HTML and a few strings, for example
html: <div>html...</div>, name string: html chunk 1, date string: 01-01-1999, location string: London, UK. I would like to store this information together as a single cloud storage object. Specifically, I am using Google Cloud Storage. There are two ways I can think of doing this. One is to store the strings as custom metadata, and the HTML as the actual file contents. The other is to store all the information as JSON file, with the HTML as a base64 encoded string.
I want to avoid a situation where after having stored a lot of data, I find there is some limitation to the approach I am using. What is the proper way to do this - is either of these approaches bad practice? Assuming there is no problem with either, I would go with the JSON approach because it is easier to pass around all the data together as a file.

There isn't a specific right way to do what you're talking about, there are potential pitfalls and performance criteria but they depend on what you're doing with the data and why. Do you ever need access to the metadata for queries? You won't be able to efficiently do that if you pack everything into one variable as a JSON object. What are you parsing the data with later? does it have built in support for JSON? Does it support something else? Is speed a consideration? Is cloud storage space a consideration? Does a user have the ability to input the html and could they potentially perform some sort of attack? How do you use the data when you retrieve it? How stable is the format of the data? You could use JSON, ProtocolBuffers, packed binary blobs in a length | value based format, base64 with a delimiter, zip files turned into binary blobs, do what suits your application and allows a clean structured design that you can test and maintain.

Related

Using the CBOR format instead of JSON in elasticsearch ingest plugin

In the documentation of Ingest Attachment Processor Plugin in Elasticsearch, it is mentioned, "If you do not want to incur the overhead of converting back and forth between base64, you can use the CBOR format instead of JSON and specify the field as a bytes array instead of a string representation. The processor will skip the base64 decoding then." Could anyone please throw some light on this or maybe share an example of how to achieve this? I need to index a very large number of documents having a significant size. So I need to minimise the latency.

how to store image in postgres using Flask model

I am trying to store an image using flask model. I don't know how to store the image in postgres, so I have encoded the image to base64 and I am trying to store that resulting text in postgres. It is working but is there any recommended way to store that encoded text or the image in postgres using flask model
class User_tbl(db.Model):
id = db.Column(db.Integer,primary_key=True)
mobile=db.Column(db.String(13),unique=True)
country=db.Column(db.String(30))
image=db.Column(db.String(256))
def __init__(self,mobile,country,image):
self.mobile=mobile
self.country=country
self.image = image
I know that maybe it's too late to answer this question, but in this days I was trying to solve something similar and none of the solutions proposed seem to shed light on the main problem.
Of course any best practice rests on your needs. In general terms however, you will find that embed a file in the database is not a good practice. Well, it depends.
Reading the "Storing Binary files in the Database" produced by postgresql wiki, I discovered that there are some circumtances in which this practice is instead higly recommended, for instance when the files must be ACID.
In those cases, at least in Postgres, bytea datatype is to be preferred over text or BLOB binary, sometimes at the cost of some higher memory requirements for the server.
In this case:
1) you don't need special sqlalchemy dialects. LargeBinary datatype will suffice, since it will be translated as a "large and/or unlengthed binary type for the target platform".
2) You don't need any encode/decode functions in PostgreSQL, of course in this specific case.
3) As I told before, it is not always a good strategy to save the files into the filesystem. In any case do not use text data type with base64 encoding. Your data will inflated more or less of the 33%, thus resulting in a huge storage impact, whereas bytea has not the same drawback
Thus, I propose these changes to your model:
class User_tbl(db.Model):
id = db.Column(db.Integer,primary_key=True)
mobile=db.Column(db.String(13),unique=True)
country=db.Column(db.String(30))
image=db.Column(db.LargeBinary)
Then you can save files into Postgres simply by passing your FileStorage parameter as a binary:
image = request.files['fileimg'].read()
It would be far easier to avoid all of this encoding and decoding and simply save it as a binary blob. In which case, use a sqlalchemy.dialects.postgresql.BYTEA column.
I know of the encode and decode functions in PostgreSQL for dealing with base64 data, see:
https://www.postgresql.org/docs/current/static/functions-string.html
(encode/decode)
Thanks,
The recomended way to store an image in postgres via flask is to store the image in your static folder(where you store Javascript & CSS files) and serve it via a web server i.e. nginx. It will be able to do it more efficiently than flask.You should only store the path to your image on postgres and then store the actual image on the File system.

Json or CSV to web service with large amount of data

I have a list of objects that I send to web service.
In csv it has 5kb and in JSon it has 15kb and this can be larger based on amount of data.
Because this is the first time that I send large amount of data to web service I need advice should I use JSon or CSV to send to ws?
What is the best practice?
I am most worried about performance.
Advantages:
JSON - easily interpreted on client side, compact notation, Hierarchical Data
CSV - Opens in Excel(?)
Disadvantages:
JSON - If used improperly can pose a security hole (don't use eval), Not all languages have libraries to interpret it.
CSV - Does not support hierarchical data, you'd be the only one doing it, it's actually much harder than most devs think to parse valid csv files (CSV values can contain new lines as long as they are between quotes, etc).
For MoreDetail See this link.
THis is the Link

Load and perform search on large amount of data

I need a suggest how to operate with large amount of data on iPhone. Let say I have xml file with ~120k text records. I need to perform search on this data. The solution i have tried is to use Core Data to store information in sorted order in caches. And then use binary search which works fast. But the problem is to build this caches. On first launch application takes about 15-25 seconds to build this caches. Maybe I need to use different approach to search the data?
Thanks in advance.
If you're using an XML file with the requirement that you can't cache, then you're not going to succeed unless you somehow carefully format your XML file to have useful data traversal properties -- but then you may as well use a binary file that's more useful unless you have some very esoteric requirements.
Really what you want is one of the typical indexing algorithms (on disk hash, B-tree, etc) from the get-go.
However...
If you have to read in and parse your XML text file, then you can skirt using a typical big and slow generic XML parser and write a fast hackish version since most of the data records you'll need to recognize are probably formatted the same way over and over. Nothing special, just find where the relevant data fields start, grab the data until it ends, move on to the next data field.
Honestly, 120k of text isn't very much-- sounds like whatever XML parser you're using is just slow. (I use this trick all the time for autogenerated XML data that just represents things like tables or simple data records -- my own parser is faster than any generic XML parser.)
This is probably the solution you actually want since you sound fairly attached to the XML file format. It won't be as error-proof as a generic XML parser if you're not careful, however it will eat that 120KB file up like nobody's business. And it's entry level CS work -- read in a file with certain specific formatting and grab the data values from it. Regexps are your friend if you have access to them.
Try storing and doing your searches in the cloud. (using a database stored on a server somewhere)
Unless you specifically need ALL of the information on the device..

Store XML data in Core Data

is there any easy way of store XML data into core data?
Currently, my app just pulls the values from the XML file directly, however, this isn't efficient for XML files which holds over 100 entries, thus storing the data in Core Data would be the best option. XML file is called/downloaded/parsed ever time the app opens.
With the Core Data, the XML data would be downloaded ever 3600 seconds or so, and refresh the current data in the core data, to reduce the loading time when opening the app.
Any ideas on how I can do this?
Having reviewed the developer documentation, it doesn't look very tasty.
I take that you mean you have to down load an xml file, parse it and then save the data encoded in the file? You have several options for saving such data.
If the data is relatively simple and static e.g. a repeating list of items, then you might just want to use a NSArray, NSSet or NSDictionary (or some nested combination) and then just write the resulting collection to disk as a plist using the collection classes writeToFile: methods. Then when the data is needed you just use one of the initWithFile: methods. The disadvantage of this system is that you have to read the entire file back into memory to use it. This system doesn't scale for very large data sets.
If the data is complex e.g. a bunch of separate but highly interrelated chunks of data, and moderately large, then Core Data would be better.
Of course, you always have the option of writing the downloaded file straight to disk as a string if you want.