advice on choosing different binary xml tools - xml-serialization

My requirement is to compress xml file into a binary format, transmit it and decompress it (lightening fast) before i start parsing it.
There are quite a few binary xml protocols and tools available. I found EXI (efficient xml interchange) better as compared to others. Tried its open source version Exificient and found it good.
I heard about google protocol buffers and facebook's thrift, can any one tell me if these two can do the job i am looking for?
OR just let me know if there is anything better then EXI i should look for.
Also, There is a good XML parser VTD-XML (haven't tried myself, just googled about it and read some articles) that accomplishes better parsing performances as compared to DOM,SAX and Stax.
I want best of both worlds, best compression + best parsing performance, any suggestions?
One more thing regarding EXI, how can EXI claim to be fast at parsing a decoded XML file? Because it is being parsed by DOM, SAX or STax? I would have believed this to be true if there was another binary parser for reading the decoded version. Correct me if i am wrong.
ALSO, is there any good C++ open source implementation for EXI format? A version in java is available by EXIficient, but i am not able to spot a C++ open source implementation?
There is one by agile delta but that's commercial.

You mention protocol buffers (protobuf); this is a binary format, but has no direct relationship to XML. In partiular, no member-names (element names / attribute names / namespaces) are encoded - it is just the data (with numeric markers for identifiers).
As such, you cannot reconstruct arbitrary XML from a protobuf stream unless you already know how to map "field 3" etc.
However! If you have an object-model that works with both XML and protobuf, the transform is trivial; deserialize with either - serialize with either. How well this works depends on the implementation; for example, it is trivial with protobuf-net and is actually how I do the codegen (load the binary; write as XML; run the XML through an xslt layer to emit code).
If you actually just want to transfer object data (and XML is just a proposed implementation detail), then I thoroughly recommend protobuf; platform independent, a wide range of implementations, version-tolerant, very small output, and very fast processing at both read and write.

Nadeem,
These are very good questions. You might be new to the domain, but the same questions are frequently asked by XML veterans. I'll try to address each of them.
I heard about google protocol buffers and facebook's thrift, can any one tell me if these two can do the job i am looking for?
As mentioned by Marc, Protocol Buffers and Thrift are binary data formats, but they are not XML formats designed to transport XML data. E.g., they have no support for XML concepts like namespaces, attributes, etc., so mapping between XML and these binary formats would require a fair bit of work on your part.
OR just let me know if there is anything better then EXI i should look for.
EXI is likely your best bet. The W3C completed a pretty thorough analysis of XML format implementations and found the EXI implementation (Efficient XML) consistently achieved the best compactness and was one of the fastest. They also found it consistently achieved better compactness than GZIP compression and even packed binary formats like ASN.1 PER (see W3C EXI Evaluation). None of the other XML formats were able to do that. In the tests I've seen comparing EXI with Protocol Buffers, EXI was at least 2-4 times smaller.
I want best of both worlds, best compression + best parsing performance, any suggestions??
If it is an option, you might want to consider the commercial products. The W3C EXI tests mentioned above used Efficient XML, which is much faster than EXIficient (sometimes >10 times faster parsing and >20 times faster serializing). Your mileage may vary, so you should test it yourself if it is an option.
One more thing regarding EXI, how can EXI claim to be fast at parsing a decoded XML file?
The reason EXI can be smaller and faster to parse than XML is because EXI can be streamed directly to/from memory via the standard XML APIs without ever producing the data in an intermediate XML format. So, instead of serializing your data as XML via a standard API, compressing the XML, sending the compressed XML, decompressing the XML on the other end, then parsing it through one of the XML APIs, ... you can serialize your data directly as EXI via a standard XML API, send the EXI, then parse the EXI directly through one of the XML APIs on the other side. This is a fundamental difference between compression and EXI. EXI is not compression per-se -- it is a more efficient XML format that can be streamed directly to/from your application.
Hope this helps!

Compression is unified with the grammar system in EXI format. The decoder API generally give you a sequence of events such as SAX events when you let decoders process EXI streams, however, decoders are not internally converting EXI back into XML text to feed into another parser. Instead, the decoder does all the convoluted decompression/scanning process to yield an API event sequence such as SAX. Because EXI and XML are compatible at the event level, it is fairly straightforward to write out XML text given an event sequence.

Related

Save array of variable size to file in flutter

I am building an app to read results from color measurement devices, and for this purpose I need to know how to store an array of results to a local file on an android smartphone/tablet and read it back from that file so that it's once again an array I can work with.
The results will be result objects, because I also need to tell when the measurement was taken, and what measurement mode was used (such as B/W-measurement or measurement of a light source).
I know how to get strings in and out, but as far as I know, transforming that to an array is impossible without bodgy and inelegant code.
So where do I even get started here?
Should I use plain .txt?
Or should I try to use .xml or .json files?
You should use standard formats for your storing. The json format is a good one for structured data because many tools support displaying or even parsing it.
For instance, you may store it like this
[
[
"result1",
"23:00",
"B/W"
],
[
"result2",
"18:14",
"Color"
]
]
You can see, if you store it to e.g. test.json and drag'n'drop it on the Firefox browser, it recognizes the format and supports cool displaying. So using standard formats for structured data is a good idea, programming libraries and even programming languages like Python support it with special classes or functionality. It's also easy for you to code the data dump or the parser by yourself.
Also XML is a good format. Actually, .json is more modern while .xml was there first. What to choose depends on in which programming world you are. Some more support this, some more that. For your purposes, it doesn't matter which one you use. I've seen .json much in the Android world but probably because it is just more modern.
But remember, the format is just the framework, which data content you put into it is up to you.

Reading & writing text in Scala, getting the encoding right?

I'm reading and writings some text files in Scala. As a complete beginner in the language, I wanted to make sure to find the right way to do it, e.g. get the encoding right.
So most of the stuff I found (also on SO ) recommends I use io.Source.fromFile.However, after trying it out like so, reading a UTF-8 file:
val user_list = Source.fromFile("usernames.txt").getLines.toList
val user_list = Source.fromFile("usernames.txt", enc="UTF8").getLines.toList
I looked at the docs but was left with some questions.
Get the encoding right:
the docs show that I can set an encoding in Source.fromFile as I tried above. Looking at the man on Codec and the types listed there, I was wondering if those are all my codec options - is there e.g. no Utf-16, Big-Endian vs Little-Endian, etc.?
I am slightly obsessed with this since it used to trip me up in Python a lot. Is this less of concern with Scala for some reason?
Get the reading in right:
All the examples I looked at used the getLines method and postprocessed it with MkString or List, etc. Is there any advantage to that over just reading in the entire file (my files are small) in one go?
Get the writing out right:
Every source I could find tells me that Scala has no file writing function and to use the Java FileWriter. I was surprised by this - is this still accurate?
Looking at it I feel the question might be a little broad for SO, so I'd be happy to take it back if it does not meet the requirements. At this point, I'm not struggling with specific examples but rather trying to set things up in a way I don't get in trouble later.
Thanks!
Scala only has a basic IO api in the standard library. For the most part you just use the java apis. The fact that a decent api from java exists is probably why the Scala team is not prioritizing having a robust and fully featured IO api.
There are also third party scala libraries you could use as well however. Better Files I've never used but heard good things about as a Scala file api. As well as fs2 which provides functional, streaming IO. I'm sure there are others out there as well.
For encoding, there are many possible encoding available. It's just that only a couple of the most common ones are available as static fields, the rest you typically access through Codec("Encoding Name"). Most apis will also let you just enter a String directly instead of needing to get a Codec instance first. The codec is really just a wrapper over java.nio.charset.Charset. You can run java.nio.charset.Charset.availableCharsets() to see all of the encodings available on your system.
As far as reading, if the files are small you can load them fully into memory if you prefer that. The only reason not to do so is if you want to avoid the extra memory use of loading the entire file at once if reading through line by line is enough. You may want to use Vector instead of List for efficiency reasons (Vector is better in many cases and should probably be preferred as a default collection, but tradition and old habits die hard and most people/guides seem to default to List, but this is a whole other topic)

POJO marshaller/demarshaller: JAX-RS JSON matched with GWT client JSON

I am using Resteasy and GWT. For certain reasons, as many others have similar motivations, I am not using GWT-RPC for some of the functionality of the software I am working on.
I need to pass POJOs between GWT client and server by marshalling/demarshalling the POJOs into/from JSON.
OK, easier said than done because I need the POJO-JSON converters on both sides to match.
Q1. Is there a standard POJO notation in JSON? Is there an ietf RFC or ISO or ECMA that specifies the format of POJO notation in JSON? Or is it a free for all, libertarian anarchy?
Q2. Do Jettison and Jackson (when used with JAXB) and Autobeans produce the same JSON for POJOs?
Q3. This is the most crucial question. You can ignore the other questions above but you MUST answer this. Give me a combination pair of server-side and GWT client side JSONizer/deJSONizer that works together. For example, can I use Autobeans on client-side and use JAXB-jettison on server side and expect the JSONized POJO notation to be the same?
Q4. Is it possible to use JAXB-Jettison or JAXB-Jackson on GWT client-side by including the java source code for JAXB, Jettison/Jackson in the whatever.gwt.xml file? Are there parts of JAXB, Jettison/Jackson source code that might e.g., depend on reflection, or non-serializable, etc that would not make it possible to use JAXB + Jettison/Jackson in GWT client code? If possible, please explain how?
~
I should clarify concerning Q1:
I am not asking about RFC for JSON. I am asking about JSON POJO format. When a POJO is converted to JSON, everybody does it their own way - so, I am thinking that there should be an RFC to standardise the way and format a POJO is converted to JSON. Is there a standard or not? !!I hope your answers should not quote me the RFC for JSON!!
~
What about
Someone needs to tell me about
badgerfish on GWT client
and GWT client-server matched JSON-RPC.
There is no standard for mapping, but I would claim there is obvious simple mapping, given simplicity of JSON format, and de facto standard of Java Beans (i.e. mapping of set/get methods to logical property names). One of few exceptions is Jettison.
Jettison is not as much a JSON/POJO library as it is JSON<->XML library: it converts JSON to XML API calls (and vice versa), to allow use of XML tools such as JAXB for XML data binding, on JSON. But the cost here is that JSON it produces and consumes has extra complexity which is only needed to work with XML APIs. And this is what makes it non-standard compared to the usual straight-forward bindings like used by Jackson, GSON, Flex-json and other "native" JSON libs.
I would recommend not using Jettison unless you really, really must for some reason. Not even if you produce both XML and JSON -- usually you are better off mapping JSON to/from POJOs using JSON tools, and XML separate to/from POJOs (using JAXB etc).
Jettison was intended to bridge the gap between (then) more mature XML tools and newish JSON format. But there isn't much benefit nowadays when there are dozens of mature JSON libraries available.
JSON is just a subset of JavaScript, it was "invented" by Douglas Crockford. Here is the RFC for application/json: http://www.ietf.org/rfc/rfc4627.txt?number=4627. So any of your server side solutions should create the same result.
We are using RestyGwt ( http://restygwt.fusesource.org/ ) on the clientside and it works like charm. Its JSON encoding style is compatible with the default Jackson Data Binding so it should work with Jackson as well.

JSON or SOAP (XML)?

I'm developing a new application for the company.
The application have to exchange data from and to iPhone.
Company server side uses .NET framework.
For example: the class "Customer" (Name, Address etc..) for a specific CustomerNumber should be first downloaded from server to iphone, stored locally and then uploaded back to apply changes (and make them available to other people). Concurrency should not be a problem (at least at this time...)
In any case I have to develop both the server side (webservice or whatever) and the iPhone app.
I'm free to identify the best way to do that (this is the application "number ONE" so it will become the "standard" for the future).
So, what do you suggest me ?
Use SOAP web services (XML parsing etc..) or user JSON ? (it seems lighter...)
Is it clear to me how to "upload" data using SOAP (very long to code the xml soap envelope ... I would avoid) but how can I do the same using JSON ?
The application needs to use date values (for example: last_visit_date etc..) what about date in Json ?
JSON has several advantages over XML. Its a lot smaller and less bloated, so you will be passing much less data over the network - which in the case of a mobile device will make a considerable difference.
Its also easier to use in javascript code as you can simply pass the data packet directly into a javascript array without any parsing, extracting and converting, so it is much less CPU intensive too.
To code with it, instead of an XML library, you will want a JSON library. Dates are handled as you would with XML - encode them to a standard, then let the library recognise them. (eg here's a library with a sample with dates in it)
Here's a primer.
Ah, the big question: JSON or XML?
In general, I would prefer XML only when I need to pass around a lot of text, since XML excels at wrapping and marking up text.
When passing around small data objects, where the only strings are small (ids, dates, etc.), I would tend to use JSON, as it is smaller, easier to parse, and more readable.
Also, note that even if you choose XML, that does not by any means mean you need to use SOAP. SOAP is a very heavy-weight protocol, designed for interoperability between partners. As you control both the client and server here, it doesn't necessarily make sense.
Consider how you'd be consuming the results on the iPhone. What mechansim would you use to read the web service response? NSXMLParser?
How you consume the data would have the biggest impact on how your serve it.
Are JSON and SOAP your only options? What about RESTful services?
Take a look at some big players on the web that have public APIs that are accessible by iPhone clients:
Twitter API
FriendFeed API
Also, review the following related articles:
How to parse nested JSON on iPhone
RESTful WCF service that can still use SOAP
Performance of REST vs SOAP
JSON has following advantages:
it can encode boolean and numeric values ... in XML everything is a string
it has much clearer semantics ... in json you have {"key":"someValue"}, in XML you can have <data><key>someValue</key></data> or <data key="someValue" /> ... any XML node must have a name ... this does not always make sense ... and children may either represent properties of an object, or children, which when occuring multiple times actually represent an array ... to really understand the object structure of an XML message, you need its corresponding schema ... in JSON, you need the JSON only ...
smaller and thus uses less bandwidth and memory during parsing/generation ...
apart from that, i see NO difference between XML and JSON ... i mean, this is so interchangable ... you can use JSON to capture the semantics of SOAP, if you want to ...
it's just that SOAP is so bloated ... if you do want to use SOAP, use a library and generators for that ... it's neither fun nor interesting to build it all by hand ...
using XML RPC or JSON RPC should work faster ... it is more lightweight, and you use JSON or XML at will ... but when creating client<->server apps, a very important thing in my eyes, is to abstract the transport layer on both sides ... your whole business logic etc. should in no way depend on more than a tiny interface, when it comes to communication, and then you can plug in protocols into your app, as needed ...
There are more options than just SOAP vs JSON. You can do a REST-based protocol (Representational State Transfer) using XML. I think it's easier use than SOAP and you get a much nicer XSD (that you design.) It's rather easy for almost any client to access such services.
On the other hand, JSON parsers are available for almost any language and make it really easy to call from JavaScript if you'll use them via AJAX.
However, SOAP can be rather powerful with tons of standardized extensions that support enterprise features.
You could also use Hessian using HessianKit on the iPhone side, and HessianC# on the server side.
The big bonuses are:
1. Hessian in a binary serialization protocol, so smaller data payloads, good for 3G and GSM.
2. You do not need to worry about format in either end, transport is automated with proxies.
So on the server side you just define an C# interface, such as:
public interface IFruitService {
int FruitCount();
string GetFruit(int index);
}
Then you just subclass CHessianHandler and implement the IFruitService, and your web service is done.
On the iPhone just write the corresponding Objective-C protocol:
#protocol IFruitService
-(int)FruitCount;
-(NSString*)GetFruit:(int)index;
#end
That can then be access by proxy by a single line of code:
id<IFruitService> fruitService = [CWHessianConnection proxyWithURL:serviceURL
protocol:#protocol(IFruitService)];
Links:
HessianKit : hessiankit
I would certainly go with JSON, as others already noted - it's faster and data size is smaller. You can also use a data modelling framework like JSONModel to validate the JSON structure, and to autoconvert JSON objects to Obj-C objects.
JSONModel also includes classes for networking and working with APIs - also includes json rpc methods.
Have a look at these links:
http://www.jsonmodel.com - the JSONModel framework
http://json-rpc.org - specification for JSON APIs implementation
http://www.charlesproxy.com - the best tool to debug JSON APIs
http://json-schema.org - tool to define validation schemas for JSON, useful along the way
Short example of using JSONModel:
http://www.touch-code-magazine.com/how-to-make-a-youtube-app-using-mgbox-and-jsonmodel/
Hope these are useful

testing strategies: generating a XML file

I'm writing a couple of classes that generate xml file. (Details probably not important at the moment).
I wondering the best testing strategy is.
I don't want to re-write the xml generation code just to compare the output, when I could write the file to disk and compare it at certain milestones (the xml spec won't change often, like once or twice every couple of years)
I'm more interested in testing the behaviour of the architecture instead of the getters & setters
Options that come to mind:
rebuilding the xml file in the testing environment and comparing the string representations
manually checking the result (writing to file, etc)
rebuilding the xml file in memory in the testing environment and comparing the in-memory elements.
Virtual Bonus if you know any libraries for C++ and/or Google Test.
Ideas?
Have you considered using XSD's and validating your XML to the XSD? You didn't mention if it was content or structure you were testing for (probably both).
If it validates, it will test the structure of the XML will conform to the required structure.
In the past I've approached this two ways:
Compare the xml file against the result stored as a string in the test file. This is easy to implement, and unless you are wanting to generate variations of the xml file for testing purposes, the string comparison method works fine.
In the case where you have a xml file writer and reader, you can compare the original with the round trip result.
I agree with you that you shouldn't replicate the logic to generate the file in the test function, just for the purpose of testing. Also, I would try to avoid the need to write to the file system -- this is unecessary dependence on the file system, and would probably result in slower running tests.
You might consider using XML Unit: http://xmlunit.sourceforge.net.
It provides JUnit extension classes which can be used to assert equality
of XML files.
You might consider an XML diff tool. There is a free one available on MSDN: XML Diff and Patch Tool.
I see you are looking for C++ tools. In that case, libxmldiff might be more suitable.