Avoiding crash when parsing '#' key in GTM on iPhone - iphone

I'm providing JSON objects that are to be parsed on different platforms. Based on several sources (for example 1, 2, 3) I decided to use keys starting with # for metadata, which has attribute counterparts in XML.
I've now been told by an iPhone developer that the current structure needs to be changed, since the # sign causes GTM to crash. He claims that it's simply impossible to parse structures like this: {"#name": "value"}
My dilemma is that I'm serving other clients and languages as well, and I want consistency between the XML and JSON representations of the data, beyond the wish to encode in a way that corresponds to the logical model. I would hope to avoid redesigning the structures because of the needs of a single language, especially since placing # signs in keys doesn't violate the rules for proper JSON.
I find it strange if tools such as GTM can't handle JSON structures that not only abide by the standards, but also follow common recommendations and guidelines. Is the criticism from this developer justified, or is there a way to solve his problem on the client side?

Related

Do i need to put path params in an URI if i don't use them [REST APIs]

I'm having a debate with a senior of mine at work and i want to know if what he says is true. Imagine I have a path /users/bucket-list that gets the currently logged in user bucket list. Now my question is, since i get the ID of the logged in user from the context do i still need to name my path like this /users/:user_id/bucket-list. I don't use the path param but my senior thinks that it should still be there and I think that since i don't use it i need to omit it. I want to hear your thoughts about this.
TL; DR
You are "doing it wrong"
Most of the time, you'll get away with it
Getting away with it is the wrong goal
Any information that can be named can be a resource -- Fielding, 2000
In most cases, I find that the easiest way to reason about "resources" is to substitute "documents", and then once the basic ideas are in place to then generalize if necessary.
One of the design problems that we face in creating our API is figuring out our resources; should "Alice's bucket-list" be presented separately from "Bob's bucket-list", or do they belong together? Do we have one resource for the entire list, or one resource for each entry in the list, and so on.
A related problem we need to consider in our design is how many representations a resource should support. This might include choosing to support multiple file formats (csv vs plain-text vs json, etc), and different languages (EN vs FR), and so on.
Your senior's proposed design is analogous to having two different resources. And having done that, everything will Just Work[tm]. There's no confusion about which resource is being identified, authorization is completely separate from identification, and so on.
Your design, however, is analogous to having a single resource with multiple representations, where a representation is chosen based on who is looking at it. And that's kind of a mess -- certainly your server can interpret the HTTP request, but general purpose components are not going to know that your resource has different identification semantics than every other resource on the internet.
We normally discriminate different representations using the Vary header; but the Authorization field is sort of out of bounds there, see RFC 7231.
In practice, you are likely to get away with your design because we have special rules about how shared-caches interact with authenticated requests, see RFC 7234.
But "likely to get away with it" is pretty weak. The point of having common standards is to get interop. If you are going to risk interop, you had better be getting something very valuable back in exchange. Nothing you have presented here suggests a compensating advantage.

Where should data be transformed for the database?

Where should data be transformed for the database? I believe this is called data normalization/sanitization.
In my database I have user created shops- like an Etsy. Let's say a user inputs a price for an item in their shop as "1,000.00". But my database stores the prices as an integer/pennies- "100000". Where should "1,000.00" be converted to "100000"?
These are the two ways I thought of.
In the frontend: The input data is converted from "1,000.00" to "100000" in the frontend before the HTTP request. In this case, the backend would validate that the price it is an integer.
In the backend: "1,000.00" is sent to the backend as is, then the backend validates that it is a price format, then the backend converts the price to an integer "100000" before being stored in the database.
It seems either would work but is one way better than the other or is there a standard way? I would think the second way is best to reduce code duplication since there is likely to be multiple frontends - mobile, web, etc- and one backend. But the first way also seems cleaner- just send what the api needs.
I'm working with a MERN application if that makes any difference, but I would think this would be language agnostic.
I would go with a mix between the two (which is a best practice, AFAIK).
The frontend has to do some kind of validation anyway, because it you don't want to wait for the backend to get the validation response. I would add the actual conversion here as well.
The validation code will be adapted to each frontend I guess, because each one has a different framework / language that it uses, so I don't necessarily see the code duplication here (the logic yes, but not the actual code). As a last resort, you can create a common validation library.
The backend should validate again the converted value sent by the frontends, just as a double check, and then store it in the database. You never know if the backend will be integrated with other components in the future and input sanitization is always a best practice.
Short version
Both ways would work and I think there is no standard way. I prefer formatting in the frontend.
Long version
What I do in such cases is to look at my expected business requirements and make a list of pros and cons. Here are just a few thoughts about this.
In case, you decide for doing every transformation of the price (formatting, normalization, sanitization) in the frontend, your backend will stay smaller and you will have less endpoints or endpoints with less options. Depending on the frontend, you can choose the perfect fit for you end user. The amount of code which is delivered will stay smaller, because the application can be cached and makes all the formatting stuff.
If you implement everything in the backend, you have full control about which format is delivered to your users. Especially when dealing with a lot of different languages, it could be helpful to get the correct display value directly from the server.
Furthermore, it can be helpful to take a look at some different APIs of well-known providers and how these handle prices.
The Paypal API uses an amount object to transfer prices as decimals together with a currency code.
"amount": {
"value": "9.87",
"currency": "USD"
}
It's up to you how to handle it in the frontend. Here is a link with an example request from the docs:
https://developer.paypal.com/docs/api/payments.payouts-batch/v1#payouts_post
Stripe uses a slightly different model.
{
unit_amount: 1600,
currency: 'usd',
}
It has integer values in the base unit of the currency as the amount and a currency code to describe prices. Here are two examples to make it more clear:
https://stripe.com/docs/api/prices/create?lang=node
https://stripe.com/docs/checkout/integration-builder
In both cases, the normalization and sanitization has to be done before making requests. The response will also need formatting before showing it to the user. Of course, most of these requests are done by backend code. But if you look at the prebuilt checkout pages from Stripe or Paypal, these are also using normalized and sanitized values for their frontend integrations: https://developer.paypal.com/docs/business/checkout/configure-payments/single-page-app
Conclusion/My opinion
I would always prefer keeping the backend as simple as possible for security reasons. Less code (aka endpoints) means a smaller attack surface. More configuration possibilities means a lot more effort to make the application secure. Furthermore, you could write another (micro)service which overtakes some transformation tasks, if you have a business requirement to deliver everything perfectly formatted from the backend. Example use cases may be if you have a preference for backend code over frontend code (think about your/your team's skills), if you want to deploy a lot of different frontends and want to make sure that they all use a single source of truth for their display values or maybe if you need to fulfill regulatory requirements to know exactly what is delivered to your user.
Thank you for your question and I hope I have given you some guidance for your own decision. In the end, you will always wish you had chosen a different approach anyway... ;)
For sanitisation, it has to be on the back end for security reasons. Sanitisation is concerned with ensuring only certain field's values from your web form are even entertained. It's like a guest list at an exclusive club. It's concerned not just with the field (e.g. username) but also the value (e.g. bobbycat, or ); DROP TABLE users;). So it's about ensuring security of your database.
Normalisation, on the other hand, is concerned with transforming the data before storing them in the database. It's the example you brought up: 1,000 to 1000 because you are storing it as integers without decimals in the database.
Where does this code belong? I think there's no clear winner because it depends on your use case.
If it's a simple matter like making sure the value is an integer and not a string, you should offload that to the web form (I.e. the front end), since forms already have a "type" attribute to enforce these.
But imagine a more complicated scenario. Let's say you're building an app that allows users to construct a Facebook ads campaign (your app being the third party developer app, like Smartly.io). That means there will be something like 30 form fields that must be filled out before the user hits "create campaign". And the value in some form fields affect the validity of other parts of the form.
In such a situation, it might make sense to put at least some of the validation in the back end because there is a series of operations your back end needs to run (like create the Post, then create the Ad) in order to ensure validity. It wouldn't make sense for you to code those validations and normalisations in the front end.
So in short, it's a balance you'll need to strike. Offload the easy stuff to the front end, leveraging web APIs and form validations. Leave the more complex normalisation steps to the back end.
On a related note, there's a broader concept of ETL (extract, transform, load) that you'd use if you were trying to consume data from another service and then transforming it to fit the way you store info in your own database. In that case, it's usually a good idea to keep it as a repository on its own - using something like Apache Airflow to manage and visualise the cron jobs.

REST API - string or numerical identifier in URL

We're developing a REST API for our platform. Let's say we have organisations and projects, and projects belong to organisations.
After reading this answer, I would be inclined to use numerical ID's in the URL, so that some of the URLs would become (say with a prefix of /api/v1):
/organisations/1234
/organisations/1234/projects/5678
However, we want to use the same URL structure for our front end UI, so that if you type these URLs in the browser, you will get the relevant webpage in the response instead of a JSON file. Much in the same way you see relevant names of persons and organisations in sites like Facebook or Github.
Using this, we could get something like:
/organisations/dutchpainters
/organisations/dutchpainters/projects/nightwatch
It looks like Github actually exposes their API in the same way.
The advantages and disadvantages I can come up with for using names instead of IDs for URL definitions, are the following:
Advantages:
More intuitive URLs for end users
1 to 1 mapping of front end UI and JSON API
Disadvantages:
Have to use unique names
Have to take care of conflict with reserved names, such as count, so later on, you can still develop an API endpoint like /organisations/count and actually get the number of organisations instead of the organisation called count.
Especially the latter one seems to become a potential pain in the rear. Still, after reading this answer, I'm almost convinced to use the string identifier, since it doesn't seem to make a difference from a convention point of view.
My questions are:
Did I miss important advantages / disadvantages of using strings instead of numerical IDs?
Did Github develop their string-based approach after their platform matured, or did they know from the start that it would imply some limitations (like the one I mentioned earlier, it seems that they did not implement such functionality)?
It's common to use a combination of both:
/organisations/1234/projects/5678/nightwatch
where the last part is simply ignored but used to make the url more readable.
In your case, with multiple levels of collections you could experiment with this format:
/organisations/1234/dutchpainters/projects/5678/nightwatch
If somebody writes
/organisations/1234/germanpainters/projects/5678/wanderer
it would still map to the rembrandt, but that should be ok. That will leave room for editing the names without messing up url:s allready out there. Also, names doesn't have to be unique if you don't really need that.
Reserved HTTP characters: such as “:”, “/”, “?”, “#”, “[“, “]” and “#” – These characters and others are “reserved” in the HTTP protocol to have “special” meaning in the implementation syntax so that they are distinguishable to other data in the URL. If a variable value within the path contains one or more of these reserved characters then it will break the path and generate a malformed request. You can workaround reserved characters in query string parameters by URL encoding them or sometimes by double escaping them, but you cannot in path parameters.
https://www.serviceobjects.com/blog/path-and-query-string-parameter-calls-to-a-restful-web-service
Numerical consecutive IDs are not recommended anymore because it is very easy to guess records in your database and some might use that to obtain info they do not have access to.
Numerical IDs are used because the in the database it is a fixed length storage which makes indexing easy for the database. For example INT has 4 bytes in MySQL and BIGINT is 8 bytes so the number have the same length in memory (100 in INT has the same length as 200) so it is very easy to index and search for records.
If you have a lot of entries in the database then using a VARCHAR field to index is a bad idea. You should use a fixed width field like CHAR(32) and fill the difference with spaces but you have to add logic in your program to treat the differences when searching the database.
Another idea would be to use slugs but here you should take into consideration the fact that some records might have the same slug, depends on what are you using to form that slug. https://en.wikipedia.org/wiki/Semantic_URL#Slug
I would recommend using UUIDs since they have the same length and resolve this issue easily.

Better way to load content from web, JSON or XML?

I have an app which will load content from a website.
There will be around 100 articles during every loading.
I would like to know which way is better to load content from web if we look at:
speed
compatibility (will there be any problems with encoding if we use special characters etc.)
your experience
JSON is better if your data is huge
read more here
http://www.json.org/xml.html
Strongly recommend JSON for better performance and less bandwidth consumption.
JSON all the way. The Saad's link is an excellent resource for comparing the two (+1 to the Saad), but here is my take from experience and based on your post:
speed
JSON is likely to be faster in many ways. Firstly the syntax is much simpler, so it'll be quicker to parse and to construct. Secondly, it is much less verbose. This means it will be quicker to transfer over the wire.
compatiblity
In theory, there are no issues with either JSON or XML here. In terms of character encodings, I think JSON wins because you must use Unicode. XML allows you to use any character encoding you like, but I've seen parsers choke because the line at the top specifies one encoding and the actual data is in a different one.
experience
I find XML to be far more difficult to hand craft. You can write JSON in any text editor but XML really needs a special XML editor in order to get it right.
XML is more difficult to manipulate in a program. Parsers have to deal with more complexity: name spaces, attributes, entities, CDATA etc. So if you are using a stream based parser you need to track attributes, element content, namespace maps etc. DOM based parsers tend to produce complex graphs of custom objects (because they have to in order to model the complexity). I have to admit, I've never used a stream based JSON parser, but parsers producing object graphs can use the natural Objective-C collections.
On the iPhone, there is no built in XML DOM parser in Cocoa (you can use the C based parser - libxml2) but there is a simple to use JSON parser as of iOS 5.
In summary, if I have control of both ends of the link, I'll use JSON every time. On OS X, if I need a structured human readable document format, I'll use JSON.
You say you are loading "articles". If you mean documents containing rich text (stuff like italic and bold), then it's not clear that JSON is an option - JSON doesn't really do mixed content.
If it's pure simple structured data, and if you don't have to handle complexities like the need for the software at both ends of the communication to evolve separately rather than remaining in lock sync, then JSON is simpler and cheaper: you don't need the extra power or complexity of XML.

REST Media type explosion

In my attempt to redesign an existing application using REST architectural style, I came across a problem which I would like to term as "Mediatype Explosion". However, I am not sure if this is really a problem or an inherent benefit of REST. To explain what I mean, take the following example
One tiny part of our application looks like:
collection-of-collections->collections-of-items->items
i.e the top level is a collection of collections and each of these collection is again a collection of items.
Also, each item has 8 attributes which can be read and written individually. Trying to expose the above hierarchy as RESTful resources leaves me with the following media types:
application/vnd.mycompany.collection-of-collections+xml
application/vnd.mycompany.collection-of-items+xml
application/vnd.mycompany.item+xml
Further more, since each item has 8 attributes which can be read and written to individually, it will result in another 8 media types. e.g. one such media type for "value" attribute of an item would be:
application/vnd.mycompany.item_value+xml
As I mentioned earlier, this is just a tiny part of our application and I expect several different collections and items that needs to be exposed in this way.
My questions are:
Am I doing something wrong by having these huge number of media types?
What is the alternative design method to avoid this explosion of media types?
I am also aware that the design above is highly granular, especially exposing individual attributes of the item and having separate media types for each them. However, making it coarse means I will end up transferring unnecessary data over the wire when in reality the client only needs to read or write a single attribute of an item. How would you approach such a design issue?
One approach that would reduce the number of media types required is to use a media type defined to hold lists of other media-types. This could be used for all of your collections. Generally lists tend to have a consistent set of behavior.
You could roll your own vnd.mycompany.resourcelist or you could reuse something like an Atom collection.
With regards to the specific resource representations like vnd.mycompany.item, what you can do depends a whole lot on the characteristics of your client. Is it in a browser? can you do code-download? Is your client a rich UI, or is it a data processing client?
If the client is going to do specific data processing then you pretty much need to stick with the precise media types and you may end up with a large number of them. But look on the bright side, you will have less media-types than you would have namespaces if you were using SOAP!
Remember, the media-type is your contract, if your application needs to define lots of contracts with the client, then so be it.
However, I would not go as far as defining contracts to exchange single attribute values. If you feel the need to do that, then you are doing something else wrong in your design. Distributed interface design needs to have chunky conversations, not chatty ones.
I think I finally got the clarification I sought for the above question from Ian Robinson's presentation and thought I should share it here.
Recently, I came across the statement "media type for helping tune the hypermedia engine, schema for structure" in a blog entry by Jim Webber. I then found this presentation by Ian Robinson of Thoughtworks. This presentation is one of the best that I have come across that provides a very clear understanding of the roles and responsibilities of media types and schema languages (the entire presentation is a treat and I highly recommend for all). Especially lookout for the slides titled "You've Chosen application/xml, you bstrd." and "Custom media types". Ian clearly explains the different roles of the schemas and the media types. In short, this is my take away from Ian's presentation:
A media type description includes the processing model that identifies hypermedia controls and defines what methods are applicable for the resources of that type. Identifying hypermedia controls means "How do we identify links?" in XHTML, links are identified based on tag and RDF has different semantics for the same. The next thing that media types help identify is what methods are applicable for resources of a given media type? A good example is ATOM (application/atom+xml) specification which gives a very rich description of hyper media controls; they tell us how the link element is defined? and what we can expect to be able to do when we dereference a URI so it actually tells something about the methods we can expect to be able to apply to the resource. The structural information of a resource represenation is NOT part of or NOT contained within the media type description but is provided as part of appropriate schema of the actual representation i.e the media type specification won’t necessarily dictate anything about the structure of the representation.
So what does this mean to us? simply that we dont need a separate media type for describing each resource as described above in my original question. We just need one media type for the entire application. This could be a totally new custom media type or a custom media type which reuses existing standard media types or better still, simply a standard media type that can be reused without change in our application.
Hope this helps.
In my opinion, this is the weak link of the REST concept. As an architectural and interface style, REST is outstanding and the work done by Roy F. and others has advanced the state of the art considerably. But there is an upper limit to what can be communicated (not just represented) by standard media types.
For people to understand and use your REST-ish API, they need to understand the meaning of the data. There are APIs where the media types tell most of the story; e.g. if you have a text-to-speech API, the input media type is text/plain and the output media type is audio / mp4, then someone familiar with the subject matter could probably make do. Text in, audio out, probably enough to go on in this case.
But many APIs can't communicate much of their meaning with just media type. Let's say you have an API that handles airline ticketing. The inputs and outputs will mostly be data. The media types on input and output of every API could be application/json or application/xml, so the media type doesn't transmit a lot of information. So then you would look at the individual fields in the inputs & outputs. Maybe there's a field called "price". Is that in dollars or pennies? USD or some other currency? I don't know how a user would answer those questions without either (a) very descriptive names, like "price_pennies_in_usd", or (b) documentation. Not to mention format conventions. Is an account number provided with or without dashes, must letters be all-caps and so on. There is no standard media type that defines these issues.
It's one thing when we're in situations where the client doesn't need a semantic understanding of the data. That works well. The fact that browsers can visually render any compliant document, and interact with any compliant resource, is really great. That's basically the "media" use case.
But it's entirely different when the client (or actually, the developer/user behind the client) needs to understand the semantics of the data. DATA IS NOT MEDIA. There is no way to explain data in all its real-world meaning and subtlety other than documenting it. This is the "data" use case.
The overly-academic definition of REST works in the media use case. It doesn't work, and needs to be supplemented with non-pure but useful things like documentation, for other use cases.
You're using the media type to convey details of your data that should be stored in the representation itself. So you could have just one media type, say "application/xml", and then your XML representations would look like:
<collection-of-collections>
<collection-of-items>
<item>
</item>
<item>
</item>
</collection-of-items>
<collection-of-items>
<item>
</item>
<item>
</item>
</collection-of-items>
</collection-of-collections>
If you're concerned about sending too much data, substitute JSON for XML. Another way to save on bytes written and read is to use gzip encoding, which cuts things down about 60-70%. Unless you have ultra-high performance needs, one of these approaches ought to work well for you. (For better performance, you could use very terse hand-crafted strings, or even drop down to a custom binary TCP/IP protocol.)
Edit One of your concerns is that:
making [the representation] coarse means I will end up transferring unnecessary data over the wire when in reality the client only needs to read or write a single attribute of an item
In any web service there is quite a lot of overhead in sending messages (each HTTP request might cost several hundred bytes for the start line and request headers and ditto for each HTTP response as in this example). So in general you want to have less granular representations. So you would write your client to ask for these bigger representations and then cache them in some convenient in-memory data structure where your program could read data from them many times (but be sure to honor the HTTP expiration date your server sets). When writing data to the server, you would normally combine a set of changes to your in-memory data structure, and then send the updates as a single HTTP PUT request to the server.
You should grab a copy of Richardson and Ruby's RESTful Web Services, which is a truly excellent book on how to design REST web services and explains things much more clearly than I could. If you're working in Java I highly recommend the RESTlet framework, which very faithfully models the REST concepts. Roy Fielding's USC dissertation defining the REST principles may also be helpful.
A media type should be seldomly created and time should be invested in making sure the format can survive change.
As you're relying on xml, there is no particular reason why you couldn't create one media type, provided that media type is described in one source.
Choosing ATOM over having one host media type that supports multiple root elements doesn't necessarily bring you anything: you'll still need to start reading the message within the context of a specific operation before deciding if enough information is present to process the request.
So i would suggest that you could happily have one media type, represented by one root element, and use a schema language to specify which of the elements can be contained.
In other words, a language like xsd can let you type your media type to support one of multiple root elements. There is nothing inherently wrong with application/vnd.acme.humanresources+xml describing an xml document that can take either or as a root element.
So to answer your question, create as few media types as you can possibly afford, by questioning if what you put in the documentation of the media type will be understandable and implementeable by a developer.
Unless you intend on registering these media types you should pick one of the existing mime types instead of trying to make up your own formats. As Jim mentions application/xml or text/xml or application/json works for most of what gets transmitted in a REST design.
In reply to Darrel here is Roy's full post. Aren't you trying to define typed resources by creating your own mime types?
Suresh, why isn't HTTP+POX Restful?