Can we measure complexity of web site? - code-metrics

I am familiar with using cyclomatic complexity to measure software. However, in terms of web site, do we have a kind of metrics to measure complexity of website?

If you count HTML tags in the displayed HTML pages, as "Operators", you can compute a Halstead number for each web page.
If you inspect the source code that produces the web pages, you can compute complexity measures (Halstead, McCabe, SLOC, ...) of those. To do that, you need tools that can compute such metrics from the web page sources.
Our SD Source Code Search Engine (SCSE) is normally used to search across large code bases (e.g., the web site code) even if the code base is set of mixed languages (HTML, PHP, ASP.net, ...). As a side effect, the SCSE just so happens to compute Halstead, McCabe, SLOC, comment counts, and a variety of other basic measurements, for each file it can search (has indexed).
Those metrics are exported as an XML file; see the web link above for an example.
This would give you a rough but immediate ability to compute web site complexity metrics.

Though the question was asked 6 moths ago...
If your website is 100% static site with no javascript at all, then it needs to be powered by a back-end programmed in a programming language. So indirectly, the complexity measures that afflict the back-end programming will also affect the complexity of maintaining the site.
Typicall, I've observed a corellation between the maintainability and quality (or lack thereof) of the web pages themselves to the quality (or lack thereof) exhibited, through software metrics, in the back-end programming. Don't quote me on that, and take it with a grain of salt. It is purely an observation I've made in the gigs I've worked on.
If your site - dynamic content or not - also has JavaScript in it, then this is also source code that demonstrate measurable attributes in terms of software complexity metrics. And since JavaScript is typically used for rendering HTML content, it stands as a possibility (but not as a certainty) that attrocious, hard to maintain JavaScript will render similarly attrocious, hard to maintain HTML (or be embedded in attrocious, hard to maintain markup.)
For a completely static site, you could still devise some type of metrics, though I'm not aware of any that are publisized.
Regarless, a good web site should have uniform linking.
It should provide uniform navigation.
Also, html pages shouldn't be replicated or duplicated, little to no dead links.
Links within the site should be relative (either to their current location or to the logical root '/') and not absolute. That is, don't hard-code the domain name.
URI naming patterns should be uniform (preferably lower case.) URLs are case-insensitive, so it makes absolutely no sense to have links that mix both cases. Additionally, links might be mapping to stuff in actual filesystems that might be case-sensitive, which leads to the next.
URIs that represent phrases should be uniform (either use - or _ to separate words, but not both, and certainly no spaces). Avoid camel case (see previous comment on lower casing.)
I'm not aware of any published or advocated software-like metrics for web sites, but I would imagine that if there are, they might try to measure some of the attributes I mentioned above.

I suppose you could consider "hub scores" to be a complexity metric since it considers how many external sites are referenced. "Authoritative sources in a hyperlinked environment" by Jon Kleinberg discusses it.

This is a great paper on the topic.

Related

Is GWT still an option for a large business application [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
My company is planning on developing a brand new web front-end application.
Some background:
It must "sizzle" i.e. a nice marketable look and feel.
Our development team has no Java experience, with limited experience in Silverlight, Javascript, JQuery or CSS.
Time to market is a factor.
We need to stream large amounts of data from an Oracle database.
It must support 500 - 1000 concurrent users
It will be hosted internally behind a firewall.
We need mapping (geo-spatial) capabilities.
Someone has recommended using GWT instead of Silverlight or Traditional technologies(Javascript, jquery, CSS etc.).
I am not sure if this is the right way to go? A lot of the GWT news is from 2007/2008. It makes me think that this technology is old and maybe dying.
If you had a choice would you choose GWT?
unfortunately two of your statements are mutually exclusive in this context:
Our development team has no Java experience
Time to market is a factor
I'm a Java programmer who has picked up GWT over the last year or so. It's immensely effective being able to write direct to the browser using a compiled language & mature development tools. I can fly through web-development faster than ever before (using ASP, JSP, ExtJS ...).
But, as the other commenters have said: if you've no Java experience you're going to find it a real challenge picking up both technologies (Java & GWT) in a short time. If you do manage to make it to market in a reasonable time I could only imagine the code base would be in very poor condition (since you'd be learning as you go) - which would be a very poor foundation for your organisation's shiny new venture.
There again, you don't have a 'lot' of skills in the other related skills you listed either.
I suspect there's a more effective solution. As some wise old goat project manager said:
I have three variables to delivering your project: time, cost and quality. Pick any two
In your situation, if the organisation wants a quality product in a short time, it's the cost factor that must compensate - your organisation should buy in some interim GWT expertise to give you a sound software architecture and to mentor your team for the next few months. After that you'll be ready to take the reigns, inheriting a quality codebase by 'standing on the shoulders of giants'.
As others have said, GWT definitely is not a dying project. Quite the contrary actually as there are now more than 20 regular contributors from within Google (versus a semi-dozen back in 2008). Wave (despite being discontinued as a Google service, it's still alive as an Apache Foundation project), Orkut, AdWords, Google Moderator and the new (still beta) Google Groups are made with GWT; and parts of Google Buzz and a few other projects at Google are built with it too.
Now as to your choice:
Silverlight is a dying technology. Microsoft made it clear that it now invests in "HTML5": http://www.zdnet.com/blog/microsoft/microsoft-our-strategy-with-silverlight-has-shifted/7834
GWT is mostly a client-side toolkit, but it comes with "high productivity" tools for client-server communications (GWT-RPC and RequestFactory for end-to-end protocols, AutoBeans for easy JSON serialization). With UiBinder, you can easily put to use your web designer skills.
if you're comfortable with JS, then go for it, but then you'd have to choose the "right toolkit" (jQuery? Google Closure?). Otherwise (which seems to be the case), it really depends how much "ajaxy" you need/want to be. I'm a strong believer in "one-page apps", but YMMV, or you can have specific constraints that rule it out. In any case, you'd have to choose a server-side technology.
So, depending on your needs/wants and skills, I'd choose GWT or "some JS toolkit". In any case, you'll have full control over the look and feel (unless you choose one of the bloated players: ExtJS/ExtGWT, SmartGWT or similar; you'll probably have a shorter time-to-market with these, but you'll pay it later, in terms of performance, integration with other toolkits, and look-and-feel).
In the light of what you're saying about your skills, I would definitely recommend GWT (despite your lack of experience with Java); because lack of experience with JavaScript is far worse than lack of experience with Java (you're talking about a "large application", so it's really important to start building things right and/or have tools to help refactoring, which you'll have with Java).
#ianmayo replied while I was writing the above, and I can only second what he said!
GWT is definitely not old or dying! A lot of Google's own applications are developed using GWT. You can download the GBST case study and learn how the global financial company uses GWT to improve productivity and create a rich user experience. You have to know that when you use GWT you automatically use javascript, html, etc. You create a your gwt application in java, but when you compile it gwt creates a folder with html files, javascript code, css, etc...
I definitely recommend it!
In order not to mislead readers with above seemingly unanimous answers, keep objective view in respected stackoverflow, following review expressed exact experiences I had with using GWT. Whether GWT is dying depends on how many new apps will adopt it,Google trend can tell (gwt trend).
Excerpt from https://softwareengineering.stackexchange.com/questions/38441/when-not-to-use-google-web-toolkit
>
I am both good and bad to answer this question - good, in that I've actually used it before, and bad, in that I was quite experienced with HTML/CSS/JavaScript prior to working with GWT. This left me maddened by using GWT in a way that other Java developers who don't really know DHTML may not have been.
GWT does what it says - it abstracts JavaScript and to some degree HTML into Java. To many developers, this sounds brilliant. However, we know, as Jeff Atwood puts it, all abstractions are failed abstractions (worth a read if considering GWT). With GWT, this specifically introduces the following problems:
Using HTML in GWT sucks.
As I said it, to some degree, even abstracts away HTML. It sounds good to a Java developer. But it's not. HTML is a document markup format. If you wanted to create Java objects to define a document, you would not use document markup elements. It is maddeningly verbose. It is also not controlled enough. In HTML there is essentially one way to write
<p>Hello how are <b>you</b>?</p>
In GWT, you have 3 child nodes (text, B, text) attached to a P node. You can either create the P first, or create the child nodes first. One of the child nodes might be the return result of a function. After a few months of development with many developers, trying to decipher what your HTML document looks like by tracing your GWT code is a headache-inducing process.
In the end, the team decided that maybe using HTMLPanel for all HTML was the right way to go. Now, you've lost many of GWT's advantages of having elements readily available to Java code to bind easily for data.
Using CSS in GWT sucks.
By attachment to HTML abstraction, this means that the way you have to use CSS is also different. It might have improved since I last used GWT (about 9 months ago), but at the time, CSS support was a mess. Because of the way GWT makes you create HTML, you often have levels of nodes that you didn't know were injected (any CSS dev knows how this can dramatically affect rendering). There were too many ways to embed or link CSS, resulting in a confusing mess of namespaces. On top of that you had the sprite support, which again sounds nice, but actually mutated your CSS and we had problems with it writing properties which we then had to explicitly overwrite later, or in some cases, thwarted our attempts to match our hand-coded CSS and having to just redesign it in ways that GWT didn't screw it up.
Union of problems, intersection of benefits
Any languages is going to have it's own set of problems and benefits. Whether you use it is a weighted formula based on those. When you have an abstraction, what you get is a union of all the problems, and an intersection of the benefits. JavaScript has it's problems, and is commonly derided among server-side engineers, but it also has quite a few features that are helpful for rapid web development. Think closures, syntax shorthand, ad-hoc objects, all of the stuff done by Jquery (like DOM querying by CSS selector). Now forget about using it in GWT!
Separation of concerns
We all know that as the size of a project grows, having good separation of concerns is critical. One of the most important is the separation between display and processing. GWT made this really hard. Probably not impossible, but the team I was on never came up with a good solution, and even when we thought we had, we always had one leaking into the other.
Desktop != Web
As #Berin Loritsch posted in the comments, the model or mindset GWT is built for is living applications, where a program has a living display tightly coupled with a processing engine. This sounds good because that's what so many feel the web is lacking. But there are two problems: A) The web is built on HTTP and this is inherently different. As I mentioned above, the technologies built on HTTP - HTML, CSS, even resource-loading and caching (images, etc.), have been built for that platform. B) Java developers who have been working on the web do not easily switch to this desktop-application mindset. Architecture in this world is an entirely different discipline. Flex developers would probably be more suited to GWT than Java web developers.
In conclusion...
GWT is capable of producing quick-and-dirty AJAX applications quite easily using just Java. If quick-and-dirty doesn't sound like what you want, don't use it. The company I was working for was a company that cared a lot about the end product, and it's sense of polish, both visual and interactive, to the user. For us front-end developers, this meant that we needed to control HTML, CSS, and JavaScript in ways that made using GWT like trying to play the piano with boxing gloves on
First of all , GWT is not dying technology, its usage increases, and its latest version is 2.2. I am using GWT for 2 years, since version 1.6. Its improvements since them is quite amazing.
Since GWT is client side technology, it does have only positive effects of your application scaliblity feature. Because server side web technologies such as jsf, struts, wicket are server resource consumers, but gwt does not need any server resource to render user interface..
But there is problem for your team. Because your team has no java experience, it would be quite difficult to adapt yourself two new technologies java and gwt.. If you have time to learn , I would strongly suggest GWT.
It takes approx 1 year to become proficient in GWT. Using GWT pays off if you develop an application as sophisticated as MicrosoftOffice or PhotoShop. It makes no sense to use GWT for small and relatively simple apps, IMHO. GWT is a time killing framework indeed, and you have to have very strong reasons to use it. I think that 99% of web apps don't need GWT.
GWT is not dying framework, but time killing framework. It has security issue. You can do easily CSRF(Cross site request forgery) request to the GWT applications. Also Java and Javascript are totally different languages, you can't translate easily. For your productivity avoid GWT.

Enterprise Web Application Architecture Question

I am wondering if it would be possible to develop an enterprise-level web application without the use of a standard MVC structure and application server by carrying the business/flow logic and session data to the client-size Javascript and make it talk to REST data services directly...maybe we could make use of an authorization/authentication layer and a second validation layer sitting on top of the data services. All these services operate on standard HTTP methods, support configurable logging&monitoring, and content or query parameters are all contained in the HTTP request/response body. Static HTML and Javascript are served to the browser and the rest is carried out by Javascript functions talking to the HTTP-based authorization/authentication, validation and then data services. Do you think this kind of an architecture could satisfy enterprise-level web application requirements?
It's possible but unlikely; what are the drivers that suggest this architecture to you? Is it just to be different or are there some specific aspects that this best addresses?
by carrying the business/flow logic
and session data to the client-side
Javascript and make it talk to REST
data services directly
In theory you'd still be able to have an appropriately layered solution (Business Logic (BL) script vs UI focused script) but practically speaking it'd be messy, and you'd lose the ability to physically separate it into different tiers. This could "bite" you at any number of places in the life of the system.
"Enterprise" grade systems are seldom small, I hate to think how much logic you'd be having to send over the wire to support a given action/process.
Putting all the BL into a scripting language ties you to that platform, and platforms change over time. The bad thing about scripts is that whilst they are stable to a degree I'd suggest they are more exposed to change than server-based platforms like Java or .Net. In an enterprise scenario, the servers will have very tight change control and upgrade paths mapped out for them - whereas browsers are much more open to regular change.
There's the issue of compatibility - unless you're tied to a specific browser (to the version level) guaranteeing consistent behavior is going to be harder, and will likely require more development effort. Let's say you deliver the solution successfully; what do you do when the business wants to take advantage of mobile computing - say iPads? Your only option is going to be a browser - you won't be able to take advantage of any of the native advantages of the platform. "The web and browsers" might seem like they'll be around forever - but then I'm guessing that what MainFrame folks said at the time. A server-centered solution is going to give you more life for less expense.
Staffing will be an issue - you'll need very strong JavaScript and server-side developers.
Security: having your core BL out on the client where it's much more exposed sounds very dangerous.
EDIT:
Web Apps can be sow for many reasons - not many of which are reason enough to put all your BL in JavaScript on the client. Building apps for performance is a whole field of endeavor on its own - I suggest you get more familiar with Architecting and implementing for performance before you write off n-tier web apps altogether :)
Regarding keeping your layers separated: there are different ways of doing this but it boils down to abstraction - and more correctly to keeping good design principles in mind; if you haven't heard of SOLID that would be a good place to start. In terms of implementation start reading up on Dependency Inversion (FYI - self-promotion, the articles mine and is .Net focused, but you should have no problem tracking down Java-based ones too).

What criteria should I take into consideration when allowing users to enter rich text on my website?

I've worked with several different types of "user-generated content" sites: wikis, a message board, blogs... These systems can differ greatly: a blog post editor allows more control over presentation than that for comments on the blog post, a wiki topic editor encourages wiki links over raw URLs, etc.
However, one key design decision is common to each: should I provide the user with a simplified markup language such as Wikitext, Markdown or BBCode, forcing users to learn that, or should I give them a WYSIWYG editor like CKEdit or TinyMCE and filter or transform the resulting HTML behind the scenes?
There was a time when I thought this was a simple matter of identifying my intended audience: tech-minded users get markup, non-technical get WYSIWYG. In practice, this hasn't worked out all that well, occasional users struggling with markup and the WYSIWYG editors providing at best a leaky abstraction for the underlying HTML.
So with my initial confidence throughly crushed, I come looking for advice:
What factors should I be taking into account when making this decision?
Have simple markup systems become commonplace enough that I can rely on most users having at least some familiarity with them?
...Or should I abandon them as merely a relic of the past, and work on finding ways to make WYSIWYG work more effectively...?
I'm not looking to go back and tear apart what I've already done. For better or worse, these systems are working, their few users comfortable or at least competent by now. But it would be nice to have some better guidelines when putting together future designs.
One approach that seems to work fairly well is the use of Markdown as done here on SO. Stupid and/or lazy people (with apologies to all who are) can simply throw text in the box; it comes out looking as messy but it's mostly there and readable. People who care about how their text looks can do some simple things that are for the most part almost intuitive (like leaving blank lines between paragraphs, putting asterisks or numbers before list items) and it Just Works™
This is Good Enough™ for a lot of applications and people. Some of the glitzier sites, such as Google Blogs, give you your choice (changeable at the click of a button) of editing raw HTML or using a WYSIWYG editor (that fails just often enough that I usually opt for raw HTML). In theory, you could even give your users 3 alternatives, such as Markdown, HTML and WYSIWYG; but at some point you'll be wondering why you even bothered. Some users will always struggle with some aspect of the interface and they'll blame you. I believe in finding a happy medium and not bothering to make everybody happy.
From my point of view, the most important considerations are those of security. If you allow raw HTML, your users can insert spam and malware and basically hijack your site for their purposes; so you have to carefully limit what's allowed. Another consideration is that if you allow, e.g. H1 headers, people can take up a lot of space and attention with posts that should really be subordinate. If you allow CSS (including style attributes in HTML tags) then again there are ways to deface your "real" content. Another big problem stems from unclosed or unmatched tags. These are the really serious problems, and you want to err on the side of strictness to avoid more serious problems.
"What factors should I be taking into account when making this decision?" What do your customers want? Can you not have a 'fall back' kind of system where the 'simplified' WYSIWYG can be used until they need the added features of raw markup? What kinds of things do the most users use most often? What features are used less often but, when they are needed, you customers cannot live without?
"Have simple markup systems become commonplace enough that I can rely on most users having at least some familiarity with them?" For people using wikis and blogs, I think the answer is yes. Even commenters on blogs to some extent for the simplest things but again, I think you should let them do markup in-line if they are able (or some common sub set of markup) and have the option of more power if they need it.
"..Or should I abandon them as merely a relic of the past, and work on finding ways to make WYSIWYG work more effectively...?" I would not take this all on at once. Work from a solid kernel of functionality and work outward to a complete system.

Common Libraries at a Company

I've noticed in pretty much every company I've worked that they have a common library that is generally shared across a number of projects. More often than not this has been a single companyx-commons project that ends up as a dumping ground for common programs including:
Command Line Parsers
File Utilities
Framework Helpers
etc...
Some of these are well thought out and some duplicate functionality found in Apache commons-lang, commons-io etc..
What are the things you have in your common library and more importantly how do you structure the common libraries to make them easy to improve and incorporate across other projects?
In my experience, the single biggest factor in the success of a common library is user buy-in; users in this case being other developers; and culture of your workplace/team(s) will be a big factor.
Separate libraries (projects/assemblies if you're in .Net) for different application tiers is essential (e.g: there's obviously no point putting UI and data access code together).
Keep things as simple as possible; what you don't put in a common library is often at least as important as what you do. Users of the library won't want to have to think, so usage needs to be super easy.
The golden rule we stuck to was keeping individual functions focused on a single task - do one thing and do it well (or very very well); don't try and provide something that tries to take every possibility into account, the more reusable you think you're making it - the less likely it is to be used. Code Complete (the book) has some excellent content on common libraries.
A good approach to setting/improving a library up is to do regular code reviews and retrospectives; find good candidates that you've already come up with and consider re-factoring them into a library for future projects; a good candidate will be something that more than one developer has had to do on more that one project (for example).
Set-up some sort of simple and clear governance of the libraries - someone who can 'own' a specific library and ensure it's overal quality (such as a senior dev or team lead).
I have so far written most of the common libraries we use at our office.
We have certain button classes that are just slightly more useful to us than the standard buttons
A database management class that does some internal caching and can connect to ODBC, OLEDB, SQL, and Access databases without even the flip of a parameter
Some grid and list controls that are multi threaded so we can add large amounts of data to them without the program slowing and without having to write all the multithreading code every time there is a performance issue with a list box/combo box.
These classes make it easier for all of us to work on each other's code and know how exactly they work since we all use the exact same interfaces throughout our products.
As far as organization goes, all of the DLL's are stored along with their source code on a shared development drive in the office that we all have access to. (We're a pretty small shop)
We split our libraries by function.
Commmon.Ui.dll has base classes for ui elements.
Common.Data.Dll is sort of a wrapper around Enterprise library Data access classes.
Common.Business is a dumping ground for other common classes that don't fit into one of those.
We create other specialized dlls as needs arise.

Developing an asset/node based CMS [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I'd like to develop a CMS for fun/personal using asset-based architecture rather than page-based (why, is the purpose of this question), but I can't find much information on the subject. All I've found barely scrapes the surface (there's a good chance I'm searching with the wrong terms).
An asset-based CMS stores information
as blocks of text called assets. These
individual assets are then related to
each other to automatically build
pages.
What are the (dis/)advantages of such a system?
What are the primary principles of asset-based architecture?
What should and shouldn't be an 'asset'? Where can I read more?
Decided to try to answer this after leaving my comment :)
If your definition of "asset" is along the lines of a "node" (such as in Drupal), or a document (such as the JSON-style documents in MongoDB or CouchDB), then here is some info:
I'll use the term "node" for this post. I think it's closest to "asset" and more popularly used. This also might be a very abstract answer, but hopefully it will at least get you thinking and pointed in the right direction.
Node-based architecture, could be described as a cross between neural networking patterns and object-oriented programming. The key is that "nodes" are points of data, and nodes can be connected to each other in some way.
Some architectures will treat nodes much like object-oriented classes, where you have different classes of nodes that can inherit various characteristics of parent nodes - every type of node inherits the basic properties of its parent - an "Essay" node might inherit the properties of a "Text-Document" node, which in turn inherits the properties of the base node. Drupal implements this inheritance model well, although it does not emphasize the connections between nodes in the way that something like Facebook's GraphAPI/Open Graph Protocol does.
This pattern of node-based architecture can be implemented at any level too, and exists in nature - think of social circles within society or ecosystems ;) On a software engineering level, it can take the form of a database, such as how MongoDB simply has nodes of data (which are called documents in that case). These documents can reference other documents, although, like Drupal, Mongo does not emphasize connectedness. Ironically, relational databases like MySQL that are the opposite of document-based databases actually emphasize connectedness more, but that's a discussion for another day. Facebook's GraphAPI that I mentioned above is implemented on a Web-API level. The Open Graph Protocol shapes it. And again, something like Drupal is implemented at the front-end level (although its back-end implements the node pattern on a lower level, of course).
Lastly, node-based architecture is much more flexible than traditional document/page based CMS architecture, but that also means there is a lot more programming and configuring to be done on the side of the developers. A node-based system will end up being far more inter-connected and its components will be integrated with one-another a deeper level, but it can also be more susceptible to breaking because of this deep level of connection - it is less than separated into individual modules. Personally, I see a huge trend where people are moving to become more "node-based" and less "content-based" as people begin to interact with websites more like applications than as electronic magazines as they did in the 90's. Plus, the node-pattern fits well with the increasing emphasis on user-contribution and social browsing because adding people and their accounts/profiles to a web site dramatically increases the complexity.
I know you said "asset," so I'll also say that asset emphasizes the data side of the node pattern more, whereas "node" emphasizes the connections between the pieces of data more.
But for further reading, I'd recommend reading up on the architecture of the software I mentioned. You could also check out node.js, JSON, and document-based databases, and GraphAPI's as they seem to fit well with this idea of asset/node-based architecture. I'm sure Wikipedia has some good stuff on these patterns as well.
You could very quickly scale this up using the CakePHP framework. It uses an MVC pattern and it provides classes called elements that may be inserted into layouts and can load whatever content you want based on the page, user, moon phase, etc.
<page>
<element calls methodX>
<element calls methodY>
<Default Content relies on Controller Action(view/edit/add/custom)>
<element calls methodZ>
</page>
I think you might be describing a CMS backed up by a content repository.
The repository itself is implemented by Apache Jackrabbit based on JSR 170:
The API should be a standard, implementation independent, way to access content bi-directionally on a granular level within a content repository. A Content Repository is a high-level information management system that is a superset of traditional data repositories. A content repository implements "content services" such as: author based versioning, full textual searching, fine grained access control, content categorization and content event monitoring. It is these "content services" that differentiate a Content Repository from a Data Repository.
For a CMS working on top a content repository, look at Nuxeo.