What NoSQL database is good for developing a Wiki? - mongodb

What NoSQL database do you recommend for developing a Wiki-like application?
I need documents to have many sub-section texts, and each can be versioned controlled, and yet normalized.
Think of a Wikipedia page. It has many sections, and being a Wiki, it has version control for the document. However, I do not want a new document to be created (or the document being entirely duplicated) everytime a paragraph is changed. I only want that particular paragraph (or section) to have a new version, so it won't waste space on storage.
Any recommendation on the database or the design strategy?

Currently no NoSQL database provides what you want here (as far as I know). The closest to what you want is the CouchDB which keeps document revision history on every update. The disk space is cheap so generally it's not a problem.
But if versioning is the key to your business and one of the business requirements you should choose a tool that is built specifically to solve this problem - Git. Git does exactly what you want and does a lot of heavy lifting for your wiki app (like version diffs, easy blame in other words who did what changes, has hooks etc.).
A great example is GitHub wiki pages. Their wiki engine built on git (Gollum) is open-sourced.
To conclude, here are your options:
use git
CouchDB that does revision tracking for you, but as far as I
know saves a copy of the document
implement the revision logic in
your app, any NoSQL db would fit nicely.

Related

Branch and Merge abilities in a Document Database?

When I think of a document database, I think of a bunch of JSON files. (I imagine it is more complex than that, but that is how envision it.)
In an upcoming project, we need the ability to deal with multiple different versions of the data. As I got to looking at the needs, they are very similar to the needs that drive branching and merging of code. (Versions of the data moving through a process, emergency updates to the existing data in prod even though there are active versions being worked on, etc)
This has me wondering, do any of the popular document databases have features that are similar to branching and merging of documents? (I tried searching around, but I could not get any relevant results.)
RavenDB has great Revisions and Patching features.
With Revisions you can keep track of your documents history
https://ravendb.net/docs/article-page/4.2/Csharp/server/extensions/revisions
https://ravendb.net/learn/inside-ravendb-book/reader/4.0/4-deep-dive-into-the-ravendb-client-api#document-revisions
With Patching you can update existing data in production
https://ravendb.net/docs/article-page/4.2/Csharp/client-api/operations/patching/single-document
https://ravendb.net/learn/inside-ravendb-book/reader/4.0/2-zero-to-ravendb#patching-documents

What is the most useful way to represent a coding standard?

We currently keep our coding standard in a MSWord document under SVN.
As our standards grow / change, it's becoming an increasingly clunky beast to maintain.
Most entries currently consist of:
A succinct explanation of the guideline.
Reasoning behind the guideline.
Any extra notes.
Examples of what you should do.
Examples of what you should not to.
At the moment we use track changes within the document to keep track of pending suggestions / corrects which are periodically reviewed and then accepted / rejected.
Is there a de-facto good way of tackling maintaining a document like this?
A repository at GitHub would serve well. See example: https://github.com/airbnb/javascript - you can have discussions, track changes, accept/reject pull requests, etc.
Also it would help if you use auto-formatting tools plugged into your build process like https://golang.org/cmd/gofmt/ or https://github.com/thoughtbot/hound
I suggest you use plain text file (or HTML / some other markup file if you need some fancy formatting) under some version control system. We used Word's features for versioning and I like what Git offers much much more.
GITHUB: As an organization, if you maintain a private Github repository (not opensource, but leverage Github's strengths to maintain repository, allow distributed coding accessible to individuals within organization), you could upload your Coding Standards document to a Github repository, maintain a markdown document, which could have reviews/pull requests etc, as mentioned by Alex above
REVIEWBOARD: If your organization does not have a private Github repository, then I suggest you could choose this option, if your organization is performing code reviews through review board. ReviewBoard allows to review code by peers, maintain data of the different reviews, whether addressed, whether the version is allowed to be shipped etc. So, you could avail this feature of review board to review Coding Standards document. ReviewBoard has a feature of reviewing PDF documents. So, I guess by this option, you are maintaining a repository for CodingStandards document as well as providing an option of reviewing PDF document, which is tracked by ReviewBoard application.
Hope it helped. I guess there might be many other ways in which many companies might be doing.

do any source-control systems use a document database for storage?

One of those questions that's difficult to google.
We were running into issues the other day with speed of our svn repository. The standard solution to this seems to be "more RAM! more CPU!" etc. Which got me to wondering, are there any source-control systems that use a document/nosql database (mongodb, couchdb etc) for database? It seems like it might be a natural -- but I'm no expert on source-control database theory. Perhaps there's a way to configure a more recent source control to use a document db as storage?
None that I know of do, and they wouldn't want to. Given the difference in degrees of testing, it would likely hurt robustness (a really bad thing for a source code repository). It would probably also end up hurting performance, because of the inability to do delta storage.
Note that Subversion has two very different storage mechanisms, one backed by the embedded Berkeley DB, and the other backed by simple files. One or the other of these might be better suited to your usage.
Also, since you posed your question pretty broadly, I'll comment on Git and TFS.
Git uses very efficiently packed files in the filesystem to store the repository. Frequently, the entire history is smaller than a checkout. For one very old project that my lab has, the entire history is 57MiB, and a working tree (not counting history) is 56MiB.
TFS stores a lot (possibly all) of its data in a SQL database.
Git uses memory-mapped files just like MongoDB :)
Though Git doesn't actually use MongoDB and I don't think it would want to. If you look at Git, it doesn't really need a NoSQL DB, it basically is a DB.
As far as i know no of the VCS uses noSQL/document based databases. The idea of using a couchdb etc. is not new...but no one has implemented such a thing till now...

How do you CM an application with managed content

We have a web application which contains a bunch of content that the system operator can change (e.g. news and events). Occasionally we publish new versions of the software. The software is being tagged and stored in subversion. However, I'm a bit torn on how to best version control the content that may be changed independently. What are some mechanisms that people use to make sure that content is stored and versioned in a way that the site can be recreated or at the very least version controlled?
When you identify two set of files which have their own life cycle (software files on one side, "news and events" on the other, you know that:
you can not versionned them together at the same time
you should not put the same label
You need to save the "news and event" files separatly (either in the VCS or in a DB like Ian Jacobs suggests, or in a CMS - Content Management system), and find a way to link the tow together (an id, a timestamp, a meta-label, ...)
Do not forget you are not only talking about two different set of files in term of life cycle, but also about different set of files in term of their very natures:
Consider the terminology introduced in this SO question "Is asset management a superset of source control" by S.Lott
software files: Infrastructure information, that is "representing the processing of the enterprise information asset". Your code is part of that asset and is managed by a VCS (Version Control System), as part of the Configuration management discipline.
"news and events": Enterprise Information, that is data (not processing); this is often split between Content Managers and Relational Databases.
So not everything should end up in Subversion.
Keep everything in the DB, and give every transaction to the DB a timestamp. that way you can keep standard DB backups and load the site content at whatever date you want if the worst happens.
I suppose part of the answer depends on what CMS you're using, and how your web app is designed, but in general, I'd regard data such as news items or events as "content". In other words, it's not part of your application - it's the data which your application processes.
Of course, there will be versioning issues between your CMS code and your application code. You could manage this by defining the interface between the two. Personally, I'd publish the data to the web app as XML, which gives you the possibility of using XML schema to define exactly what the CMS is required to produce, and what the web app should expect to process.
This ought to mean that most changes in the web app can be made without a corresponding alteration in the rendering of the data. When functionality changes require this, you can create a new version of the schema and continue to make progress. In this scenario, I'd check the schema in with the web app code, but YMMV.
It isn't easy, and it gets more complicated again if you need additional data fields in your CMS. Expect to plan for a fairly complex release process (also depending on how complex your Dev-Test-Acceptance-Production scenario is.)
If you aren't using a CMS, then you should consider it. (Of course, if the operation is very small, it may still fall into the category where doing it by hand is acceptable.) Simply putting raw data into a versioning system doesn't solve the problem - you need to be able to control the format in which your data is published to the web app. Almost certainly this format should be something intended for consumption by software, and therefore not usually suitable for hand-editing by the kind of people who write news items or events.

Version control for version control?

I was overseeing branching and merging throughout the last release at my company, and a number of times had to modify our Subversion pre-commit hooks to enforce different requirements on check-in comments and such. I was a bit nervous every time I was editing those files, because (a) they're part of a live production system, albeit only used internally (and we're not a huge organization), and (b) they're not under version control themselves.
I'm curious what sort of fail-safes people have in place on their version control infrastructure. Daily backups? "Meta" version control? I suppose the former is in place here as part of the backup of the whole repository. And the latter would be useful as the complexity of check-in requirements grows...
Natch - the version-control and any other infrastructure code is also under version-control but I would use a separate project from any development project.
I prefer a searchable wiki or similar knowledge-base repository to clogging up your bug-tracking system with things like VCS config.
Most importantly, make sure that the documentation is kept up to date - in my experience, people are vastly better at keeping code docs up to date than admin docs. This may have been the individuals concerned . One thing that is often overlooked is, if systems are configured according to standard Unix Practices or similar philosophy, that implies a body of knowledge about locations that may not be familiar to an OS/X or Windows programmer, faced with suddenly fixing a broken script. Without being condescending, make sure basic assumptions about location and interdependency are documented.
You should document all "setup" configuration for all your tools and these documents should be checked into version control. For tools with text file configurations which allow comments, you could just checkin the config file. But for tools that require using the interface, you should have a full document with images of the dialog boxes showing what choices are chosen.
Most importantly though, these documents should say WHY you have set the values chosen (when not taking the default).
Second, as backup, the same documents should be included in your bug tracking software under a "How do I setup the version control software?" bug. (The bug tracking database is located on a different physical server, right?)
Third, all of this should be backed-up off-site. I'm sure there question on SO about backup strategies.
What's wrong with using the same version control repository for the commit hooks and other configuration files? That's how I've handled it in the past when I've been responsible for a project's configuration management.
You should also back up your svn repository. That way if the repository itself becomes corrupted or the server catches fire or something, you can recover both your project and the svn control files.
If you have build scripts that are doing this (such as Nant) then you could be checking in those.