Tips for building a site statistics backend? - visitor-statistic

I'd like to build site statistics that are a bit more substantial than just google analytics.
(lots more user specific logging, associated with their login, recording of ajax tasks)
Are relational databases ok?
From previous fiddlings I've found that databases seem to get pretty large pretty quick if trying to record lots of site statistics, but that may have been to my bad handling of the database..
Any languages/platforms that should be avoided?
Any good comprehensive site statistics software that should be looked at?
(Tried to look at Piwik, but their site seems to be down.)
Is this a bad idea? :)

Related

how is adoption of the activity stream standard (activitystrea.ms)

I am new to look at the activity standard. When i search on google, I quickly find there has the http://activitystrea.ms/ and in the first page, it said: The Activity Streams format has already been adopted by BBC, Gnip, Google Buzz Gowalla, IBM, MySpace, Opera, Socialcast, Superfeedr, TypePad, Windows Live, YIID, and many others.
I am not quite sure if it is still live and any other activity standard that much more popular in industry?
macf
Over at Fashiolista we've opensourced our approach to building feed systems.
https://github.com/tschellenbach/Feedly
We also use the activity stream standard and we're quite happy with it. As far as I know there are no other standards which have become mainstream. I do think that most companies slightly deviate from the standards.
In addition have a look at this high scalability post were we explain some of the design decisions involved:
http://highscalability.com/blog/2013/10/28/design-decisions-for-scaling-your-high-traffic-feeds.html
This tutorial will help you setup a system like Pinterest's feed using Redis. It's quite easy to get started with.
To learn more about feed design I highly recommend reading some of the articles which we based Feedly on:
Yahoo Research Paper
Twitter 2013 Redis based, with fallback
Cassandra at Instagram
Etsy feed scaling
Facebook history
Django project, with good naming conventions. (But database only)
http://activitystrea.ms/specs/atom/1.0/ (actor, verb, object, target)
Quora post on best practises
Quora scaling a social network feed
Redis ruby example
FriendFeed approach
Thoonk setup
Twitter's Approach

Intranet site Content Management

I'm currently designing my very first Website for a small business Intranet (5 pages). Can anyone recommend the best way to manage content for the Company News section? I don't really want to get involved in day to day content updates so something that would be simple for the Marketing guy to create and upload a simple news article, perhaps created in MS Word, lets assume the author has no html skills.
I've read about Content Management systems but,
A. I won't get any funding for purchase and
B. Think it's a bit overkill for a small 5 page internal website.
It's been an unexpected hurdle in my plans, for something that I'd assumed would be a fairly common functionality I can't seem to find any definitive articles to suit my needs.
I'm open to suggestions (even if it's confirmation that a CMS is the only way to go).
Your requirements are : small site, no budget and the need for it to be easy for the marketing guy to upload a news item.
My recommendation would be to go with an all in one CMS e.g wordpress which has the kind of functionality you're talking about out of the box.
My guess is this organisation is just getting into "intranets" so something quick and simple that can be used to justify expenditure if value is returned is the key. Perhaps look at a plugin that automatically emails a summary of the blog posts to all employees once a week would be useful ?
There are many options and you can use any one of these:
Joomla
SilverStripe CMS
ModX
Cushy CMS
Frog CMS
Drupal
Additional in what Mr. Mckinnon said, you must keep in mind that if you don't want to get involved in daily updates of the people who is going to use the platform, you should consider the following:
What kind of data you want to be displayed
Who can view/modify that data
Who can create/remove data
How you will be organizing all that data
Your intranet should not be limited to display or create data, eventually all that data can turn into a beautiful Knowledge Base (KB) for your company that eventually your coworkers can share their solutions to common and rare problems that company can present eventually. This KB is amazing and time-saving, it is recommended to start it as soon as possible, so newcomers to your Company have access to it and see the most common issues and they can enter into production asap (we all know time is a luxury in every company regarding size).
Just keep in mind too, that all that knowledge and data is beyond valuable to you and your coworkers, so you should also consider some additional login credentials so your Company System Administrator can manage those credentials and also eventual audit for unauthorized access (if applicable).
I hope this helps from the administrative point of view

When to use multiple DBMS

When is it a good idea to use more than one DBMS? What are the possible repercussions, and how do you decide when to do so?
I'm currently building an application which runs an analysis on our users' websites and stores it. This allows me to analyze all the data and give them analytics.
Since the data collected from each site is static and varies greatly from site to site, CouchDB seemed like a great fit. But in order to create this system, I'd need to build a user account system which couch is quite horrible at (reserving names, emails, etc has all sorts of problems).
My first thought was to use MySQL to handle the user accounts and CouchDB for the massive amounts of data. Essentially, trying to use a hammer for a nail and a screwdriver for a screw.
Is this a time when more than one DBMS is a good idea?
I don't see anything wrong with using MySQL for users accounts and CouchDB for crawled information.
For the users, you might even consider something simpler, like GDBM

Drupal 7 is unnaceptably slow

Top post update: "unacceptably slow means between 2 and 10 seconds to load the front page on a site with 12 beta testers, only one lgged in and no more than 20 articles posted, after applying the most popular "speed it up" fixes).
I am a newcomer to Drupal (although I have been a professional software developer for 30 years).
I am just setting up my first site, so am not committed yet and could switch.
Like many others on this forum and elsewhere, I find Drupal 7 to be unacceptably slow (which is a pity, because of the great features, but I guess that's what causes the slow load time).
I have done the research, google around, read blogs and forums and have tried all of the commonly suggested solutions, but to no avail.
I am currently polling my dozen or so beta testers on where the site is acceptable or just too darn slow, but that is just a formality.
So, can you please help? If I can't use Drupal 7 then what can I use?
The obvious answer might be Drupal 6 but sooner or later that will no longer be developed or supported.
Is there another CMS for my needs? I want to have a community site. That means, at a minimum, Forums, Polls, Groups, hopefully a wiki, individual blogs would be nice, maybe photo galleries, though that is less important, chat rooms would be good.
Just your general "bunch of folk with similar interests, although some of them have sub-interests & cliques" site.
I tried CMS matrix, but - surprisingly - didn't find anything. I am googling, but would prefer some feedback from someone with personal experience.
Again, I do not mean to slight Drupal 7, just to say that it's not for me … don't taze down-vote me, bro :-)
Please tell us what "unacceptably slow" means to you. For many of my applications, this is a few tens of ms. For others, it's a few seconds. You probably need to apply the standard set of website speed tuning tricks to make anything go quickly.
Use yslow (http://developer.yahoo.com/yslow/) and related developer tools to help you troubleshoot why the site is loading slowly. Usually these types of problems are not the backend's fault, but are related to issues like over-large images, too many individual elements on a page, incorrect caching, etc.
Make sure your database is fast.
Don't use shared hosting.
Make sure you are not serving oversized images.
Verify that caching is turned on and works the way you expect.
Use a cookie-less subdomain for the static media files.
Compress and combine static files like CSS, javascript, etc.
If you're trying to host a busy, complex site on $5/mo shared hosting, I'm sorry, but that just isn't going to fly.

Anyone have a link to a technical discussion of anything akin to the Facebook news feed system?

I'm looking for a presentation, PDF, blog post, or whitepaper discussing the technical details of how to filter down and display massive amounts of information for individual users in an intelligent (possibly machine learning) kind of way. I've had coworkers hear presentations on the Facebook news feed but I can't find anything published anywhere that goes into the dirty details. Searches seem to just turn up the controversy of the system. Maybe I'm not searching for the right keywords...
#AlexCuse I'm trying to build something similar to Facebook's system. I have large amounts of data and I need to filter it down to something manageable to present to the user. I cannot use another website due to the scale of what I've got to work at. Also I just want a technical discussion of how to implement it, not examples of people who have an implementation.
Are you looking for something along the lines of distributed pub/sub with content based filtering? If so, you may want to look into Siena and some of the associated papers such as Design and Evaluation of a Wide-Area Event Notification Service