I was reading the Google Bigtable paper and stumbled upon this word chunk server used in Google File System. So I wanted to know is there any difference in chunk server and simply a server.
Thanks in advance!
The Wikipedia article about the Google File System explains the role of the Chunk Server in detail. The short answer is, no, there's not really a difference.
Chunk Servers are/(were) just like any other servers, but simply taking on the specific role of serving fixed size chunks or blobs of data. The servers were provisioned, configured and optimized accordingly:
https://en.wikipedia.org/wiki/Google_File_System
Related
We want to take advantage of the No-Sql Databases in our applications, and we found out about Couchbase.
I've read about it on another stack question, where somebody says that you can configure Couchbase to work with Memcached only (so it saves data only on memory, not on disks also).
However, i haven't found anything about this in the documentation.
Is it possible to setup Couchbase server to work with RAM memory?
Or, you specify on the client side where the data should be saved? (disk or memory)
Yes, Just use memcached buckets. That's all
Check out: http://www.couchbase.com/docs/couchbase-manual-2.0/couchbase-introduction-basics.html
Couchbase 2.0 documentation explicitly states that it's an in-memory database. From my experience the buckets all exist in RAM. You can set the size of every bucket to partition your RAM appropriately.
I need my client to download 30Mb worth of files.
Following is the setup.
They are comprised of 3000 small files.
They are downloaded through tcp bsd socket.
They are stored in client's DB as they get downloaded.
Server can store all necessary files in memory.(no file access on server side)
I've not seen many cases where client downloads such large number of files which I suspect due to server side's file access.
I'm also worried if multiplexer(select/epoll) will be overwhelmed by excessive network request handling.(Do I need to worry about this?)
With the above suspicions, I zipped up 3000 files to 30 files.
(overall size doesn't change much because the files are already compressed files(png))
Test shows,
3000 files downloading is 25% faster than 30files downloading & unzipping.
I suspect it's because client device's is unable to download while unzipping & inserting into DB, I'm testing on handheld devices.. iPhone..
(I've threaded unzipping+DB operation separate from networking code, but DB operation seems to take over the whole system. I profiled a bit, and unzipping doesn't take long, DB insertion does. On server-side, files are zipped and placed in memory beforehand.)
I'm contemplating on switching back to 3000 files downloading because it's faster for clients.
I wonder what other experienced network people will say over the two strategies,
1. many small data
2. small number of big data & unzipping.
EDIT
For experienced iphone developers, I'm threading out the DB operation using NSOperationQueue.
Does NSOperationQueue actually threads out well?
I'm very suspicious on its performance.
-- I tried posix thread, no significant difference..
I'm answering my own question.
It turned out that inserting many images into sqlite DB at once in a client takes long time, as a result, network packet in transit is not delivered to client fast enough.
http://www.sqlite.org/faq.html#q19
After I adopted the suggestion in the faq to speed up "many insert", it actually outperforms the "many files download individually strategy".
After searching about Cassandra a little bit, i wanted to learn about Membase NoSql. I downloaded it but i couldn't find a way to configure it. Is there anyone who can help me ??
In the current version (beta 2) there isn't much configuration to do. With the exception of a per-server memory quota, everything should be setup by default.
Once you install, you should be able to immediately talk "memcached" over port 11211 to the Membase server. Both ASCII and binary protocols are supported.
Beta 3 will introduce multiple buckets, the ability to set memory and disk quotas per bucket as well as the replica count per bucket.
Let me know if you have any other questions, you can also visit forums.northscale.com.
Thanks, take care.
Perry
I want to take a new streaming server for my website which generally holds videos and audio files. But how do we maintain backup of the streaming server if storage size is increasing day by day.
Generally Database server, like Sql Server, backup can be easily taken and restored very easily as it does not occupy much space for medium range application.
On the other hand how can we take backup of streaming server. If the server fails, the there should be an alternative server / solution that should decrease downtime of the server.
How the back-end architecture of YouTube built to handle this.
The backend architecture of YouTube probably uses Google's BigTable which stores objects redundantly over several different servers. If you are using a single server solution your only real options are backing up to an attached disk, backing up to another server or using an offsite storage system like Amazon S3 (which you could then use with their CDN to do basic HTTP streaming of content in the case of a failure).
I'm looking for ways to gather files from clients. These clients have our software and we are currently using FTP for gathering files from them. The files are collected from the client's database, encrypted and uploaded via FTP to our FTP server. The process is fraught with frustration and obstacles. The software is frequently blocked by common firewalls and often runs into difficulties with VPNs and NAT (switching to Passive instead of Active helps usually).
My question is, what other ideas do people have for getting files programmatically from clients in a reliable manner. Most of the files they are submitting are < 1 MB in size. However, one of them ranges up to 25 MB in size.
I'd considered HTTP POST, however, I'm concerned that a 25 mb file would often fail over a post (the web server timing out before the file could completely be uploaded).
Thoughts?
AndrewG
EDIT: We can use any common web technology. We're using a shared host, which may make central configuration changes difficult to make. I'm familiar with PHP from a common usage perspective... but not from a setup perspective (written lots of code, but not gotten into anything too heavy duty). Ruby on Rails is also possible... but I would be starting from scratch. Ideally... I'm looking for a "web" way of doing it as I'd like to eventually be ready to transition from installed code.
Research scp and rsync.
One option is to have something running in the browser which will break the upload into chunks which would hopefully make it more reliable. A control which does this would also give some feedback to the user as the upload progressed which you wouldn't get with a simple HTTP POST.
A quick Google found this free Java Applet which does just that. There will be lots of other free and pay for options that do the same thing
You probably mean a HTTP PUT. That should work like a charm. If you have a decent web server. But as far as I know it is not restartable.
FTP is the right choice (passive mode to get through the firewalls). Use an FTP server that supports Restartable transfers if you often face VPN connection breakdowns (Hotel networks are soooo crappy :-) ) trouble.
The FTP command that must be supported is REST.
From http://www.nsftools.com/tips/RawFTP.htm:
Syntax: REST position
Sets the point at which a file transfer should start; useful for resuming interrupted transfers. For nonstructured files, this is simply a decimal number. This command must immediately precede a data transfer command (RETR or STOR only); i.e. it must come after any PORT or PASV command.