Two questions about OpenStack Swift ring function - openstack-swift

I'm new in Swift and I'm trying to learn its functionality. I have two question for you regarding the ring and the consistent hashing algorithm.
When we want to store an object, we take its path (for example ".../v1/account_name/container_name/object_name.ext"), feed the MD5 hash function with this path, then we obtain an hash value. From this hash value we take the first n bits, where n is the part-power, and use those bits to obtain the partition number. Now, if we access to the ring using the partition number, we can discover in which node that partition is and store the object in this way.
First question: what if that partition is full?
Suppose now that swift stores the object in the correct node, the second question is: how swift decides where storing the replicas?
Thank you all!

how swift decides where storing the replicas?
When you create a ring informing all nodes and all disks you have for your cluster, it automatically defines where each copy should be and also which handoff nodes to use in case of a failure. So, when you ask the ring where to find/store an object with the hash ABC123DEF... it will answer you something like:
Look at here:
SERVER1/DISK2/PATH/TO/FILE
SERVER2/DISK4/PATH/TO/FILE
SERVER4/DISK1/PATH/TO/FILE
And if you don't find, look at here.
Handoff: SERVER2/DISK2/PATH/TO/FILE
Handoff: SERVER8/DISK7/PATH/TO/FILE
Handoff: SERVER3/DISK1/PATH/TO/FILE

Related

Determine Remaining Bytes

I'm working on a project where I need to send a value between two pieces of hardware using CoDeSys. The comms system in use is CAN and is only capable of transmitting in Bytes, making the maximum value 255.
I need to send a value higher than 255, I'm capable of splitting this over more than one byte and reconstructing it on the receiving machine to get the original value.
I'm thinking I can divide the REAL value by 255 and if the result is over 1 then deconstruct the value in to one byte holding the remainders and one byte holding the amount of 255's in the whole number.
For example 355 would amount to one byte of 100 and another of 1.
Whilst I can describe this, I'm having a really hard time figuring out how to actually write this in logic.
Can anyone help here?
This is all handled for you in CoDeSys if I understand you correctly.
1. CAN - Yes it's in byte but you must not be using CANopen you are using the low level FB that ask you to send a CAN frame of an 8 byte array?
If it is your own two custom controllers ( you are programming both of them in CoDeSys) just use netvariables. Netvariables allows you to transfer any type of variable and you can take the variable list from one controller and import it to another controller and all the data will show up. You don't have to do any variable manipulation it's handle under the hood for you. But I don't know the specifics of your system and what you are trying to do.
If you are trying to deconstruct construct variables from one size to another that is easy and I can share that code with you.

consistent hashing on Multiple machines

I've read the article: http://n00tc0d3r.blogspot.com/ about the idea for consistent hashing, but I'm confused about the method on multiple machines.
The basic process is:
Insert
Hash an input long url into a single integer;
Locate a server on the ring and store the key--longUrl on the server;
Compute the shorten url using base conversion (from 10-base to 62-base) and return it to the user.(How does this step work? In a single machine, there is a auto-increased id to calculate for shorten url, but what is the value to calculate for shorten url on multiple machines? There is no auto-increased id.)
Retrieve
Convert the shorten url back to the key using base conversion (from 62-base to 10-base);
Locate the server containing that key and return the longUrl. (And how can we locate the server containing the key?)
I don't see any clear answer on that page for how the author intended it. I think this is basically an exercise for the reader. Here's some ideas:
Implement it as described, with hash-table style collision resolution. That is, when creating the URL, if it already matches something, deal with that in some way. Rehashing or arithmetic transformation (eg, add 1) are both possibilities. This means, naively, a theoretical worst case of having to hit a server n times trying to find an available key.
There's a lot of ways to take that basic idea and smarten it, eg, just search for another available key on the same server, eg, by rehashing iteratively until you find one that's on the server.
Allow servers to talk to each other, and coordinate on the autoincrement id.
This is probably not a great solution, but it might work well in some situations: give each server (or set of servers) separate namespace, eg, the first 16 bits selects a server. On creation, randomly choose one. Then you just need to figure out how you want that namespace to map. The namespaces only really matter for who is allowed to create what IDs, so if you want to add nodes or rebalance later, it is no big deal.
Let me know if you want more elaboration. I think there's a lot of ways that this one could go. It is annoying that the author didn't elaborate on this point; my experience with these sorts of algorithms is that collision resolution and similar problems tend to be at the very heart of a practical implementation of a distributed system.

How do I model a queue on top of a key-value store efficiently?

Supposed I have a key-value database, and I need to build a queue on top of it. How could I achieve this without getting a bad performance?
One idea might be to store the queue inside an array, and simply store the array using a fixed key. This is a quite simple implementation, but is very slow, as for every read or write access the complete array must be loaded / saved.
I could also implement a linked list, with random keys, and there is one fixed key which acts as starting point to element 1. Depending on if I prefer a fast read or a fast write access, I could let point the fixed element to the first or the last entry in the queue (so I have to travel it forward / backward).
Or, to proceed with that - I could also have two fixed pointers: One for the first, on for the last item.
Any other suggestions on how to do this effectively?
Initially, key-value structure is extremely similar to the original memory storage where the physical address in computer memory plays as the key. So any type of data structure could be modeled upon key-value storage surely, including linked list.
Originally, a linked list is a list of nodes including the index information of previous node or following node. Then the node it self should also be viewed as a sub key-value structure. With additional prefix to the key, the information in the node could be separately stored in a flat table of key-value pairs.
To proceed with that, special suffix to the key could also make it possible to get rid of redundant pointer information. This pretend list might look something like this:
pilot-last-index: 5
pilot-0: Rei Ayanami
pilot-1: Shinji Ikari
pilot-2: Soryu Asuka Langley
pilot-3: Touji Suzuhara
pilot-5: Makinami Mari
The corresponding algrithm is also imaginable, I think. If you could have a daemon thread for manipulation these keys, pilot-5 could be renamed as pilot-4 in the above example. Even though, it is not allowed to have additional thread in some special situation, the result of the queue it self is not affected. Just some overhead would exist for the break point in sequence.
However which of the two above should be applied is the problem of balance between the cost of storage space or the overhead of CPU time.
The thread safe is exactly a problem however an ancient problem. Just like the class implementing the interface of ConcurrentMap in JDK, Atomic operation on key-value data is also provided perfectly. There are similar methods featured in some key-value middleware, like memcached, as well, which could make you update key or value separately and thread safely. However these implementation is the algrithm problem rather than the key-value structure it self.
I think it depends on the kind of queue you want to implement, and no solution will be perfect because a key-value store is not the right data structure for this kind of task. There will be always some kind of hack involved.
For a simple first in first out queue you could use a few kev-value stores like the folliwing:
{
oldestIndex:5,
newestIndex:10
}
In this example there would be 6 items in the Queue (5,6,7,8,9,10). Item 0 to 4 are already done whereas there is no Item 11 or so for now. The producer worker would increment newestIndex and save his item under the key 11. The consumer takes the item under the key 5 and increments oldestIndex.
Note that this approach can lead to problems if you have multiple consumer/producers and if the queue is never empty so you cant reset the index.
But the multithreading problem is also true for linked lists etc.

Using the HiLoIdGenerator in NoRM for MongoDB to create a unique identifier

I have been struggling a little with the HiLoIdGenerator that comes with NoRM (http://normproject.org/); I want to use it to generate a unique identifier that I can use as a SLUG for my blog posts. At present I use the ObjectId to uniquely identify a document within MongoDB, but as this is GUID-like and it doesn't look very good in a URL, I would prefer to have something like www.myblog.com/posts/1243 and so this is why I have decided to use the HiLoIdGenerator.
I would like to generate my HiLo id's on the client-side and I read on stuart harris' blog http://red-badger.com/Blog/post/A-simple-IRepository3cT3e-implementation-for-MongoDB-and-NoRM.aspx that NoRM's new HiLo Id generator also allows this by allocating a range of integers to the client session that can be used with impunity (other clients will be using a different range) but when i opened the HiLoIdGenerator it said that the HiLoIdGenerator Class that generates a new identity value using the HILO algorithm. Only one instance of this class should be used in your project.
I really have three questions:
1) if I had multiple instances of the HiLoIdGenerator in my application (say I had an instance in my service class that called GenerateId for every new document) could I actually guarantee that all of my id's would be unique, given that the code for the HiLoIdGenerator class says that there should only be a single instance of this class in an application?
2) the HiLoIdGenerator constructor takes a capacity argument, and I would like to know what it does, I passed 0 and all of the generated Id's were the same, I then passed in 1 new HiLoIdGenerator(1) the Id's began at 1 and were incremented by 1; I don't really understand what it does but I am presuming that it has something to do with a range of potential values that the generator can generate, but I am not sure, and I would like to be. Could someone please explain this argument?
3) I think I understand the aim of the HiLo algorithm as explained here What's the Hi/Lo algorithm? but what I don't understand is whether I can have two instances of MongoDB with two different applications each looking at a different instance of a MongoDB but both containing the same collection types, whether generated id's are globally unique, i.e., could I use them the way I would a GUID, or are they simply unique within a given instance of MongoDB, therefore precluding a merge of both collections into a single instance of MongoDB at a later date?
thanks
See here for how to produce monotonically increasing ids:
http://www.mongodb.org/display/DOCS/Atomic+Operations#AtomicOperations-%22InsertifNotPresent%22
Yes they would be unique, each client (HiLoGenerator) would request a range of lo's that could be allocated but they would only be unique if they both used the same capacity
Capacity is the number of Id's that the client can assign with impunity, again if you have a different capacity amongst clients you have the potential to create non-unique values, if you are using monotonically increasing Id's you are only ever assigning a single sequential value, you do not need the HiLo algorithm, you just need a single place that contains a value that you can increment and assign to a new entity, see dm's answer for an implementation of this
Yes as long as both clients are both using the same collection that holds the Hi value, and as long as both clients use the same capcity for generating the lo's

How can I implement incr/decr on top of a key/value store?

How can I implement incr/decr on top of a key/value store?
I'm using a key value store that doesn't support incr and decr though which is why I want to create this. I have used Redis and Memcached incr and decr, so as mentioned in some of the answers then this is a perfect example of how I want the incr and decr to behave, so thanks to those who mentioned this.
The point of having a incr() function is it's all internal to the store. You don't have to pull data out and push it back in.
What you're doing sounds like you want to put some logic in your code that pulls the data out, increments it and pushes it back in... While it's not very hard (I think I've just described how you'd do it), it does defeat the point somewhat.
To get the benefit you'd need to change the source of your key store. Might be easy.
But a lot of caches already have this. If you really need this for speed, perhaps you should find an alternate store like memcached that does support it.
Memcache has this functionality built in
edit: it looks like you're not going to get an atomic update without updating the source, as there doesn't appear to be a lock function. If there is (and this is not pretty), you can lock the value, get it, increment it in your application, put it, and unlock it. Suboptimal though.
it kind of seems like without a compareAndSet then you are out of luck. But it will help to consider the problem from another angle. For example, if you were implementing an atomic counter that shows the number of upvotes for a question, then one way would be to have a "table" per question and to put a +1 for each upvote and -1 for each downvote. Then to "get" you would sum the "table". For this to work I assume "tables" are inexpensive and you don't care how long "get" takes to compute, you only mentioned incr/decr.
If you wish to atomically increment or decrement an int value associated with a key of e.g. type string, and if you'll know all of the keys in advance of having to perform the atomic operations on any of them, use Dictionary<string, int[]> and pre-populate the dictionary with a single-item array for each key value. It will then be possible to perform atomic operations (e.g. increment) on items via code like Threading.Interlocked.Increment(MyDict[keyString][0]);. If you need to be able to deal with keys that are not known in advance, you may need to use a ConcurrentDictionary instead of Dictionary, but you need to be careful if two threads try to simultaneously create dictionary entries for the same key.
Since increment and decrement are simple addition and subtraction operations that are "commutative", what you need to implement is a PN-Counter. It is a CRDT (commutative replicated data type). Various examples of how to implement this on Riak are available around the web and on Github.