Related
Jobrunr dashboard by default shows UUIDs in place of server names.
Is it possible to make this more human readable and/or customizable? For example, ip address, hostname etc.
Currently this is not possible, but feel free to create an feature request or (even better) a PR to add this functionality.
Update on 17/01/2023: this is now solved in JobRunr v6. Not only the background job servers will show the names but also inside the processing block a job.
I am working on a desktop application that requires synchronization between several clients. Basically, a group of people (let's say between 2 and 10) all run the same application. One of them hosts a server and the other clients connect to that server. The client that hosts the server also connects to his own server.
The applications should stay synchronized between all clients, meaning all clients see the same data in the application. Specifically, the data in question I can define in two separate forms:
A simple property with a certain value (this value must stay synchronized)
A list of properties (the items in the list and their values must stay synchronized)
Simple examples of (1) could be: which item in a list does the client currently have selected, and what's the current location of the client's mouse pointer within the application window. These properties keep changing continuously but the number of these properties is constant and does not grow (e.g. defined during design time).
An example of (2) could be a list of chat messages. These lists will grow during runtime with no way to predict how many items there will be.
Here is an example code in C# for the state, client and chat messages:
public class State
{
// A single value shared between all clients
public int SimpleInteger {get;set;}
// List of connected clients and their individual states
public List<Client> Clients {get;set;}
// List of chat messages
public List<ChatMessage> Messages {get;set;}
}
public class Client
{
public string ClientId {get;set;}
public string Username {get;set;}
public ClientState ClientState {get;set;}
}
public class ClientState
{
public string ClientId {get;set;}
public int SelectedIndex {get;set;}
public int MouseX {get;set;}
public int MouseY {get;set;}
}
public class ChatMessage
{
public string ClientId {get;set;}
public string Message {get;set;}
}
I've been working on this on and off for a long time but whatever kind of state synchronization I came up with, it never worked well.
When I search for solutions, I only ever find solutions for games, but those are not very helpful because my requirements are different:
I cannot deal with "dropped updates", I cannot predict (interpolate or extrapolate) what the other clients are doing. Every client needs to receive every update to stay in sync.
On the other hand, I don't care about lag (within reason). It is fine if I see the updates of other client with about a second delay.
When a new client connects (or reconnects), a large portion of the state must be transfered (for example: the list of chat messages from example 2). Each client is required to know about the entire history of the chat so this must be downloaded when a client connects.
My current solution can be summarized as follows:
The server keeps track of the state, e.g. the source of truth.
The state contains the properties that require synchronizing.
The state also contains a list of connected users (and their usernames etc).
Clients also each keep a local copy of the state, which they can act upon immediately. For example, they update their mouse position in their local state continously.
Whenever a client updates his local state, this update is sent to the server.
Potential exceptions here are things that change too fast such as the mouse position, those I will only send in regular intervals.
The server also updates the common "source of truth" state.
Finally, the server updates all other clients with the new updated state.
The last two steps are where I'm struggling. I can think of two methods to synchronize the state, one is easy but probably not efficient and the other is efficient but prone to errors.
The server simply sends the entire state to all clients.
As soon as the server receives an update from the client, the update is applied to the state and the new state is broadcasted. Every other client replaces their local state.
I feel this will probably work, but the state can grow in size quickly due to the "list" items (for example chat messages). In my previous attempts, this quickly became a problem and sending the state back become much too slow.
The server re-sends the same update (that it received) to all other clients.
Each client then only applies the new update to their state locally to sync back with the server.
This is probably much more efficient and sending the entire state is only necessary when a client connects.
However, in the past I frequently ran into desync issues where clients were no longer in sync. I don't really know what caused it, probably conflicts between messages (for example server telling the client to update a value in the state, but the client just updated his local value, which has precedence?). After this happens, everything went completely wrong as the updates are now being applied to two different states and have different outcomes.
I'm looking for some guidance on general concepts on how to achieve this. I'm using several messaging libraries to achieve the actual communication between client and server and that part is not an issue I think. I can make sure in these libraries that every message is received for example (though I'm not sure if the order is guaranteed). Like I said before, lag is not an issue, but I must guarantee every state update is received both by the server and by every other client.
Any help would be great! Thanks.
This is a hard problem and there are enough tricky areas that I wouldn't want to build this myself. Authentication, conflicting updates, API management, network outages, single point of failure, and local persistence come to mind.
If you're up for using a cloud-based solution, Google Cloud Firestore takes care of those tricky areas and does what you need:
Clients save data to the database, by creating, updating, or deleting records. Example code.
Whenever a record is created, updated, or deleted, all clients get realtime notifications. Example code.
(After you follow the links above, make sure you click C# above the code boxes to see the C# code).
This is a complicated issue, with many moving parts, as you seem to understand. As I've been researching this, I've read a couple comments on questions like this one on a variety of Q&A sites, stating this kind of thing is a project all on it's own.
Disclaimer: I haven't done this myself, so I don't know how well this would work, but maybe you can take my suggestions and work with them, if you haven't already done so. I've worked on projects where this was implemented, but I wasn't part of that implementation directly.
Connection
Since you haven't said which library you are using for the connection, I'm going to assume you are using websockets or something similar. If not, I suggest you move to something like websockets. It allows for a (near) constant connection between client and server so that data can be pushed both directions, avoiding the client from having to poll and pull the data. The link below seems to have a decent walk-though on how to do it, so I won't try to. Because links die, here's the first example code they give, which seems pretty simple.
using System.Net.Sockets;
using System.Net;
using System;
class Server {
public static void Main() {
TcpListener server = new TcpListener(IPAddress.Parse("127.0.0.1"), 80);
server.Start();
Console.WriteLine("Server has started on 127.0.0.1:80.{0}Waiting for a connection...", Environment.NewLine);
TcpClient client = server.AcceptTcpClient();
Console.WriteLine("A client connected.");
}
}
https://developer.mozilla.org/en-US/docs/Web/API/WebSockets_API/Writing_WebSocket_server
Client start up
Once you have a stable connection between server and client, you need to make sure the data is in sync. When the user starts the app, you can get the timestamp of the latest change in each table and compare that to the server. If they are exactly the same, you have a somewhat reasonable expectation that the table hasn't changed. I'm assuming each table has a column containing the timestamp for the last edit made to the row.
For the tables that have changed, you can have the server send the new and updated rows to the client based on the client's "last changed timestamp".
Since the internet isn't 100% guaranteed to be connected, you will also need to keep track of the times the client has been connected vs. when they've been on the app (unless the app just won't work without being connected to the server). This information also needs to be sent to the server to compare to data changed during intervals where the client hasn't been connected.
Once timestamp matching has been done, you need to compare the row counts. If they match, you can more reasonably assume the tables are the same. If they aren't, you can see about matching ID/primary keys. There's a variety of different ways to do this, including 1:1 matching (which is slowest but most reliable), or you can do some math with the IDs (assuming numerical IDs) and try to see what's different in batches of 100 rows (for example). Idea: If adding the sorted, auto-increment integer IDs for the first 100 rows is the same on the client and the server, all those rows exist on both servers, but if it doesn't match, you can try the 1:1 match to see what's missing. Because this can be lengthy for large databases, you may want to track this type of sync in another table, so it doesn't need to be done all the time.
Instead, you may want a table to track all the data not sent to a client. This would require a confirmation that the data sent was correctly inserted into the client DB. This could also work on the client side to track what hasn't been sent to the server. Of course, this kind of thing can get cumbersome quickly, even if you're just tracking keys, table names, and timestamps. You can rack up millions of rows quickly, if you don't remove old data periodically. This is why I suggest tracking unsent data, so that anything that becomes "sent" is no longer tracked by this table and removed.
If you don't want to code and manage all that, you can try for a library that does it. There are a variety out there. Even Microsoft has one, but it's on extended support to only 1/1/2021. What happens after that, I doubt even Microsoft knows, but it gets you 1.25 years to come up with a different solution.
Creating Synchronization Providers With The Sync Framework
The Sync Framework can be used to build apps that synchronize data from any data store using any protocol over a network. We'll show you how it works and get you started building a custom sync provider.
https://learn.microsoft.com/en-us/previous-versions/sql/synchronization/mt490616(v=msdn.10)
https://support.microsoft.com/en-us/lifecycle/search?alpha=Microsoft%20Sync%20Framework%202.1
Normal runtime
Once you have your data synced on startup (or in the background after startup), you can simply send the data to the server normally, as in when the user makes changes. Since you'll have a websocket type connection, any changes the server gets from other clients will be able to be pushed to all the other clients.
As far as changing the data in real time in your app, you may have to be constantly polling your local/client DB for timestamp changes so the UI can be appropriately updated. There may be something within C# that does this for you or another library you can find.
Conclusion
At this point, I'm out of ideas. It seems reasonable to me this would work, even though it's a lot of work. Hopefully you can take what I have and use it as a foundation to your own ideas on how to accomplish your task. It seems there's a lot of work ahead of you, so good luck!
Footnote
As I'm currently the only answer after several days of it being unanswered, I'm going to assume no one else has anything better to suggest. If they do, I'd encourage them to make their own answer instead of complaining about mine. People tweaking this answer is expected, but please remember community standards when making comments.
I'm only answering this because I haven't seen anyone else do it on this or other sites. It's only been bits and disconnected pieces here & there, with people still not being able to make sense of it as a whole.
This and similar questions have been asked before on this site and closed as "too broad". If you feel this same way as a reader, please vote so on the Question not this answer.
There are several solutions to your problem.
You could use a BizTalk server out-of-the box. This may not be what you have in mind.
If you want something more home-brewed, you could use WCF (Windows Communication Foundation) with MSMQ (Microsoft Message Queue). This would give you guaranteed message delivery, and durable messages (if you want). You would not have to worry about lost connections, and other errors occurring during messages transmission.
You can go down another level and use direct TCP and UDP protocols to transmit messages. But now, you have to take care of more error cases.
Any SQL DBMS implements one important part of your problem statement: it maintains shared state. Consider what ACID promises:
Consistency. At any one instant, all clients reading from the database are guaranteed to see the same information.
Atomicity. The client updating the database can use as many steps as needed. When the transaction is committed, the data are changed entirely or not at all.
Isolation. The server gives each client the illusion of interacting with it alone. It handles concurrent updates, and updates the database as though the updates arrived serially.
You may not care about durability for this application.
The mediation among the clients is, for my money, the most useful feature of the DBMS for your application. That will save you work, and headaches. Another, non-obvious, benefit is that it can enforce consistency rules for the state information; that can be remarkably useful to prevent an obsolete/corrupt client from munging the shared state.
The second part of your problem statement is notifying 2-10 clients of changed state. There are any number of ways to do that.
Some DBMSs can access OS services from triggers. You could have an update trigger issue a notification. Alternatively, the updating client could do that.
The actual notification mechanism could be quite simple. Clients could connect to a server (that you write) and block on read(2). The server itself listens on a port for update notifications. On receipt of one, it repeats it to all connected clients. When the client's read request returns, it's time to query the database for the updated state, and post a new read.
To prevent a kind of "thundering herd" problem when several updates arrive back-to-back, when a client reads the update message, it could keep reading updates until EWOULDBLOCK, and only then query the DBMS. OTOH, if it's important to see the intermediate states (to see every update, not just the current state), the DBMS is perfectly capable of storing and providing all versions and distinguishing them with a timestamp or serial number.
If you don't want to use TCP sockets directly, you might prefer ZeroMQ.
In this design, each client has three connections: the DBMS, the read-notify socket, and (maybe) the server-notify socket. The server has N+1 connections, for N clients and one listening socket. You have no locks to implement, very little tracking of participation, no problems re-synchronizing, and short windows inconsistency among clients as each one acts on its notification.
I have a website in which users would upload various and later access them.
The files are stored in a specific path in the server at this point. Now if I need to have multiple servers for the website, what is the best way to make the user uploaded files accessible across multiple servers. Amazon s3 is one option that has crossed my mind. What other options do I have?
First, you can try using a CDN (http://en.wikipedia.org/wiki/Content_delivery_network).
Also, you can make it in house, by having specialized servers setup for static content. You will need maybe a lookup server, to know for each file on what server can be found. It will also contain the logic to determine what is the best server to use to save the file. This is more complicated, as you will have to make the load balancing and take care of geographic location of users.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I have been working on a method to sync core data stored in an iPhone application between multiple devices, such as an iPad or a Mac. There are not many (if any at all) sync frameworks for use with Core Data on iOS. However, I have been thinking about the following concept:
A change is made to the local core data store, and the change is saved. (a) If the device is online, it tries to send the changeset to the server, including the device ID of the device which sent the changeset. (b) If the changeset does not reach the server, or if the device is not online, the app will add the change set to a queue to send when it does come online.
The server, sitting in the cloud, merges the specific change sets it receives with its master database.
After a change set (or a queue of change sets) is merged on the cloud server, the server pushes all of those change sets to the other devices registered with the server using some sort of polling system. (I thought to use Apple's Push services, but apparently according to the comments this is not a workable system.)
Is there anything fancy that I need to be thinking about? I have looked at REST frameworks such as ObjectiveResource, Core Resource, and RestfulCoreData. Of course, these are all working with Ruby on Rails, which I am not tied to, but it's a place to start. The main requirements I have for my solution are:
Any changes should be sent in the background without pausing the main thread.
It should use as little bandwidth as possible.
I have thought about a number of the challenges:
Making sure that the object IDs for the different data stores on different devices are attached on the server. That is to say, I will have a table of object IDs and device IDs, which are tied via a reference to the object stored in the database. I will have a record (DatabaseId [unique to this table], ObjectId [unique to the item in the whole database], Datafield1, Datafield2), the ObjectId field will reference another table, AllObjects: (ObjectId, DeviceId, DeviceObjectId). Then, when the device pushes up a change set, it will pass along the device Id and the objectId from the core data object in the local data store. Then my cloud server will check against the objectId and device Id in the AllObjects table, and find the record to change in the initial table.
All changes should be timestamped, so that they can be merged.
The device will have to poll the server, without using up too much battery.
The local devices will also need to update anything held in memory if/when changes are received from the server.
Is there anything else I am missing here? What kinds of frameworks should I look at to make this possible?
I've done something similar to what you're trying to do. Let me tell you what I've learned and how I did it.
I assume you have a one-to-one relationship between your Core Data object and the model (or db schema) on the server. You simply want to keep the server contents in sync with the clients, but clients can also modify and add data. If I got that right, then keep reading.
I added four fields to assist with synchronization:
sync_status - Add this field to your core data model only. It's used by the app to determine if you have a pending change on the item. I use the following codes: 0 means no changes, 1 means it's queued to be synchronized to the server, and 2 means it's a temporary object and can be purged.
is_deleted - Add this to the server and core data model. Delete event shouldn't actually delete a row from the database or from your client model because it leaves you with nothing to synchronize back. By having this simple boolean flag, you can set is_deleted to 1, synchronize it, and everyone will be happy. You must also modify the code on the server and client to query non deleted items with "is_deleted=0".
last_modified - Add this to the server and core data model. This field should automatically be updated with the current date and time by the server whenever anything changes on that record. It should never be modified by the client.
guid - Add a globally unique id (see http://en.wikipedia.org/wiki/Globally_unique_identifier) field to the server and core data model. This field becomes the primary key and becomes important when creating new records on the client. Normally your primary key is an incrementing integer on the server, but we have to keep in mind that content could be created offline and synchronized later. The GUID allows us to create a key while being offline.
On the client, add code to set sync_status to 1 on your model object whenever something changes and needs to be synchronized to the server. New model objects must generate a GUID.
Synchronization is a single request. The request contains:
The MAX last_modified time stamp of your model objects. This tells the server you only want changes after this time stamp.
A JSON array containing all items with sync_status=1.
The server gets the request and does this:
It takes the contents from the JSON array and modifies or adds the records it contains. The last_modified field is automatically updated.
The server returns a JSON array containing all objects with a last_modified time stamp greater than the time stamp sent in the request. This will include the objects it just received, which serves as an acknowledgment that the record was successfully synchronized to the server.
The app receives the response and does this:
It takes the contents from the JSON array and modifies or adds the records it contains. Each record get set a sync_status of 0.
I used the word record and model interchangeably, but I think you get the idea.
I suggest carefully reading and implementing the sync strategy discussed by Dan Grover at iPhone 2009 conference, available here as a pdf document.
This is a viable solution and is not that difficult to implement (Dan implemented this in several of its applications), overlapping the solution described by Chris. For an in-depth, theoretical discussion of syncing, see the paper from Russ Cox (MIT) and William Josephson (Princeton):
File Synchronization with Vector Time Pairs
which applies equally well to core data with some obvious modifications. This provides an overall much more robust and reliable sync strategy, but requires more effort to be implemented correctly.
EDIT:
It seems that the Grover's pdf file is no longer available (broken link, March 2015). UPDATE: the link is available through the Way Back Machine here
The Objective-C framework called ZSync and developed by Marcus Zarra has been deprecated, given that iCloud finally seems to support correct core data synchronization.
If you are still looking for a way to go, look into the Couchbase mobile. This basically does all you want. (http://www.couchbase.com/nosql-databases/couchbase-mobile)
Similar like #Cris I've implemented class for synchronization between client and server and solved all known problems so far (send/receive data to/from server, merge conflicts based on timestamps, removed duplicate entries in unreliable network conditions, synchronize nested data and files etc .. )
You just tell the class which entity and which columns should it sync and where is your server.
M3Synchronization * syncEntity = [[M3Synchronization alloc] initForClass: #"Car"
andContext: context
andServerUrl: kWebsiteUrl
andServerReceiverScriptName: kServerReceiverScript
andServerFetcherScriptName: kServerFetcherScript
ansSyncedTableFields:#[#"licenceNumber", #"manufacturer", #"model"]
andUniqueTableFields:#[#"licenceNumber"]];
syncEntity.delegate = self; // delegate should implement onComplete and onError methods
syncEntity.additionalPostParamsDictionary = ... // add some POST params to authenticate current user
[syncEntity sync];
You can find source, working example and more instructions here: github.com/knagode/M3Synchronization.
Notice user to update data via push notification.
Use a background thread in the app to check the local data and the data on the cloud server,while change happens on server,change the local data,vice versa.
So I think the most difficult part is to estimate data in which side is invalidate.
Hope this can help u
I have just posted the first version of my new Core Data Cloud Syncing API, known as SynCloud.
SynCloud has a lot of differences with iCloud because it allows for Multi-user sync interface. It is also different from other syncing api's because it allows for multi-table, relational data.
Please find out more at http://www.syncloudapi.com
Build with iOS 6 SDK, it is very up to date as of 9/27/2012.
I think a good solution to the GUID issue is "distributed ID system". I'm not sure what the correct term is, but I think that's what MS SQL server docs used to call it (SQL uses/used this method for distributed/sync'ed databases). It's pretty simple:
The server assigns all IDs. Each time a sync is done, the first thing that is checked are "How many IDs do I have left on this client?" If the client is running low, it asks the server for a new block of IDs. The client then uses IDs in that range for new records. This works great for most needs, if you can assign a block large enough that it should "never" run out before the next sync, but not so large that the server runs out over time. If the client ever does run out, the handling can be pretty simple, just tell the user "sorry you cannot add more items until you sync"... if they are adding that many items, shouldn't they sync to avoid stale data issues anyway?
I think this is superior to using random GUIDs because random GUIDs are not 100% safe, and usually need to be much longer than a standard ID (128-bits vs 32-bits). You usually have indexes by ID and often keep ID numbers in memory, so it is important to keep them small.
Didn't really want to post as answer, but I don't know that anyone would see as a comment, and I think it's important to this topic and not included in other answers.
First you should rethink how many data, tables and relations you will have. In my solution I’ve implemented syncing through Dropbox files. I observe changes in main MOC and save these data to files (each row is saved as gzipped json). If there is an internet connection working, I check if there are any changes on Dropbox (Dropbox gives me delta changes), download them and merge (latest wins), and finally put changed files. Before sync I put lock file on Dropbox to prevent other clients syncing incomplete data. When downloading changes it’s safe that only partial data is downloaded (eg lost internet connection). When downloading is finished (fully or partial) it starts to load files into Core Data. When there are unresolved relations (not all files are downloaded) it stops loading files and tries to finish downloading later. Relations are stored only as GUID, so I can easly check which files to load to have full data integrity.
Syncing is starting after changes to core data are made. If there are no changes, than it checks for changes on Dropbox every few minutes and on app startup. Additionaly when changes are sent to server I send a broadcast to other devices to inform them about changes, so they can sync faster.
Each synced entity has GUID property (guid is used also as a filename for exchange files). I have also Sync database where I store Dropbox revision of each file (I can compare it when Dropbox delta resets it’s state). Files also contain entity name, state (deleted/not deleted), guid (same as filename), database revision (to detect data migrations or to avoid syncing with never app versions) and of course the data (if row is not deleted).
This solution is working for thousands of files and about 30 entities. Instead of Dropbox I could use key/value store as REST web service which I want to do later, but have no time for this :) For now, in my opinion, my solution is more reliable than iCloud and, which is very important, I have full control on how it’s working (mainly because it’s my own code).
Another solution is to save MOC changes as transactions - there will be much less files exchanged with server, but it’s harder to do initial load in proper order into empty core data. iCloud is working this way, and also other syncing solutions have similar approach, eg TICoreDataSync.
--
UPDATE
After a while, I migrated to Ensembles - I recommend this solution over reinventing the wheel.
I'm looking into using MemCached for a web application I am developing and after researching MemCached over the past few days, I have come across a question I could not find the answer to.
How do you link Memcached server together or how do you replicate data between MemCached server?
Additionally: Is this functionality controlled by the servers or the clients and how?
when you set several servers, the client libraries use a first hash to pick one where to store each key/data pair. that means that there's no replication, and also that every client has to use the same set of servers.
pros:
almost zero overhead, storage and bandwidth grow linearly.
server code is kept simple and reliable.
cons:
any change in the set of servers (one goes down, or you add a new one) suddenly invalidates (almost) the whole cache.
you have to be sure to use the same algorithm on every client.
if you have control to the client's code, you can simply store each key/data pair twice on two servers. just be sure to search on the same places when reading from a different client.
I've used BeITMemcached and in that you create an instance of MemcacheClient and set the servers you want to use, just as strings.
At that point the client itself determines which of the servers it has available to put different items into. You never know which an item will be in.
Check here to see how the servers handle failover.
The easiest thing is to have a repopulate mechanism. In my case, I store several hundred objects in memcache which come out of a database. I can just call repopulate and put them all back in there. Whenever I add, update or delete them to the database, I make those same calls to memcache.
http://repcached.lab.klab.org/
Also, the PHP PECL memcache client can replicate data to multiple servers, see memcache.redundancy.
It sounds like you wish to have caches that can cope with machines rebooting etc if so…
In a lot of case (assuming you are not writing Facebook) a RDMS is fast enough for caching. Just create a table that has a key and a blob column. If the RDBS server has enough ram, all the data will be in RAM and just saved to disk so as to allow recovery.
Remember this could be a separate server(s) from your main database server.
If you wish to get more fancy and are using a high-end RDMS, you may be able to set up change notifications on the queries that are used to build the “cached data” that delete out-of-date rows from the cache.
Someone you can set up triggers to clear invalid rows from the cache, however this can be very complex very quickly.
Memcached does not provide replication property. To do that, you need to add the server to memcached client server list and then hit the DB for the data to be stored in that particular server.
You should seriously consider CouchBase. It uses the memcached protocol, provides nearly the same speed, and delivers the automatic replication you're looking for. It also persists to disk so your cache will never be cold.