Performing Get Copy All Operation With Microsoft Sync Framework - frameworks

I'm testing out Microsoft Sync Framework to try and see if it'll be suitable for a task that I'm working on. One of the things I'd like to be able to do is to have the option to not just send changed files, but instead to send all of the files (for example, if I'm syncing to a client machine for the first time, and so want to send all files).
I can't seem to find an example of this in the documentation, so any advice would be welcome.

if you're synching for the first time, then there is nothing special to configure as it will sync everything.
if you've already synched and want to re-send all files regardless of whether they've changed or not, just delete the metadata file and that should remove all knowledge of what has been synched.

Related

Synchronize Directory of Files Between Server and iOS Application

I am building an internal iOS application (so - it won't ever be in the app store), and I need to keep a directory of content synchronized between a server and each of the instances of the iOS application. This would be easy enough if I just wanted to delete and re-download this content each time, but I would rather use something similar to rsync to only download the elements that have changed.
I haven't found any good way to utilize rsync. I considered looking at Objective-Git as a possibility here, but at a quick glance it looked like there is still a lot of the support for remote repositories that isn't supported yet.
As a final note, while this won't be in the app store, I will not be jailbreaking these devices and I would prefer to not rely on any private API's (although if there was an elegant solution that utilized private API's I might consider it).
Thoughts?
ADDITIONAL NOTE: This needs to be an isolated solution. I won't be relying on outside services (like Dropbox, Box.net, etc...). This needs to work solely between the device and the server (which is on a local network with the device).
Use HTTP to list the contents of each folder on the server.
Compare last modification time of each file with those on the device, and identify added/removed files.
Get added and modified files, remove deleted files.
It sounds like you're maybe asking for a library that already does this, but if you don't find one it's obviously moderately easy to write this from the ground up using stat(2) on the server and the same or a higher-level equivalent on the iOS devices. Have the iPhone send a tree of files with their modification date to the server and get back a list of insert/delete/update operations to do with the url (or whatever) for each one so you can do them incrementally on a background thread. Have the information from the server for new/updated files include the mod date that the server has so you can set it to be the same on the iOS device and send that when asking the server for the status of each file (kind of hack using the file system to store that, but it works).
Why not just set up a RESTful interface and do it across HTTP; that way you could query the modification times easily enough to determine whether client or server files need to be updated. You might also want to keep track of what files on the client have been synced, so you can easily know which files to add or delete. This can be done with a simple .sync file or using a plist / sqlite / etc.
If you'll consider FTP, there are some pretty advanced client libraries available.
For example, the iOS Chilkat bundle includes an FTP client library that supports synchronization in both directions. It's not free, but it's pretty cheap -- and you get a ton of other stuff that will likely prove useful someday. Here's an example of iOS pulling down all additions and changes (mode 2):
http://www.example-code.com/ios/ftp_syncLocalTree.asp
One caveat -- judging solely from the example, it doesn't appear to synchronize deletions. If this is a requirement, you could do it yourself without too much effort immediately following a sync.
acrosync (see https://acrosync.com/library.html) seems like a good fit given the initial question, however I haven't used it myself yet.

importing updated files into a database

I have files that are updated every 2 hours. I have to detect the files automatically and insert the extracted information from them into a database.
Our DBMS is Postgresql and programming language is Python. How would you suggest I do that?
I want to make use of DAL (Database Abstraction Layer) to make connection between the files and database and use postgresql LISTEN/NOTIFY techniques to detect the new files. If you agree with me please tell me how I can use LISTEN/NOTIFY functions to detect the files.
Thank you
What you need is to write a script that stays running as a dæmon, using a file system notify API to run a callback function when the files change. When the script is notified that the files change it should connect to PostgreSQL and do the required work, then go back to sleep waiting for the next change.
The only truly cross platform way to watch a directory for changes is to use a delay loop to poll os.listdir and os.stat to check for new files and updated modification times. This is a waste of power and disk I/O; it also gets slow for big sets of files. If your OS reliably changes the directory modification time whenever files within the directory change you can just os.stat the directory in a delay-loop, which helps.
It's much better to use an operating system specific notification API. Were you using Java I'd tell you to use the NIO2 watch service, which handles all the platform specifics for you. It looks like Watchdog may offer something similar for Python, but I haven't needed to do directory change notification in my Python coding so I haven't tested it. If it doesn't work out you can use platform-specific techniques like inotify/dnotify for Linux, and the various watcher APIs for Windows.
See also:
How do I watch a file for changes?
Python daemon to watch a folder and update a database
You can't use LISTEN/NOTIFY because that can only send messages from within the database and your files obviously aren't in there.
You'll want to have your python script scan the directory the files are in and check their modification time (mtime). If they are updated, you'll need to read in the files, parse the data and insert it to the db. Without knowing the format of the files, there's no way to be more specific.

Guaranteeing consistency while accessing files on a web server

I'm in the process of building a simple update server for an application. The parts of the application being updated are configuration files; the most up-to-date copies of these files exist on the update server and these files can be edited by the individual managing the application (the "application manager") at any time. However, I don't want the application to be able to download one of these files while the file is being edited by the application manager; this would obviously cause consistency issues. How can I prevent these files from being accessed in an inconsistent state? Alternatively, would a solution be to provide a checksum along with the file that the application could use to determine if the file was received in a consistent state?
EDIT: I've seen this post concerning access restrictions using .htcaccess and think it could be of use. However, I want the application manager to do as little thinking as possible; having them forget to re-allow connections might be problematic. That being said, they're going to have to do some work at some point; maybe this is the way I should go?

What's the best way to update code remotely?

For example, I have a website with various types of information. If that goes down I have a copy of the same website the users use on a local webserver, like Apache or IIS on the client. They use this local version until the Internet version returns. They can have no downtime, in other words.
The problem is that over time the Internet version will change while the client versions will remain the same unless I touch each client's machine to make the updates. I don't want to do that.
Is there a good way to keep my client up to date so that when I make a change on the server the client gets a copy so they can run it locally if needs be?
Thank you.
EDIT: do you think maybe using SVN and timely running of the update by the clients would work?
EDIT: they'll never ever submit anything. It's just so I don't have to update the client by hand, manually going to the machine. they're webpages that run in case the main server is down.
I will go for Git over SVN because of its distributed nature. Gives you multiple copies of code; use it along with this comment's solution:
Making git auto-commit
to autocommit.
Why not use something like HTTrack to make local copies of your actual internet site on each machine, rather then trying to do a separate deployment. That way you'll automatically stay in sync.
This has the advantage that if, at some point, part of your website is updated dynamically from a database, the user will still be able to have a static copy of the resulting site that is up-to-date.
There are tools like rsync which you can use periodically to sync the changes.

Detect a file in transit?

I'm writing an application that monitors a directory for new input files by polling the directory every few seconds. New files may often be several megabytes, and so take some time to fully arrive in the input directory (eg: on copy from a remote share).
Is there a simple way to detect whether a file is currently in the process of being copied? Ideally any method would be platform and filesystem agnostic, but failing that specific strategies might be required for different platforms.
I've already considered taking two directory listings separaetd by a few seconds and comparing file sizes, but this introduces a time/reliability trade-off that my superiors aren't happy with unless there is no alternative.
For background, the application is being written as a set of Matlab M-files, so no JRE/CLR tricks I'm afraid...
Edit: files are arriving in the input directly by straight move/copy operation, either from a network drive or from another location on a local filesystem. This copy operation will probably be initiated by a human user rather than another application.
As a result, it's pretty difficult to place any responsibility on the file provider to add control files or use an intermediate staging area...
Conclusion: it seems like there's no easy way to do this, so I've settled for a belt-and-braces approach - a file is ready for processing if:
its size doesn't change in a certain period of time, and
it's possible to open the file in read-only mode (some copying processes place a lock on the file).
Thanks to everyone for their responses!
The safest method is to have the application(s) that put files in the directory first put them in a different, temporary directory, and then move them to the real one (which should be an atomic operation even when using FTP or file shares). You could also use naming conventions to achieve the same result within one directory.
Edit:
It really depends on the filesystem, on whether its copy functionality even has the concept of a "completed file". I don't know the SMB protocol well, but if it has that concept, you could write an app that exposes an SMB interface (or patch Samba) and an API to get notified for completed file copies. Probably a lot of work though.
This is a middleware problem as old as the hills, and the short answer is: no.
The two 'solutions' put the onus on the file-uploader: (1) upload the file in a staging directory and then move it into the destination directory (2) upload the file, and then create/upload a 'ready' file that indicates the state of the content file.
The 1st one is the better, but both are inelegant. The truth is that better communication media exist than the filesystem. Consider using some IPC that involves only a push or a pull (and not both, as does the filesystem) such as an HTTP POST, a JMS or MSMQ queue, etc. Furthermore, this can also be synchronous, allowing the process receiving the file to acknowledge the content, even check it for worthiness, and hand the client a receipt - this is the righteous road to non-repudiation. Follow this, and you will never suffer arguments over whether a file was or was not delivered to your server for processing.
M.
One simple possibility would be to poll at a fairly large interval (2 to 5 minutes) and only acknowledge the new file the second time you see it.
I don't know of a way in any OS to determine whether a file is still being copied, other than maybe checking if the file is locked.
How are the files getting there? Can you set an attribute on them as they are written and then change the attribute when write is complete? This would need to be done by the thing doing the writing ... which sounds like it isn't an option.
Otherwise, caching the listing and treating a file as new if it has the same file size for two consecutive listings is the best way I can think of.
Alternatively, you could use the modified time on the file - the file has to be new and have a modified time that is at least x in the past. But I think this will be about equivalent to caching the listing.
It you are polling the folder every few seconds, its not much of a time penalty is it? And its platform agnostic.
Also, linux only: http://www.linux.com/feature/144666
Like cron but for files. Not sure how it deals with your specific problem - but may be of use?
What is your OS. In unix you can use the "lsof" utility to determine if a user has the file open for write. Apparently somewhere in the MS Windows Process Explorer there is the same functionality.
Alternativly you could just try an exclusive open on the file and bail out of this fails. But this can be a little unreliable and its easy to tread on your own toes.