Can commits/pushes to github be automated? - github

I've moved a site to a Jekyll / GitHub Pages setup and have an iOS-based markdown editor that syncs to dropbox. Currently I'm investigating ways to bridge the gap and have files created on the go automatically committed and pushed to the GitHub repo but unsure where to start. Is anything like this possible?
(I am not experienced in using Automator on OSX but it seems like it might be an option, though I can't guarantee that a machine will be awake all the time)

Using cron should do the trick. Note that you'll need to have key-based authentication set up for git so you're not prompted a password on push.
(Note that I've used these tools in Linux, but they should work in OS X as well.)
Create your script somewhere
#!/bin/sh
cd /path/to/git/repo
git commit -a -m "Automated commit message." # commit all changes
git push
Make the schript executable
chmod + x script.sh
Run crontab -e to edit your cron file, and add 0 * * * * /path/to/script.sh to execute the script once per hour.
This also assumes that this will be the only committer. If anyone else pushes to the repo from elsewhere, you'll have to merge those changes to this clone before this script will push successfully again.
You could also check out Flashbake!

Related

How to skip big files for GitHub in Machine Learning Repo?

I'm new to GitHub and Machine Learning.
I've been using Conda and Jupyter Notebook for my tests in ML.
It all was fine.
But I know that it's better to use VS Code (easer to code?) and GitHub(promote and share my code?). I don't really care about version control because I'm only doing my own fist steps.
But nevertheless I did create GitHub account and I try to create a Repo and push my already existing folders with Python files. These folders also contain raw and modified data that is used in the code... .csv and .xlsx files. Some of them are 100 Mb+
I use Mac M1 and I've tried to create .gitignore_global file (and it works - when I git add . from the Terminal files noted .gitignore_global don't push (upload).
I've also created a .gitignore file in my working directory.
And I use find ./* -size +100M | cat >> .gitignore to add these files in the .gitignore (and it adds).
But when I try to git init -b main , git add . , git commit -m "First commit", git remote add origin <REMOTE_URL> and git push -u origin main it still tries to upload 100m+ files.
I've tried to delete the whole git subfolder and Repo on the site... it doesn't work.
What should I do in order not to upload (push) these files?
How do you use GitHub for DataScience / Machine Learning with these limitations?
It's really impossible not to use all the data files...
Please see above. I've tried several ways

How Do I Update a Live Server Correctly Using Mercurial?

I'm new to Mercurial and version control, and although I'm only working on personal PHP application projects (until I hopefully get a job soon) I'm well overdue learning how it all works.
I've been reading about Mercurial all day, but I'm still confused on a few elements...
Firstly, I understand Mercurial CAN push my files straight to my live server, but I don't see many tutorials or examples explaining how this is done, so it leads me to think it's not used often? I currently use FTP to upload my files, and it's error prone to know which files have been modified, so I'd like to eliminate this obviously.
I also see services like BitBucket being mentioned a lot, but if I'm pushing to BitBucket how do I then get my files to my live server? Can I get only the changed files to upload via FTP, or do I need to install Mercurial on my server too or something?
Apologies if this is a basic question, I'm just a little lost as to how companies would/should use this service, and how files and uploads are handled elegantly. How should i go about version control on a personal project?
There are many ways to do that, but I'll try to narrow it down to the basic steps involved in a scenario using BitBucket:
1) Install Mercurial on both your dev machine and your live server.
2) Create a repository in BitBucket.
3) Clone the repository to your dev machine using the URL that appears in BitBucket, e.g.:
hg clone https://your_user#bitbucket.org/your_account/your_repos .
4) Clone the repository to your live server in the same way.
5) Do your dev and commit your code to the local repository on your dev machine (using hg commit). Then push the changesets to BitBucket using hg push.
6) Once you're ready to deploy the changes to your live server, log in to your live server and run hg pull -u.
I just use rsync to upload everything. If you're working by yourself, it's simple and works fine.
I set up an SSL certificate, and then made a bash shortcut ,p (target directory):
,p() { rsync -avz --delete ./ "user#server.com:/var/www/html/$#/"; }
Then, on my local host I can type ,p images and the current directory will be uploaded to mysite/images.
If you're always uploading to the same place, you can make a shortcut with no argument:
alias ,pm="rsync -avz --delete ./ "user#server.com:/var/www/html/";
Finally, if you just want to type the command:
rsync -avz --delete ./ "user#server.com:/var/www/html/

How do I execute a post-receive hook on msysgit on Windows hosted by Apache?

I'm setting up a Git server on a Windows host. I have installed the latest Apache, and have it working with msysGit. I'm not using SSH at all; I can push and pull through HTTP.
Now I want to add a post-receive hook to notify my build server, but I can't figure out how to do that. I see the sample hook scripts in the repository on the server, but I'm confused about what to do there. Do I put a Windows batch file there, named post-receive.bat, or do something else?
I'm a bit fuzzy on details of what this is all doing, but Apache is executing c:\Program Files\git\libexec\git-core\git-http-backend.exe when it sees a Git URL. Is git-http-backend.exe going to trigger the post-receive hook?
Update
I'm getting closer. Here's my hook, in hooks/post-receive in my repo:
#!/c/Program Files/Git/bin/sh
curl http://mybuildserver:8080/job/Whazzup/build
I changed the shebang from #!/bin/sh because on Windows I don't have that. Now in the Apache error log I get the message error: cannot spawn hooks/post-receive: No such file or directory
Incidentally, Git bash's chmod does not seem to work. But I was able to get the permission on post-receive to rwxr-xr-x by renaming the sample file.
Update
I changed the shebang line back to #!/bin/sh, but I still get the same error in the Apache error log: error: cannot spawn hooks/post-receive: No such file or directory. As a test I opened a Git bash prompt in the hooks folder, and executed the hook with ./post-receive, and it worked.
Update
Now I'm wondering if I have a different problem. Taking VonC's advice, I tried running Apache httpd from the command line under my own account, instead of as a service under LocalSystem. Still the same thing. Pushing and pulling work fine, but the hook doesn't execute. Then I tried getting Apache out of the equation. From a Git bash prompt on the same computer as the repo, I did a clone (via filesystem), modify, commit, and push. But the hook still didn't execute.
Update
OK, I had a silly problem in my hook script, but now at least it executes when I push to the repo from the same computer (via filesystem). But not when I push through Apache. Apache is now running under a regular account, and the Apache account has full control of the repository. The push works fine, but the post-receive hook doesn't execute.
Apache is executing c:\Program Files\git\libexec\git-core\git-http-backend.exe when it sees a Git URL. Is git-http-backend.exe going to trigger the post-receive hook?
No, it will pass the command (clone, push, pull) to git itself.
The post-receive hook will be executed after the push has been completed, and it is a bash (unix shell) script, as illustrated in " post-receive hook on Windows - GIT_WORK_DIR: no such file or directory ".
See also " git windows post pull " to see where you can create that post-receive script (in the .git/hooks of your repo): it has nothing to do with your http Apache service in front of the repos.
Regarding the error message "cannot spawn hooks/post-receive: No such file or directory", you can refer to " msysgit error with hooks: "git error: cannot spawn .git/hooks/post-commit: No such file or directory" ":
The shebang must be #!/bin/sh
Apache must run as a regular user instead of Local System, in order to benefit from the environment variables defined for said regular user.
<path_to_git>\bin must be in the PATH

Jenkins: FTP / SSH deployment, including deletion and moving of files

I was wondering how to get my web-projects deployed using ftp and/or ssh.
We currently have a self-made deployment system which is able to handle this, but I want to switch to Jenkins.
I know there are publishing plugins and they work well when it comes to uploading build artifacts. But they can't delete or move files.
Do you have any hints, tipps or ideas regarding my problem?
The Publish Over SSH plugin enables you to send commands using ssh to the remote server. This works very well, we also perform some moving/deleting files before deploying the new version, and had no problems whatsoever using this approach.
The easiest way to handle deleting and moving items is by deleting everything on the server before you deploy a new release using one of the 'Publish over' extensions. I'd say that really is the only way to know the deployed version is the one you want. If you want more versioning-system style behavior you either need to use a versioning system or maybe rsync that will cover part of it.
If your demands are very specific you could develop your own convention to mark deletions and have them be performed by a separate script (like you would for database changes using Liquibase or something like that).
By the way: I would recommend not automatically updating your live sites after every build using the 'publish over ...' extension. In case we really want to have a live site automatically updated we rely on the Promoted Builds Plugin to keep it nearly fully-automated but add a little safety.
I came up with a simple solution to remove deleted files and upload changes to a remote FTP server as a build action in Jenkins using a simple lftp mirror script. Lftp Manual Page
In Short, you create a config file in your jenkins user directory ~/.netrc and populate it with your FTP credentials.
machine ftp.remote-host.com
login mySuperSweetUsername
password mySuperSweetPassword
Create an lftp script deploy.lftp and drop it in the root of your .git repo
set ftp:list-options -a
set cmd:fail-exit true
open ftp.remote-host.com
mirror --reverse --verbose --delete --exclude .git/ --exclude deploy.lftp --ignore-time --recursion=always
Then add an "Exec Shell" build action to execute lftp on the script.
lftp -f deploy.lftp
The lftp script will
mirror: copy all changed files
reverse: push local files to a remote host. a regular mirror pulls from remote host to local.
verbose: dump all the notes about what files were copied where to the build log
delete: remove remote files no longer present in the git repo
exclude: don't publish .git directory or the deploy.lftp script.
ignore-time: won't publish based on file creation time. If you don't have this, in my case, all files got published since a fresh clone of the git repo updated the file create timestamps. It still works quite well though and even files modified by adding a single space in them were identified as different and uploaded.
recursion: will analyze every file rather than depending on folders to determine if any files in them were possibly modified. This isn't technically necessary since we're ignoring time stamps but I have it in here anyway.
I wrote an article explaining how I keep FTP in sync with Git for a WordPress site I could only access via FTP. The article explains how to sync from FTP to Git then how to use Jenkins to build and deploy back to FTP. This approach isn't perfect but it works. It only uploads changed files and it deletes files off the host that have been removed from the git repo (and vice versa)

What is the cleverest use of source repository that you have ever seen?

This actually stems from on my earlier question where one of the answers made me wonder how people are using the scm/repository in different ways for development.
Pre-tested commits
Before (TeamCity, build manager):
The concept is simple, the build system stands as a roadblock between your commit entering trunk and only after the build system determines that your commit doesn't break things does it allow the commit to be introduced into version control, where other developers will sync and integrate that change into their local working copies
After (using a DVCS like Git, that is a source repository):
My workflow with Hudson for pre-tested commits involves three separate Git repositories:
my local repo (local),
the canonical/central repo (origin)
and my "world-readable" (inside the firewall) repo (public).
For pre-tested commits, I utilize a constantly changing branch called "pu" (potential updates) on the world-readable repo.
Inside of Hudson I created a job that polls the world-readable repo (public) for changes in the "pu" branch and will kick off builds when updates are pushed.
my workflow for taking a change from inception to origin is:
* hack, hack, hack
* commit to local/topic
* git pup public
* Hudson polls public/pu
* Hudson runs potential-updates job
* Tests fail?
o Yes: Rework commit, try again
o No: Continue
* Rebase onto local/master
* Push to origin/master
Using this pre-tested commit workflow I can offload the majority of my testing requirements to the build system's cluster of machines instead of running them locally, meaning I can spend the majority of my time writing code instead of waiting for tests to complete on my own machine in between coding iterations.
(Variation) Private Build (David Gageot, Algodeal)
Same principle than above, but the build is done on the same workstation than the one used to develop, but on a cloned repo:
How not to use a CI server in the long term and not suffer the increasing time lost staring at the builds locally?
With git, it’s a piece of cake.
First, we ‘git clone’ the working directory to another folder. Git does the copy very quickly.
Next times, we don’t need to clone. Just tell git get the deltas. Net result: instant cloning. Impressive.
What about the consistency?
Doing a simple ‘git pull’ from the working directory will realize, using delta’s digests, that the changes where already pushed on the shared repository.
Nothing to do. Impressive again.
Of course, while the build is running in the second directory, we can keep on working on the code. No need to wait.
We now have a private build with no maintenance, no additional installation, not dependant on the IDE, ran with a single command line. No more broken build in the shared repository. We can recycle our CI server.
Yes. You’ve heard well. We’ve just built a serverless CI. Every additional feature of a real CI server is noise to me.
#!/bin/bash
if [ 0 -eq `git remote -v | grep -c push` ]; then
REMOTE_REPO=`git remote -v | sed 's/origin//'`
else
REMOTE_REPO=`git remote -v | grep "(push)" | sed 's/origin//' | sed 's/(push)//'`
fi
if [ ! -z "$1" ]; then
git add .
git commit -a -m "$1"
fi
git pull
if [ ! -d ".privatebuild" ]; then
git clone . .privatebuild
fi
cd .privatebuild
git clean -df
git pull
if [ -e "pom.xml" ]; then
mvn clean install
if [ $? -eq 0 ]; then
echo "Publishing to: $REMOTE_REPO"
git push $REMOTE_REPO master
else
echo "Unable to build"
exit $?
fi
fi
Dmitry Tashkinov, who has an interesting question on DVCS and CI, asks:
I don't understand how "We’ve just built a serverless CI" cohere with Martin Fowler's state:
"Once I have made my own build of a properly synchronized working copy I can then finally commit my changes into the mainline, which then updates the repository. However my commit doesn't finish my work. At this point we build again, but this time on an integration machine based on the mainline code. Only when this build succeeds can we say that my changes are done. There is always a chance that I missed something on my machine and the repository wasn't properly updated."
Do you ignore or bend it?
#Dmitry: I do not ignore nor bend the process described by Martin Fowler in his ContinuousIntegration entry.
But you have to realize that DVCS adds publication as an orthogonal dimension to branching.
The serverless CI described by David is just an implementation of the general CI process detailed by Martin: instead of having a CI server, you push to a local copy where a local CI runs, then you push "valid" code to a central repo.
#VonC, but the idea was to run CI NOT locally particularly not to miss something in transition between machines.
When you use the so called local CI, then it may pass all the tests just because it is local, but break down later on another machine.
So is it integeration? I'm not criticizing here at all, the question is difficult to me and I'm trying to understand.
#Dmitry: "So is it integeration"?
It is one level of integration, which can help get rid of all the basic checks (like format issue, code style, basic static analysis detection, ...)
Since you have that publication mechanism, you can chain that kind of CI to another CI server if you want. That server, in turn, can automatically push (if this is still fast-forward) to the "central" repo.
David Gageot didn't need that extra level, being already at target in term of deployment architecture (PC->PC) and needed only that basic kind of CI level.
That doesn't prevent him to setup more complete system integration server for more complete testing.
My favorite? An unreleased tool which used Bazaar (a DSCM with very well-thought-out explicit rename handling) to track tree-structured data by representing the datastore as a directory structure.
This allowed an XML document to be branched and merged, with all the goodness (conflict detection and resolution, review workflow, and of course change logging and the like) made easy by modern distributed source control. Splitting components of the document and its metadata into their own files prevented the issues of allowing proximity to create false conflicts, and otherwise allowed all the work that the Bazaar team put into versioning filesystem trees to work with tree-structured data of other kinds.
Definitely Polarion Track & Wiki...
The entire bug tracking and wiki database is stored in subversion to be able to keep a complete revision history.
http://www.polarion.com/products/trackwiki/features.php