I'm new to GitHub and Machine Learning.
I've been using Conda and Jupyter Notebook for my tests in ML.
It all was fine.
But I know that it's better to use VS Code (easer to code?) and GitHub(promote and share my code?). I don't really care about version control because I'm only doing my own fist steps.
But nevertheless I did create GitHub account and I try to create a Repo and push my already existing folders with Python files. These folders also contain raw and modified data that is used in the code... .csv and .xlsx files. Some of them are 100 Mb+
I use Mac M1 and I've tried to create .gitignore_global file (and it works - when I git add . from the Terminal files noted .gitignore_global don't push (upload).
I've also created a .gitignore file in my working directory.
And I use find ./* -size +100M | cat >> .gitignore to add these files in the .gitignore (and it adds).
But when I try to git init -b main , git add . , git commit -m "First commit", git remote add origin <REMOTE_URL> and git push -u origin main it still tries to upload 100m+ files.
I've tried to delete the whole git subfolder and Repo on the site... it doesn't work.
What should I do in order not to upload (push) these files?
How do you use GitHub for DataScience / Machine Learning with these limitations?
It's really impossible not to use all the data files...
Please see above. I've tried several ways
I need to download all releases (in fact all tags) of a project from github repository.
So I need commands or scripts that provide all the tag names of a given project and download them automatically.
Clone the entire git repository to a local directory on your machine with git clone --no-single-branch
Now you have all the data downloaded in git format
Use the git tag command to list all the tags
Use the git archive command to create an archive (tar or zip) of each tagged release (Hint: Use shell or batch scripting to loop over the output from git tag)
I am new to Sourcetree and source control in general. I am working on an Android project with a few other people and use bitbucket as the repository. I have learned the basics but don't want to track certain files in my local, specifically a lot of the gradle and iml files. But i think Stop tracking will remove those from the repo. Is there a way to just have source tree ignore any changes i make to certain files locally but not delete them from the repo ?
Thank you in advance
You can create a file and name it .gitignore in the root of the project and in that file place every directory to exclude by git like:
The above would be excluded from git tracked files.
If you are already tracking files this command will remove them from index:
git rm -r --cached <folder>
I am very new to bazaar and I am exploring the features of it (and of version control system)
I have a bazaar repo, lets call it 'foo'. Under foo repo I have a directory, lets call it 'projects'.
so, I want to create a separate bazaar repo with only projects directory & I want to retain the log too. I mean to say, everything that is related to project folder present in log file, should be available with this new repo.
I tried export command, but I just got the directory without any log.
Any pointers where I should look ?
You can do this using the fastimport plugin:
bzr fast-export /path/to/orig/project | \
bzr fast-import-filter -i project1/ | \
bzr fast-import - /path/to/new/project1
(I broke the line for readability)
The first command dumps the revisions of the branch at the specified path to standard output
The second command filters the revisions, selecting only the ones that affect the project1/ directory. The trailing / is important.
The third command imports the revisions from the standard input to the specified branch. If the branch does not exist, bzr will create a shared repository with a branch named trunk in it.
For more details, see the help pages:
bzr help fast-export
bzr help fast-import-filter
bzr help fast-import
The fastimport plugin is included in the default installation on Windows and Mac OS X. If you have a more exotic setup, I recommend installing it with pip. I don't remember 100% the package name, maybe bzr-fastimport. You will also need the fastimport python library.
I am new to Hudson / Jenkins and was wondering if there is a way to check in Hudson's configuration files to source control.
Ideally I want to be able to click some button in the UI that says 'save configuration' and have the Hudson configuration files checked in to source control.
Most helpful Answer
There is a plugin called SCM Sync configuration plugin.
Original Answer
Have a look at my answer to a similar question. The basic idea is to use the filesystem-scm-plugin to detect changes to the xml-files. Your second part would be committing the changes to SVN.
EDIT: If you find a way to determine the user for a change, let us know.
EDIT 2011-01-10 Meanwhile there is a new plugin: SCM Sync configuration plugin. Currently it only works with subversion and git, but support for more repositories is planned. I am using it since version 0.0.3 and it worked good so far.
Note that Vogella has a recent (January 2014, compared to the OP's question January 2010) and different take on this.
Consider that the SCM Sync configuration plugin can generate a lot of commits.
So, instead of relying on a plugin and an automated process, he manages the same feature manually:
Storing the Job information of Jenkins in Git
I found the amount of commits a bit overwhelming, so I decided to control the commits manually and to save only the Job information and not the Jenkins configuration.
For this switch into your Jenkins jobs directory (Ubuntu: /var/lib/jenkins/jobs) and perform the “git init” command.
I created the following .gitignore file to store only the Git jobs information:
Now you can add and commit changes at your own will.
And if you add another remote to your Git repository you can push your configuration to another server.
Alberto actually recommend to add as well (in $JENKINS_HOME):
jenkins own config (config.xml),
the jenkins plugins configs (hudson*.xml) and
the users configs (users/*/config.xml)
To manually manage your configuration with Git, the following .gitignore file may be helpful.
# Miscellaneous Hudson litter
# Generated Hudson state
# Tools that Hudson manages
# Extracted plugins
# Job state
See this GitHub Gist and this blog post for more details.
There is a new SCM Sync Configuration plug-in which does exactly what you are looking for.
SCM Sync Configuration Hudson plugin
is aimed at 2 main features :
Keep sync'ed your config.xml (and other ressources) hudson files with a
SCM repository
Track changes (and author) made on every file with commit messages
I haven't actually tried this yet, but it looks promising.
You can find configuration files in Jenkins home folder (e.g. /var/lib/jenkins).
To keep them in VCS, first login as Jenkins (sudo su - jenkins) and create its git credentials:
git config --global user.name "Jenkins"
git config --global user.email "jenkins#example.com"
Then initialize, add and commit the basic files such as:
git init
git add config.xml jobs/ .gitconfig
git commit -m'Adds Jenkins config files' -a
also consider creating .gitignore with the following files to ignore (customize as needed):
# Git untracked files to ignore.
# Cache.
# Fingerprint records.
# Working directories.
# Secret files.
# Plugins.
# State files.
# Job state files.
# Updates.
# Hidden files.
# Except git config files.
# User content.
# Log files.
# Miscellaneous litter
Then add it: git add .gitignore.
When done, you can add job config files, e.g.
shopt -s globstar
git add **/config.xml
git commit -m'Added job config files' -a
Finally add and commit any other files if needed, then push it to the remote repository where you want to keep the config files.
When Jenkins files are updated, you need to reload them (Reload Configuration from Disk) or run reload-configuration from Jenkins CLI.
A more accurate .gitignore, inspired by the reply from nepa:
It ignores everything except for .xml config files and .gitignore itself.
(the difference to nepa's .gitignore is that it doesn't "unignore" all top-level directories (!*/) like logs/, cache/, etc.)
The way I prefer is to exclude everything in the Jenkins home folder except the configuration files you really want to be in your VCS. Here is the .gitignore file I use:
This ignores everything (*) except (!) .gitignore itself, the jobs/projects, the plugin and other important and user configuration files.
It's also worth considering to include the plugins folder. Annoyingly updated plugins should be included...
Basically this solution makes it easier for future Jenkins/Hudson updates because new files aren't automatically in scope. You just get on the screeen what you really want.
Answer from Mark (https://stackoverflow.com/a/4066654/142207) should work for SVN and Git (although Git configuration did not work for me).
But if you need it to work with Mercurial repo, create a job with following script:
hg remove -A || true
hg add ../../config.xml
hg add ../../*/config.xml
if [ ! -z "`hg status -admrn`" ]; then
hg commit -m "Scheduled commit" -u fill_in_the#blank.com
hg push
I've written a plugin that lets you check your Jenkins instructions into source control. Just add a .jenkins.yml file with the contents:
- make
- make test
and Jenkins will do it:
I checked in hudson entirely, you could use this as a starting point https://github.com/morkeleb/continuous-delivery-with-hudson
There are benefits to keeping entire hudson in git. All config changes are logged and you can test the testup quite easily on one machine and then update the other machine(s) using git pull.
We used this as a boilerplate for our hudson continuous delivery setup at work.