Amazon Sagemaker Failed: Please check if you have a directory that has same name as the git repo? - github

I was working in Sagemaker, and noticed that my notebook instance was behind my github repo as I had just pushed to it outside of working in Sagemaker. I couldn't seem to pull, so I deleted the directory within Jupyter and git cloned my updated repo. It worked fine, but once I was done working I haven't since been able to reinitialize the notebook. Sagemaker simpy says
Failure reason
Please check if you have a directory that has same name as the git repo.
I cloned from the same repo, so I don't imagine that the directory name changed. Maybe the directory is in the wrong place? Either way, how do I go in and change things if I can't open the notebook? Not sure what to do about this.

Do you have a lifecycle configuration (LCC) script that clones the repository? I can't think of another reason why the notebook would fail to start (I assume what you meant by 'reinitialize'). If you do, remove the LCC script to start your notebook. Or you could add a condition to check for the folder and clone if it does not exist, something like -
if [ ! -d "$FOLDER" ] ; then
git -C /home/sagemaker-user clone $REPOSITORY_URL
# if you want to pull latest when restarting, uncomment lines below
# else
# cd "$FOLDER"
# git pull $REPOSITORY_URL
fi
I work at AWS and my opinions are my own.

Related

How to skip big files for GitHub in Machine Learning Repo?

I'm new to GitHub and Machine Learning.
I've been using Conda and Jupyter Notebook for my tests in ML.
It all was fine.
But I know that it's better to use VS Code (easer to code?) and GitHub(promote and share my code?). I don't really care about version control because I'm only doing my own fist steps.
But nevertheless I did create GitHub account and I try to create a Repo and push my already existing folders with Python files. These folders also contain raw and modified data that is used in the code... .csv and .xlsx files. Some of them are 100 Mb+
I use Mac M1 and I've tried to create .gitignore_global file (and it works - when I git add . from the Terminal files noted .gitignore_global don't push (upload).
I've also created a .gitignore file in my working directory.
And I use find ./* -size +100M | cat >> .gitignore to add these files in the .gitignore (and it adds).
But when I try to git init -b main , git add . , git commit -m "First commit", git remote add origin <REMOTE_URL> and git push -u origin main it still tries to upload 100m+ files.
I've tried to delete the whole git subfolder and Repo on the site... it doesn't work.
What should I do in order not to upload (push) these files?
How do you use GitHub for DataScience / Machine Learning with these limitations?
It's really impossible not to use all the data files...
Please see above. I've tried several ways

Switching Remote Urls from HTTPS to SSH

I tried to switch from a HTTPS to SSH repo using git. Below are the first commands I used.
Then, when I tried to add a branch to the staging area, I got the following messages:
I am not able to push anything or add any commits to git from my command line either. I get an error saying "could not read remote repository". Could someone please help me? What should I do now? I am new to git and I don't want to dig myself in a deeper hole!
Check for a .git/ subfolder in:
your current working directory (where you switch to SSH)
your parent folders
If you see one in any parent folder, that would make your current working directory a nested Git repository.
Ideally, there should not be any parent Git repository above your own: see if you can remove those parent .git folders (or move them elsewhere).

Jupyter Notebook file is taking forever to upload on Github

I was trying to upload one of my Jupyter Notebook files on GitHub, but it's taking forever to upload.
File size is also not that big. It's about 17KB. Also getting problem for this notebook only.
Here's the screen shot.
Any kind of help or suggestions are highly appreciated.
Try using Git Bash to push your code/make changes instead of uploading files directly on GitHub (it is less prone to errors and is quite comfortable at times - takes less time as well!), for doing so, you may follow the below-given steps:
Download and install the latest version of Git Bash from here - https://git-scm.com/
Right-click on any desired location on your system.
Click “Git Bash Here”.
git config --global user.name “your name”
git config --global user.email “your email”
Go back to your GitHub account – open your project – click on “clone” – copy HTTPS
link.
git clone PASTE HTTPS LINK.
Clone of your GitHub project will be created on your computer location.
Open the folder and paste your content.
Make sure content is not empty
Right-click inside the cloned folder where you have pasted your content.
Click “Git Bash Here” again.
You will find (master) appearing after your location address.
git add .
Try git status to check if all your changes are marked in green.
git commit --m “Some message”
git push origin master
Hope this helps! Good luck!
You could try and:
clone the repository, add the file locally, commit and push
check on github.com if your remote repository has a .gitattributes file with lfs directives in it.
Maybe that repository, managed by LFS, has reached some upload limit which would prevent any new upload.

Creating new git repository, can't add directory

I am about to throw my laptop through a wall, and am hoping for help before reaching that point. For reference, I am following these instructions exactly - https://help.github.com/articles/adding-an-existing-project-to-github-using-the-command-line/.
I have a directory ".../path/thisdir". Inside of thisdir are (1) a file called Demo.R and (2) a directory called sportVU. sportVU is a directory with ~15 files in it.
When I follow the instructions in that link, my github repo looks like this:
https://github.com/NicholasCanova/packageSportVU
Notice that the sportVU directory link cannot be clicked in github, and when I download the repo, sportVU is an empty folder. Why is this happening? This shouldn't be so tough.
EDIT: this is what the repo looks like in my local machine, I'm 100% sure it's not empty:
It could be that you have two .git folder in your directory. View hidden folders to see.
Similar questions:
What does a grey icon in remote GitHub mean
Why can I not open my folder in GitHub?
Since Git doesn't store empty folders the steps you should follow exactly are:
Delete the old repo and start again following exactly my steps.
git clone <repo url>
Inside the folder of the cloned repo create a directory manually and name it as you want i.e sportVU.
Drag and Drop all the files you want in the sportVU direcrory.
cd to Myrepo/sportVU and type git add *
type git commit -m "added some files"
git push -u origin master.
and you should be all set

Rstudio: Changing origin for git version control of project

I originally set up git in Rstudio while enrolled in the Data Scientist's Toolbox course at Coursera. Unfortunately, I did this in my phd project. The repository no longer exists on github. I am now attempting to write my thesis in rmarkdown using knitr and bookdown. I would like to use version control, both to learn proper git workflow and to have a structured back up of everything I have done in my thesis. However, I have been unable to change the version control repository in Rstudio.
I am unable to change this in the Tools > Version control > Project setup > Git/SVN menu. The Origin: textbox is unchangable.
I tried creating a new project using the old phd project's working directory. This also cloned the version control settings.
How do I change the origin to accomplish what is described above?
Git, Github and Rstudio are different things. You could use git as local version control tools. You might connect your local repo to Github account which is based on git by push/pull. Rstudio just makes a user interface for git and supplies the function to push the repo into remote server based on git to make version control(not only Github, but also Gitlab).
So for your issue, if you do not want to pay for github for a private repo, all of your code would be public and I don't think it is good before your finally finished your thesis. But version control could be made locally with git only. Just use git shell to control the version.
However, as a student, github could support private repo here for you. Just register and find your student package. Then just remove the url for remote repo after you cd to your workdir in command line, use the following code to find your remote url(mostly you might fing origin):
git remote -v
Then use this to remove them:
git remote rm origin
Now you could use version control locally. If you want to connect this repo to your remote github private repo, use this:
git remote add origin https://github.com/[YourUsername]/[YourRepoName].git
RStudio would find this information about git and support your following operation. Project in RStudio is different with git, although project support git as version control tool. So you need git in command line or shell to solve your problem.
This can be done by opening /your.project/.git/config
and editing the remote origin line(s), e.g. changing from git to https.
Restart Rstudio & you'll be prompted for your github username & password.
This is what worked for me for migrating from github to Azure
Go to the top right Git window in RStudio and click on the gear. Now click Shell (to open the terminal there).
#remove origin
git remote rm origin
#add new origin like Azure for me via HTTPS
git remote add origin https://USER#dev.azure.com/USER/PROJECT/_git/REPONAME
#push your local repro
git push -u origin --all
#in my case put in the PAT password if you needed to generate one.
After testing, I found some clue
Actually Rstudio is not really smart about this setting
It will first search for the git file in the Rproject folder where your Rporject file is located
if it could not, then it goes up to the folder contains your Rproject folder
However, for version control you only need coding files while RProject may contains some big files like .RData some pictures etc.
I don't find a way to manually disrupt this logic flow, the only thing you can do is to delete the current git repository setting files(which is .git folder and 2 other git setting files), then Rstudio may ask you if you want to init a new one.