Count number of empty repositories on GitHub - github

I was just wondering if it's possible to count the total number of empty repositories on GitHub.
If not for all users, can it be done for yourself?
Edit
I have tried the size:0 search, but it seems to return a lot of repositories which do contain data. Taking something like size:0..1 didn't help either.
If I try searching for the keyword empty, but it doesn't cover all aspects.
Update
I got a response from Brian Levine (GitHub)
That would be an interesting statistic. We don't have a simple way to do that right now. However, you might be able to use the GitHub API to get close. You could look through public repositories and compare "pushed_at" and "created_at" dates to see if there has been any activity. Additionally, you could find repositories with a "size" of 0. There's more information on how to find this information, and much more, right here:
http://developer.github.com/v3/repos/

You could:
list all public repos through the API, and,
for each repo, check the ones with a size equals to 0.
(The size seems to be in KB)
GET /repos/:owner/:repo
Note that an "empty" repo could still have at least one commit, when created with the default README.md description file.
Actually, as the OP Aniket comments:
I explained the meaning of empty as: 0-1 commits, max 3 files:
.gitignore
README.md
LICENSE
(Note: README is different from README.md)
Another way is, for each repo, to look at the number of commits.
0 or 1 commit means probably an empty repo.
Update: GitHub confirms there is no current way to determine if a repo is "empty".
The closest way to do that would be:
You could look through public repositories and compare "pushed_at" and "created_at" dates to see if there has been any activity

To check if a repository is empty, look to see if it has any commits.
https://api.github.com/repos/:owner/:repo/commits?per_page=1
An empty repository will have a non-successful HTTP status and the content...
{
"message": "Git Repository is empty.",
"documentation_url": "https://developer.github.com/v3"
}
If it doesn't exist, you'll get a 404 and...
{
"message": "Not Found",
"documentation_url": "https://developer.github.com/v3"
}
If it does exist, you'll get an HTTP 200 and one commit.

Using the attribute "size" from the API will not help as mentioned by other posters here.
An example is this repository:
https://api.github.com/repos/errfree/test
If you note, it displays the size as 48 despite being empty.
Disclaimer: This approach is a hack. It is not efficient nor officially supported by GitHub, but works good enough for me.
Basically, I download the Zip version of the repository. When the repository is empty then it will not return a zip file but provides as result an HTML page saying "This repository is empty.".
After downloading a zip file, I verify if the size is smaller than 30Kb and if this is the case, I look inside the file contents for the string "This repository is empty." to confirm that a given repository is empty.
Here is a practical example of direct zip download that on this case will display an empty page:
https://github.com/errfree/test/zipball/master/
My pseudo-code in Java:
// we might have reached an empty repository
if(fileZip.length() < 30000){
// read the contents
final String content = utils.files.readAsString(fileZip);
// is this an HTML file with the repository empty message?
if(content.contains("This repository is empty.")){
return null;
}
}
Hope this helps.

Related

github search code in repo gives empty result

I have a strange issue in my repo, when I search for i.e. "DEM"
Steps:
Open https://github.com/limex/schotterosm-cartocss-style/search?q=DEM
Expected Result:
4 Search Results. i.e. for the readme.md, as you can clearly see on the page https://github.com/limex/schotterosm-cartocss-style
Observed Result:
0 Results
I don't get any results for code searches within my repo.
Funfact:
My repo is a fork and the same search works on the original repo:
https://github.com/cyclosm/cyclosm-cartocss-style/search?q=DEM
Any idea why?
BR Günther
I think you repository is still being indexed. I also get that notification with the new search experience:

How can I get the open graph image for a GitHub repository?

I have been trying to get the open graph images from my repositories through the GraphQL API that GitHub exposes, but I always get my avatar back. I have tried querying the repositories node, the search, and the user node to no avail.
For example, for the query:
query {
repository(name: "rust-algorithms", owner: "alexfertel") {
openGraphImageUrl
nameWithOwner
}
}
I get:
"repository": {
"openGraphImageUrl": "https://avatars.githubusercontent.com/u/22298999?s=400&v=4",
"nameWithOwner": "alexfertel/rust-algorithms"
},
Which you can tell gives an avatar and not the open graph image generated by GitHub for the repository.
Is there a way to get this image that doesn't involve scraping GitHub?
The general format is:
https://opengraph.githubassets.com/<any_hash_number>/<owner>/<repo>
Credit here goes to user #msfjarvis on Reddit
any_hash_number
We can use any number or string here. This is actually to tell the API that this is the version. It's better to use hash because then it will always give the updated image. But we can use any string like 1, a, or 1a. If we always use 1 or a it will not give updated image.
You can also get it for issue and pull request
Issue
https://opengraph.githubassets.com/<any_hash_number>/<owner>/<repo>/issue/<issue_number>
PR
https://opengraph.githubassets.com/<any_hash_number>/<owner>/<repo>/pull/<pr_number>
any_hash_number
We can use any number or string here. This is actually to tell the API that this is the version. It's better to use hash because then it will always give the updated image.But we can use any string like 1, a, or 1a. If we always use 1 or a it will not give updated image.

How to reset a file to a particular commit with JGit?

Consider my local repository contains more than one file, while doing checkout for a particular commit of a file, other files in the repository got deleted.
I am using following API (git is the instance of git repository)
git.checkout().setName(commitId).call()
Is this correct way to check out a particular commit of a particular file?
The JavaDoc of setName() says
When only checking out paths and not switching branches, use setStartPoint(} to specify from which branch or commit to check out files.
And for addPath() it states:
If this option is set, neither the setCreateBranch() nor setName() option is considered. In other words, these options are exclusive.
Therefore I think you should use
git.checkout().addPath( ... ).setStartPoint( ... ).call();
Your call reset the index (and can remove files no longer present in the new commit you check out)
You can look for a more precise example in jgit/porcelain/RevertChanges.java
// revert the changes
git.checkout().addPath(fileName).call();
In your case:
git.checkout().setname(commitId).addPath(fileName).call()

How to search all GitHub repositories for SHA?

Is there a way to find all repositories on github that contain a given commit (given its SHA)?
I can find easily check if a given commit exists in a known repository by checking the existence of
https://github.com/${USER}/${REPO}/commit/${SHA}
...but what if I don't know the repo-slug?
Doing a simple search on SHAs (via the webinterface) doesn't return anything.

List all the files checked-in in a single cvs commit

Generally,our fixes/patches for any bugs involves changes in multiple files and we will commit all these files in a single shot.
In SVN, for each commit (may involve multiple files),it will increment revision number of whole repository by one. So, we can easily link all the multiple files that went in a single commit.
Now the difficulty with the same case in CVS is that it will increment the revision numbers of all the files individually. Let's say if a commit involves the following files:
file1.c //revision assigned as part of this commit..1.5.10.2
file2.c //revision assigned as part of this commit..1.41.10.1
and the comment given for this commit is "First Bug Fix".
Now, the only way to get all files checked-in as part of this commit is by searching through all the cvs logs for comment "First Bug Fix" and hopefully it will return only the two file revisions mentioned above.
Please share your views on if there is any better way in CVS to keep track of all files checked-in in a single commit instead of relaying on comment given as part of commit.
I think CVSps might do what you are looking for.
"CVSps is a program for generating 'patchset' information from a CVS repository. A patchset in this case is defined as a set of changes made to a collection of files, and all committed at the same time (using a single 'cvs commit' command). This information is valuable to seeing the big picture of the evolution of a cvs project. While cvs tracks revision information, it is often difficult to see what changes were committed 'atomically' to the repository."
This cvsps relies on cvs client. Make sure you have proper version of cvs which supports rlog command (1.1.1)
CVS does not have inherent support for "transactions".
You need some additional glue to do this. Fortunately, this has all been done for you and is available in a very nice extension called "cvszilla".
The home page is here:
http://www.nyetwork.org/wiki/CVSZilla
This also ties in to CVSweb, which is a great way to browse through your CVS modules via a web-based GUI.
Perhaps the ANT CvsChangeLog Task is another choice. See http://ant.apache.org/manual/Tasks/changelog.html . It provides date and time for a checkin message. You can produce nice reports with XSLT - try the example at the bottom of the ANT manual page.
I know it's late for an answer, but perhaps other users come across this like I did (searching) and appreciate the ANT integration.
OK, I just installed cvsps and ran it from the top level. Here's a sample of the output... this is one of the few hundred patch sets on my module. Note that indeed this does work across different directory trees.
---------------------
PatchSet 221
Date: 2009/04/22 22:09:37
Author: jlove-ext
Branch: HEAD
Tag: LCA_v1_0_0_0_v6
Log:
Bug: 45562
Check the length of strings in messages. Namely:
* Logical server IDs cannot be more than 18 characters (forcing a
TCSE protocol requirement).
* Overall 'sid' (filter) search string length cannot be more than
500 (this is actually more than the technical maximum messages are
allowed, but is close).
Alarm messages and are now not going to crash either as the alarm text
is shortened if necessary by the LCA.
Members:
catalogue/extractCmnAlarms.pl:1.2->1.3
programs/ldapControlAgent/LcaCommon.h:1.18->1.19
programs/ldapControlAgent/LcaUtils.cc:1.20->1.21
programs/ldapControlAgent/LcaUtils.h:1.6->1.7
programs/ldapControlAgent/LdapSession.cc:1.61->1.62
tests/cts-45562.txt:INITIAL->1.1
So, this may indeed do what you want. Nice one, Joakim. However, as mentioned, CVSzilla does much more than this:
Web-browsable CVS repositories (via CVSweb).
Web-browsable transactions.
Supports transactions across modules.
Generates CVS commands (using 'cvs -j') to merge patchsets onto other branches.
Integration with bugzilla (transactions are automatically registered against bugs).
If all you want is just the patchset info, go with cvsps. If you're looking to use CVS on large projects over a long period of time and are thinking about using bugzilla for your bug-tracking, then I would suggest looking into CVSzilla.
This also could be useful:
http://code.google.com/a/eclipselabs.org/p/changelog/