Advanced search on github excluding a specific repository - github

I'm trying to figure out if there's any way to exercise the various fields defined for the github advanced search form that would allow me to effectively exclude hits from a specific repo. In other words I want to do a code search for all hits landing outside a given repository, an inverse repository search if you will.
I may be able to tune the size field with an inequality, but I'm hoping there's something I may be overlooking that has this sort of search in mind. My specific use case is that there's a major monorepo on our remote but there's a small constellation of support repositories which reuse some bits of the main repo that need to be refactored. I'm trying to identify those source hits in the smaller repos that need to be upgraded.
https://github.com/search/advanced?q=test&type=Repositories

Use -repo in the normal search. You can exclude a repository by prepending a hyphen (-).
foo_library -repo:owner1/repoX -repo:owner2/repo
See also docs.github.com or github.community.

Related

How to specify "issues count" when searching repositories in Github Search

When searching for repositories, I can use the "help-wanted-issues" and "good-first-issues" search qualifiers to narrow down the search results like so.
react in:topics,readme language:javascript help-wanted-issues:>0
But I want to specify issues count when searching repositories regardless of their labels because not all issues are labeled with the labels I mentioned above. e.g,
react in:topics,description language:javascript issues:>0
I know that there are separate qualifiers to search for issues but I don't want to search for any specific issue, I want to search for repositories according to my search query which contains number of issues that I can specify. Is there any workaround?
I read the docs here and expected to find some search qualifier (other than "good-first-issues" and "help-wanted-issues") to specify number of issues in a repository.

perform aggregate computation on Github data

I have a Github (Enterprise) organization with multiple repositories. Each repository contains one or more .properties files.
Some of these .properties files will be contained in folders that contain "i18n" in their path or filename.
These .properties files would be relevant for translation processes.
As a basic step: I would need to get the average/min/max frequency of commits that involve translation-relevant files (as defined above) for each one of the repositories.
As an ideal scenario: I would also need to determine how many key-values were changed/added/deleted by each commit on average, to better determine the resulting workload for the translation process.
What I tried so far:
Github GraphQL APIs v4: seems to me that the API is very well suited for searching, but not as much for computing aggregations.
Github ReST APIs v3: specific commits can be searched, but not based on file extension. While file extensions are a query criteria for files themselves, they are not for commits.
Any hint on how to achieve this?

What file name should I use to identify GitHubs userName/repoName structure for analysis on my local filesystem without nested folders?

I am analyzing GitHub repositories.
To keep track of the same resource throughout multiple calculations I want to use the userName and repoName in combination as an identifier.
First I thought a simple
userName_repoName
would do but apparently repoNames can now contain underscores as well (userNames can't so splitting after the first underscore would work okay I guess)
So my question is:
Do you have any advice for me on how to create a cross-platform failsafe identifier from a GitHub userName and repoName?
I want to avoid nested folders as it is easier to keep track of same-level folders than multiple nested ones (one user with multiple repos)

looking for file in repo with certain properties

In GitHub, is it possible to search for repos with greater than 100 stars, and contains a file called "foo.txt"?
I am trying to use their search interface.
Not exactly, considering the GitHub search limitation (or "Considerations").
Only the default branch is considered. In most cases, this will be the master branch.
Only files smaller than 384 KB are searchable.
Only repositories with fewer than 500,000 files are searchable.
You must always include at least one search term when searching source code.
For example, searching for language:go is not valid, while amazing language:go is.

split a mercurial repo into different baby repos

The situation is, I once placed some conceptually related codes into one package in hope of interweaving them gradually later, but it turns out they eventually become independent of each other (can be safely separated). Therefore, I decide it's time to split them into different packages, but I'm not sure how to do it in a way so that I could also keep the respective version control history for each sub-package. Any ideas?
The Convert extension included with the standard distribution is used for this purpose. Specifically, check out the --filemap option, which can include, exclude and rename files and directories when converting from one database to another.