looking for file in repo with certain properties - github

In GitHub, is it possible to search for repos with greater than 100 stars, and contains a file called "foo.txt"?
I am trying to use their search interface.

Not exactly, considering the GitHub search limitation (or "Considerations").
Only the default branch is considered. In most cases, this will be the master branch.
Only files smaller than 384 KB are searchable.
Only repositories with fewer than 500,000 files are searchable.
You must always include at least one search term when searching source code.
For example, searching for language:go is not valid, while amazing language:go is.

Related

How to specify "issues count" when searching repositories in Github Search

When searching for repositories, I can use the "help-wanted-issues" and "good-first-issues" search qualifiers to narrow down the search results like so.
react in:topics,readme language:javascript help-wanted-issues:>0
But I want to specify issues count when searching repositories regardless of their labels because not all issues are labeled with the labels I mentioned above. e.g,
react in:topics,description language:javascript issues:>0
I know that there are separate qualifiers to search for issues but I don't want to search for any specific issue, I want to search for repositories according to my search query which contains number of issues that I can specify. Is there any workaround?
I read the docs here and expected to find some search qualifier (other than "good-first-issues" and "help-wanted-issues") to specify number of issues in a repository.

Advanced search on github excluding a specific repository

I'm trying to figure out if there's any way to exercise the various fields defined for the github advanced search form that would allow me to effectively exclude hits from a specific repo. In other words I want to do a code search for all hits landing outside a given repository, an inverse repository search if you will.
I may be able to tune the size field with an inequality, but I'm hoping there's something I may be overlooking that has this sort of search in mind. My specific use case is that there's a major monorepo on our remote but there's a small constellation of support repositories which reuse some bits of the main repo that need to be refactored. I'm trying to identify those source hits in the smaller repos that need to be upgraded.
https://github.com/search/advanced?q=test&type=Repositories
Use -repo in the normal search. You can exclude a repository by prepending a hyphen (-).
foo_library -repo:owner1/repoX -repo:owner2/repo
See also docs.github.com or github.community.

perform aggregate computation on Github data

I have a Github (Enterprise) organization with multiple repositories. Each repository contains one or more .properties files.
Some of these .properties files will be contained in folders that contain "i18n" in their path or filename.
These .properties files would be relevant for translation processes.
As a basic step: I would need to get the average/min/max frequency of commits that involve translation-relevant files (as defined above) for each one of the repositories.
As an ideal scenario: I would also need to determine how many key-values were changed/added/deleted by each commit on average, to better determine the resulting workload for the translation process.
What I tried so far:
Github GraphQL APIs v4: seems to me that the API is very well suited for searching, but not as much for computing aggregations.
Github ReST APIs v3: specific commits can be searched, but not based on file extension. While file extensions are a query criteria for files themselves, they are not for commits.
Any hint on how to achieve this?

github search limit results

I need to do a very large search on Github for a statistic in my thesis.
For example, I need to explore a large number of Android projects on GitHub, but the site limits the search result to 1000 (ex. https://github.com/search?l=java&q=onCreate&ref=searchresults&type=Code&utf8=%E2%9C%93). Also using the Java GitHub API I tried the library org.eclipse.egit.github.core.client.GitHubClient using the method GitHubClient.searchRepositories() but even there the number of results is limited.
Does anyone know how to get all results?
The Search API will return up to 1000 results per query (including pagination), as documented here:
https://developer.github.com/v3/search/#about-the-search-api
However, there's a neat trick you could use to fetch more than 1000 results when executing a repository search. You could split up your search into segments, by the date when the repositories were created. For example, you could first search for repositories that were created in the first week of October 2013, then second week, then September, and so on.
Because you would be restricting search to a narrow period, you will probably get less than 1000 results, and would therefore be able to get all of them. In case you notice that more than 1000 results are returned for a period, you would have to narrow the period even more, so that you can collect all results.
https://help.github.com/articles/searching-repositories/#search-based-on-when-a-repository-was-created-or-last-updated
You should be able to automate this via the API.
If you are searching for all files in Github with filename:your-file-name, you could also slice it with a query attribute : size.
For example, you are looking for all files named test.rb in Github, Github API may return more than 11M results, but you could only get 1000 of them because the GitHub Search API provides up to 1,000 results for each search. An url like : https://api.github.com/search/code?q=filename:test.rb+size:1000..1500 would be able to slice your search by changing size range.

Merge two LLBLGEN 2 source files

I have two LLGLGEN 2.6 pro source files that I have to merge in my git repo (2 different branches). Due to the "professionnal" work of previous programmers on this project, the two projects have changes (the fork is 1 year old) that are not tracked in documents.
What can be the less painfull solution to finalize my merge ?
Thanks.
In my experience, it's easier to simply ignore the merge conflicts in the LLBL generated code and just re-sync the project to the database and then regenerate the code completely post-merge.
Where this becomes a problem is when there are a lot (or even a few) customizations made to the LLBL project file (e.g renaming fields, creating typed lists). There isn't much you can do about these outside of tracking them down one by one. The good news is the compiler will complain of something is missing or renamed.