GitHub: Search for String Inside of a File in a Repo - github

TLDR: I want to use the GitHub search API to find a list of all repos with a "search-string" inside of the top-level Jenkinsfile. Does GitHub allow that?
I've read numerous SO posts and GitHub search/API docs including:
How to search for code in GitHub with GitHub API?
How to search for code in github, with github API?
https://developer.github.com/v3/search/#search-code
And can't find the answer to my issue.
I'm first trying to use the GitHub code search API to search for a specific substring (only two words with a dash joining them: "search-string") in a specific file, but I can't figure out how to do it. I've tried numerous combinations of simple & advanced searches, but usually get zero results. I.e.
1 result (obvious): repo:repo/redacted
0 results: repo:repo/redacted search-string
0 results: search-string repo:repo/redacted filename:Jenkinsfile
0 results: search-string repo:repo/redacted in:Jenkinsfile
And quite a few more combos.
Once I get it working on the GitHub website, I'll convert it to an API call, which shouldn't be an issue.
Thanks in advance!

I was having a similar problem where I was trying to use the GitHub search web interface to find instances of a particular filename in my code, which had a name including underscore characters and a number, like my_image_asset_2.svg.
Searching on that string within my repository (or organization) unexpectedly returned zero results (in the "Code" results type), using a search term like:
repo:repo/redacted my_image_asset_2.svg
Even trimming out the number and extension from my search term still unexpectedly returned zero results:
repo:repo/redacted my_image_asset
A workaround that finally stumbled on that got GitHub to return the code I was looking for was to (1) drop all punctuation characters from my filename, and (2) enclose the filename in quotes:
repo:repo/redacted "my image asset 2 svg"
This might not be a perfect solution in all cases; I imagine it might also match filenames like my-image-asset-2.svg. But depending on your use case, it might be "good enough"?

Related

Search github repo for term with special characters using sourcegraph?

The top answer to How to search on GitHub to get exact string matches, including special characters shows a way to search GitHub for terms that include special characters using a tool called sourcegraph.
I got that working:
https://sourcegraph.com/search?q=context:global+.where%28&patternType=literal
but I'd like to narrow the search to a specific repo (not all of GitHub) - how can I do that?
Example
Here's the exact search I tried on GitHub:
https://github.com/sharetribe/sharetribe/search?q=.where%28
(it searches for where instead of .where(.
Here's the search on sourcegraph:
https://sourcegraph.com/search?q=context:global+.where%28&patternType=literal
It returns results for all of GitHub rather than the specific repo sharetribe/sharetribe.
How can I limit this search to one repo?
You can limit this search by using the repo filter:
https://sourcegraph.com/search?q=context:global+repo:sharetribe/sharetribe+.where%28&patternType=literal

Which names does GitHub recognize as README's?

I know there are several names that get recognized by GitHub as README's, e.g. README, README.md.
What is an exhaustive list of such names?
I found a searching rule
const PATTERN = /^readme\.(?:markdown|mdown|mkdn|md|textile|rdoc|org|creole|mediawiki|wiki|rst|asciidoc|adoc|asc|pod|txt)/i

How can I search by filename and filter by stars on Github?

According to the Github Advanced Search docs, you can Query for values greater or less than another value using syntax like:
cats stars:>1000
This returns repositories with the word "cats" that have more than 1000 stars.
Also according to the Github Advanced Search guide for Search by filename,
cats filename:package.json
This will return matches for the word "cats" inside the package.json file in repositories.
Given these two features (and possibly others I haven't mentioned here), I would like to search for specific files in repositories that have greater than a given number of stars.
plugin-syntax-dynamic-import filename:babel.config.js stars:>1000
This doesn't currently return the results I'm looking for - it returns matches for plugin-syntax-dynamic-import in babel.config.js but it also returns items with less than 1000 stars.
How can I search by filename and filter by stars on Github?

Postgres full text search ignore url

I am trying to use PostgreSQL to implement a full-text search system.
I encounter this strange or may be intended feature with that.
While trying to index or search for a column which contains names of files with extension (e.g. myimage.jpg), the system treats it as a url and does not properly tokenize.
I referred to the documentation and see that via ts_debug that the file name is taken as a host of a url.
Could some one tell how to take all inputs as normal word in the FTS of PostgreSQL.
Also, on a second request, how can one do a contains, startswith, and endswith searches with it?
Update
I have now tried the statement create text search configuration..., copied from pg_catalog.english and removed host,url, and url_path and then specified the configuration for the ts_debug method. But still no go., myimage.jpg is still identified as host.
Version
I use version 9.4
tl;dr Look at pre-parsing your input and removing punctuation if you really only want words (and not emails, urls, hosts, etc).
So after trying to figure this out myself the issue is that you don't seem to be able to easily customise the parser. From my understanding the parser runs first, which generates tokens. Those tokens are then matched to dictionaries.
By removing host, url, url_path from the configuration all you are doing is making it so that these tokens don't get looked up in a dictionary, resulting in no lexeme from these tokens. Which essentially means that they don't exist in terms of search. Which is not want you want...
Ideally what you need to do is customise the parser to not generate those tokens in the first place, or to also generate overlapping tokens (similar to how hyphenated words generate a token for the entire word as well as individual components) . This doesn't seem to be possible at the moment without writing a custom parser.
The only solution to this would be to pre-parse the text to remove the full stop. Note that if you rely on other types of tokens like version (e.g. 8.3.0) or email (e.g. name#domain.com) this will break those. So you may need to be a bit clever on how you remove characters.
select ts_debug('english', replace('this-is-a-file.jpg', '.', ' '));
"(asciihword,"Hyphenated word, all ASCII",this-is-a-file,{english_stem},english_stem,{this-is-a-fil})"
"(hword_asciipart,"Hyphenated word part, all ASCII",this,{english_stem},english_stem,{})"
"(blank,"Space symbols",-,{},,)"
"(hword_asciipart,"Hyphenated word part, all ASCII",is,{english_stem},english_stem,{})"
"(blank,"Space symbols",-,{},,)"
"(hword_asciipart,"Hyphenated word part, all ASCII",a,{english_stem},english_stem,{})"
"(blank,"Space symbols",-,{},,)"
"(hword_asciipart,"Hyphenated word part, all ASCII",file,{english_stem},english_stem,{file})"
"(blank,"Space symbols"," ",{},,)"
"(asciiword,"Word, all ASCII",jpg,{english_stem},english_stem,{jpg})"
In terms of your second question. Are you talking about partial word matches? You get this a little bit with the stemming when using a config like english, so running becomes run which will match if you search for run or running. If you're talking about fuzzy matching it gets a little more complicated. I suggest reading this article http://rachbelaid.com/postgres-full-text-search-is-good-enough/

how to search for gist by name

Is there a way to find a Gist from the name (or description)?
I was watching a YouTube video discussion and one of the participants brought up a Gist. It was too small to read on the video, but the name at the top was clear (dhh/test_induced_design_damage.rb); however, I wasn't able to use that name to find the Gist. (Eventually I found a raw link on a Twitter feed, with a 20-digit hex number. The Gist is public.) I later tried several different searches to see if there was a way I could find it by name, and I tried looking in Github's Help, but I couldn't find a way. Did I miss something, or is there just no way to do this?
If you know the username you can go to https://gist.github.com/username/ and then search through them, but that only works if it's not an anonymously posted or private gist. There's not a nice way to get to a Gist unless you've got the link if you don't know who posted it.
In your case, the Gist is available as the first one at the moment under https://gist.github.com/dhh.
prefixes are available eg filename:*design_damage.rb
top result: dhh / test_induced_design_damage.rb
filename:.bashrc Find all gists with a ".bashrc" file.
cat language:html Find all cat gists with HTML files.
join extension:coffee Find all instances of join in gists with a coffee
extension.
system size:>1000 Find all instances of system in gists
containing a file larger than 1000kbs.
cat stars:>100 Find cat gists with greater than 100 stars.
user:defunkt Get all gists from the user defunkt.
cat anon:true Include anonymous gists in your search for cat-related gists.
NOT cat Excludes all results containing cat.
cat fork:only Search all forked gists for results containing cat.