Get file content on Github with GraphQL API and Python - github

Given a list of repositories on GitHub with 'repoName' and 'userName', I need to find all the '.java' files, and get the content of the java files. Tried to use RestAPI first, but ran into the rate limit very easily, now I'm switching to GraphQL API, but don't know how to achieve that.

Here is how I would do it:
Algo Identify_java_files
Entry: path the folder
Out: java files in the folder
Identify all files in the folder of the repository
For each file
if the type of the file is "blob":
if ".java" is the end of the name of the file
get its content
else if the type of the file is "tree":
Identify_java_files(path of the tree file)
You can easily implement this pseudo code using Python. My pseudo code makes the assumption to use recursion, but it can be done otherwise, it's just for the example. You will need the requests and json libraries.
Here are the queries you might need, and you can use the explorer to test them.
{
repository(name: "checkout", owner: "actions") {
defaultBranchRef {
name
}
}
}
This query allows you to check the name of the default branch of the repository. You will need it for the following queries, or you can use a specific branch but you will have to know its name. I don't know (and don't believe) if you can get all the branches names for a repository.
{
repository(name: "checkout", owner: "actions") {
object(expression: "main:") {
... on Tree {
entries {
path
type
}
}
}
}
}
This query gets the content of the root folder for a specific repository. The expression: "main:" refers to the branch of the repository along with the path. Here the branch is main and the path is empty (it comes after the ":"), meaning we are looking at the root folder. Some repositories are using "master" as default branch, so be sure of which branch to use.
To check the content of a file, you can use this accepted answer.
I updated the example in order for you to be able to try it.
{
repository(name: "checkout", owner: "actions") {
object(expression: "main:.github/workflows/test.yml") {
... on Blob {
text
}
}
}
}
You send your requests to the API using requests or alike, and store the responses as JSON or alike for treatment.
As a side note, I do not think it is possible to achieve this without issuing multiple queries. I recently had to do something similar, and this is my first SO answer, so let me know if anything is unclear.
Edit:
You can use this answer to list all files in a repository.

Related

Best practice for using variables to configure and create new Github repository instance in Terraform instead of updating-in-place

I am trying to set up a standard Github repository template for my organization that uses Terraform to spin up new repos with the configured settings.
Every time I try to update the configuration file to create a new instance of the repository with a new name, instead it will try to update-in-place any repo that was already created using that file.
My question is what is the best practice for making my configuration file reusable with input variables like repo name? Should I make a module or is there some way of reusing that file otherwise?
Thanks for the help.
Terraform is a desired-state-configuration system, which means that your configuration should represent the full set of objects that should exist rather than an instruction to create a single object.
Therefore the typical way to add a new repository is to add a new resource block declaring that new repository, and leave the existing ones unchanged. Terraform will then see that there's a new resource not currently tracked in the state and will propose to create it.
If your repositories are configured in some systematic way that you can describe using a mechanical rule rather than manual configuration then you can potentially use the for_each meta-argument to declare multiple resource instances from the same resource block, using Terraform language expressions to describe the systematic rule.
For example, you could create a local value with a higher-level data structure that describes what should be different between your repositories and then use that data structure with for_each on a single resource block:
locals {
repositories = tomap({
example_1 = {
description = "First example repository"
}
example_2 = {
description = "Second example repository"
}
})
}
resource "github_repository" "all" {
for_each = local.repositories
name = each.key
description = each.value.description
private = true
}
For simplicity in this example I've only made the name and description variable between the instances, but you can add whatever extra attributes you need for each of the elements of local.repositories and then access them via each.value inside the resource block.
The private argument above illustrates how this approach can avoid the need to re-state argument values that will be the same for each declared repository, and have your local.repositories data structure focus only on the minimum attributes needed to describe the variations you need for your local policies around GitHub repositories.
A resource block with for_each set appears as a map of objects when used in expressions elsewhere, using the same keys as in the map given in for_each. Therefore if you need to access the repository ids, or any other attribute of the systematically-declared objects, you can write Terraform expressions that work with maps. For example, if you want to output all of the repository ids as a map of strings:
output "repository_ids" {
value = tomap({
for k, r in github_repository.all : k => r.repo_id
})
}

Commit a whole folder with modified files in it with SharpSvn.

I am using SharpSvn in my C# project. I am having a folder with some text files in it and another folders with subfolders in it. I am adding the folder under version control and commit it. So far, so good. This is my code:
using (SvnClient client = new SvnClient())
{
SvnCommitArgs ca = new SvnCommitArgs();
SvnStatusArgs sa = new SvnStatusArgs();
sa.Depth = SvnDepth.Empty;
Collection<SvnStatusEventArgs> statuses;
client.GetStatus(pathsConfigurator.FranchisePath, sa, out statuses);
if (statuses.Count == 1)
{
if (!statuses[0].Versioned)
{
client.Add(pathsConfigurator.FranchisePath);
ca.LogMessage = "Added";
client.Commit(pathsConfigurator.FranchisePath, ca);
}
else if (statuses[0].Modified)
{
ca.LogMessage = "Modified";
client.Commit(pathsConfigurator.FranchisePath, ca);
}
}
}
I make some changes to one of the text files and then run the code againg. The modification is not committed. This condition is false:
if (statuses.Count == 1)
and all the logic in the if block does not execute.
I have not written this logic and cannot quite get this lines of code:
client.GetStatus(pathsConfigurator.FranchisePath, sa, out statuses);
if (statuses.Count == 1) {}
I went on the oficial site of the API but couldn`t find documentation about this.
Can someone that is more familiar with this API tell what these two lines do ?
What changes need to be done to this code so if some of the contents of the pathsConfigurator.FranchisePath folder are changed, the whole folder with the changes to be commited. Any suggestions with working example will be greatly appreciated.
Committing one directory with everything inside is pretty easy: Just call commit on that directory.
The default Depth of commit is SvnDepth.Infinity so that would work directly. You can set additional options by providing a SvnCommitArgs object to SvnClient.Commit()

Extend multiple sources / indexes

I have many web pages that are clones of each other. They have the exact same database
structure, just different data in different databases (each clone is for a different country so everything is
separated).
I would like to clean up my sphinx config file so that I don't duplicate the same queries
for every site.
I'd like to define a main source (with db auth info) for every clone, a common source for
every table I'd like to search, and then sources&indexes for every table and every clone.
But I'm not sure how exactly I should go about doing that.
I was thinking something among this lines:
index common_index
{
# charset_type, stopwords, etc
}
source common_clone1
{
# sql_host, sql_user, ...
}
source common_clone2
{
# sql_host, sql_user, ...
}
# ...
source table1
{
# sql_query, sql_attr_*, ...
}
source clone1_table1 : ???
{
# ???
}
# ...
index clone1_table1 : common_index
{
source: clone1_table1
#path, ...
}
# ...
So you can see where I'm confused :)
I though I could do something like this:
source clone1_table1 : table1, common_clone1 {}
but it's not working obviously.
Basically what I'm asking is; is there any way to extend two sources/indexes?
If this isn't possible I'll be "forced" to write a script that will generate my sphinx config file to ease maintenance.
Apparently this isn't possible (don't know if it's in the pipeline for the future). I'll have to resort to generating the config file with some sort of script.
I've created such a script, you can find it on GitHub: sphinx generate config php

How two people, concurrently editing the same file is handled?

I believe the title says it. I'm new to source control thingy.
So, let's say I have two developers working on the same project and they started editing the same file(s) at the same time then everyone of them send the new version at a slightly different time. From what I understand the one who sends the changes last will have his changes kept, the other one's code will be in the archive only!!!
Is that correct?
Please clarify. Thanks.
No, that's not quite correct. It depends somewhat on which version control software you're using, but I like Git so I'll talk about that.
Suppose we have a file Foo.java:
class Foo {
public void printAWittyMessage() {
// TODO: Be witty
}
}
Alice and Bob both modify the file. Alice does this:
class Foo {
public void printAWittyMessage() {
System.out.println("Alice is the coolest");
}
}
and Bob does this:
class Foo {
public void printAWittyMessage() {
System.out.println("Alice is teh suk");
}
}
Alice checks her version in first. When Bob attempts to check his in, Git will warn him that there is a conflict and won't allow the commit to be pushed into the main repository. Bob has to update his local repository and fix the conflict. He'll get something like this:
class Foo {
public void printAWittyMessage() {
<<<<< HEAD:<some git nonsense>
System.out.println("Alice is the coolest");
=====
System.out.println("Alice is teh suk");
>>>>> blahdeblahdeblah:<some more git nonsense>
}
}
The <<<<<, ===== and >>>>> markers show which lines were changed simultaneously. Bob must resolve the conflict in some sensible way, remove the markers, and commit the result.
So what eventually lives in the repository is:
Original version -> Alice's version -> Bob's conflict-fixed version.
To summarise: the first to commit gets in without any problems, the second to commit must resolve the conflict before getting into the repository. You should never end up with someone's changes being clobbered automatically. Obviously Bob can resolve the conflict incorrectly but the beauty of version control is that you can roll back the incorrect fix and repair it.
Much depends on the system you're using.
However in the common case is: who commits his changes second would have to perform a "merge" operation. Meaning s/he would need to compare the two files and come up with a merged version. However (!) many popular system (including IDE) comes with smart tools to assist you doing that.
Here are some tools like that compared:
http://en.wikipedia.org/wiki/Comparison_of_file_comparison_tools

How can I Diff a Svn Repository using SharpSvn

My question is quite simple and with the SharpSvn Api, it should be easy as well. Here what I did:
path = "c:\project";
using (SvnLookClient client = new SvnLookClient())
{
SvnLookOrigin o = new SvnLookOrigin(path);
Collection<SvnChangedEventArgs> changeList;
client.GetChanged(o, out changeList); // <-- Exception
}
and when I call the GetChanged, I get an exception:
Can't open file 'c:\project\format': The system cannot find the file specified.
So, Maybe there is something I'm missing? Or maybe it's not the right way to do find out the list of files and folders that were modified in the local repository?
Thanks in advance.
The SvnLookClient class in SharpSvn is the equivalent to the 'svnlook' console application. It is a low level tool that enables repository hooks to look into specific transactions of a repository using direct file access.
You probably want to use the SvnClient class to look at a WorkingCopy and most likely its Status() or in some cases simpler GetStatus() function to see what changed.
The path that the SvnLookOrigin constructor wants is actually:
path = "c:\project\.svn\";
That is, it wants that special ".svn" directory not just the root of where the source is checked out to.
Although you probably do want to listen to Bert and do something like:
path = "c:\project";
using (SvnLookClient client = new SvnLookClient())
{
SvnLookOrigin o = new SvnLookOrigin(path);
Collection<SvnChangedEventArgs> changeList;
client.GetStatus(o, out changeList); // Should now return the differences between this working copy and the remote status.
}