I'm working on a project that has a large number of json files that are never reviewed in pull requests but occasionally need to be changed. Recently we had to make minor changes to them, and github isn't allowing me to create a pull request with those changes. Instead it gives me:
This comparison is taking too long to generate.
Unfortunately it looks like we can’t render this comparison for you right now. It might be too big, or there might be something weird with your repository.
I checked the diff locally and the actual code changes are pretty minor (maybe 200 lines changed), but there are millions of changed lines in these json files. Is there any way to tell Github to ignore them? Right now I am unable to make a PR so the changes can't go through our normal company review process.
I've tried using the .gitattributes file with *.json linguist-generated=true unfortunately that had no effect.
Edit: As suggested in the accepted answer, I contacted github support about this case. Their suggestion was to create a new branch with a small commit, create the PR, and then merge the actual branch that I want to deploy into it. This will update the PR, and while the diff still won't display, it will let me create a PR.
I have just been able to solve this problem via the gh command line tool!
gh pr create
Maybe this option did not exist when the original question was posted, but I'll definitely be doing this in future.
When you compare two branches on GitHub, GitHub has to go compute the diff for those changes using Git and then render it for you. In your case, you have millions of lines of changes, and Git doesn't perform very well in this case because the algorithm that's used to compute diffs is O((N + M)D). Thus, if you have a number of differences proportional to the number of lines, the algorithm is essentially O(N²). Having a large N makes that even worse.
GitHub has a limit on long a request can take, so your large number of changes are just not going to render in the interface. It may be possible to choose the branches you want even though the diff won't render and still open the pull request. If not, you may need to resort to using the API, which won't generate the diff for rendering and therefore is likely to work a little better.
I would encourage you to let GitHub Support know about this, if you haven't been able to find a way to do it through the UI, since they can notify someone to make sure the interface is usable to create a PR even if the diff can't render. You probably aren't the first person to encounter this.
You may also want to store these files outside of Git on some sort of artifact server and pull them down to your repository based on hash, in which case you wouldn't have this pathological case.
Related
I made sweeping changes to 2 projects (call it Project A and B) in my team's repository in this PR *. After getting through testing, review, and getting comments resolved, I found out that a different team is working on conflicting changes to project A on a different PR and those changes need to go to staging first. Now I can't merge my PR because it has changes to A, but my changes to Project B need to be merged ASAP. Now the obvious answer is probably to abandon the PR, do a soft reset and start again with two branches, but my team wants to keep the comment threads on both A and B because they showcase a month of work and important design decisions made by multiple teams during the reviewing process. Is there a way to split my changes to Projects A and B into two separate PR's without losing all the comments?
My current solution is branch a new branch off the current one, manually undo my changes to Project A from the branch, push, and leave the comments to A dangling (I think there will be some comment view that the file reference is deleted), merge. Then create a new PR of the newest branch, copy and paste each of the 50+ dangling comments over to their respective files, resolve all, and merge. This is obviously not great because I lose some context and it will take hours. Other solutions I've seen on here will delete all existing comments altogether and I can't have that.
*Yes, I definitely shouldn't have done that.
There is no out of the box solution for this.
I would suggest to use Azure DevOps APIs, and even that isn’t a walk in the park.
Modus operandi
Choose your language, PowerShell, js, c#, etc and start exploring. Below a suggestion where to start:
Start with getting the original PR using the getbyid api
Since you mention the comments are the most import part, the next thing is to retrieving list with thread comments API.
Optional: Per comment iterate through the thread with the next get api, because the docs state: Represents a comment which is one of potentially many in a comment thread.
From this point, retrieve more info you need or start building the copy of the PR with, create PR and adding comments and other part you need to reuse.
Again, this is not easy and maybe retrieving and saving this in another format already helps you.
I did tried to find a solution for you online, but did find it (yet). Found other handy stuff like this quick and dirty js script to search through comments.
When creating a GitHub Pull Request, it is often that a file (script, lib, etc.) may be completely replaced (or introduced) with one from another repo. Sometimes, the file requires small changes. I'm trying to establish a standard for my team for how to communicate where the file came from and what changed. In the same way that you can craft a URL to highlight a specific change in a single repo, I'd like to be able to highlight a change across repos.
The reality may very well be that GitHub does not offer this. (I do a lot of research before asking questions. Consequently, the answer is often, "you couldn't find an answer because it is impossible.") In which case an alternative will be needed. One possibility might be to generate a diff in markdown and add it as a comment. (Notice I improved that answer back in 2016.)
One possibility might be to generate a diff in markdown and add it as a comment.
Good idea.
One alternative which would not depend on a PR comment would be to use git notes. They are not supported/displayed by GitHub since 2014 and they are criticised, but they would remain in your case possible way to leave... well a note describing where some of the PR files are coming from.
(Note: Github usage appears in bounds GitHub issues asked on Stack Overflow)
PRs that move code around, even within the same function, appear very onerous on Github, even when they don't do anything else. I have created a very basic PR to demonstrate this: https://github.com/tommyjcarpenter/github_test/pull/1/commits/2afb07ec5c6b56724bd10c6b56386299493bbb43. All the repo does is define two functions, and PRs a change to move the first below the second. The diff on github shows 20 lines removed and 21 added. One would assume the diff could be shown as a trivial "code move".
Now imagine this with a lot more functions and a lot more trivial code moves.
It seems git itself IS able to detect such changes: Using Git diff to detect code movement + How to use diff options
So, is there a way to swap out the diffing algorithm such that PRs like this don't look so onerous? Does github use it's own internal algorithm or does it use your default diffing algorithm?
(EDIT: this also appears to make account-level contributions on Github a bit misleading: someone that just moves code around may be shown to make a huge number of additions and deletions to a repository, thus giving the impression that they are a large contributor, when in fact they didn't contribute any functionality)
I suspect if this sort of feature existed, it would be a hidden feature like hiding-of-whitespaces in diffs.
The challenge I see for Github to implement different diffing algos is the implications to contribution metrics. Contribution metrics (i.e. contributers, PR size) would now have to be footnoted with the algorithm used in order to properly audit and review changes.
As a work around, you could separate formatting commits from functional change commits to at least be able to distinguish between the two via commit history.
I am using Mercurial, but I imagine that any merge tool that is aware of the version control system below it could do several things that a merge tool which is not aware of the version control system and only sees two "files" in two different folders, could never do.
I have been using KDIFF3, and recently tried BeyondCompare, and neither of them will do this, at least not that I could figure out.
What I want to do is best shown in this picture, an annotation column and perhaps even ability to open other windows from those annotation columns so I could browse specific versions of specific files to see context when trying to do a merge.
In the image here, I am showing a two way merge, but the same applies for a three way merge. To the right or to the left of the actual file content being shown, I would like a gutter or a right side annotation column showing some kind of annotation of where this change came from. Since Mercurial hex ids are relatively unfriendly and unhelpful, and since repository-local-revision-numbers are repository local, I think that a short text description based on commit comments would be most helpful. Of course, with Mercurial, 99% of these commit comments are going to say "Merge", and nothing else. (Groan.) But lets pretend for a minute that we weren't using tools and workflows that left us that crippled at merge time, and instead, that we could have a useful commit comment show up each time:
Right now the workflow for complex merges looks like this for me:
Using my distributed version control tool (mercurial), pull changes from another repository which is in effect a branch. Merge. The merge window for TortoiseHg is usually where I start all this from. This in turn lets me configure a merge tool (beyond compare or Kdiff3).
However, it does not appear that there is any merge tool (that I have seen) that can be told, "hey you're not just merging two way or three way with different versions of a file in the two completely different folders, with the names I told you, but those files are also files that have a complete edit history available to you to show your human the actual context, the commits that those line changes came from with their commit comments, often having a bug number as part of the commit which will give the person doing the merge the ability to see What in the Heck is Really Going on.
I would change from Mercurial to Git, for example, even, for a real merge experience that didn't force me to do manually what I think my tools could be doing for me automatically. I'm using Mercurial, TortoiseHG, and KDIFF3, and if I could just change from KDIFF3 to some other tool, or do ANYTHING at all to get annotations and merges together on one screen, I would like to do so.
I'm curious to get people's thoughts on how to manage version control for unrelated functions in Matlab.
I keep a reasonably large set of general purpose scripts, each of which is more or less independent of the others. I've been keeping them all in a single directory, containing a single repository in Mercurial. I'm starting to collaborate much more, and I'd like my collaborators to be able modify the files, commit, branch, and merge.
The problem is that the files are independent of one another. Essentially, they're like many separate little projects. But Mercurial treats the repository as a single entity. So if a collaborator modifies file A and B, and I only want to merge in the changes from file A, things get complicated. I know that I could merge from the collaborator, then revert file B, but I'm wondering if there's a simpler way to handle this setup.
I could set up many tiny repositories to manage each file separately, but that also gets complicated.
I'm open to changing version control systems (although I like Mercurial a lot). Any suggestions?
It is considered a best practice to check in code after each bug fix/feature addition/or what not. Given your files are really independent "projects" it seems unlikely a bug or feature would span multiple files. Probably the best you can do is encourage your colleagues in best practices to commit changes only for a single file at once. Explain that better discipline about checking in leads to more manageable source control later. Hopefully you can get most to follow the practice and the few obstinate ones just stop taking their commits for.
It really depends on your typical reasons for merging one change but not the other. If you're using it to create a software configuration, i.e. sometimes you want to use version 1 of file A and version 2 of file B and sometimes it's the other way around, then you probably want to use subrepos to hold each file. If it's because you never want to accept part of a collaborator's change, then they need to be instructed how to make their changes more cohesive and submit them separately. That can sometimes be a difficult concept for people who either haven't used source control before, or who are accustomed to source control like svn that has little or no intrinsic concept of a changeset.
It depends whether you want to maintain a single 'master' version of the files, merging in changes that you like and ignoring others. If collaborators want to develop other branches, then they should perhaps clone the repository, and you can then accept the changesets that you want in the master.
If you want to veto changes by other collaborators, then the changes either need to be kept separate (via a cloned repository or branch) or you need a review process before changes are pushed back to the trunk.
I always use incoming repositories for collaborators. They match what the other person has made, but it avoids messing with my own repository. When you do this, you can then cherrypick their new changesets into your own repository with the transplant extension.