How do I reassign users in an experiment bucket that has been closed in Intuit’s Wasabi? - ab-testing

Initially I had two buckets, “control group” and “treatment”. After running the experiment for a while (users got assigned to both buckets), I closed the “control group” bucket, and made the “treatment” bucket user allocation to 100%. However, I need to reassign all users who are in the “control group” bucket to the “treatment” bucket. How do I do that?

There is currently no way to do this easily.
The "just do it for all" method:
To assign all users to the same bucket, you can use this script on your assignments.csv (which can be downloaded via the UI):
for i in $( tail -n+2 assignments.csv | awk '{split($0,a,"\t");print a[2];}' ); do curl -X PUT -d '{"assignment": "NEW_BUCKET_NAME", "overwrite": true}' -H "Content-Type: application/json" http://HOST:PORT/api/v1/assignments/applications/APPNAME/experiments/EXPERIMENTLABEL/users/$i ; done
If you don't want the output to scroll around, you can add > output.log to store it to a log or >/dev/null to completely ignore it.
Note: there is no simple going back and depending on the number of assignments this will take a while. I will check if there are better methods, but I am not aware of any yet.
The "just do it when needed" method:
If you are able to change your production code, you can also just use the PUT assignment for each user individually when they come back to your website. Just use a PUT to:
http://HOST:PORT/api/v1/assignments/applications/APPLICATION/experiments/EXPERIMENT/users/USER
and make sure to send the header Content-Type: application/json and as the body: {"assignment": "NEW_BUCKET_NAME", "overwrite": true}.
(Since I am also one of the developers: I suggest you create an issue for this use case on the wasabi github repository, so we can easily track this issue and you can help us developing a solution for this problem.)

Related

Using a public access token for GitHub with Neo4j

I'm trying to use LOAD CSV with a CSV file stored in GitHub. It works fine with the 10 minute, temporary token you get when viewing the raw file, but I want something that's more persistent, as I need to be able to deploy this to multiple environments. Ten minutes just won't cut it.
I figured a private access token would be the way forward, but (once again) GitHub's spectacularly poor quality documentation made this much harder than it should be.
I set up a private access token with the repo and read:org permissions and with this I can get at my files using CURL, e.g.
curl -s https://<my_token>#raw.githubusercontent.com/<my repo>/<path>/<my file>.csv
This works fine and I see the contents of my test file.
But if I try to navigate to that URL I just get a 404 error and if I use it in Neo4j with a LOAD CSV statement, I get an error couldn't load the external resource at:.
I'm basically doing this:
LOAD CSV WITH HEADERS FROM '<URL that worked in CURL>' AS row
...and it fails miserably.
Where:
LOAD CSV WITH HEADERS FROM '<URL for raw file from GitHub with 10 minute token>' AS row
works fine, so I know I can access external files, i.e. files not in the import directory.
Is this just a failing with GitHub, or am I doing something wrong?
Although I hate answering my own questions, I left this kicking around a while and nobody came back with anything that helped.
I now know a whole lot more about public access tokens than I ever wanted to, but it was all worthwhile, as it helped me get around this issue.
There's an apoc.load.jsonParams function that accepts bearer tokens. From here it didn't take too much work to get this working the same way that LOAD CSV had done.
There was one last gotcha though, I soon discovered that the URLs for the repository can't include spaces or other non-alphanumeric characters, but that's a small price to pay for success?
So this doesn't work:
LOAD CSV WITH HEADERS FROM 'https://<my_token>#raw.githubusercontent.com/<my repo>/<path>/<my file>.csv' AS row...
Instead I had to use:
CALL apoc.load.jsonParams("https://raw.githubusercontent.com/<my repo>/<path>/<my file>.json", {Authorization: "Bearer <token>"}, null) YIELD value WITH value AS row...
There's an equivalent apoc.load.csvParams procedure, but I never tested this.

GitHub API - how to compare 2 commits

It's possible to get list of changed files between two commits.
Something like that
comparison between two commits in web version but using GitHub Api.
The official commit comparison API is Compare two commits:
GET /repos/:owner/:repo/compare/:base...:head
Both :base and :head can be either branch names in :repo or branch names in other repositories in the same network as :repo. For the latter case, use the format user:branch:
GET /repos/:owner/:repo/compare/user1:branchname...user2:branchname
Note that you can use tags or commit SHAs as well.
For instance:
https://api.github.com/repos/git/git/compare/v2.2.0-rc1...v2.2.0-rc2
Note the '...', not '..' between the two tags.
And you need to have the oldest tag first, then the newer tag.
That gives a status:
"status": "behind",
"ahead_by": 1,
"behind_by": 2,
"total_commits": 1,
And for each commit, information about the files:
"files": [
{
"sha": "bbcd538c8e72b8c175046e27cc8f907076331401",
"filename": "file1.txt",
"status": "added",
"additions": 103,
"deletions": 21,
"changes": 124,
"blob_url": "https://github.com/octocat/Hello-World/blob/6dcb09b5b57875f334f61aebed695e2e4193db5e/file1.txt",
"raw_url": "https://github.com/octocat/Hello-World/raw/6dcb09b5b57875f334f61aebed695e2e4193db5e/file1.txt",
"contents_url": "https://api.github.com/repos/octocat/Hello-World/contents/file1.txt?ref=6dcb09b5b57875f334f61aebed695e2e4193db5e",
"patch": "## -132,7 +132,7 ## module Test ## -1000,7 +1000,7 ## module Test"
}
]
BUT:
The response will include a comparison of up to 250 commits. If you are working with a larger commit range, you can use the Commit List API to enumerate all commits in the range.
For comparisons with extremely large diffs, you may receive an error response indicating that the diff took too long to generate. You can typically resolve this error by using a smaller commit range.
Notes:
"same network" means: two repositories hosted by the same Git repository hosting services (two repositories on github.com for example, or on the same on-premise GHE -- GitHub Enterprise -- instance)
You can therefore compare two branches between a repo and its fork.
Example:
https://api.github.com/repos/030/learn-go-with-tests/compare/master...quii:master
compare link
diff link
(this example compares a fork to its original repo, not the original repo to the fork: that is because the fork, in this case, is behind the original repo)
As noted by Tom Carver in the comments:
this suggested API silently maxes out at 300 files shown;
I haven't yet found an API that avoids this limitation
Investigating answers coming with the official API, one can find a barely mentioned way to get diffs from Github. Try this:
wget -H 'Accept: application/vnd.github.v3.diff' \
http://github.com/github/linguist/compare/96d29b76...a20631af.diff
wget -H 'Accept: application/vnd.github.v3.diff' \
http://github.com/github/linguist/compare/a20631af...96d29b76.diff
This is the link you provided as an example, with .diff appended. And the reverse diff of the same.
The header given makes sure the request is handled by the Github's v3 API. That's currently the default, but might change in the future. See Media Types.
Why two downloads?
Github serves linear diffs from older to newer versions, only. If the requested diff is indeed linear and from an older to a newer version, the second download will be empty.
If the requested diff is linear, but from a newer to an older version, the first download is empty. Instead, the whole diff is in the second download. Depending on what one want to achieve, one can normally apply it to the newer version or reverse-apply (patch -R) it to the older version.
If there is no linear relationship between the pair of requested commits, both downloads get answered with non-zero content. One from the common anchestor to the first commit and another, reversed one from this common anchestor to the other commit. Applying one diff normally and the other one reversed gives what applying the output of git diff 96d29b76..a20631af would give, too.
As far as I can tell, these raw diffs aren't subject to Github's API limitations. Requests for 540 commits with 1002 file changes went flawlessly.
Note: one can also append .patch instead of .diff. Then one still gets one big file for each, but a set of individual patches for each commit inside this file.
Traumflug's answer isn't correct if you are using the API to access private repos. Actually, I think that answer doesn't require the header since it works without it in a public repo anyways.
You should not put the .diff at the end of the url and you should use the api subdomain. If you want the diff specifically, you only need to put the appropriate media type header in the request (and the token for authentication).
So for example:
wget -H 'Accept: application/vnd.github.v3.diff' \
https://api.github.com/repos/github/linguist/compare/96d29b76...a20631af?access_token=123
GitHub's documentation is super confusing since it says it only works for branch names, but it also accepts commit shas. Also, the returned JSON includes a diff_url that is just a direct link to the diff but does not work if the repo is private, which isn't very helpful.
Here's another actual executable example using the HEAD and HEAD~1 references on my public repo DataApp--ParamCompare which should help illuminate the :owner and :repo notation once substituted with clear parameters.
curl -X GET https://api.github.com/repos/jxramos/DataApp--ParamCompare/compare/HEAD~1...HEAD
As a sanity check the equivalent browser representation can be seen at https://github.com/jxramos/DataApp--ParamCompare/compare/HEAD~1...HEAD
In general the form goes as the following to lend an alternate parameter syntax for the api routing:
https://api.github.com/repos/<owner_name>/<repo_name>/compare/HEAD~1...HEAD
One can also invoke a url such as
https://api.github.com/repos/jxramos/DataApp--ParamCompare/compare/80f0bb42606888ce7fc66b4402fcc90a1709c9e8...255fe089543f5569f90af54168af904e88fc150f
There should be an equivalent graphql means to just pare down and select those results under the files list to select all the filename values to lend something of a git diff --name-only type output straight from remote. I'll update this answer if I figure it out.
My take on this is that the graphql API doesn't conduct operations which is what a diff is, but rather allows to to query primitive types and properties and the like of the repo itself. You can see the sort of entities you're dealing with by looking at the schema itself https://developer.github.com/v4/public_schema/

Unable to accept songs submitted to moderated group via SoundCloud API

I'm getting 404 when hitting url like this, (offc. with variables changed to proper values):
PUT https://api.soundcloud.com/groups/<group_id>/pending_tracks/<track_id>
Calling DELETE on that same URL works as expected, it rejects submission from group.
Requesting simple GET .../pending_tracks (no track-id at the end) works fine for me.
The tools I have used so far to test this are:
official PHP library (by mptre),
manually constructed cURL request,
cURL binary on windows
I couldn't find any info in SoundCloud API docs (or on the internet) how this API method should or could be used. Any chance someone could help me with how it is supposed to be accessed properly, these are the questions:
what is the correct url
if there should be any, what is expected as the query data
if there a query body and what is the format.
More details:
Calling PUT /groups/44/pending_tracks/99119291 returns 404, so I've figured
out, the track ID must be supplied some other way.
By digging trough the PHP wrapper and gathering pieces of info scattered
around the internet, I've found out that some PUT requests are complemented
with CURLOPT_POSTFIELDS and other have XML in their body. So far I went with
postfields approach.
My curl binary config looks like this:
--url https://api.soundcloud.com/groups/44/pending_tracks
--cacert cacert.pem
--user-agent PHP-SoundCloud
--header "Accept: application/json"
--header "Authorization: OAuth XXXXXXXXXXXXXXXXXXXXXXXXXXXX"
--request PUT
--data <!--read on please-->
The data section was tested with following strings, each time supplying as
a value track-id like this track[id]=99119291:
track
track[]
track[id]
track-id
track_id
trackid
approve
approved
approve[]
approve[tracks][]
approved[tracks][]
tracks[approve][]
tracks[approved][]
approve[tracks][][id]
approved[tracks][][id]
tracks[approve][][id]
tracks[approved][][id]
tracks[]
tracks[][id]
tracks[][track][id]
tracks[][track][][id]
group[][id]
group[approve][]
group[approve][id]
group[approve][][id]
group[approved][]
group[approved][id]
group[approved][][id]
group[track][approve]
group[track][approve][]
group[track][approved][]
group[track][approve][id]
group[track][approve][][id]
group[track][approved][][id]
group[track][id]
group[tracks][id]
group[track][][id]
group[tracks][][id]
group[tracks][]
groups[][id]
groups[approve][id]
groups[approve][][id]
groups[approved][id]
groups[approved][][id]
groups[track][approve]
groups[track][approve][]
groups[track][approved][]
groups[track][approve][id]
groups[track][approve][][id]
groups[track][approved][][id]
groups[track][id]
groups[tracks][id]
groups[track][][id]
groups[tracks][][id]
Needless to say, none of those worked, each time result was the same as if I was accessing API endpoint with a simple GET request.
I'm really tired of blindly poking the SoundCloud API.
I'm sorry for you pain with the API, I fully agree it deserves a lot more and better documentation.
Anyway, while it's unclear to me why it got implemented like this, you should be able to approve tracks by sending a PUT request to api.soundcloud.com/groups/:group_id/tracks/:track_id. I hope that helps.

Restful way for deleting a bunch of items

In wiki article for REST
it is indicated that if you use http://example.com/resources DELETE, that means you are deleting the entire collection.
If you use http://example.com/resources/7HOU57Y DELETE, that means you are deleting that element.
I am doing a WEBSITE, note NOT WEB SERVICE.
I have a list that has 1 checkbox for each item on the list. Once i select multiple items for deletion, i will allow users to press a button called DELETE SELECTION. If user presses the button, a js dialog box will popup asking user to confirm the deletion. if user confirms, all the items are deleted.
So how should i cater for deleting multiple items in a RESTFUL way?
NOTE, currently for DELETE in a webpage, what i do is i use FORM tag with POST as action but include a _method with the value DELETE since this is what was indicated by others in SO on how to do RESTful delete for webpage.
One option is to create a delete "transaction". So you POST to something like http://example.com/resources/deletes a new resource consisting of a list of resources to be deleted. Then in your application you just do the delete. When you do the post you should return a location of your created transaction e.g., http://example.com/resources/deletes/DF4XY7. A GET on this could return the status of the transaction (complete or in progress) and/or a list of resources to be deleted.
I think rojoca's answer is the best so far. A slight variation might be, to do away with the javascript confirm on the same page, and instead, create the selection and redirect to it, showing a confirm message on that page. In other words:
From:
http://example.com/resources/
do a
POST with a selection of the ID's to:
http://example.com/resources/selections
which, if successful, should respond with:
HTTP/1.1 201 created, and a Location header to:
http://example.com/resources/selections/DF4XY7
On this page you will then see a (javascript) confirm box, which if you confirm will do a request of:
DELETE http://example.com/resources/selections/DF4XY7
which, if successful, should respond with:
HTTP/1.1 200 Ok (or whatever is appropriate for a successful delete)
Here's what Amazon did with their S3 REST API.
Individual delete request:
DELETE /ObjectName HTTP/1.1
Host: BucketName.s3.amazonaws.com
Date: date
Content-Length: length
Authorization: authorization string (see Authenticating Requests (AWS Signature Version 4))
Multi-Object Delete request:
POST /?delete HTTP/1.1
Host: bucketname.s3.amazonaws.com
Authorization: authorization string
Content-Length: Size
Content-MD5: MD5
<?xml version="1.0" encoding="UTF-8"?>
<Delete>
<Quiet>true</Quiet>
<Object>
<Key>Key</Key>
<VersionId>VersionId</VersionId>
</Object>
<Object>
<Key>Key</Key>
</Object>
...
</Delete>
But Facebook Graph API, Parse Server REST API and Google Drive REST API go even further by enabling you to "batch" individual operations in one request.
Here's an example from Parse Server.
Individual delete request:
curl -X DELETE \
-H "X-Parse-Application-Id: ${APPLICATION_ID}" \
-H "X-Parse-REST-API-Key: ${REST_API_KEY}" \
https://api.parse.com/1/classes/GameScore/Ed1nuqPvcm
Batch request:
curl -X POST \
-H "X-Parse-Application-Id: ${APPLICATION_ID}" \
-H "X-Parse-REST-API-Key: ${REST_API_KEY}" \
-H "Content-Type: application/json" \
-d '{
"requests": [
{
"method": "POST",
"path": "/1/classes/GameScore",
"body": {
"score": 1337,
"playerName": "Sean Plott"
}
},
{
"method": "POST",
"path": "/1/classes/GameScore",
"body": {
"score": 1338,
"playerName": "ZeroCool"
}
}
]
}' \
https://api.parse.com/1/batch
Interestingly, i think the same method applies as to PATCHing multiple entities, and requires thinking about what we mean with our URL, parameters and REST method.
return all 'foo' elements:
[GET] api/foo
return 'foo' elements with filtering for specific ids:
[GET] api/foo?ids=3,5,9
Wherein the sense is the URL and the filter determine "what elements we are dealing with?", and the REST method (in this case "GET") says "what to do with those elements?"
Hence PATCH multiple records to mark them as read
[PATCH] api/foo?ids=3,5,9
..with the data foo[read]=1
Finally to delete multiple records, this endpoint is most logical:
[DELETE] api/foo?ids=3,5,9
Please understand I don't believe there are any "rules" on this - to me it just "makes sense"
I would say DELETE http://example.com/resources/id1,id2,id3,id4 or DELETE http://example.com/resources/id1+id2+id3+id4. As "REST is an architecture (...) [not] protocol" to quote this wikipedia article there is, I believe, no single one way of doing this.
I am aware that above is not possible without JS with HTML but I get the feeling that REST was:
Created without thinking of minor details like transactions. Who would need to operate on more then single item? This is somehow justified in HTTP protocol as it was not intended to serve through it anything else other then static webpages.
Not necessary well adjusting into current models - even of pure HTML.
As Decent Dabbler answer and rojocas answer says, the most canonical is using virtual resources to delete a selection of resources, but I think that is incorrect from a REST perspective, because executing a DELETE http://example.com/resources/selections/DF4XY7 should remove the selection resource itself, not the selected resources.
Taking the Maciej Piechotka anwser or the fezfox answer, I only have an objection: There's a more canonical way of pass an array of ids, and is using the array operator:
DELETE /api/resources?ids[]=1a2b3c4d-5e6f-7a8b-9c0d-1e2f3a4b5c6d&ids[]=7e8f9a0b-1c2d-3e4f-5a6b-7c8d9e0f1a2b
In this way you are attacking to the Delete Collection endpoint but filtering the deletion with a querystring in the right way.
I had the same situation to delete multiple items. This is what I ended up doing. I used DELETE operation and the ids of items which were to be deleted were part of HTTP header.
As there is no 'proper' way to do this, what I have done in the past is:
send DELETE to http://example.com/something with xml or json encoded data in the body.
when you receive the request, check for DELETE, if true, then read the body for the ones to be deleted.

What is the recommended/effective request payload for a REST PUT method?

I see two types of examples in various places. One uses form fields like
curl -X PUT -d "phone=123.456.7890" "http://127.0.0.1/services/rest/user/123"
and the other uses an XML content like (some variation of) this
echo "<user><id>123</id><phone>123.456.7890</phone></user>" | curl -X PUT -d #- "http://127.0.0.1/services/rest/user/"
It seems like using the form fields has the advantage of brevity and clearly identifying the client's intent by targeting just the modified fields, but makes it awkward to address "deeper" metadata.
Using the XML content has an advantage of being more complete, but the disadvantage of the overhead of figuring out which field the client is actually modifying (assuming that they send back the entire resource with small modifications).
Is there a best practice, or even a more-common practice?
It could be something like JSON(P)? (I'm not sure about exact syntax):
$ echo '{user: {id: 123, phone: 123.456.7890}}' |\
> curl -X PUT -d #- 'http://127.0.0.1/services/rest/user/'
Or
$ echo '{phone: 123.456.7890}' |\
> curl -X PUT -d #- 'http://127.0.0.1/services/rest/user/123.json'
In the second example URL does not refer to a specific resource, so IMHO it's not RESTful.
If you fix that, the choice comes down to form and XML encoding.
If you need structured and extensible data, then XML might be useful:
<phone type="work, mobile"><num>555-555</num><ext>123</ext></phone>
but not neccessary:
phone=555-555&phone-ext=123&phone-type=work&phone-type=mobile
Lots of API users may get XML encoding wrong, have trouble grasping namespace indirection, so form encoding might be better for wide audience.
Good question! I don't know of a specific best practice or common practice. But I do want to point out that the question isn't really about form fields or XML, it's about partial representations vs. full representations. You've succinctly described the practical differences between them. One aspect of the question is who has the responsibility to determine what has changed: the client or the server.
A hybrid option would be some kind of format wherein a client could specify what exactly has changed, using some syntax to point to "deeper" metadata, such as XPath or JSONpath, along with the new value.