I'm trying to use LOAD CSV with a CSV file stored in GitHub. It works fine with the 10 minute, temporary token you get when viewing the raw file, but I want something that's more persistent, as I need to be able to deploy this to multiple environments. Ten minutes just won't cut it.
I figured a private access token would be the way forward, but (once again) GitHub's spectacularly poor quality documentation made this much harder than it should be.
I set up a private access token with the repo and read:org permissions and with this I can get at my files using CURL, e.g.
curl -s https://<my_token>#raw.githubusercontent.com/<my repo>/<path>/<my file>.csv
This works fine and I see the contents of my test file.
But if I try to navigate to that URL I just get a 404 error and if I use it in Neo4j with a LOAD CSV statement, I get an error couldn't load the external resource at:.
I'm basically doing this:
LOAD CSV WITH HEADERS FROM '<URL that worked in CURL>' AS row
...and it fails miserably.
Where:
LOAD CSV WITH HEADERS FROM '<URL for raw file from GitHub with 10 minute token>' AS row
works fine, so I know I can access external files, i.e. files not in the import directory.
Is this just a failing with GitHub, or am I doing something wrong?
Although I hate answering my own questions, I left this kicking around a while and nobody came back with anything that helped.
I now know a whole lot more about public access tokens than I ever wanted to, but it was all worthwhile, as it helped me get around this issue.
There's an apoc.load.jsonParams function that accepts bearer tokens. From here it didn't take too much work to get this working the same way that LOAD CSV had done.
There was one last gotcha though, I soon discovered that the URLs for the repository can't include spaces or other non-alphanumeric characters, but that's a small price to pay for success?
So this doesn't work:
LOAD CSV WITH HEADERS FROM 'https://<my_token>#raw.githubusercontent.com/<my repo>/<path>/<my file>.csv' AS row...
Instead I had to use:
CALL apoc.load.jsonParams("https://raw.githubusercontent.com/<my repo>/<path>/<my file>.json", {Authorization: "Bearer <token>"}, null) YIELD value WITH value AS row...
There's an equivalent apoc.load.csvParams procedure, but I never tested this.
Related
I want to find out the changed files between two given commits/branches/tags using the Bitbucket Rest API.
I tried to use the diff command from here
curl -u USER:PASSWORD https://REPO-URL/rest/api/latest/projects/PROJECT/repos/REPO/compare/diff?from=COMMITHASH1&to=COMMITHASH2
where CAPITAL words are place holders for actual values I cannot post here.
The result of the request is always something like
The command "to" is spelled wrong or cannot be found
(the original result is in German, so that might be the translation).
However, if I switch the query parameters like .../diff?to=...&from=..., it says that the command from is unknown. I also tried other similar diff queries like .../compare/changes?from=...&to=... or .../diff?since=...&until=..., but the result was similar as mentioned above. Also giving branch names instead of commit-hashs showed no result.
Therefore, my assumption was that the second query parameter cannot be handled correctly by the API.
Other basic queries on the API like .../branches work fine, so authentication or whatever is no problem.
What am I doing wrong? Do I need to wrap the commit-hashs into "" or something like that?
Thank you very much!
PS: As the repository is commercially used, I cannot give you the actual url, user or password to try for yourself.
I'm trying to get a history report for a repository connection over ManifoldCF REST API. According to the documentation:
https://manifoldcf.apache.org/release/release-2.11/en_US/programmatic-operation.html#History+query+parameters
It should be possible with the following URL (connection name: myConnection):
http://localhost:8345/mcf-api-service/json/repositoryconnectionhistory/myConnection
I have also tried to use some of the history query parameters:
http://localhost:8345/mcf-api-service/json/repositoryconnectionhistory/myConnection?report=simple
But I am not sure if I am using them correctly or how they should be attached to the URL, because it is not mentioned in the documentation.
The problem is also that I don't receive any error, but an empty object, so it is difficult to debug. The API returns an empty object even for a non-existing connection.
However it works for resources, which doesn't have any attributes, e.g.:
http://localhost:8345/mcf-api-service/json/repositoryconnectionjobs/myConnection
or
http://localhost:8345/mcf-api-service/json/repositoryconnections/myConnection
Thanks in advace for any help.
I also wrote a message to ManifoldCF team and they gave me an answer. So I summed up it for you below.
Query parameters go after the fixed "path" part of the URL and are of the form ?parameter=value¶meter2=value2...
So in the same way as in any other URL.
The problem was that I didn't supply the activity(s) that I wanted to match. Possible activities are e.g. fetch, process. My example:
http://localhost:8345/mcf-api-service/json/repositoryconnectionhistory/myConnection?activity=process&activity=fetch
Finally, the reason why I didn't get an error when I used a connection name that is bogus is because the underlying implementation is merely doing a dumb query and not checking for the legality/existence of the connection name.
I'm trying to search for all projects (or at least several thousand) from the github search API. I've gotten everything else to work, except the filters on filename.
For example, sending the following request to the search API only returns 1 result:
https://api.github.com/search/code?q=django+in:requirements.txt+filename:requirements.txt+language:python+org:openmicroscopy
Likewise, sending the following
https://api.github.com/search/repositories?q=filename:Makefile&per_page=100
only returns 1 result as well. I'm willing to bet that there is more than 1 repo on github with a Makefile or a dependency on Django. I must be doing something wrong, but I can't seem to figure out what it is.
According to this post on Github's developer site to support the expected volume of requests, they have added restrictions to code queries which requires us to specify set of users, organizations, or repositories with the query. Read about considerations for code search at this link
Now, about your search API requests, in the first one the in qualifier is provided with file name requirements.txt which is wrong.
The documentation states that in should be provided with file to restrict the search to the file contents, path to restrict the search to the file path or both.
Like this, in:file, in:path, in:file,path
So, if you want to search in file contents the correct API call should be
https://api.github.com/search/code?q=django+in:file+filename:requirements.txt+org:openmicroscopy
I removed the language qualifier since you are searching in a .txt file and doing this improved the result.
Checkout this URL, it will produce same results on the website,
https://github.com/search?utf8=%E2%9C%93&q=org%3Aopenmicroscopy+django+in%3Afile+filename%3Arequirements.txt&type=Code
Your second query is a repository search, it cannot not take a filename as qualifier you should see this link for available qualifiers.
I have seen several posts addressing this issue or similar to this issue for requests or GETs. I am not having this problem getting the data from the server, its solely on the POST.
The Errors I get are
The JSON request was too large to be deserialized.
or either
Error during serialization or deserialization using the JSON JavaScriptSerializer. The length of the string exceeds the value set on the maxJsonLength property. Parameter name: input
I haven't been able to consistently determine which actions result in which error, but it is predominately the latter one.
In an effort to get the value of the MaxJsonSize value, on the Index method of the controller, I get this data and dump it into a viewbag to write to console on the client side. Every time it comes back at 10k (102400).
If I reduce the data package size, and still serialize as previously, I get no errors.
In fiddler I can inspect the package and all the JSON is deserializable in fiddler, so I don't see an issue in my JSON. Additionally if I console.log(data) chrome sees no problems with it either.
The VM in the controller is the same for both POST and GET. With the exception there is more data with the POST than the GET. To test this I got a huge data set from the server;
GeoJSON data for all 50 states. Following was the result.
GET Content-Length: 3229309 return 200
POST Content-Length: 2975244 return 500
The POST failed in this scenario and returned the second error listed previously.
I only changed the data minimally (one string) and don't know why when sent back its smaller, but the JSON for both the GET and the POST is virtually identical.
I've tried changing the web.config file:
<system.web.extensions>
<scripting>
<webServices>
<jsonSerialization maxJsonLength="2147483644"/>
</webServices>
</scripting>
</system.web.extensions>
I just added this to the end of my config file just prior to
I've also added a parameter in Settings.config
<add key="aspnet:MaxJsonDeserializerMembers" value="2147483644" />
I have also verified that this param loads as part of the application settings in IIS.
Is there something else I can try to change to allow for these large data sets to be sent in a POST.
As a last resort, I was going to pull all of the GeoJSON data out of the POST. However when a user navigates back and they haven't changed what they were mapping, we'd have to find all the GeoJSON data again, causing undue work on the server etc. I thought if I only had to fetch it once that would be best from an efficiency perspective.
I struggled with this too, nothing I changed in web.config helped, despite several SO answers looking relevant. They helped with returning large JSON data, but the large JSON post kept failing. In the end I found this:
increase maxJsonLength for JSON POST and used the solution there, and it worked for me.
Quoting from there :
the MVC json serializer does not look at the webconfig to get the max length (thats for asp.net web services). you need to use your own serializer. you override ExecuteResult and supply you own json serializer. to override the input, create a new JsonValueProviderFactory, then override ValueProvider in the controller to return your new json factory when its a json request.
OK, here's a goal I've been looking for a while.
As it's known, most advertising and analytics companies use a so called "pixel" code in order to track websites views, transactions, conversion etc.
I do have a general idea on how it works, the problem is how to implement it. The tracking codes consist from few parts.
The tracking code itself.
This is the code that the users inserts on his webpage in the <head> section. The main goal of this code is to set some customer specific variables and to call the *.js file.
*.js file.
This file holds all the magic of CRUD (create/read/update/delete) cookies, track user's events and interaction with the webpage.
The pixel code.
This is an <img> tag with the src atribute pointing to an image *.gif (for example) file that takes all the parameters collected on the page, and stores them in the database.
Example:
WordPress pixel code: <img id="wpstats" src="http://stats.wordpress.com/g.gif?host=www.hostname.com&list_of_cookies_value_pairs;" alt="">
Google Analitycs:
http://www.google-analytics.com/__utm.gif?utmwv=4&utmn=769876874&etc
Now, it's obvious that the *.gif request has to reach a server side scripting language in order to read the parameters data and store them in a db.
Does anyone have an idea how to implement this in Zend?
UPDATE
Another thing I'm interested in is: How to avoid the user's browser to load the cached *.gif ? Will a random parameter value do the trick? Example: src="pixel.gif?nocache=random_number" where the nocache parameter value will be different on every request.
As Zend is built using PHP, it might be worth reading the following question and answer: Developing a tracking pixel.
In addition to this answer and as you're looking for a way of avoiding caching the tracking image, the easiest way of doing this is to append a unique/random string to it, which is generated at runtime.
For example, server-side and with the creation of each image, you might add a random URL id:
<?php
// Generate random id of min/max length
$rand_id = rand(8, 8);
// Echo the image and append a random string
echo "<img src='pixel.php?a=".$vara."&b=".$varb."&rand=".$rand_id."'>";
?>
Just adding my 2 cents to this thread because I think an important, and frequently used, option is missing: you don't necessarily need a scripting language to capture the request. A more efficient approach is to use the web server access log (like apache access log for instance) to log the request and then handle that log with whatever tools you see fit, like ELK stack for instance.
This makes serving the requests much lighter because no scripting language is loaded to prepare the response, just native apache response, which is typically much more efficient.
First of all, the *.gif doesn't need to be that file type, the only thing that is of interest is the Content-Type http header. Set that to image/gif (or any other, appropiate type) in the beginning, execute your code and render some sort of image to the response body.
Well, all of the above codes are correct and is good but to be certain, the guy above mention "g.gif"
You can just add a simple php code to write to an sql or fwrite("file.txt",$opened)
where var $opened serves as the counter++ if someone opened your mail... then save it as "g.gif"
TO DO all of this just add these:
<Files "/thisdirectory">
AddType application/x-httpd-php .gif
</Files>
to your ".htaccess" file but be sure to make a new directory for that g.gif or whatever.gif where the directory only contains g.gif and .htaccess