Is it possible to use Kdb+ http client to access pages protected by login? I am using https://github.com/KxSystems/cookbook/blob/master/yahoo.q as example of basic GET/POST. Does anyone have an example how to extract a cookie and use it in the following requests?
It is probably a bit crude, but the following will extract headers from an http, then cookies, parse and return as a dictionary:
x:"HTTP/1.0 200 OK\r\nContent-type: text/html\r\nSet-Cookie: theme=light\r\nSet-Cookie: sessionToken=abc123; Expires=Wed, 09 Jun 2021 10:18:14 GMT\r\n\r\n";
left:{(first y ss x)#y};
vs1:{{(y#x;(count[z]+y)_x)}[y;;x](first y ss x)};
headers:{{(`$x[0];x[1])} flip vs1[": "] each 1_"\r\n" vs left["\r\n\r\n"]x};
cookies:{(!). {(`$x[0];x[1])} flip vs1["="] each {x[1]#where x[0]=`$"Set-Cookie"} x};
cookies headers[x]
Whilst you might be able to various bits and bobs from an http response, the fact that you won't be able to manipulate http methods means that q can't be your tool to do this - well, not without some vigorous effort.
I would use something like Beautiful Soup in conjunction with q. Soup has some great tools for handling this kind of thing (e.g. cookies etc). There are various other similar projects too.
System call for Beautiful Soup that make relevant get/post/put calls and download required data
system"/path/to/code.py"
Where the code dumps the result somewhere or puts it into kdb directly. Then do whatever you like with it.
Related
First off thanks for reading!
Second off YES I have tried to find the answer! :) Perhaps I haven't found it because I'm not using the right words to describe my problem, but it's been about 4 hours that I've been trying to figure it out now and I'm getting a little loopy trying to piece it together on my own.
I am very new to programming. Python is my first language. I am on my third Python course. I have an assignment to use the socket library (not urllib library - I know how to do that) to make a socket and use GET to receive information. The problem is that the program needs to take raw input for the URL in question.
I have everything else the way I want it, but I need to know the syntax that I'm supposed to be using INSIDE my "GET" request in order for the HTTP message to include the requested document path.
I have tried (obviously not all together lol):
mysock.send('GET (url) HTTP/1.0\n\n')
mysock.send( ('GET (url) HTTP:/1.0\n\n'))
mysock.send(('GET (url) HTTP:/1.0\n\n'))
mysock.send("GET (url) HTTP/1.0\n\n")
mysock.send( ("'GET' (url) HTTP:/1.0\n\n"))
mysock.send(("'GET' (url) 'HTTP:/1.0\n\n'"))
and:
basically every other configuration of the above (, ((, ( (, ', '' combinations listed above.
I have also tried:
-Creating a string using the 'url' variable first, and then including it inside mysock.send(string)
-Again with the "string-first" theory, but this time I used %r to refer to my user input (so 'GET %r HTTP/1.0\n\n' % url basically)
I've read questions here, other programming websites, the whole chapter in the book and the whole lectures/notes online, I've read articles on the socket library and the .send(), and of course articles on GET requests... but I'm clearly missing something. It seems most don't use socket library when they can use urllib and I don't blame them!!
Thank you again...
Someone from the university posted back to me that the url variable can concatenated with the GET syntax and assigned to a string variable which can then be called with .send(concatenatedvariable) - I had mentioned trying that but had missed that GET requires a space after the word 'GET' so of course concatenating didn't include a space and that blew it. In case anyone else wants to know :)
FYI: A fully quallified URL is only allowed in HTTP/1.1 requests. It is not the norm, though, as HTTP/1.1 requires setting the Host header. The relevant piece of reading would've been RFC 7230, sec. 3.1.1 and possibly RFC 3986. The syntax of the parameters is largely borrowed from the CGI format. It is in no way enforced, however. In a nutshell, everything put together would look like this on the wire:
GET /path?param1=value1¶m2=value2 HTTP/1.1
Host: example.com
As a final note: The line delimiter in HTTP is CRLF (\r\n). For robustness, a simple linefeed is acceptable as well but not recommended.
I need to access the Date: header when I handle the request, but this seems to be "swallowed" by the framework; any other header (even made up FooBar ones) show up and I can get them, but this gives me None (I'm using Postman to send a simple GET request - everything else works just fine):
println("Date: " + request.headers.get("Date").getOrElse("no date!"))
returns "no date!" no matter how I try to send something sensible.
I'm wondering whether this gets processed before the request object reaches my Action.
I need the actual string value sent, as this should be part of the request's signature - so an equivalent Date object representing the same value would not be of much use (as it needs to be part of the hash, to avoid replay attacks).
Just as a test, I replaced the Date header with a Date-Auth one, and this one shows up just fine:
ArrayBuffer((Date-Auth, ArrayBuffer(Wed, 15 Nov 2014 06:25:24 GMT))
Any ideas or suggestions greatly appreciated!
Are you sure there is a Date Header in your request (tested with tools like firebug or wireshark)?
Browsers do not need to send a Date header.
RFC 2616 (HTTP 1.1) from the Date section (14.18)
Clients SHOULD only send a Date header field in messages that include an entity-body, as in the case of the PUT and POST requests, and even then it is optional. A client without a clock MUST NOT send a Date header field in a request.
I stand corrected - it turns out that Chrome blocks a whole bunch of headers:
http://www.getpostman.com/docs/requests
I wrote a Python Flask test server and, in fact, the Date header is not there.
That page has also a fix, which works just fine with Postman Version 0.10.4.3 and Interceptor(1).
sorry for wasting everyone's time!
1 Incidentally, IMO Postman is the best REST client and has now also some awesome looks, beyond incredible functionality. If you're working with REST APIs, I highly recommend it.
I would like to represent dynamic images in an email. For example with the given url
<img src="http://myserver.com/index.php/user_key/thispagestate.jpg" />
I would like to serve a different image based on logic within my server. There will only be between 2 to 4 static images used to represent the result of any given request.
The 2 options I had in mind were:
to serve the images directly using perhaps
imagecreatefromjpeg
Or generate 302 redirects
Seeing as each request will result in one of a limited number of images I thought a redirect might save resources on our end and make use of caching on the user's end too. The result for each request will change depending on the user and time, perhaps using redirects will have some consecuence for SEO or spam filtering?
Your opinions on the best method will be appreciated
The 2 options I had in mind were:
to serve the images directly using
perhaps imagecreatefromjpeg Or
generate 302 redirects
I'd go with #1 in this case, though since it's a static image you can simply use:
header("Cache-Control: no-cache, must-revalidate"); // HTTP/1.1
header("Expires: Sat, 26 Jul 1997 05:00:00 GMT"); // Date in the past
header('Content-Type: image/jpg'); // or image/png, etc.
echo file_get_contents($image_path); // where $image_path is the path to the image
exit;
instead. You'd only need to use the GD functions if you're were trying to do something like adding text on top of the static image.
Note in this cache I'm setting it to cache expire, since the URL will be the same, but the content might change. This could potentially confuse caching systems.
Seeing as each request will result in
one of a limited number of images I
thought a redirect might save
resources on our end and make use of
caching on the user's end too.
The reverse actually, since the same file will now have different content. You'll want them to revalidate the content each time to make sure the proper image shows.
I'm using Jon Crosby's open source Objective-C OAuth library http://code.google.com/p/oauthconsumer/ for some basic http authentication that does not deal with tokens, only consumer key and consumer secret. My code works great for GET, GET with parameters in the URL, and POST. When I issue a POST request that has parameters in the URL, though, the request fails authorization. I'm trying to figure out why.
The server is using Apache Commons OAuth, so I'd like to compare my base string with that library. Here's a contrived example and the base string and signature produced by my library. Can anyone see what the problem is?
consumer key: abcdef
consumer secret: ghijkl
POST request: http://emptyrandomhost.com/a/uriwith/params?interesting=foo&prolific=bar
my base string: POST&http%3A%2F%2Femptyrandomhost.com%2Fa%2Furiwith%2Fparams&interesting%3Dfoo%26oauth_consumer_key%3Dabcdef%26oauth_nonce%3D1%26oauth_signature_method%3DHMAC-SHA1%26oauth_timestamp%3D2%26oauth_version%3D1.0%26prolific%3Dbar
This data produces the following OAuth header authorization:
Authorization: OAuth oauth_consumer_key="abcdef",
oauth_version="1.0",
oauth_signature_method="HMAC-SHA1",
oauth_timestamp="2",
oauth_nonce="1",
oauth_signature="Z0PVIz5Lo4eB7aZFT8FE3%2FFlbz0%3D"
And apparently my signature is wrong. The problem has to either be in the construction of the base string, in the way that the HMAC-SHA1 function is implemented (using Apple's CCHmac from CommonHMAC.h, so hopefully this isn't it), or with my Base64Transcoder, which is open source c. 2003 by Jonathan Wight/Toxic Software. I primarily suspect the base string, since the requests work for GET and POST and only fail with POST with URL parameters as above.
Can someone with lots of OAuth experience spot the problem above? Something else that would be very useful is the base string that is produced by Apache Commons OAuth in their authentication. Thanks.
As per RFC 5849 section 3.4.1.2, the OAuth base string URI does not include the query string or fragment. If either the client or the server does not remove the query parameters from the base string URI and add them to the normalized OAuth parameter list, the signatures won't match. Unfortunately, it's hard to tell which side is making this mistake. But it's easy to determine this is the problem: If it always works without query parameters but always fails with query parameters, you can be pretty sure that one side or the other is generating the wrong base string. (Be sure that it always happens though... intermittent errors would be something else. Similarly, if it never works with or without a query string, that would also be something else.) The other possibility is that normalization was done incorrectly — the parameter list must be sorted and percent encoded sequences must be upper-cased. If it's not normalized correctly on both sides, that will also cause a base string mismatch, and thus a signature mismatch.
you can build and check visually your request at this URL:
http://hueniverse.com/2008/10/beginners-guide-to-oauth-part-iv-signing-requests/
Open the boxes denoted by [+] signs and fill in your values, that way you may be able to see if the problem is at your code, or at the provider side.
Am I breaking any laws in the REST bible by returning application/octet-stream for my responses ? The REST endpoint receives 5 image urls.
{ "image1": "http://ww.o.com/1.gif",
"image2": "http://www.foo.be/2.gif" }
and it will download these and return them as application/octet-stream.
CLARIFICATION: The client that invokes this REST interface is a mobile app. Every additional network connections made will reduce battery life by a few milliamps. I am forced to use REST because it is a company standard. If not, I will do my own binary protocol.
It is not so good, as the client will not know what to do with such binary data except of storing those bytes somewhere or sending them further to some other process (if this is all you need to do with your data, then it is fine).
You may take a look at multipart content types. IMO, a multipart message containing several image/gif parts would be a better alternative.
From the sounds of this, this sounds much more like an RPC call. Specifically, "here's a list of URLs, send me back an archive".
That process is not particularly RESTful, as REST is not an RPC based system.
What you need to do is treat the archives as reources, and a way to create and then serve them up.
For example you could:
POST /archives
Content-Type: application/json
{ "image1": "http://ww.o.com/1.gif",
"image2": "http://www.foo.be/2.gif" }
As a result, you would get
HTTP/1.1 201 Created
Location: http://example.com/archives/1234
Content-Type: application/json
Then, you could make a request to http://example.com:
GET /archives/1234
Accept: multipart/mixed
Here, you will get the actual archive in a single request (like you want), only it's a multipart formatted result. (multipart/x-zip would work too, that's a zip file)
If you did:
GET /archives/1234
Accept: application/json
You would get back the JSON you sent originally (so you could, perhaps, edit and update the archive, something you may not want to support sending up the binary images).
To change it you would simply POST back the update:
PUT /archives/1234
Content-Type: application/json
{ "image1": "http://ww.o.com/1.gif",
"image2": "http://www.foo.be/2.gif",
"image3": "http://www.foo2.foo/4.gif" }
The resource is /archives/1234, that's its name.
It has two representations in this case: the JSON version, and the actual, binary archive. Your service distinguishes between the two using the content type specified in the Accept header. That header is the client telling you what it wants.
When you're done with the archive, simply DELETE it
DELETE /archives/1234
Or you can have the server expire the resource at some later time.
Why not have five separate REST calls?
Seems cleaner and divides more logically. It will also run the downloads in parallel, 2 or more at a time depending on the browser you are using.
They are called REST principles not laws, but no you are not "breaking" them, IMO. REST is about resources being addressable by a URL, and (where appropriate) available in multiple formats. It doesn't say what the format should be. There's a simple description of what REST means in this article.
However, as #Andrey says there are nicer ways to handle sending multiple data objects than inventing your own adhoc format. The Multipart mimeType / format is one alternative, and another is to send the objects packed up as a tar, zip or a similar archive file format.
IMO. the real problem with using "application/octet-stream" and is that it doesn't tell anyone anything about how the data is actually formatted. Rather your client has "know" how it is formatted, and interpret it accordingly. And the problems with inventing your own format are interoperability and (possibly) having to design, implement and maintain libraries to support it, possibly may times over.