I'm trying to get a grasp on how to reverse engineer the URL's to download streams.
I know there are allready Open Souce tools, that do that, but by copying them i do not get the process of how to do it.
As an Example: I try to get a downloader for soundcloud to work. Im guessing the download url should be something like api.soundcloud.com/track/... . Somewhere inbetween there surely are the track_id and client_id which can be excracted from the source of the page.
But i can't seem to get further than that right now.
Before I answer my own post, i want to state, that downloading Streams from Soundcloud is illegal and hurts the artists. Also playing the stream outside of Soundcloud is only allowed under their terms, so please check those first.
So to grab the Stream Link i first looked into the Soundcloud Python Library. There i found, that i can just question the API with api.soundcloud.com/resolve?url=<URL of the desired Song Page>&client_id=<client_id>.
The Client id has to be sent with each api request. Searching trough the code its really easy to find a client_id. It seems to be static for unregistered users at least and further searching for it suggested, that it is that way for at least a year.
After you call the resolve URL above you get an XML document with properties of the Song/Stream. There you will find the Stream URL. You can just make a normal HTTP Request for that Stream URL. (Don't forget to append the client_id).
If for some reasons the link doesn't work properly, try disabling your 302 HTTP redirections.
Related
I need to scrape a webpage.
I specifically need to extract the section "USDT Funding Market" on the page.
$WebExtract = Invoke-WebRequest -Uri "https://www.kucoin.com/margin/lend/USDT"
Although $WebExtract.content or $WebExtract.AllElements does not contain the "USDT Funding Market".
The reason for this is that your initial web request is only returning you the HTML of the page, however a lot of modern web applications like this have some JavaScript under the hood that will reach out to their API's and fill the page with data after the initial HTML is loaded.
The good news is, usually with public websites like this you can hit the underlying API's without worrying about authentication - which actually makes parsing the data easier, as it's typically in JSON format.
Try opening the debug console in your browser (usually F12), open the network tab and refresh the page - you can typically find these API requests/endpoints in there, e.g;
Simply copy that URL out and invoke the web request to the API instead, for example;
$request = Invoke-WebRequest -Uri 'https://www.kucoin.com/_api/currency/prices?base=USD&targets='
$json = $request.Content | ConvertFrom-Json
$json.data
This might not be the specific API endpoint/data you're after, but hopefully this points you in the right direction. If you poke around in the console you should be able to find the one you're looking for.
EDIT:
Apologies for the misleading information here, I didn't notice that these numbers were updating in real time on the site. The above is still pretty common for a lot of websites so I'll leave it, however on sites that are this responsive/live another common technology used is Web Sockets.
If you have a look at the last lines of the console, you should be able to see a request ending with 'transport=websocket' to a URL like this;
wss://ws-web.kucoin.com/socket.io/?token=...%3D%3D&format=json&acceptUserMessage=false&connectId=connect_welcome&EIO=3&transport=websocket
If you select this line and head over to the Messages tab in your browser console/debugger, you'll be able to see the web socket messages being returned;
This looks like the data you're after, but I'm not hugely familiar with querying Web Sockets through PowerShell.
Simply invoking a web request won't work here, as it's using the web sockets protocol to handle the communication. You would also need to find a way to get a valid token for opening a web sockets connection (likely just a web request for that).
Perhaps these posts will help;
http://wragg.io/powershell-slack-bot-using-the-real-time-messaging-api/
How to use a Websocket client to open a long-lived connection to a URL, using PowerShell V2?
I'm building REST API, and when resource is created normally I return HTTP 201 Created along with Location header to specify where that resource is located. But from some reason http client is not redirecting.
I'm using Postman for this. Does anyone have idea on this problem?
In short, a Location header is not sufficient to trigger a client redirect. It must be used in conjunction with a 3xx HTTP status code.
References:
https://en.m.wikipedia.org/wiki/HTTP_location
Redirecting with a 201 created
This is one of those things where the expectation does not meet what actually happens, and the first thing people think is "well that doesn't work properly", as has been suggested in other comments.
The Location is just a random header, and clients, such as Postman or curl or anything else need to be instructed to follow them. Most won't do this by default, as that is an unreasonable default.
YouTube for example returns a body for some responses and a Location tag too. One example would be video uploads. They respond to your original meta-data for the video is sent with a POST, and they shove a Location URL which is the endpoint to upload the video too. If clients just randomly redirected to that you'd be having a bad time.
You can use Paw to make a "sequence", which I believe will let you take values from headers to reuse. This is also possible with Runscope Ghostinspector.
I want to capture how parameters are being sent. Usually what I do is to make a request and check on Firebug's params tab what are the parameters sent. However, when I try to do this on the following site (http://www.infraero.gov.br/voos/index_2.aspx), it doesn't work - I can't see what are the parameters in order to repeat this request using curl. How can I get it? I'm not sure but I think that cookies are being used.
EDIT
I was able to get the request content, but couldn't understand it. It seems it uses javascript to generate the proper request. How can I reproduce this request via cURL?
Did you see this previous question cURL post data to asp.net page ? That might answer the question right there (all I did was search "ASP.NET cURL"). And this one: Unable to load ASP.NET page using Python urllib2 talks about Python, but it approaches it in a way that should translate to cURL.
But for my $0.02, I wouldn't bother trying to untangle ASP.NET's and __VIEWSTATE and javascript. Is it an absolute requirement that you use cURL?
I think you would be better off using a client that works more like a real browser and understands javascript. That's a bit of work, but it isn't as bad as it sounds. I've done this before with http://watirwebdriver.com/ and a short Ruby script. Here's how to do it with Python and Mechanize (this is probably a bit more lightweight).
http://phantomjs.org/ is another option that you script using javascript. If you Google "Scraping ASP.NET" you will see that this is a common problem.
You didn't say how you want it done, but you can send the request with curl simply with curl -d name1=contents1&name2=contents2 [TARGETURL] etc.
Note that you probably first need to fetch the main page and extract the "__VIEWSTATE" form field and submit back that (VERY huge) contents to get your submission accepted.
I'm using the ASIHTTPRequest lib in my iOS app for making REST requests to a web app. I'm doing my best to use the correct verbs (GET, POST, PUT, DELETE) when making the various requests, but when making a POST request, I'm not sure I understand why it matters if I include the parameters in the POST request or in the URL. It works both ways, so why should I include the parameters in the POST request instead of just including them in the URL? As I understand it, the only reason for include the parameters in a POST request is to keep them from being visible in the URL in case someone is looking over your shoulder, or something like that. But if I'm making a POST request from my iOS app and there's no browser involved, then does it really matter which way I do it?
Thanks so much for your wisdom, I'm still learning!
When using a POST request, it is good practice to put the parameters in the data instead of the URL. In your case, it works to put it in the URL, but this isn't always true. Some scripts will expect the parameters to be in a specific place and not find them if they aren't there. As for what POST is good for, it allows you to send more data. The URL is limited to a length of 255 characters, so you need to use some other method if you want to send more data than that. The data in a POST request also doesn't need to be encoded to be compatible with the URL specification.
As I understand it, the only reason for include the parameters in a POST request is to keep them from being visible in the URL in case someone is looking over your shoulder, or something like that.
You misunderstand it.
There are other issues. If your site makes changes to data based off a GET request, it's possible that spambots, search engines, browser prefetchers, and other automated tools will trigger potentially destructive data changes.
If the endpoint isn't under your control, it's entirely possible that it won't even accept the parameters as GET parameters. Most APIs require proper usage of the GET vs. POST verbs.
I am posting (HTTP POST) various values to the posterous api. I am successfully able to upload the title, body, and ONE media file, but when I try to add in a second media file I get a server 500.
They do allow media and media[] as parameters.
How do I upload multiple files with the iPhone SDK?
The 500 your getting is probably based on one of two things:
An incorrect request
An error on the server
Now, if its an incorrect, the HTTP server would be more helpful responding back with like a 415 (unsupported media type) or something. A 500 insists that something went wrong on the server and that your request was valid.
You'll have to dig into the server API or code (if you wrote it), or read the docs and figure out what's wrong with your second request ... seems like maybe your not setting the appropriate media type?
EDIT: Ok, so I looked at the API. It appears your posting XML, so your request content-type should be
Content-Type: application/xml
The API doc didn't specifically say, but that would be the correct type.
EDIT: Actually on second glance, are you just POSTing w/URI params? Their API doc isn't clear (I'm also looking rather quickly)