How to export to PDF a confluence page within a script - wget

I'd like to automatically export to PDF some confluence pages.
It can be downloaded with URL :
http://<confluence server>/confluence/spaces/flyingpdf/pdfpageexport.action?pageId=<pageID>
When typing this URL, it works perfectly.
But when I try to download with wget, an HTML page is downloaded instead ( asking for login and password). I tried to provide login/password with --user and --password wget options but it does not work.
Do you have an idea to provide confluence credentials to wget command? Or another solution to download the PDF page?

If you are using a Confluence Server before Confluence 5.5 you are in luck! Confluence has an API to handle this, see their documentation.
Update : If you are using Confluence Server 5.5 or later, they do not enable the API for this by default. See Confluence Administration > Further Configuration to enable the XML-RPC and SOAP APIs. (Thanks #fatpanther for pointing this out)
The new REST API does not support this, see the REST API documentation.
You may be able to use the Confluence Command Line Interface to export to PDF.

First request the resource:
curl -D- -u user:pwd -X GET -H "Content-Type: application/json" "https://your-url/confluence/spaces/flyingpdf/pdfpageexport.action?pageId=12345678"
Extract the "Location" value from the resulting JSON (e.g. grep | cut), then repeat the query with adjusted URL and mime type:
curl -D- -u user:pwd -X GET -H "Content-Type: text/html;charset=UTF-8" "https://your-url/$LOCATION_JUST_EXTRACTED" --output file.pdf

Narcolessico 's answer worked for me, but it took me some time to completely understand the approach. I will add to the answer provided above.
NOTE: I am using Java (Apache HttpClient) to perform the HTTP GET requests to the Confluence server.
I used Chrome to navigate to the Confluence page I wanted to export to PDF. I expanded the tools menu, right-clicked on 'Export to PDF', and then clicked on 'Inspect'. This will reveal the underlying HTML element for this menu option containing the link used to launch the PDF export operation.
inspect element to find url
The element inspection revealed the relative link to the PDF export action as follows.
html source
From Java, if you perform a HTTP GET to https://your-confluence-server-hostname/the-relative-link-from-step-2, you will need to disable redirect handling. This is where Narcolessico's answer confused me as I was getting different responses from cURL vs. Java. When I realized that the cURL operation was returning a 302 response and that the Apache Http client was auto handling it, I found a means to disable that auto redirect handling so that I can capture the Location header information.
The code to disable the auto redirect handling is as follows.
final HttpClient client = HttpClientBuilder
.create()
.setSSLContext(sslContext)
.disableRedirectHandling() // disable the auto handling here
.build();
final String urlToGetLocation = "https://<your-confluence-server-hostname><the-relative-link-from-step-2>"
final HttpGet request = new HttpGet(urlToGetLocation);
// You'll need to provide Basic Auth credentials. This is a base-64 encoded
// username:password string, else the Location header returned will be a
// redirect to the login page.
request.setHeader(HttpHeaders.AUTHORIZATION, authorizationHeaderValue);
request.setHeader(HttpHeaders.CONTENT_TYPE, "application/json");
final HttpResponse response = client.execute(request);
final HttpEntity payload = response.getEntity();
NOTE: I am also overriding the SSL context to do nothing. That is another issue you may need to contend with if Confluence is using HTTPs.
On a side note, if you were to perform a CURL GET for the above stated url, you get a response as follows.
redacted cURL output
The above GET request and resulting 302 response, will reveal the location of the PDF document that you can then download. The 302 response headers will contain the following.
final Header[] headers = response.getHeaders(HttpHeaders.LOCATION);
final String location = headers[0].getValue();
This is a url in the form of the following.
/download/temp/pdfexport-20190924-240919-0526-189/a-filename-for-pdf.pdf?contentType=application/pdf
The Location header above contain the url to the exported/generated PDF. You can then make a subsequent HTTP GET to that url to download the generated PDF document.
if you're using the Apache Http client, you'll need to use auto redirect handling for this subsequent GET request.
All credit to Narcolessico for this answer. I simply wanted to add the details I had to sort out to get it to work from Java.

Related

Why does Github actions rest API download artifacts by creating a temporary URL?

I am following the docs here https://docs.github.com/en/rest/actions/artifacts#download-an-artifact to use Github actions rest API to download artifacts. Given an ARTIFACT_ID and access token if the repo is private, one can call the API via cURL or the github CLI to get a response from github. The response header contains Location:... which provides a temporary URL lasting 1 minute from which the artifact can be downloaded. The artifact can then be downloaded via a second call to cURL.
I would like to know the reason for this design decision on the part of Github. In particular, why not just return the artifact in response to the first call to cURL? Additionally, given that the first call to cURL is intended to return a temporary URL from which the artifact can be retrieved, why not have this temporary URL returned directly by call to cURL rather than having it only contained in the header. Other information such as if the credentials are bad, or if the object has been moved are returned in json when this cURL command is run, so why can't the temporary URL also be contained here?
To help clarify my question, here is some relevant code:
# The initial cURL command looks something like this:
curl -v \
-H "Accept: application/vnd.github+json" \
-H "Authorization: token <TOKEN>" \
https://api.github.com/repos/OWNER/REPO/actions/artifacts/ARTIFACT_ID/ARCHIVE_FORMAT
# the temporary URL, which can be curled to retrieve the artifact, looks like something like this:
curl https://pipelines/actions/githubusercontent.com/serviceHosts/<HEXSTRING>/_apis/pipelines/1/runs/16/\
signedartifactscontent?artifactName=<artName>&urlExpires=<date>&urlSigningMethod=HMACV2&urlSignature=<SIGNATURE>
Additionally, I am currently capturing the standard error of the cURL command and then running regex on it so as to extract the temporary URL. Is there a better way to do this? For example, is there a flag I could pass to cURL that would give me the value of Location directly?
Additionally, it is stated that The archive_format must be zip. Given this is the case, what is the benefit of having this parameter. Is it not redundant? If so, what is the benefit of this redundency?
This is a consequence of a 2011 design decision regarding https://github.blog/2011-08-02-nodeload2-downloads-reloaded/
When implementing a proxy of any kind, you have to deal with clients that can’t read content as fast as you can send it.
When an HTTP server response stream can’t send any more data to you, write() returns false.
Then, you can pause the proxied HTTP request stream, until the server response emits a drain event.
The drain event means it’s ready to send more data, and that you can now resume the proxied HTTP request stream.
TO avoid DDOS, it is better to manage that stream from a temporary URL, rather than a fixed one.
You can use -D to display response header, but you would still need to post-process its answer to get the redirection URL.

Understanding the GET/POST/DELETE requests on a basic level?

I'm currently learning to use REST API (from WooCommerce in this case) and got some basic questions:
How to see complete request string in Postman software?
I'm testing a simple GET request which works great with for example:
<host>/wp-json/wc/v3/products
to receive the product list. In this case I use the authorization tab to enter my user/pass as Basic Auth.
I also tested curl.exe using another simple Windows command prompt. This also returned product list:
curl.exe <host>/wp-json/wc/v3/products -u mykey:mysecret
What is the difference between them? The last example is a simple GET, i assume, although it's not stated. How about POST or DELETE etc? This is what i don't understand: A https request can only have an address and eventual parameters. Where and how does "GET" come into the picture?!
If possible, I would like the see the complete URL request (as one string) from the working Postman example?
My last question is about testing the same method on another server/service which is not WooCommerce. Afaik this service is created with something called swagger:
curl "<host>/orderapi/item" -H "accept: application/json" -H "X-Customer: <customer>" -H "X-ApiKey: <mykey>" -H "X-ApiSecret: <mysecret>" -H "Content-Type: application/json"
This also returns a list of, in this case orders instead of products. All good.
But for this example I haven't figured out how to achieve the same request in Postman. What auth method should I use?
And again, I don't understand the GET/POST/DELETE thing. And I also would like to see the complete request as one-string.
1) How to see complete request string in Postman software? I would like the see the complete URL request (as one string) from the working Postman example
On version 9.x.x:
The code window(image) shows the choosen method (yellow mark) and the code window(red arrow), where you get the actual
curl code(image)
2) What is the difference between them? The last example is a simple GET, i assume, although it's not stated. How about POST or DELETE etc? Where and how does "GET" come into the picture?
From the curl documentation:
-X, --request
(HTTP) Specifies a custom request method to use when communicating
with the HTTP server. The specified request method will be used
instead of the method otherwise used (which defaults to GET). Read the
HTTP 1.1 specification for details and explanations. Common additional
HTTP requests include PUT and DELETE, but related technologies like
WebDAV offers PROPFIND, COPY, MOVE and more.
GET is the default method for curl, which means:
curl.exe <host>/wp-json/wc/v3/products -u mykey:mysecret
is the same as:
curl.exe <host>/wp-json/wc/v3/products -u mykey:mysecret -X "GET"
so, for a POST/DELETE/... you should change your '-X' parameter for example:
curl.exe <host>/wp-json/wc/v3/products -u mykey:mysecret -X "POST" [...otherOptions]
(Assuming that you can receive a POST on the url above)
3) [On another server/service] I haven't figured out how to achieve the same request in Postman. What auth method should I use?
The -H specify the header parameter you are passing. You have those in your example:
accept: application/json
X-Customer:
X-ApiKey:
X-ApiSecret:
Content-Type: application/json
You need to add those in your postman on the headers(image) tab. In this case you don't need to specify a auth method, once you're sending the ApiKey on the header. In addition to that, you can specify the authorization Type to be "Api Key" and put X-ApiKey as key and your apikey value on the value field(image). It'll generate the same request as shown in the headers image.
curl, at least the GNU one on Linux, uses GET method by default. If you want to change a HTTP method in your request, there's -X option, for example:
$ curl -X DELETE https://example.com
Postman has something called Postman Console which you can open by pressing Alt + Ctrl + C:
and where you can see more details about requests and responses.
Postman also lets you import curl commands, so you don't need to manually prepare the request, you can only paste the curl command in Postman.
There are many resources online on the specifics, e.g. how to import a curl command.

Replicatiing post request from website form using postman returns 500 internal server error

I'm trying to replicate a post request done normally by a website form via postman but the server returns 500 error.
the form website URL that I'm dealing with is here.
what I have done so far is investigate the network request using chrome or safari dev tools, copy the request as cURL, import the cURL in postman and do the request.
what can be the possible reasons for the failure and what are the alternative ways to achieve the same result?
Postman Headers:
Most probably you must have used invalid request body. The browser shows parsed json body and you might have copied incomple request body.
To get full body click view source and copy the full content.

How to get the request body and headers in Curl format in latest Postman version v5.5.0?

I have a working post API call in Postman. I am planning to call it through bash by using Curl script. How to convert my current request into Curl script?
You can use the Postman code feature to generate snippets in various different languages.
This is a quick example showing the generated cURL snippet using the headers, request body etc.
You can also do the reverse of this action with a cURL request, these can be imported into Postman from a file or the request can be pasted straight into the application in raw text form.
These option can be found in the Import section.
Once imported, Postman will populate the different areas with the raw request data.

How to post credentials using POSTMAN client to create a cookie based session

I am using postman client to make REST calls to JIRA API. It says "POST your credentials to http://jira.example.com:8090/jira/rest/auth/1/session" to get SESSION. I tried posting with Form-data, application/x-www-form-urlencoded, raw etc. Nothing worked. which is the right way to do that.
Here is the tutorial i am following: https://developer.atlassian.com/jiradev/jira-apis/jira-rest-apis/jira-rest-api-tutorials/jira-rest-api-example-cookie-based-authentication
Since you're using postman, I'm assuming you're in a dev environment. In this case, it might be simpler to get going with the auth header, which is a base-64 encoded username/password. From the documentation here:
Supplying Basic Auth headers
If you need to you may construct and send basic auth headers yourself. To do this you need to perform the following steps:
Build a string of the form username:password
Base64 encode the string
Supply an "Authorization" header with content "Basic " followed by the encoded string. For example, the string "fred:fred" encodes to "ZnJlZDpmcmVk" in base64, so you would make the request as follows.
curl -D- -X GET -H "Authorization: Basic ZnJlZDpmcmVk" -H "Content-Type: application/json" "http://kelpie9:8081/rest/api/2/issue/QA-31"
In the Headers section of Postman, add Authorization with Basic <base64-encoded-username:password>
Don't forget to also add the header Content-Type as application/json
(You can use base64encode.org to quickly encode your username/password).
Don't forget to put the string in as username-colon-password (username:password)
If you are on the same UI as I for postman, click Authorization, select an auth type (I used basic auth succesfully), and then enter your credentials. Next click over to the body tab, select raw, and on the drop down menu on the right choose JSON(applications/json), and supply the body as normal.
That is the first hurdle. The next hurdle which may be hit (and the one I am stuck on) is that once your basic-auth gets accepted, JIRA will deny access as part of Cross Site Request Forgery checks (XSRF) with a code 403. I have a ticket open right now seeing if there is a possible workaround to post and put from postman, because using postman and newman would be much much simpler than building an entire plugin which I have to jump through a bunch of hoops to access.
With Postman can simply add withCredentials:true to your request header section.