How to determine file extension of downloaded content - perl

I am downloading multiple files which may be of different types (eg. PDF or TIFF). I would like to save the files with the correct file extension for each file. I am able to look at the content-type header using:
$type = $mech->response->headers->header( 'Content-Type' );
Then I can work from there and make up my own file extensions based on the content-type found, but is there a perl module that already does this? How else can it be done?

Related

How to read actual file contents by name in CSV Data Set config - Jmeter

I was writing JMeter tests for REST API.
Its a post request and we need to send a big xml content in request body.
So I was using CSV Data Set config to parameterize the xml content in body part.
I have created a CSV Data Set config for HTTP Request sampler.
In csv file, I am writing the whole xml content. 1 row for 1 request. It is working fine.
But I found this is bit complex as we have to maintain large lines of xml in csv file.
Is there any way we can write only xml file names or full paths in csv file and CSV DataSet config checks the name and then read the contents of that file and append in request body.
file-abc.xml
file-def.xml
I think this would be easy to maintain as we can have dedicated files for XML content.
Any way to do it using CSV DataSet config?
Or any other way to achieve the same in JMeter tests.
I found this question How to hold Xml file names in CSV Data set Config (Jmeter)
I followed its answer but I am not able to pass the xml content in request body.
Its only passing xmlfile names written in csv file in the request body.
But as per answer it reads the file from xml path/name and pass it in the parameter.
You can keep the file names or paths to the files in the CSV file and read the file content using __FileToString() function directly in the HTTP Request sampler body
If you're keeping XML files in a separate folder you might find Directory Listing Config plugin easier to use in case you want to add/remove/rename files without having to maintain the CSV mapping.
Directory Listing Config plugin can be installed using JMeter Plugins Manager

TYPO3: File download in Backend

I would like to integrate a CSV download in the Backend. The CSV file doesn't have to be saved on the server, so just a simple Array-to-CSV for download.
I know using FAL is quite tedious in TYPO3 so I would like to know if there is a simple solution for my issue. Like calling a "download" action an returning a "CSV string" to download ?
I did used this solution for the download action but I am looking for a solution without FAL and without keeping the file on the server.
No need for FAL or saving a file on the server. You can add a custom action in your controller that sets the content-type and disposition headers to treat your request like a download:
public function exportAction()
{
// Just an example on how you could access the downloadable data.
$records = $GLOBALS['TYPO3_DB']->exec_SELECTgetRows('*', 'tx_domain_model_table');
// modify the result to be a csv encoded string, json or whatever you want it to be.
$data = myConvert($records, 'csv');
header('Content-Type: text/x-csv');
header('Content-Disposition: attachment; filename="download.csv"');
header('Pragma: no-cache');
return $data;
}
Where $data equals a csv encoded array for example.
What's more interesting is what kind of data you want to be downloadable. To make your data downloadable, setting the header()'s and returning any simple data type should work.

How do I know what to name a file downloaded using HTTP?

I am creating an HTTP client downloader in Python. I am able to correctly download a file such as http://www.google.com/images/srpr/logo11w.png just fine. However, I'm not sure what to actually name the thing.
There is of course the filename at the end of the URL, but is this always reliable?
If I recall correctly, wget uses the following heuristic:
If a Content-Disposition header exists, get the filename from there.
If the filename component of the URL exists (e.g. http://myserver/filename), use that.
If there is no filename component (e.g. http://www.google.com), derive the filename from the Content-Type header (such as index.html for text/html)
In all cases, if this filename is already present in the directory use a numerical suffix, such as index (1).html, or overwrite, depending on configuration.
There are plenty of other flags that control other heuristics, such as creating .html for ASP/DHTML content-types.
In short, it really depends how far you want to go. For most people, doing the first two + basic Content-Type->name mapping should be enough.

Get the name of an uploaded file

I'm uploading a file using curl:
curl -X POST --data-binary #/home/me/my_file.jpb localhost:9001/upload
And here is how to store it:
def upload = Action(parse.temporaryFile) {
request =>
import java.io.File
val f = new File("tmp/someName") // how do I get the name of the file being uploaded?
request.body.moveTo(f, true)
Ok("File uploaded\n")
}
Note that files can be in any format. I want to get the name of the actually uploaded file. I tried request.body.file.getName but it returns gibberish.
How do I do that?
I am fairly certain you cannot get the file name from the binary stream you are uploading via curl. You need to explicitly provide the file name separately.
The options I can think of are these:
If your Content-Type header is instead multipart/form-data, then the process is quite simple as described here
Upload JSON with a String for the file name and a binary portion for the file.

How to link files directly from Github (raw.github.com)

Are we allowed to link files directly from Github ?
<link rel="stylesheet" href="https://raw.github.com/username/project/master/style.css"/>
<script src="https://raw.github.com/username/project/master/script.js"></script>
I know this is allowed on Google Code. This way I don't have to worry about updating a local file.
The great service RawGit was already mentioned, but I'll throw another into the ring: GitCDN.link
Benefits:
Lets you link to specific commits, as well as auto-get the latest (aka master)
Incurs no damage from high traffic volumes; RawGit asks that it's dev.rawgit.com links be only used during development, where as GitCDN give you access to the latest version, without the danger of the servers exploding
Give you the option of auto minifying your HTML, CSS and JavaScript, or serving it as written (https://min.gitcdn.link).
Adds compression (GZip)
Adds all the correct headers (Content-Type, cache-control, e-tag, etc)
Full disclosure, I'm a project maintainer at GitCDN.link
You can use external server rawgithub.com. Just remove a dot between words 'raw' and 'github' https://raw.github.com/.. => https://rawgithub.com/ and use it. More info you find in this question.
However, according to the rawgithub website it will be shutting down at the end of October 2019.
You can link directly to raw files, but it's best not to do it since the raw files always get sent with a plain/text header and can cause loading problems.
You need carry out the following steps
Get the raw url of the file from github. Which is something like https://raw.githubusercontent.com/username/folder/example.css
Visit http://rawgit.com/. Paste the git url above in the input box. It will generate two url's, one for development and other for production purpose.
Copy any one of them and you are done.
The file will act as a CDN. You can also use gist urls.
GitHub Pages: https://yourusername.github.io/script.js
GitHub repo raw files: https://github.com/yourusername/yourusername.github.io/blob/master/script.js
Use GitHub Pages, DO NOT use raw files.
Reason:
GitHub Pages are based on CDN, raw files are not. Accessing raw files will directly hit on GitHub servers and increase server load.
Add a branch your project using the name "gh-pages" and then you'll (shortly after branching) be able to use a direct URL such as https://username.github.io/project/master/style.css (using your URL, and assuming "style.css" is a file in the "master" folder in the root of your "project" repository...and that your Github account is "username").
For those who ended up in this post and just want to get the raw link from an image in GitHub:
If it is the case of an image, you can just add '?raw=true' at the end of the link to the file.
E.g.
Original link:
https://github.com/githubusername/repo_name/blob/master/20160309_212617-1.png
Raw link:
https://github.com/githubusername/repo_name/blob/master/20160309_212617-1.png?raw=true
Use jsdelivr.com
Copied directly from https://www.jsdelivr.com/?docs=gh:
load any GitHub release, commit, or branch
note: we recommend using npm for projects that support it
https://cdn.jsdelivr.net/gh/user/repo#version/file
load jQuery v3.2.1
https://cdn.jsdelivr.net/gh/jquery/jquery#3.2.1/dist/jquery.min.js
use a version range instead of a specific version
https://cdn.jsdelivr.net/gh/jquery/jquery#3.2/dist/jquery.min.js
https://cdn.jsdelivr.net/gh/jquery/jquery#3/dist/jquery.min.js
omit the version completely to get the latest one
you should NOT use this in production
https://cdn.jsdelivr.net/gh/jquery/jquery/dist/jquery.min.js
add ".min" to any JS/CSS file to get a minified version
if one doesn't exist, we'll generate it for you
https://cdn.jsdelivr.net/gh/jquery/jquery#3.2.1/src/core.min.js
add / at the end to get a directory listing
https://cdn.jsdelivr.net/gh/jquery/jquery/
After searching for this same functionality, I ended up writing my own PHP script to act as a proxy. The trouble I kept running into is even when you get the RAW version/link from Github and link to it in your own page, the header sent over was 'text/plain' and Chrome was not executing my JavaScript file from Github. I also didn't like the other links posted for using third party services because of the obvious security/tampering issues possible.
So using this script, I can pass over the RAW link from Github, have the script set the correct headers, and then output the file as if it were coming from my own server. This script can also be used with a secure application to pull in non-secure scripts without throwing SSL errors warning of "Non-secure links used".
Linking:
<script src="proxy.php?link=https://raw.githubusercontent.com/UserName/repo/master/my_script.js"></script>
proxy.php
<?php
###################################################################################################################
#
# This script can take two URL variables
#
# "type"
# OPTIONAL
# STRING
# Sets the type of file that is output
#
# "link"
# REQUIRED
# STRING
# The link to grab and output through this proxy script
#
###################################################################################################################
# First we need to set the headers for the output file
# So check to see if the type is specified first and if so, then set according to what is being requested
if(isset($_GET['type']) && $_GET['type'] != ''){
switch($_GET['type']){
case 'css':
header('Content-Type: text/css');
break;
case 'js':
header('Content-Type: text/javascript');
break;
case 'json':
header('Content-Type: application/json');
break;
case 'rss':
header('Content-Type: application/rss+xml; charset=ISO-8859-1');
break;
case 'xml':
header('Content-Type: text/xml');
break;
default:
header('Content-Type: text/plain');
break;
}
# Otherwise, try and determine what file type should be output by the file extension from the link
}else{
# See if we can find a file type in the link specified and set the headers accordingly
# If css file extension is found, then set the headers to css format
if(strstr($_GET['link'], '.css') != FALSE){
header('Content-Type: text/css');
# If javascript file extension is found, then set the headers to javascript format
}elseif(strstr($_GET['link'], '.js') != FALSE){
header('Content-Type: text/javascript');
# If json file extension is found, then set the headers to json format
}elseif(strstr($_GET['link'], '.json') != FALSE){
header('Content-Type: application/json');
# If rss file extension is found, then set the headers to rss format
}elseif(strstr($_GET['link'], '.rss') != FALSE){
header('Content-Type: application/rss+xml; charset=ISO-8859-1');
# If css xml extension is found, then set the headers to xml format
}elseif(strstr($_GET['link'], '.xml') != FALSE){
header('Content-Type: text/xml');
# If we still haven't found a suitable file extension, then just set the headers to plain text format
}else{
header('Content-Type: text/plain');
}
}
# Now get the contents of our page we're wanting
$contents = file_get_contents($_GET['link']);
# And finally, spit everything out
echo $contents;
?>
If your webserver has active allow_url_include, GitHub serving the files as raw plain/text is not a problem since you can include the file first in a PHP script and modify its Headers to the proper MIME type.