New COSMIC file download service
8 May 2018
As part of our ongoing efforts to improve and modernise COSMIC infrastructure we have taken the hard decision to retire our SFTP server and move towards web-based endpoints for file downloads. There will be no change to the files that are available, just the means of downloading.
As of release v85, all of our files are available to download from "cancer.sanger.ac.uk/cosmic/file_download". On our downloads page you'll find buttons marked "Scripted download", which will show you a quick overview of how you can get the required file. There's a bit more information on our help pages about using a command-line tool or simple scripts, all of which we've included below.
Release v85 will be last COSMIC dataset to be published to the SFTP server. As of release v86 (August 2018) we will no longer create SFTP accounts for new users and we will not be uploading the new release files. At the time of release v87, around November 2018, we will be shutting down the SFTP entirely.
If you use the SFTP server as part of a pipeline or scripted download process, please have a look at the documentation for the new service and give it a go as soon as possible. While moving away from SFTP is a necessary step in our ongoing process of modernisation, we realise that it may be an integral part of our users' workflows. We're keen to hear from you if you have problems using the new download system, or any suggestions for making it more usable.
COSMIC provides a simple interface for downloading data files. Downloading is a three stage process:
- generate an HTTP Basic Auth credential string
- make an authenticated request to obtain a download link
- make a request to retrieve the data file from the returned link
1. Generate an authentication string
When you make the first request you must supply the email address that you used to register for COSMIC, along with your COSMIC password. The email address and password must be supplied in the Authorization header of the request. We use the HTTP Basic Auth protocol, which encodes the string using Base64 encoding, in order to avoid problems with non-word characters in passwords.
To generate your HTTP Basic Auth string, concatenate the email and password with a colon (:) and base64 encode the resulting string. For example, using standard Unix command line tools:
echo 'email@example.com:mycosmicpassword' | base64
ZW1haWxAZXhhbXBsZS5jb206bXljb3NtaWNwYXNzd29yZAo=
Using that authentication string you can now make a request to obtain the download URL for the file.
You can use the same authentication string for all of your downloads. You only need to re-generate the string if you change your COSMIC password.
2. Make an authenticated request to get the download URL
The next request must supply the authentication string that you just generated, as a header on the request:
Authorization: Basic ZW1haWxAZXhhbXBsZS5jb206bXljb3NtaWNwYXNzd29yZAo=
The path to the required file may be specified as part of the URL or using the "data" parameter. Using the command line tool cURL, you could make the request like this:
curl -H "Authorization: Basic ZW1haWxAZXhhbXBsZS5jb206bXljb3NtaWNwYXNzd29yZAo="https://cancer.sanger.ac.uk/cosmic/file_download/GRCh38/cosmic/v84/classification.csv
Or, HTML encoding the file path and supplying it using the "data" parameter:
curl -H "Authorization: Basic ZW1haWxAZXhhbXBsZS5jb206bXljb3NtaWNwYXNzd29yZAo=" https://cancer.sanger.ac.uk/cosmic/file_download?data=GRCh38%2Fcosmic%2Fv84%2Fclassification.csv
You can find the path for a file on our download page
If you have supplied valid COSMIC credentials, the server will return a small snippet of JSON containing a URL from which you can download your requested file:
{
"url" : "https://cog.sanger.ac.uk/cosmic/GRCh38/cosmic/v84/classification.csv?AWSAccessKeyId=KFGH85D9KLWKC34GSl88&Expires=1521726406&Signature=Jf834Ck0%8GSkwd87S7xkvqkdfUV8%3D"
}
If your credentials were not valid, you will receive a response with status code 401 Unauthorized and a JSON snippet with an error message:
{
"error" : "not authorised"
}
3. Download the data file
You can now extract the URL from the JSON snippet and make a request to that URL to download the data file:
curl -o classification.csv 'https://cog.sanger.ac.uk/cosmic/GRCh38/cosmic/v84/classification.csv?AWSAccessKeyId=KFGH85D9KLWKC34GSl88&Expires=1521726406&Signature=Jf834Ck0%8GSkwd87S7xkvqkdfUV8%3D'
You do not need to provide the HTTP Basic Auth header for this request. The download URL is valid for one hour.
Python/Perl Downloads
You can achieve the same thing using any programming language; here are examples for both Python and Perl.
#!python3
import requests
email = "email@example.com"
password = "mycosmicpassword"
url = "https://cancer.sanger.ac.uk/cosmic/file_download/"
filepath = "GRCh38/cosmic/v85/classification.csv"
filename = "classification.csv"
# get the download URL
r = requests.get(url+filepath, auth=(email, password)) download_url = r.json()["url"]
# get the file itself
r = requests.get(download_url)
# write the file to disk
with open(filename, "wb") as f:
f.write(r.content)
#!perl
use strict;
use warnings;
use LWP::UserAgent;
use MIME::Base64;
use URI;
use JSON;
my $ua = LWP::UserAgent->new;
my $email = 'email@example.com';
my $password = 'mycosmicpassword';
my $url = 'https:/cancer.sanger.ac.uk/cosmic/file_download/';
my $filepath = 'GRCh38/cosmic/v85/classification.csv';
my $filename = 'classification.csv';
# build URL with the parameter specifying the file to be downloaded
my $uri = URI->new("$url$filepath");
# get the download URL
my $r = $ua->get( $uri, Authorization => 'Basic ' . encode_base64("$email:$password") );
# decode the JSON string in the response and extract the download URL
my $json = decode_json $r->content;
my $download_url = $json->{url};
# get the file itself and save it to disk
$r = $ua->get($download_url, ':content_file' => $filename);
File Manifest
In order to make use of this, you will need to know the file path that you want to download. Whilst details of current files available for download can be found on the main download page, the new download service makes available the 4 most recent COSMIC releases. In order to explore older releases, the file download end point can return lists of files from any of the available releases. The lists can be viewed in a browser or called programmatically.
Files are stored in a hierarchical system, starting with the genome build, then whether they are from cosmic or cell lines, and finally the release version. You can explore it using "curl" like this:
curl https://cancer.sanger.ac.uk/cosmic/file_download
Which will elicit the response:
[ "GRCh37", "GRCh38"]
showing that you can see files from either the GRCh37 or GRCh38 datasets. With a filter applied, the response will give the next layer of options:
curl https://cancer.sanger.ac.uk/cosmic/file_download/GRCh38
[ "GRCh38/cell_lines", "GRCh38/cosmic" ]
The final level is the COSMIC release version. You can keep walking down the directory-like structure until a list of available files is returned:
curl https://cancer.sanger.ac.uk/cosmic/file_download/GRCh38/cosmic/v85
[
"GRCh38/cosmic/v85/All_COSMIC_Genes.fasta.gz",
"GRCh38/cosmic/v85/COSMIC_ORACLE_EXPORT.dmp.gz.tar",
"GRCh38/cosmic/v85/CosmicBreakpointsExport.tsv.gz",
"GRCh38/cosmic/v85/CosmicCompleteCNA.tsv.gz",
...
You can then use these file paths to download the individual files as explained above.
Download The Latest Release
The file download endpoint also accepts 'latest' as the version number, passing this will always guarantee you get download links for the very latest release of COSMIC. For example:
curl https://cancer.sanger.ac.uk/cosmic/file_download/GRCh38/cosmic/latest
[
"GRCh38/cosmic/v85/All_COSMIC_Genes.fasta.gz",
"GRCh38/cosmic/v85/COSMIC_ORACLE_EXPORT.dmp.gz.tar",
"GRCh38/cosmic/v85/CosmicBreakpointsExport.tsv.gz",
"GRCh38/cosmic/v85/CosmicCompleteCNA.tsv.gz",
...
Here the most recent release is v85, so when passed `latest`, the response will give the links for all cosmic, GRCh38, v85 files. This also works for the actual file download. So if you know that you just want a specific file from the latest release, you can call:
https://cancer.sanger.ac.uk/cosmic/file_download/GRCh38/cosmic/latest/classification.csv
and always be guaranteed the newest available version.
Multiple File Downloads
By combining the methods for listing available files and for downloading files, it becomes easy to script multiple file downloads. For example, in Python you could do something like this:
#!python3
import requests
email = "email@example.com"
password = "mycosmicpassword"
url = "https://cancer.sanger.ac.uk/cosmic/file_download/"
files = requests.get("https://cancer.sanger.ac.uk/cosmic/file_download/GRCh38/cosmic/v85")
# get the download URLs
for filepath in files.json():
r = requests.get(url+filepath, auth=(email, password))
download_url = r.json()["url"]
# get each file itself
r = requests.get(download_url)
...
Right now the file download service is still in beta, if you have any issues or suggestions for improvements please let us know. Furthermore, if you have any questions about the new file download service, or the deprecation of SFTP, please get in touch with us at cosmic@sanger.ac.uk or on Twitter @cosmic_sanger.