home

How I accidentally added proxied download to onedrive-vercel-index🃏

12/29/2021·spencer woo·
github
·comments

🧐TL;DR - I added proxied download purely by accident, just because I thought it was a necessary feature for the embedded PDF reader that I actually was refactoring.

My thought process

So, here's the story -

I am the author of onedrive-vercel-index - which is an OneDrive directory listing web app with features like previewing files (PDFs) and downloading files from OneDrive links extracted from their APIs. The project started back as onedrive-cf-index, which is a similar web app but poorly written and less maintainable - deployed on CloudFlare Workers. I refactored the entire project with Next.js and deployed it on Vercel, taking advantage of its seamless integration with serverless functions, which is what onedrive-vercel-index is today.

Feature refactoring

It's safe to say that onedrive-vercel-index is better looking, more performant, and easier to deploy than its original counterpart, and I was trying to reach feature parity with the old project as well for the refactor. Two of the features that are featured in today's story are:

  • PDF preview in the browser.
  • Proxied download by relaying the download stream on the serverless instance.

The PDF preview, where the old project, implemented with no modern web framework whatsoever, and playing with DOM elements like it's no-mans-business, used a library called PDFObject to render the preview. Moving on to the new project, as we were using React and Next.js, it's only reasonable that we leverage modern libraries to make the render, which I initially did - with react-pdf.

The problem with react-pdf

It would be unfair to say that react-pdf is not feature-complete, it's just that all the features, including scrolling and updating the page number, pinch and zoom, document outline, etc., are all to be implemented by ourselves - which I did - but I just wrote some basic features like navigating the pages with buttons and jumping to specific pages. The UI was also quite basic, as react-pdf's killer feature is rendering individual PDF pages inside the webpage as if they were images.

The original react-pdf implementation of the PDF viewer
The original react-pdf implementation of the PDF viewer

Onwards - new PDF viewer

Embedding the PDF directly

Anyway, I decided to either ditch react-pdf altogether and look for a more feature-complete React component for viewing PDFs, or just leverage the default PDF viewer which comes with all modern browsers nowadays. I eventually settled down for the latter notion to embed the PDF into the website with <iframe> and <object> tags, which are totally viable options for embedding PDFs without using client-side JavaScript.

<object data="..." type="application/pdf" width="100%" height="100%">
  <iframe src="..." width="100%" height="100%" style="border: none;">
    This browser does not support PDFs.
  </iframe>
</object>

But - and this is a HUGE but - the download link of PDF files returned by OneDrive API looks like:

https://public.bn.files.1drv.com/y4ma94vzjJyqvNZ23de78mvzNm3jFwwKyRiQgn-v9ZFs8twkO_favEvQjeUy-4OhNn9VtF5uD2IM_XDIqRlHvE8QZi-G4fXWYX6jyuyUWCA4jz_F8L-QvAlkUA6PpcgoW7mXpPpJtT-rX4tkI7w-3dk12LjTaBaXXO1__IVTA_xxryDGulFH25pJlUo5us4wthSfkG937hF6syLsEjI6465Xa4u_EhdONV8UCt-xUpFALkcLea80VN-qd6ml_2NLdv4

This, on its own, looks quite normal and standard. But when I tried to embed the PDF with this URL (providing it to data and src in the JSX above), the page doesn't load up a native viewer - instead, it downloads the file directly.

What went wrong?

After some Googling and examining URLs that actually brings up the native PDF viewer such as ...

https://arxiv.org/pdf/1709.00440.pdf

... I found the reason: a header called content-disposition including a field called attachment. Let's take a closer look at the response of these two URLs.

The two URLs all return PDF files, where two of the headers that come with the response are:

  • content-type - representing the type of the file, which is application/pdf in our case. And ...
  • content-disposition - the main culprit today.

The former URL (which triggers a direct download instead of in-browser preview) contains this header with the content of:

content-disposition: attachment; filename="ECAM22.110.SB.pdf"

While the latter doesn't contain this header at all.

This is the deciding factor for whether downloading the file or previewing it inside the browser. The attachment indicates that the file is to be downloaded directly, regardless of it's opened in a browser window or embedded inside other websites, where if we want to force the file to be previewed in the browser, the attachment keyword should be replaced with inline.

Mitigation?

Since the direct download URL is returned by OneDrive API, we cannot manipulate its headers. If we want to change the header information so that our document gets embedded and previewed by the browser, we need to relay the response through our server, and add the adjustments to the response sent by us. It’s the only way to adjust response header information.

So, relaying the response through our servers ... that sounds suspiciously familiar - yep, it’s our proxied download feature!

Proxied download with custom headers

Without thinking too much, I set out to implement the proxied download feature. Inside Next.js, I created a new serverless API route /api/proxy to handle this request. It would accept an encoded url as a parameter, and pipe the download stream through our server before sending it to the client. Basically, the code looked something like:

export default async function handler(req: NextApiRequest, res: NextApiResponse) {
  // ... getting the decodedUrl

  const { headers, data: stream } = await axios.get(decodedUrl, {
    responseType: 'stream',
  })

  // ... some other functions

  // Send data stream as response
  res.writeHead(200, headers)
  stream.pipe(res)
}

We can see that the header is exposed here in the relay, which we can manipulate.

// Check if requested file is PDF based on content-type
if (headers['content-type'] === 'application/pdf' && inline) {
  // Get filename from content-disposition header
  const filename = headers['content-disposition'].split('filename*=')[1]
  // Remove original content-disposition header
  delete headers['content-disposition']
  // Add new inline content-disposition header along with filename
  headers['content-disposition'] = `inline; filename*=UTF-8''${filename}`
}

So, by implementing this new API route, we can effectively change the header of the PDF direct download link, so that the browser embeds the document instead of downloading it. We are only doing this to PDFs and will only change the header if an additional inline=true is present in the request URL. And ... we have implemented proxied download along the way - which is not my original intention.

I originally decided to pass this particular feature of relaying the download stream through Vercel, as it may not contribute to faster download speeds considering Vercel is still blocked in some parts of mainland China - which is the original purpose of proxied download in Cloudflare Workers. Anyway, what’s done is done. 🥶

Why was it useless after all?

I immediately came across Mozilla’s online demo of pdf.js - what I think is one of the most feature complete PDF viewer implementations there is on earth. The online demo basically provides a pdf.js powered preview server that renders any publicly available PDF online. You can simply provide it with a encoded PDF URL, and it will render the viewer along with complete document controls and more.

https://mozilla.github.io/pdf.js/web/viewer.html?file=<ENCODED_URL>

The barebone implementation that we mentioned in previous chapters requires a native PDF viewer which is not available in most mobile browsers. Leveraging Mozilla’s pdf.js demo server, we can basically:

  • mitigate all compatibility issues on different platforms ...
  • use the URL provided by OneDrive API directly without hacky header adjustments ...
  • and have native controls and dark mode support. Awesome!
The new embedded PDF viewer based on Mozilla's pdf.js
The new embedded PDF viewer based on Mozilla's pdf.js

So, at the end of the day, the PDF preview is built with Mozilla’s pdf.js and embedding it inside our website with <iframe>, and the new relay API was not used as its original purpose. Regardless, I provided buttons to download files proxied through Vercel if anyone ever wants to. (The inline=true parameter was also completely useless...)

References

https://stackoverflow.com/questions/6293893/how-do-i-force-files-to-open-in-the-browser-instead-of-downloading-pdf
https://pdfobject.com/static/

Cheers, and hope you enjoy onedrive-vercel-index. 🥳

How I accidentally added proxied download to onedrive-vercel-index - Spencer Woo
Author
Spencer Woo
Date
12/29/2021

Attribution, non-commercial, and sharealike.

cd /blog