🏛️
warc-embed-netlify Experimental proxy and wrapper for safely embedding Web Archives (.warc.gz
, .wacz
) into web pages.
This particular implementation uses Netlify and its Edge Functions as its backbone.
See also: warc-embed
(Self-hosted + NGINX version)
Summary
Concept
"It's a wrapper"
warc-embed-netlify
serves an HTML document containing a pre-configured instance of <replay-web-page>
, webrecorder's front-end archive playback system, pointing at a proxied version of the requested archive.
The playback will only start when said document is embedded in a cross-origin <iframe>
for security reasons (XSS prevention in the context of an <iframe>
needing both allow-script
and allow-same-origin
).
See details for the /embed
route.
"It's a proxy"
warc-embed-netlify
pulls the requested archive file and adds the HTTP headers <replay-web-page>
requires in order to download and interpret the file, such as access-control-allow-origin and content-type.
It also offers a very basic polyfill for range requests, required for playing back .wacz
files, if the server hosting the archive file does not support this feature.
See details for the /archive.warc.gz
route - for the /archive.wacz
route.
Example
<!-- On https://*.domain.ext: -->
<iframe
src="https://warcembed.domain.ext/embed/?archive-url=https://otherdomain.ext/archive.warc.gz&original-url=https://what-was-archived.ext/path"
allow="allow-scripts allow-modals allow-forms allow-same-origin"
>
</iframe>
Deployment
Allowlist
The proxy will only pull archive files from hosts listed in allowlist.js.
Edit this file to determine which domains a specific instance of the proxy can pull files from.
<replay-web-page>
Updating This project hosts its own copy of replayweb.page.
You may update it to the latest version by running ./update-replay-web-page.sh
and pushing changes.
Deploy on Netlify
At the time of writing this README, Netlify's free plan grants 3M Netlify Edge function hits per month and per account.
See Netlify's pricing.
Attaching a subdomain to this deployment:
See Netlify's documentation on domains management.
Routes
/embed
Role
Serves an HTML document containing an instance of <replay-web-page>
, pointing at a proxied archive file.
Must be embedded in a cross-origin <iframe>
, preferably on the same parent domain to avoid thrid-party cookie limitations:
warcembed.domain.ext: Hosts warc-embed-netlify
www.domain.ext: Has iframes pointing to warc.domain.ext/embed
Methods
GET
, HEAD
Source
Query parameters
Name | Required ? | Description |
---|---|---|
archive-url |
Yes | Full url to the .warc.gz or .wacz file to embed. Must point to a host listed in allowlist. |
original-url |
Yes | Url of the page that was archived. |
Example
<!-- On https://*.domain.ext: -->
<iframe
src="https://warcembed.domain.ext/embed/?archive-url=https://otherdomain.ext/archive.warc.gz&original-url=https://what-was-archived.ext/path"
allow="allow-scripts allow-modals allow-forms allow-same-origin"
>
</iframe>
/archive.[wacz|warc.gz]
Role
Pulls a given .wacz
or warc.gz
file from the url given by ?archive-url
and serves it with the headers needed to playback including:
access-control-allow-origin
accept-ranges
content-type
content-disposition
The <replay-web-page>
instance in the document generated by /embed
points to this route.
Files need to be hosted on a server supporting range requests: archive.js
will try to detect support for range requests, and provide a basic polyfill for it if not.
Methods
GET
, HEAD
Source
Query parameters
Name | Required ? | Description |
---|---|---|
archive-url |
Yes | Full url to the .wacz or .warc.gz file to embed. Must point to a host listed in allowlist. |
Local development
This project can be run locally using the Netlify CLI. No account is needed.
In your terminal:
# Install netlify-cli globally
npm install netlify-cli -g
# Start the development server (should run on port 8888 by default)
netlify dev