Experimental proxy and wrapper for safely embedding Web Archives (warc.gz, wacz) into web pages.

Overview

warc-embed-netlify 🏛️

Experimental proxy and wrapper for safely embedding Web Archives (.warc.gz, .wacz) into web pages.

This particular implementation uses Netlify and its Edge Functions as its backbone.

See also: warc-embed (Self-hosted + NGINX version)


Summary


Concept

"It's a wrapper"

warc-embed-netlify serves an HTML document containing a pre-configured instance of <replay-web-page>, webrecorder's front-end archive playback system, pointing at a proxied version of the requested archive.

The playback will only start when said document is embedded in a cross-origin <iframe> for security reasons (XSS prevention in the context of an <iframe> needing both allow-script and allow-same-origin).

See details for the /embed route.

"It's a proxy"

warc-embed-netlify pulls the requested archive file and adds the HTTP headers <replay-web-page> requires in order to download and interpret the file, such as access-control-allow-origin and content-type.

It also offers a very basic polyfill for range requests, required for playing back .wacz files, if the server hosting the archive file does not support this feature.

See details for the /archive.warc.gz route - for the /archive.wacz route.

Example

<!-- On https://*.domain.ext: -->
<iframe
  src="https://warcembed.domain.ext/embed/?archive-url=https://otherdomain.ext/archive.warc.gz&original-url=https://what-was-archived.ext/path"
  allow="allow-scripts allow-modals allow-forms allow-same-origin"
>
</iframe>

☝️ Back to summary


Deployment

Allowlist

The proxy will only pull archive files from hosts listed in allowlist.js.

Edit this file to determine which domains a specific instance of the proxy can pull files from.

Updating <replay-web-page>

This project hosts its own copy of replayweb.page.

You may update it to the latest version by running ./update-replay-web-page.sh and pushing changes.

Deploy on Netlify

Deploy to Netlify

At the time of writing this README, Netlify's free plan grants 3M Netlify Edge function hits per month and per account.

See Netlify's pricing.

Attaching a subdomain to this deployment:

See Netlify's documentation on domains management.

☝️ Back to summary


Routes

/embed

Role

Serves an HTML document containing an instance of <replay-web-page>, pointing at a proxied archive file.

Must be embedded in a cross-origin <iframe>, preferably on the same parent domain to avoid thrid-party cookie limitations:

warcembed.domain.ext: Hosts warc-embed-netlify
www.domain.ext: Has iframes pointing to warc.domain.ext/embed

Methods

GET, HEAD

Source

embed.js

Query parameters

Name Required ? Description
archive-url Yes Full url to the .warc.gz or .wacz file to embed. Must point to a host listed in allowlist.
original-url Yes Url of the page that was archived.

Example

<!-- On https://*.domain.ext: -->
<iframe
  src="https://warcembed.domain.ext/embed/?archive-url=https://otherdomain.ext/archive.warc.gz&original-url=https://what-was-archived.ext/path"
  allow="allow-scripts allow-modals allow-forms allow-same-origin"
>
</iframe>

/archive.[wacz|warc.gz]

Role

Pulls a given .wacz or warc.gz file from the url given by ?archive-url and serves it with the headers needed to playback including:

  • access-control-allow-origin
  • accept-ranges
  • content-type
  • content-disposition

The <replay-web-page> instance in the document generated by /embed points to this route.

Files need to be hosted on a server supporting range requests: archive.js will try to detect support for range requests, and provide a basic polyfill for it if not.

Methods

GET, HEAD

Source

archive.js

Query parameters

Name Required ? Description
archive-url Yes Full url to the .wacz or .warc.gz file to embed. Must point to a host listed in allowlist.

☝️ Back to summary


Local development

This project can be run locally using the Netlify CLI. No account is needed.

In your terminal:

# Install netlify-cli globally 
npm install netlify-cli -g

# Start the development server (should run on port 8888 by default)
netlify dev

☝️ Back to summary

You might also like...

Use Cloudflare Pages Functions as a reverse proxy with custom domain support.

Use Cloudflare Pages Functions as a reverse proxy with custom domain support.

cf-page-func-proxy Use Cloudflare Pages Functions as a reverse proxy with custom domain support. Getting Start 1.下载或是Fork本仓库 2.修改_worker.js中的url.hostn

Dec 23, 2022

injects Chromium extension into packaged electron apps. highly experimental. might work.

Electron extension injector injects Chromium extension into packaged electron apps. highly experimental. might work. use Alt+Shift+E to access extensi

Sep 2, 2022

On this page, you can save and load all the awesome books you have and save the name and the author into the local storage. this project uses Javascript to interact with the pages

Awesome Books: refactor to use JavaScript classes In this project, We add the links to the applications into the final project Getting Started if you

Nov 29, 2022

experimental web browser optimized for rabbit-holing

experimental web browser optimized for rabbit-holing

Cartographist Cartographist is an experimental web browser optimized for rabbit-holing. Instead of opening new windows (with cmd-click), Cartographist

Jan 2, 2023

An experimental syntax highlighter web app bot based on Telegram's WebApp update.

Syntax Highlighter WebApp Inspired by zubiden/tg-web-bot-demo. Try the demo bot running here: @syntaxyybot Recently Telegram released a big update for

Nov 8, 2022

GitHub and Markdown-Based CMS for Blogs. EXPERIMENTAL and in the "Idea" stage. I have no clue if this is feasible.

Turborepo starter This is an official pnpm starter turborepo. What's inside? This turborepo uses pnpm as a package manager. It includes the following

Oct 13, 2022

An experimental plugin to preview and insert block patterns in WordPress.

An experimental plugin to preview and insert block patterns in WordPress.

Block Pattern Explorer The Block Pattern Explorer is an experimental WordPress plugin based heavily on the work currently being done in Gutenberg. The

Oct 19, 2022

Experimental tool inspired by Capsize, implemented in Solid JS. Generates x-height and cap-height CSS

Usage Those templates dependencies are maintained via pnpm via pnpm up -Lri. This is the reason you see a pnpm-lock.yaml. That being said, any package

Jul 5, 2022

Experimental URL-CID index using b trees (chunky-trees from @mikeal)

ipfs-url-index Experimental IPFS index for URL-CID, implemented using chunky-trees B-Tree implementation. API Server Run node main.js to start the ap

Mar 14, 2022
Comments
Owner
Harvard Library Innovation Laboratory
Harvard Library Innovation Laboratory
Grupprojekt för kurserna 'Javascript med Ramverk' och 'Agil Utveckling'

JavaScript-med-Ramverk-Laboration-3 Grupprojektet för kurserna Javascript med Ramverk och Agil Utveckling. Utvecklingsguide För information om hur utv

Svante Jonsson IT-Högskolan 3 May 18, 2022
Hemsida för personer i Sverige som kan och vill erbjuda boende till människor på flykt

Getting Started with Create React App This project was bootstrapped with Create React App. Available Scripts In the project directory, you can run: np

null 4 May 3, 2022
Kurs-repo för kursen Webbserver och Databaser

Webbserver och databaser This repository is meant for CME students to access exercises and codealongs that happen throughout the course. I hope you wi

null 14 Jan 3, 2023
A JavaScript library to read, write, and merge ZIP archives in web browsers.

Armarius About Armarius is a JavaScript library to read, write, and merge ZIP archives in web browsers. This library mainly focuses on a low memory fo

Aternos 5 Nov 9, 2022
proxy 🦄 yxorp is your Web Proxy as a Service (SAAS) Multi-tenant, Multi-Threaded, with Cache & Article Spinner

proxy ?? yxorp is your Web Proxy as a Service (SAAS) Multi-tenant, Multi-Threaded, with Cache & Article Spinner. Batteries are included, Content Spinning and Caching Engine, all housed within a stunning web GUI. A unique high-performance, plug-and-play, multi-threaded website mirror and article spinner

4D/ҵ.com Dashboards 13 Dec 30, 2022
Proxy but misspelled -- closed proxy for the internet

pyrox Proxy that runs on Cloudflare Workers. Setup Install wrangler2. npm install wrangler. Generate a public Ed25519 key, exported under SPKI mode wi

bots.gg 10 Sep 9, 2022
A personal semantic search engine capable of surfacing relevant bookmarks, journal entries, notes, blogs, contacts, and more, built on an efficient document embedding algorithm and Monocle's personal search index.

Revery ?? Revery is a semantic search engine that operates on my Monocle search index. While Revery lets me search through the same database of tens o

Linus Lee 215 Dec 30, 2022
A Hackable Markdown Note Application for Programmers. Version control, AI completion, mind map, documents encryption, code snippet running, integrated terminal, chart embedding, HTML applets, plug-in, and macro replacement.

Yank Note A hackable markdown note application for programmers Download | Try it Online >>> Not ecommended English | 中文说明 [toc]{level: [2]} Highlights

洋子 4.3k Dec 31, 2022
Stochastic neighbor embedding meets Voronoi tessellation.

Sneronoi This repository contains the source code for the generative art collection Sneronoi, written by Claus O. Wilke and released on fx(hash) on De

Claus Wilke 5 Sep 2, 2022
Prismatic's library for embedding a prismatic.io marketplace in your app

@prismatic-io/marketplace @prismatic-io/marketplace allows you to to embed Prismatic's Integration Marketplace within your web application, giving you

Prismatic.io 7 Dec 15, 2022