The high efficent browser driver on top of puppeteer, ready for production scenarios.

Overview

browserless

Last version Coverage Status NPM Status

browserless is an efficient driver for controlling headless browsers built on top of puppeteer developed for scenarios where performance matters.

Highlights

Installation

You can install it via npm:

$ npm install browserless puppeteer --save

browserless is backed by puppeteer, so you need to install it as well.

You can use it next to puppeteer, puppeteer-core or puppeteer-firefox, interchangeably.

Usage

This is a full example for showcase all the browserless capabilities:

const createBrowserless = require('browserless')
const termImg = require('term-img')

// First, create a browserless factory 
// that it will keep a singleton process running
const browserlessFactory = createBrowserless()

// After that, you can create as many browser context
// as you need. The browser contexts won't share cookies/cache 
// with other browser contexts.
const browserless = await browserlessFactory.createContext()

// Perform the action you want, e.g., getting the HTML markup
const buffer = await browserless.screenshot('http://example.com', {
  device: 'iPhone 6'
})

console.log(termImg(buffer))

// After your task is done, destroy your browser context
await browserless.destroyContext()

// At the end, gracefully shutdown the browser process
await browserlessFactory.close()

As you can see, browserless is implemented using a single browser process and creating/destroying specific browser contexts.

You can read more about that at technical details section.

If you're already using puppeteer, you can upgrade to use browserless instead almost with no effort.

Additionally, you can use some specific packages in your codebase, interacting with them from puppeteer.

Initialization

All methods follow the same interface:

  • <url>: The target URL. It's required.
  • [options]: Specific settings for the method. It's optional.

The methods follows an async interface, returning a Promise.

.constructor(options)

It initializes a singleton browserless process, returning a factory that will be used for creating browser contexts:

const browserlessFactory = require('browserless')

const { createContext } = browserlessFactory({
  timeout: 25000,
  lossyDeviceName: true,
  ignoreHTTPSErrors: true 
})

// Now every time you call `createContext` 
// it will be create a browser context.
const browserless = await createContext({ retry: 2 })

They are some propetary browserless options; The rest of options will be passed to puppeter.launch.

options

See puppeteer.launch#options.

Additionally, you can setup:

defaultDevice

type: string
default: 'Macbook Pro 13'

Sets a consistent device viewport for each page.

lossyDeviceName

type: boolean
default: false

It enables lossy detection over the device descriptor input.

const browserless = require('browserless')({ lossyDeviceName: true })

browserless.getDevice({ device: 'macbook pro 13' })
browserless.getDevice({ device: 'MACBOOK PRO 13' })
browserless.getDevice({ device: 'macbook pro' })
browserless.getDevice({ device: 'macboo pro' })

This setting is oriented for find the device even if the descriptor device name is not exactly the same.

mode

type: string
default: launch
values: 'launch' | 'connect'

It defines if browser should be spawned using puppeteer.launch or puppeteer.connect

timeout

type: number
default: 30000

This setting will change the default maximum navigation time.

puppeteer

type: Puppeteer
default: puppeteer|puppeteer-core|puppeteer-firefox

It's automatically detected based on your dependencies being supported puppeteer, puppeteer-core or puppeteer-firefox.

.createContext(options)

Now you have your browserless factory instantiated, you can create browser contexts on demand:

const browserless = browserlessFactory.createContext({ 
  retry: 2 
})

Every browser context is isolated. They won't share cookies/cache with other browser contexts. They also can contain specific options.

options

retry

type: number
default: 2

The number of retries that can be performed before considering a navigation as failed.

.browser

It returns the Browser instance associated with your browserless factory.

const browser = await browserlessFactory.browser()
console.log('My browser PID is', browser.proces().pid)

.respawn

It will respawn the singleton browser associated with your browserless factory.

const getPID = promise => (await promise).process().pid

console.log('Process PID:', await getPID(browserlessFactory.browser()))

await browserlessFactory.respawn()

console.log('Process PID:', await getPID(browserlessFactory.browser()))

This method is am implementation detail, normally you don't need to call it.

.close

It will close the singleton browser associated with your browserless factory.

const onExit = require('signal-exit')

onExit(async (code, signal) => {
  console.log('shutting down all the things')
  await browserlessFactory.close()
  console.log(`exit with code ${code} (${signal})`)
})

It should be used to gracefully shutdown your resources.

Methods

.html(url, options)

It serializes the content from the target url into HTML.

const html = await browserless.html('https://example.com')
console.log(html)

options

See browserless.goto to know all the options and values supported.

.text(url, options)

It serializes the content from the target url into plain text.

const text = await browserless.text('https://example.com')
console.log(text)

options

See browserless.goto to know all the options and values supported.

.pdf(url, options)

It generates the PDF version of a website behind an url.

const buffer = await browserless.pdf('https://example.com')
console.log(`PDF generated in ${buffer.byteLength()} bytes`)

options

This method use the following options by default:

{
  margin: '0.35cm',
  printBackground: true,
  scale: 0.65
}

See browserless.goto to know all the options and values supported.

Also, any page.pdf option is supported.

Additionally, you can setup:

margin

type: stringstring[]
default: '0.35cm'

It sets paper margins. All possible units are:

  • px for pixel.
  • in for inches.
  • cm for centimeters.
  • mm for millimeters.

You can pass an object object specifying each corner side of the paper:

const buffer = await browserless.pdf(url.toString(), {
  margin: {
    top: '0.35cm',
    bottom: '0.35cm',
    left: '0.35cm',
    right: '0.35cm'
  }
})

Or, in case you pass an string, it will be used for all the sides:

const buffer = await browserless.pdf(url.toString(), {
  margin: '0.35cm'
})

.screenshot(url, options)

It takes a screenshot from the target url.

const buffer = await browserless.screenshot('https://example.com')
console.log(`Screenshot taken in ${buffer.byteLength()} bytes`)

options

This method use the following options by default:

{
  device: 'macbook pro 13'
}

See browserless.goto to know all the options and values supported.

Also, any page.screenshot option is supported.

Additionally, you can setup:

codeScheme

type: string
default: 'atom-dark'

When this value is present and the response 'Content-Type' header is 'json', it beautifies HTML markup using Prism.

The syntax highlight theme can be customized, being possible to setup:

  • A prism-themes identifier (e.g., 'dracula').
  • A remote URL (e.g., 'https://unpkg.com/prism-theme-night-owl').
element

type: string

Capture the DOM element matching the given CSS selector. It will wait for the element to appear in the page and to be visible.

overlay

type: object

After the screenshot has been taken, this option allows you to place the screenshot into a fancy overlay

You can configure the overlay specifying:

  • browser: It sets the browser image overlay to use, being light and dark supported values.
  • background: It sets the background to use, being supported to pass:
    • An hexadecimal/rgb/rgba color code, eg. #c1c1c1.
    • A CSS gradient, eg. linear-gradient(225deg, #FF057C 0%, #8D0B93 50%, #321575 100%)
    • An image url, eg. https://source.unsplash.com/random/1920x1080.
const buffer = await browserless.screenshot(url.toString(), {
  hide: ['.crisp-client', '#cookies-policy'],
  overlay: {
    browser: 'dark',
    background:
      'linear-gradient(45deg, rgba(255,18,223,1) 0%, rgba(69,59,128,1) 66%, rgba(69,59,128,1) 100%)'
  }
})

.destroyContext

It will destroy the current browser context

const browserless = await browserlessFactory.createContext({ retry: 0 })

const content = await browserless.html('https://example.com')

await browserless.destroyContext()

.getDevice(options)

Giving a specific device descriptons, this method will be the devices settings for it.

browserless.getDevice({ device: 'Macbook Pro 15' })
// {
//   userAgent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.89 Safari/537.36',
//   viewport: {
//     width: 1440,
//     height: 900,
//     deviceScaleFactor: 2,
//     isMobile: false,
//     hasTouch: false,
//     isLandscape: false
//   }
// }

It extends from puppeteer.devices, adding some missing devices there.

options

device

type: string

The device descriptor name. It's used to find the rest presets associated with it.

When lossyDeviceName is enabled, a fuzzy search rather than a strict search will be performed in order to maximize getting a result back.

viewport

type: object

An extra of viewport settings that will be merged with the device presets.

browserless.getDevice({ 
  device: 'iPad', 
  viewport: {
    isLandscape: true
  } 
})
headers

type: object

An extra of headers that will be merged with the device presets.

browserless.getDevice({ 
  device: 'iPad', 
  headers: {
    'user-agent': 'googlebot'
  } 
})

.evaluate(fn, gotoOpts)

It exposes an interface for creating your own evaluate function, passing you the page and response.

The fn will receive page and response as arguments:

const ping = browserless.evaluate((page, response) => ({
  statusCode: response.status(),
  url: response.url(),
  redirectUrls: response.request().redirectChain()
}))

await ping('https://example.com')
// {
//   "statusCode": 200,
//   "url": "https://example.com/",
//   "redirectUrls": []
// }

You don't need to close the page; It will be closed automatically.

Internally, the method performs a browserless.goto, being possible to pass extra arguments as second parameter:

const serialize = browserless.evaluate(
  page => page.evaluate(() => document.body.innerText), {
  waitUntil: 'domcontentloaded'
})

await serialize('https://example.com')
// '<!DOCTYPE html><html><div>…'

.goto(page, options)

It performs a page.goto with a lot of extra capabilities

const browserless = require('browserless')

const page = await browserless.page()
const { response, device } = await browserless.goto(page, { url: 'http://example.com' })

options

Any option passed here will bypass to page.goto.

Additionally, you can setup:

abortTypes

type: array
default: []

It sets the ability to abort requests based on the resource type.

adblock

type: boolean
default: true

It enabled the builtin adblocker by Cliqz that aborts unnecessary third party requests associated with ads services.

animations

type: boolean
default: false

Disable CSS animations and transitions, also it sets prefers-reduced-motion consequently.

click

type: stringstring[]

Click the DOM element matching the given CSS selector.

device

type: string
default: 'macbook pro 13'

It specifies the device descriptor to use in order to retrieve userAgent and viewport.

evasions

type: string[]
default: require('@browserless/goto').evasions

It makes your Headless undetectable, preventing to being blocked.

These techniques are used by antibot systems to check if you are a real browser and block any kind of automated access. All the evasion techniques implemented are:

Evasion Description
chromeRuntime Ensure window.chrome is defined.
stackTraces Prevent detect Puppeteer via variable name.
mediaCodecs Ensure media codedcs are defined.
navigatorPermissions Mock over Notification.permissions.
navigatorPlugins Ensure your browser has NavigatorPlugins defined.
navigatorWebdriver Ensure Navigator.webdriver exists.
randomizeUserAgent Use a different User-Agent every time.
webglVendor Ensure WebGLRenderingContext & WebGL2RenderingContext are defined.

The evasion techniques are enabled by default. You can omit techniques just filtering them:

const createBrowserless = require('browserless')

const evasions = require('@browserless/goto').evasions.filter(
  (evasion) => evasion !== 'randomizeUserAgent'
)

const browserlessFactory = createBrowserless({ evasions });
headers

type: object

An object containing additional HTTP headers to be sent with every request.

const browserless = require('browserless')

const page = await browserless.page()
await browserless.goto(page, {
  url: 'http://example.com',
  headers: {
    'user-agent': 'googlebot',
    cookie: 'foo=bar; hello=world'
  }
})
hide

type: stringstring[]

Hide DOM elements matching the given CSS selectors.

const buffer = await browserless.screenshot(url.toString(), {
  hide: ['.crisp-client', '#cookies-policy']
})

This sets visibility: hidden on the matched elements.

html

type: string

In case you provide HTML markup, a page.setContent avoiding fetch the content from the target URL.

javascript

type: boolean
default: true

When it's false, it disables JavaScript on the current page.

mediaType

type: string
default: 'screen'

Changes the CSS media type of the page using page.emulateMediaType.

modules

type: stringstring[]

Injects <script type="module"> into the browser page.

It can accept:

  • Absolute URLs (e.g., 'https://cdn.jsdelivr.net/npm/@microlink/[email protected]/src/browser.js').
  • Local file (e.g., `'local-file.js').
  • Inline code (e.g., "document.body.style.backgroundColor = 'red'").
const buffer = await browserless.screenshot(url.toString(), {
  modules: [
    'https://cdn.jsdelivr.net/npm/@microlink/[email protected]/src/browser.js',
    'local-file.js',
    "document.body.style.backgroundColor = 'red'"
  ]
})
remove

type: stringstring[]

Remove DOM elements matching the given CSS selectors.

const buffer = await browserless.screenshot(url.toString(), {
  remove: ['.crisp-client', '#cookies-policy']
})

This sets display: none on the matched elements, so it could potentially break the website layout.

colorScheme

type: string
default: 'no-preference'

Sets prefers-color-scheme CSS media feature, used to detect if the user has requested the system use a 'light' or 'dark' color theme.

scripts

type: stringstring[]

Injects <script> into the browser page.

It can accept:

  • Absolute URLs (e.g., 'https://cdn.jsdelivr.net/npm/@microlink/[email protected]/src/browser.js').
  • Local file (e.g., `'local-file.js').
  • Inline code (e.g., "document.body.style.backgroundColor = 'red'").
const buffer = await browserless.screenshot(url.toString(), {
  scripts: [
    'https://cdn.jsdelivr.net/npm/[email protected]/dist/jquery.min.js',
    'local-file.js',
    "document.body.style.backgroundColor = 'red'"
  ]
})

Prefer to use modules whenever possible.

scroll

type: string

Scroll to the DOM element matching the given CSS selector.

styles

type: stringstring[]

Injects <style> into the browser page.

It can accept:

  • Absolute URLs (e.g., 'https://cdn.jsdelivr.net/npm/[email protected]/dist/dark.css').
  • Local file (e.g., `'local-file.css').
  • Inline code (e.g., "body { background: red; }").
const buffer = await browserless.screenshot(url.toString(), {
  styles: [
    'https://cdn.jsdelivr.net/npm/[email protected]/dist/dark.css',
    'local-file.css',
    'body { background: red; }'
  ]
})
timezone

type: string

It changes the timezone of the page.

url

type: string

The target URL.

viewport

It will setup a custom viewport, using page.setViewport method.

waitForSelector

type:string

Wait a quantity of time, selector or function using page.waitForSelector.

waitForTimeout

type:number

Wait a quantity of time, selector or function using page.waitForTimeout.

waitUntil

type: string | string[]
default: 'auto'
values: 'auto' | 'load' | 'domcontentloaded' | 'networkidle0' | 'networkidle2'

When to consider navigation succeeded.

If you provide an array of event strings, navigation is considered to be successful after all events have been fired.

Events can be either:

  • 'auto': A combination of 'load' and 'networkidle2' in a smart way to wait the minimum time necessary.
  • 'load': Consider navigation to be finished when the load event is fired.
  • 'domcontentloaded': Consider navigation to be finished when the DOMContentLoaded event is fired.
  • 'networkidle0': Consider navigation to be finished when there are no more than 0 network connections for at least 500 ms.
  • 'networkidle2': Consider navigation to be finished when there are no more than 2 network connections for at least 500 ms.

.context

It returns the BrowserContext associated with your instance.

const browserContext = await browserless.context()

console.log({ isIncognito: browserContext.isIncognito() })
// => { isIncognito: true }

.page

It returns a standalone Page associated with the current browser context.

const page = await browserless.page()
await page.content()
// => '<html><head></head><body></body></html>'

Command Line Interface

You can perform any browserless action from your terminal.

Just you need to install @browserless/cli globally:

npm install @browserless/cli --global

Additionally, can do it under demand using npx:

npx @browserless/cli --help

That's the preferred way to interact with the CLI under CI/CD scenarios.

Lighthouse

browserless has a Lighthouse integration that connects to a Puppeteer instance in a simple way.

const lighthouse = require('@browserless/lighthouse')
const { writeFile } = require('fs/promises')

const report = await lighthouse('https://example.com')

await writeFile('report.json', JSON.stringify(report, null, 2))

The report will be generated url, extending from lighthouse:default settings, being these settings the same than Google Chrome Audits reports on Developer Tools.

options

The second argument can contain lighthouse specific settings The following options are used by default:

{
  logLevel: 'error',
  output: 'json',
  device: 'desktop',
  onlyCategories: ['perfomance', 'best-practices', 'accessibility', 'seo']
}

See Lighthouse configuration to know all the options and values supported.

Additionally, you can setup:

getBrowserless

type: function
default: require('browserless')

The browserless instance to use for getting the browser.

logLevel

type: string
default: 'error'
values: 'silent' | 'error' | 'info' | 'verbose'

The level of logging to enable.

output

type: string | string[]
default: 'json'
values: 'json' | 'csv' | 'html'

The type(s) of report output to be produced.

device

type: string
default: 'desktop'
values: 'desktop' | 'mobile' | 'none'

How emulation (useragent, device screen metrics, touch) should be applied. 'none' indicates Lighthouse should leave the host browser as-is.

onlyCategories

type: string[]null
default: ['performance', 'best-practices', 'accessibility', 'seo']
values: 'performance' | 'best-practices' | 'accessibility' | 'pwa' | 'seo'

Includes only the specified categories in the final report.

Packages

browserless is internally divided into multiple packages for ensuring just use the minimum quantity of code necessary for your use case.

Package Version
browserless npm
@browserless/benchmark npm
@browserless/cli npm
@browserless/devices npm
@browserless/examples npm
@browserless/errors npm
@browserless/function npm
@browserless/goto npm
@browserless/pdf npm
@browserless/screenshot npm
@browserless/lighthouse npm

FAQ

Q: Why use browserless over puppeteer?

browserless not replace puppeteer, it complements. It's just a syntactic sugar layer over official Headless Chrome oriented for production scenarios.

Q: Why do you block ads scripts by default?

Headless navigation is expensive compared with just fetch the content from a website.

In order to speed up the process, we block ads scripts by default because they are so bloat.

Q: My output is different from the expected

Probably browserless was too smart and it blocked a request that you need.

You can active debug mode using DEBUG=browserless environment variable in order to see what is happening behind the code:

Consider open an issue with the debug trace.

Q: I want to use browserless with my AWS Lambda like project

Yes, check chrome-aws-lambda to setup AWS Lambda with a binary compatible.

License

browserless © Microlink, Released under the MIT License.
Authored and maintained by Microlink with help from contributors.

The logo has been designed by xinh studio.

microlink.io · GitHub @MicrolinkHQ · Twitter @microlinkhq

Comments
  • Evasion techniques

    Evasion techniques

    Libraries

    • https://github.com/paulirish/headless-cat-n-mouse
    • https://github.com/berstend/puppeteer-extra/tree/master/packages/puppeteer-extra-plugin-stealth

    URLs to test

    • [x] https://www.stuff.co.nz/national/politics/108051628/poll-shows-labour-overtake-national
    • [x] https://www.zomato.com/bangalore/bold-marathahalli
    • [x] https://www.reddit.com/r/news/comments/8vebjp/lebron_james_takes_154_million_4year_deal_with/?st=JJ41AUKB&sh=be6868cb
    • [x] https://www.reddit.com/r/funny/comments/8ltsck/this_sea_dog_is_the_ultimate_prankster/
    • [ ] https://www.zillow.com/homedetails/4611-Cardinal-Ridge-Way-Flowery-Branch-GA-30542/83350172_zpid
    • [ ] https://www.kmart.com.au/product/portable-charger-15000mah/2168305
    • [ ] https://www.bloomberg.com/news/articles/2019-01-15/here-are-five-volatility-charts-keeping-wall-street-up-at-night?srnd=premium-canada
    • [ ] https://www.scmp.com/week-asia/opinion/article/2112486/inconvenient-truths-murder-journalism-india
    • [ ] https://startse.com/noticia/netflix-do-esporte-planeja-chegada-ao-brasil-ma-noticia-para-globo
    • [ ] https://www.coches.net/segunda-mano/
    • [ ] https://www.ouest-france.fr/
    • [ ] https://www.washingtonpost.com/nation/2020/06/25/coronavirus-live-updates-us/

    Related

    • https://timvanscherpenzeel.github.io/detect-gpu/
    opened by Kikobeats 11
  • Lighthouse: images for desktop reports returning mobile interface

    Lighthouse: images for desktop reports returning mobile interface

    Bug Report

    Current Behavior When I use the following MQL API, the report returns the result.data.insights.lighthouse.audits['final-screenshot'] is returned as a base64 encoded image. However, this image is of the mobile view and not of the desktop view of the website.

    const url = 'https://anywebsitehere.com';
    const payload = {
      meta: false,
      insights: {
        lighthouse: {
          device: 'desktop',
          onlyCategories: ['performance', 'best-practices', 'accessibility', 'seo'],
        },
        technologies: false,
      },
    };
    
    const result = await mql(url, payload);
    

    Expected behavior/code

    I'd expect the above to return the desktop variation of the image and not a mobile version.

    Additional context/Screenshots

    Can be provided upon request.

    enhancement 
    opened by dustinsgoodman 6
  • chore: update adblocker and use pre-built engine from CDN

    chore: update adblocker and use pre-built engine from CDN

    Closes https://github.com/microlinkhq/browserless/pull/133

    • Fix fetch abstraction on top of 'got'
    • Update 'got' to latest to get 'text', 'json', and 'buffer' helpers
    • Make use of prebuilt engine from CDN whenever possible
    opened by remusao 6
  • build: only use compatible rules

    build: only use compatible rules

    After using https://github.com/StevenBlack/hosts on production scenarios, I noted it makes the execution slow.

    I'm not sure if this is happening based on the number of rules or because I'm doing an adaptation from the original rule definition.

    opened by Kikobeats 5
  • update adblocker + make use of puppeteer helpers

    update adblocker + make use of puppeteer helpers

    Hi there,

    We've published a new release of @cliqz/adblocker. Among some small bug fixes, it contains a new blocker helper to ease the use of the library in the context of Puppeteer projects. I took the liberty of updating browserless with this change; let me know if this is acceptable or if you'd like me to make extra modifications. Alternatively, feel free to cherry-pick the commit; as I was not sure what the best way to make the PR was.

    Here are a few improvements this PR would bring:

    • Enabling adblocking for Puppeteer takes only one line now and there is not need to deal with requests explicitly (the use of tldts for parsing has also be internalized): await engine.enableBlockingInPage(page);
    • More ads are now blocked (you get the same full-blown experience as with a WebExtension in the browser). This is the results of a few extra capabilities added in the context of Puppeteer:
      • requests can be redirected to data URLs (e.g.: google analytics would not be blocked, but instead a fake response would be injected so that the site does not break, but there is no tracking and no performance cost since the real request did not happen)
      • full cosmetic filtering is applied; which means that more ads will be blocked (in fact some of them might now be hidden whereas it was not possible before with only network filtering). Check Google ads for example (well in fact you should not see them anymore...). This feature should also reduce the likelihood of breakage as well as defuse "paywalls" (where a site asks you to disable the adblocker to proceed = no more!).
    • I changed postinstall.js from goto package so that the full adblocker is created and dumped on disk in its binary form at install time. Initializing PuppeteerBlocker from this binary blob is extremely fast (i.e.: at least 3 orders of magnitude faster than parsing the lists from scratch, we're talking about less than 10ms on cold start). Since postinstall.js is done once, but goto(...) can be called lots of times, this means that warm-up time will be drastically reduced when using goto.
    • Removed this particular list 'https://raw.githubusercontent.com/uBlockOrigin/uAssets/master/recipes/recipes_en.txt' which is not supported. There should be no visible difference in terms of blocking.

    Caveat: the debug logs for adblocker have been removed as there is currently no way to know which requests where blocked (everything happens internally in PuppeteerBlocker). If that is a desired capability, we could easily add a hook/callback to get some statistics about blocked requests.

    Best, Rémi

    opened by remusao 5
  • feat: page numbers

    feat: page numbers

    Why

    Puppeteer (as every HTML to PDF tool out there) still lacks support for generated content. This usually comes as a surprise for most people that require some "advanced" features like creating table of contents (TOC) or being able to refer content by it's page number.

    Alternative to puppeteer is wkhtmltopdf which does support TOC generation. However it fails to supply a HTML template API, so it's non customisable. Also does not support page number references within a pdf document. On top of that, the browser it uses is dated, which makes it difficult to user modern charting libraries and other advanced css and javascript features.

    I needed those features for some client work, so ended up implementing it and port to this library.

    How

    It's simple to use:

    const browserless = require('browserless')
    
    ;(async () => {
      const url = 'https://example.com'
      const buffer = await browserless.pdf(url, { page_numbers: true })
      console.log(`PDF generated!`)
    })()
    

    On the HTML part use elements <span class="pageNumber"> or <span class="pageNumber" rel="someElementId">. The resulting PDF will have both elements replaced by current page number or page number that corresponds to the referred someElementId.

    Full HTML Example

    <!DOCTYPE html>
    <html lang="en">
        <head>
            <meta charset="utf-8">
            <meta name="viewport" content="width=device-width, initial-scale=1, maximum-scale=1, user-scalable=no">
            <!-- <link rel="shortcut icon" href="" type="image/x-icon"> -->
            <title>Example</title>
    
            <style>
             section {page-break-after: always;}
            </style>        
        </head>
        <body>
            <section id="toc">
                <h1>Table of Contents:</h1>
                <ul>
                    <li>Section 1 -- Page <span class="pageNumber" rel="section1">X</span></li>
                    <li>Section 2 -- Page <a href="#section2" class="pageNumber" rel="section2">Y</a> with navigation</li>
                    <li>Section 3 -- Page <span class="pageNumber" rel="section3"></span></li>
                    <li>Section 4 -- Page <span class="pageNumber" rel="section4">Z</span></li>                
                </ul>
            </section>
    
            <section id="section1">
                <h1>Section 1</h1>
                <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</p>
                <p>This should be page: <span class="pageNumber"></span></p>
            </section>
    
            <section id="section2">
                <h1>Section 2</h1>
                <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</p>
                <p>This should be page: <span class="pageNumber"></span></p>
            </section>
    
            <section id="section3">
                <h1>Section 3</h1>
                <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</p>
                <p>This should be page: <span class="pageNumber"></span></p>
                <p>And <b>Section 4</b> should be on page: <span class="pageNumber" rel="section4">PLACEHOLDER</span></p>
            </section>        
    
            <section id="section4">
                <h1>Section 4</h1>
                <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</p>
                <p>This should be page: <span class="pageNumber"></span></p>
                <p>And <b>Section 2</b> should be on page: <span class="pageNumber" rel="section2"></span></p>            
            </section>
            
        </body>
    </html>
    

    Implementation Details

    I added some code to browserless in order to make extension features easier to implement. It could be more polished or allow some kind of dependency injection, making features pluggable.

    Page numbers implementation uses pdf-extract, which have some dependencies that must be previously installed in the OS. OCR support is not required.

    This implementation requires an extra PDF to be generated, so it will make the whole PDF processing and generation slower when using page_numbers option. This might have an impact for more processing intense production environments.

    Thanks!

    opened by josemf 5
  • Cannot find module 'puppeteer'

    Cannot find module 'puppeteer'

    Saw this linked on echo.js, decided to give a couple of the examples a shot.

    Copy/pasted the screenshot example from the docs into a js file, added a package.json, installed and saved browserless, then ran node on the js file. I'm assuming that would be a standard use-case for the lib.

    Here's the error. I will spend some time chasing it down when I get home from work later.

        throw err;                                                                                                                                                                                            
        ^                                                                                                                                                                                                     
                                                                                                                                                                                                              
    Error: Cannot find module 'puppeteer'                                                                                                                                                                     
        at Function.Module._resolveFilename (module.js:557:15)                                                                                                                                                
        at Function.Module._load (module.js:484:25)                                                                                                                                                           
        at Module.require (module.js:606:17)                                                                                                                                                                  
        at require (internal/module.js:11:18)                                                                                                                                                                 
        at Object.<anonymous> (/home/mike/dev/test/node_modules/browserless/index.js:6:19)                                                                                                                    
        at Module._compile (module.js:662:30)                                                                                                                                                                 
        at Object.Module._extensions..js (module.js:673:10)                                                                                                                                                   
        at Module.load (module.js:575:32)                                                                                                                                                                     
        at tryModuleLoad (module.js:515:12)                                                                                                                                                                   
        at Function.Module._load (module.js:507:3) ```
    question 
    opened by maximumdata 5
  • build(deps-dev): bump p-all from 3.0.0 to 4.0.0

    build(deps-dev): bump p-all from 3.0.0 to 4.0.0

    Bumps p-all from 3.0.0 to 4.0.0.

    Release notes

    Sourced from p-all's releases.

    v4.0.0

    Breaking

    • Require Node.js 12.20 d2abd1e
    • This package is now pure ESM. Please read this.

    Improvements

    • Improve TypeScript types by using variadic tuple instead of overloads (#9) ea9c277
      • This means the strongly-typed return type is no longer limited to 10 elements.

    https://github.com/sindresorhus/p-all/compare/v3.0.0...v4.0.0

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies javascript 
    opened by dependabot[bot] 4
  • build(deps): bump meow from 9.0.0 to 10.1.3

    build(deps): bump meow from 9.0.0 to 10.1.3

    Bumps meow from 9.0.0 to 10.1.3.

    Release notes

    Sourced from meow's releases.

    v10.1.3

    • Fix return type for .showHelp() (#213) db55316

    https://github.com/sindresorhus/meow/compare/v10.1.2...v10.1.3

    v10.1.2

    • Fix engines field (#203) 1368ae0

    https://github.com/sindresorhus/meow/compare/v10.1.1...v10.1.2

    v10.1.1

    • Fix failure with isMultiple when isRequired function returns false (#194) e1f0e24

    https://github.com/sindresorhus/meow/compare/v10.1.0...v10.1.1

    v10.1.0

    • Upgrade dependencies 829aab0
    • Allow default property of Flag types to accept arrays (#190) ae73466

    https://github.com/sindresorhus/meow/compare/v10.0.1...v10.1.0

    v10.0.1

    • Upgrade dependencies (#185) a0daf20

    https://github.com/sindresorhus/meow/compare/v10.0.0...v10.0.1

    v10.0.0

    Breaking

    • Require Node.js 12 (#181) 05320ac
    • This package is now pure ESM. Please read this.
    • You must now pass in the importMeta option so meow can find your package.json:
     const cli = meow(…, {
    +	importMeta: import.meta
     });
    

    Previously, meow used some tricks to infer the location of your package.json, but this no longer works in ESM.

    https://github.com/sindresorhus/meow/compare/v9.0.0...v10.0.0

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies javascript 
    opened by dependabot[bot] 4
  • Add types

    Add types

    Prerequisites

    • [x] I'm using the last version.
    • [x] My node version is the same as declared as package.json.

    Subject of the issue

    When you import the browserless on a typescript based project, the error is given that it has no types, i also don't see them in the repo or in @types.

    Steps to reproduce

    Just create a new typescript project and import the browserless.

    import createBrowserless from 'browserless';
    

    Tell us how to reproduce this issue.

    Expected behaviour

    Browserless should have types, so that it can be used easily in typescript and help the user with intellisense.

    Actual behaviour

    It has no inbuilt types or no info on them being installed separately.

    enhancement 
    opened by spa5k 4
  • build(deps): bump @cliqz/adblocker-puppeteer from 1.4.24 to 1.5.0

    build(deps): bump @cliqz/adblocker-puppeteer from 1.4.24 to 1.5.0

    Bumps @cliqz/adblocker-puppeteer from 1.4.24 to 1.5.0.

    Release notes

    Sourced from @cliqz/adblocker-puppeteer's releases.

    v1.5.0

    :nail_care: Polish

    • adblocker
      • #414 Implement retry mechanism while fetching resources (@remusao)
    • adblocker-webextension
      • #413 webextension: handler for runtime messages now returns a promise (@remusao)

    :house: Internal

    • adblocker-benchmarks, adblocker-circumvention, adblocker-content, adblocker-electron-example, adblocker-electron, adblocker-puppeteer-example, adblocker-puppeteer, adblocker-webextension-cosmetics, adblocker-webextension-example, adblocker-webextension, adblocker

    Committers: 1

    Changelog

    Sourced from @cliqz/adblocker-puppeteer's changelog.

    v1.5.0 (2020-01-16)

    :nail_care: Polish

    • adblocker
      • #414 Implement retry mechanism while fetching resources (@remusao)
    • adblocker-webextension
      • #413 webextension: handler for runtime messages now returns a promise (@remusao)

    :house: Internal

    • adblocker-benchmarks, adblocker-circumvention, adblocker-content, adblocker-electron-example, adblocker-electron, adblocker-puppeteer-example, adblocker-puppeteer, adblocker-webextension-cosmetics, adblocker-webextension-example, adblocker-webextension, adblocker

    Committers: 1

    v1.4.20 (2020-01-15)

    :house: Internal

    • #412 Migrate local GitHub actions to TypeScript (@remusao)

    Committers: 1

    v1.4.19 (2020-01-15)

    :house: Internal

    Committers: 1

    v1.4.12 (2020-01-15)

    :house: Internal

    Committers: 1

    v1.4.2 (2020-01-15)

    :memo: Documentation

    :house: Internal

    • #407 Add GitHub actions for releasing on GitHub (@remusao)
    ... (truncated)
    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language
    • @dependabot badge me will comment on this PR with code to add a "Dependabot enabled" badge to your readme

    Additionally, you can set the following in your Dependabot dashboard:

    • Update frequency (including time of day and day of week)
    • Pull request limits (per update run and/or open at any time)
    • Out-of-range updates (receive only lockfile updates, if desired)
    • Security updates (receive only security updates, if desired)
    dependencies 
    opened by dependabot-preview[bot] 4
  • Add `@browserless/screencast`

    Add `@browserless/screencast`

    It could be really cool to do something similar to page.screenshot, but oriented for video content.

    Since this feature is not fully implemented as part of puppeteer API, we can ship a standalone package under the browserless umbrella.

    The main difference between screenshot vs. screencast it how to specify the actions to be done as part of the video content (like scroll, click, etc.).

    The browserless way could be something like:

    const { createScreencast } = require('@browserless/screencast')
    
    /* let's assume you have `page` as precondition */
    const screencast = await createScreencast(page, { path: '/my/video/path.mp4' })
    
    /* actions that will be recorded */
    await screencast.goto('http://example.org')
    await screencast.scrollTo({ selector: '#footer', duration: 1000 }) // fancy smooth animation
    await screencast.click('a')
    await screencast.waitForTimeout(3000)
    
    /* serialize actions into video */
    await screencast.stop()
    

    Another approximation could be specifying the actions as configuration file:

    const screencast = require('@browserless/screencast')
    /* let's assume you have `page` as precondition */
    await screencast(page, {
      path: '/my/video/path.mp4',
      actions: [
        ['goto', 'http://example.org'],
        ['scrollTo', { selector: '#footer', duration: 1000 }],
        ['click', 'a'],
        ['waitForTimeout', '3000']
      ]
    })
    

    Related

    • https://github.com/puppeteer/puppeteer/issues/478

    Inspiration

    • https://github.com/prasanaworld/puppeteer-screen-recorder ⭐️⭐️ – pretty near to the goal.
    • https://github.com/qawolf/playwright-video ⭐ – A solution created in the era playwright doesn't have an official API.
    • https://playwright.dev/docs/videos ⭐ – The Playwright official API.
    • https://github.com/browserless/chrome/blob/master/src/apis/screencast.ts ⭐ – A solution using canvas for recording.
    • https://github.com/Flam3rboy/puppeteer-stream ⭐ – An implementation using MediaRecorder API.
    • https://github.com/clipisode/puppeteer-recorder ⭐️ – frame-to-frame solution using ffmpeg.
    • https://github.com/muralikg/puppetcam
    • https://github.com/tungs/timesnap
    • https://github.com/anishkny/webgif
    • https://gist.github.com/muralikg/23cfed0b099b3df812bb2b27ba1be6a4
    • https://github.com/transitive-bullshit/puppeteer-lottie
    • https://github.com/tungs/timecut
    • https://developer.chrome.com/docs/devtools/recorder/#open
    enhancement 
    opened by Kikobeats 0
  • [screenshot] mobile overlay

    [screenshot] mobile overlay

    Similar to

    • https://github.com/sindresorhus/capture-website/pull/27/files?short_path=f1d7f01#diff-f1d7f01715e29ea2a7cbaf4f2f8117cc

    Related

    • https://github.com/microlinkhq/browserless/tree/master/packages/screenshot/media
    • https://browserframe.com/
    opened by Kikobeats 0
Releases(v9.8.0)
Owner
microlink.io
Browser as API
microlink.io
Grupprojekt för kurserna 'Javascript med Ramverk' och 'Agil Utveckling'

JavaScript-med-Ramverk-Laboration-3 Grupprojektet för kurserna Javascript med Ramverk och Agil Utveckling. Utvecklingsguide För information om hur utv

Svante Jonsson IT-Högskolan 3 May 18, 2022
Hemsida för personer i Sverige som kan och vill erbjuda boende till människor på flykt

Getting Started with Create React App This project was bootstrapped with Create React App. Available Scripts In the project directory, you can run: np

null 4 May 3, 2022
Kurs-repo för kursen Webbserver och Databaser

Webbserver och databaser This repository is meant for CME students to access exercises and codealongs that happen throughout the course. I hope you wi

null 14 Jan 3, 2023
Prototype of real-time comments and a proposal of how to make it "production-ready".

Real-time comments prototype Simple demonstration of real-time commenting. Installation After forking it, run npm install, then you need two environme

Tiger Abrodi 3 Jan 16, 2022
A highly opinionated and complete starter for Next.js projects ready to production

The aim for this starter is to give you a starting point with everything ready to work and launch to production. Web Vitals with 100% by default. Folder structure ready. Tooling ready. SEO ready. SSR ready.

Fukuro Studio 28 Nov 27, 2022
A production-ready ECPay AIO SDK for Node.js

node-ecpay-aio A production-ready 綠界全方位金流(ECPay All-In-One, AIO) SDK for Node.js with TypeScript Support Documentation 本模組詳細使用說明請見 User Guide Overview

simen 21 Nov 1, 2022
MUI Core is a collection of React UI libraries for shipping new features faster. Start with Material UI, our fully-loaded component library, or bring your own design system to our production-ready components.

MUI Core MUI Core contains foundational React UI component libraries for shipping new features faster. Material UI is a comprehensive library of compo

MUI 83.6k Dec 30, 2022
A secure MERN Stack boilerplate ready for Production that uses Docker & Nginx.

A production ready & secure boilerplate for the MERN Stack that uses Docker & Nginx. Focus on the product and not the setup. You can directly start wo

Karan Jagtiani 34 Dec 23, 2022
Automagically bypass hcaptcha challenges with http api, with puppeteer, selenium, playwright browser automation scripts to bypass hCaptcha programmatically

Automagically bypass hcaptcha challenges with http api, with puppeteer, selenium, playwright browser automation scripts to bypass hCaptcha programmatically. For help you can message on discord server with the bellow link. You can also create an issue.

Shimul 199 Jan 2, 2023
Manage Voximplant Platform `applications`, `rules` and `scenarios` from your own environment

VOXENGINE-CI Manage Voximplant Platform applications, rules, and scenarios from your own environment using @voximplant/apiclient-nodejs under the hood

Voximplant 21 May 6, 2022
The proposal of this repository is having a scaffold with some scenarios where you can challenge your front-end knowledge.

Frontend Kata / Interview ?? Hello developer! The proposal of this repository is having a scaffold with some scenarios where you can challenge your fr

Adrián Ferrera González 2 Nov 11, 2022
Framework agnostic CLI tool for routes parsing and generation of a type-safe helper for safe route usage. 🗺️ Remix driver included. 🤟

About routes-gen is a framework agnostic CLI tool for routes parsing and generation of a type-safe helper for safe route usage. Think of it as Prisma,

Stratulat Alexandru 192 Jan 2, 2023
Portuguese version of the Cassandra driver javascript node.js workshop

Versão em Português do workshop Cassandra driver javascript node.js Olá e bem-vindo! Este é o repositório complementar para a apresentação prática dos

DataStax Developers 2 Mar 17, 2022
🚀 Macaca Playwright driver

macaca-playwright Playwright is a framework for Web Testing and Automation. It allows testing Chromium, Firefox and WebKit with a single API. Macaca P

Macaca 13 Nov 8, 2022
A Fetch API-compatible PlanetScale database driver

PlanetScale Serverless Driver for JavaScript A Fetch API-compatible PlanetScale database driver for serverless and edge compute platforms that require

PlanetScale 255 Dec 27, 2022
A lightweight, performant, and simple-to-use wrapper component to stick section headers to the top when scrolling brings them to top

A lightweight, performant, and simple-to-use wrapper component to stick section headers to the top when scrolling brings them to top

Mayank 7 Jun 27, 2022
A peculiar little website that uses Eleventy + Netlify + Puppeteer to create generative poster designs

Garden — Generative Jamstack Posters "Garden" is an experiment in building creative, joyful online experiences using core web technologies. ?? Buildin

George Francis 13 Jun 13, 2022
AWS CDK stack for taking website screenshots (powered by Puppeteer)

CDK Screenshot (powered by Puppeteer) Made possible by the excellent Puppeteer. Install export AWS_PROFILE=myprofile export AWS_DEFAULT_REGION=us-east

Alexei Boronine 6 Oct 23, 2022