Bundling puppeteer with its extra via Pkg to standalone executable

Bundling puppeteer with its extra via Pkg to standalone executable
Photo by Richy Great / Unsplash

Somehow, you may encounter the situation requires you to build a standalone executable using pkg or nexe. As I debugging both, I decided to use Pkg as I feel nexe is exposing our source code more easily. However, the packaging puppeteer with its extras such as puppeteer-extra-plugin-stealth isn't easy. In this post, I am letting you know how to patch your package with puppeteer-extra to be compiled easily with Pkg.

Use local chromium binary instead

The first stuff you need to consider is downloading chromium binary on-demand. There are two main reason: a) reducing the package delivery size, and b) respecting cross-platform support. I wrote a simple code to download a chromium binary to local environment:

const localChromiumDirectory = path.join(
	process.cwd(),
	'.local-chromium',
);

// https://github.com/puppeteer/puppeteer/releases/tag/v19.2.0
const localChromiumRevision = '1056772';

if (!existsSync(localChromiumDirectory)) {
	mkdirSync(localChromiumDirectory);
}

const fetcher = puppeteer.createBrowserFetcher({
	path: localChromiumDirectory,
});

const available = fetcher.localRevisions();

if (!available.includes(localChromiumRevision)) {
	await fetcher.download(
		localChromiumRevision,
		(progress, total) => {
			console.log(`DOWNLOAD: ${progress}/${total}`);
		},
	);
}

const revision = fetcher.revisionInfo(localChromiumRevision);

const platform = addExtra(puppeteer);
const evasion = puppeteerStealth();

platform.use(evasion);

const browser = await platform.launch({
	executablePath: revision.executablePath,
});

The point of the code above is I pinned the version of the chromium binary to download. I could check the version of the chromium binary that my local puppeteer package using via its release note. The chromium version is updated when second entry of version schema updates.

Patching dependencies to avoid using lazy-cache or dynamic require statements

This is super important when you consider ES Module on your project. Because standalone binary compilers such as pkg and nexe work through by changing the entrypoint of Node.JS executable. In such kind of environment, you might catch-up that ESM won't work as expected. It's not a kind of executing file. Therefore, you need to change the schema of your module system into CJS. In this case, you need to bundle, and this fact leads you need to care about the way of dependency being required. Bundlers in JavaScript world do only know static information and do not find dynamic require statements.

The use of lazy-cache package will lead you to panic. lazy-cache is not bad, but breaks the road to our goal as it induces developers to use dynamic requires. As I know, there are two packages using lazy-cache when using puppeteer-extra: a) clone-deep@0.2.4 b) shallow-clone@0.1.2.
I am going to use pnpm for patching package in here, or you can apply following patches I made. The first one is for clone-deep and the second one is for shallow-clone. Please, respect the full version I specified or patch the version which major version is zero.

diff --git a/package.json b/package.json
index 481d58e5a009ea7d63687957d45f0d01edb27500..f14a84c03351582de5835a542b7314b03e005c13 100644
--- a/package.json
+++ b/package.json
@@ -24,7 +24,6 @@
     "for-own": "^0.1.3",
     "is-plain-object": "^2.0.1",
     "kind-of": "^3.0.2",
-    "lazy-cache": "^1.0.3",
     "shallow-clone": "^0.1.2"
   },
   "devDependencies": {
diff --git a/utils.js b/utils.js
index d2a7570d3585c42b9c88fb925e21dafafa1431e6..83b360072b385124ffec03c077c40f1aafd145ee 100644
--- a/utils.js
+++ b/utils.js
@@ -1,21 +1,12 @@
 'use strict';
 
-/**
- * Lazily required module dependencies
- */
-
-var utils = require('lazy-cache')(require);
-var fn = require;
-
-require = utils;
-require('is-plain-object', 'isObject');
-require('shallow-clone', 'clone');
-require('kind-of', 'typeOf');
-require('for-own');
-require = fn;
-
 /**
  * Expose `utils`
  */
 
-module.exports = utils;
+module.exports = {
+isObject: require('is-plain-object'),
+clone: require('shallow-clone'),
+typeOf: require('kind-of'),
+forOwn: require('for-own'),
+};
diff --git a/package.json b/package.json
index a088fd98158d5922eb66b57138dd6d87cb10ea4e..58542571f5a2b0bf9b3328db4c4a137cf5b53c0f 100644
--- a/package.json
+++ b/package.json
@@ -23,7 +23,6 @@
   "dependencies": {
     "is-extendable": "^0.1.1",
     "kind-of": "^2.0.1",
-    "lazy-cache": "^0.2.3",
     "mixin-object": "^2.0.1"
   },
   "devDependencies": {
diff --git a/utils.js b/utils.js
index f6fb96765085a4566690fba625872f44368967a1..f4ca138d6d088f4c05a17c4bfd6eb40d645f515c 100644
--- a/utils.js
+++ b/utils.js
@@ -1,10 +1,7 @@
 'use strict';
 
-var utils = require('lazy-cache')(require);
-var fn = require;
-require = utils;
-require('is-extendable', 'isObject');
-require('mixin-object', 'mixin');
-require('kind-of', 'typeOf');
-require = fn;
-module.exports = utils;
+module.exports = {
+  isObject: require('is-extendable'),
+  mixin: require('mixin-object'),
+  typeOf: require('kind-of')
+};

Place two files under patches directory naming file using the following schema: package-name@version.patch. Finally, apply to the package.json to let pnpm know we patched.

{
  "private": true,
  "pnpm": {
    "patchedDependencies": {
      "clone-deep@0.2.4": "patches/clone-deep@0.2.4.patch",
      "shallow-clone@0.1.2": "patches/shallow-clone@0.1.2.patch"
    }
  }
}

Do patch reinstalling node_modules or using pnpm update command.

Using esbuild to bundle

Most bundlers in stable release won't complain but I'll use esbuild here. Make sure you're compling to CJS.

esbuild --bundle --platform=node --outfile=./out/index.cjs --define:process.env.NODE_ENV="production" --format=cjs <entry>

Adding missing dependencies and covering dynamic scripts

Still we need more steps. First is including scripts all files that pkg might not find and skip as puppeteer-extra-plugin-stealth and extra families dynamically loads scripts. The additionals such as user-preferences and user-data-dir are dependencies I found that some of stealth evasion scripts require. Install them too. Also, save the following somewhere as json or package.json. For me, I named the file delivery-config.json.

{
  "pkg": {
    "scripts": [
      "node_modules/puppeteer/lib/*.js",
      "node_modules/puppeteer-extra-plugin-stealth/**/*.js",
      "node_modules/puppeteer-extra-plugin-user-preferences/**/*.js",
      "node_modules/puppeteer-extra-plugin-user-data-dir/**/*.js"
    ],
    "outputPath": "dist"
  }
}

Respecting stringifying strategy of puppeteer

The waiting — last step is here. Appending --public as the compatibility layer between browser context and Node.JS context that puppeteer implemented work correctly. This is especially for evaluate function for puppeteer Pages. By converting to Node.JS compile cache, the way evaluate function sends context to browser end breaks, and leads to crash. Don't worry about the code exposure. There is no JavaScript we can deliver safe. Just adding --minify flag to esbuild will be enough.

pkg ./out/index.cjs --public -t node16-win-x64 -c ./delivery-config.json

Congrats! You now have a setup to deliver your application containing puppeteer every major platform.

Subscribe to Typed.sh

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe