Bundling puppeteer with its extra via Pkg to standalone executable
Somehow, you may encounter the situation requires you to build a standalone executable using pkg or nexe. As I debugging both, I decided to use Pkg as I feel nexe is exposing our source code more easily. However, the packaging puppeteer with its extras such as puppeteer-extra-plugin-stealth
isn't easy. In this post, I am letting you know how to patch your package with puppeteer-extra to be compiled easily with Pkg.
Use local chromium binary instead
The first stuff you need to consider is downloading chromium binary on-demand. There are two main reason: a) reducing the package delivery size, and b) respecting cross-platform support. I wrote a simple code to download a chromium binary to local environment:
const localChromiumDirectory = path.join(
process.cwd(),
'.local-chromium',
);
// https://github.com/puppeteer/puppeteer/releases/tag/v19.2.0
const localChromiumRevision = '1056772';
if (!existsSync(localChromiumDirectory)) {
mkdirSync(localChromiumDirectory);
}
const fetcher = puppeteer.createBrowserFetcher({
path: localChromiumDirectory,
});
const available = fetcher.localRevisions();
if (!available.includes(localChromiumRevision)) {
await fetcher.download(
localChromiumRevision,
(progress, total) => {
console.log(`DOWNLOAD: ${progress}/${total}`);
},
);
}
const revision = fetcher.revisionInfo(localChromiumRevision);
const platform = addExtra(puppeteer);
const evasion = puppeteerStealth();
platform.use(evasion);
const browser = await platform.launch({
executablePath: revision.executablePath,
});
The point of the code above is I pinned the version of the chromium binary to download. I could check the version of the chromium binary that my local puppeteer package using via its release note. The chromium version is updated when second entry of version schema updates.
Patching dependencies to avoid using lazy-cache
or dynamic require statements
This is super important when you consider ES Module on your project. Because standalone binary compilers such as pkg and nexe work through by changing the entrypoint of Node.JS executable. In such kind of environment, you might catch-up that ESM won't work as expected. It's not a kind of executing file. Therefore, you need to change the schema of your module system into CJS. In this case, you need to bundle, and this fact leads you need to care about the way of dependency being required. Bundlers in JavaScript world do only know static information and do not find dynamic require statements.
The use of lazy-cache
package will lead you to panic. lazy-cache
is not bad, but breaks the road to our goal as it induces developers to use dynamic requires. As I know, there are two packages using lazy-cache
when using puppeteer-extra: a) clone-deep@0.2.4
b) shallow-clone@0.1.2
.
I am going to use pnpm
for patching package in here, or you can apply following patches I made. The first one is for clone-deep
and the second one is for shallow-clone
. Please, respect the full version I specified or patch the version which major version is zero.
diff --git a/package.json b/package.json
index 481d58e5a009ea7d63687957d45f0d01edb27500..f14a84c03351582de5835a542b7314b03e005c13 100644
--- a/package.json
+++ b/package.json
@@ -24,7 +24,6 @@
"for-own": "^0.1.3",
"is-plain-object": "^2.0.1",
"kind-of": "^3.0.2",
- "lazy-cache": "^1.0.3",
"shallow-clone": "^0.1.2"
},
"devDependencies": {
diff --git a/utils.js b/utils.js
index d2a7570d3585c42b9c88fb925e21dafafa1431e6..83b360072b385124ffec03c077c40f1aafd145ee 100644
--- a/utils.js
+++ b/utils.js
@@ -1,21 +1,12 @@
'use strict';
-/**
- * Lazily required module dependencies
- */
-
-var utils = require('lazy-cache')(require);
-var fn = require;
-
-require = utils;
-require('is-plain-object', 'isObject');
-require('shallow-clone', 'clone');
-require('kind-of', 'typeOf');
-require('for-own');
-require = fn;
-
/**
* Expose `utils`
*/
-module.exports = utils;
+module.exports = {
+isObject: require('is-plain-object'),
+clone: require('shallow-clone'),
+typeOf: require('kind-of'),
+forOwn: require('for-own'),
+};
diff --git a/package.json b/package.json
index a088fd98158d5922eb66b57138dd6d87cb10ea4e..58542571f5a2b0bf9b3328db4c4a137cf5b53c0f 100644
--- a/package.json
+++ b/package.json
@@ -23,7 +23,6 @@
"dependencies": {
"is-extendable": "^0.1.1",
"kind-of": "^2.0.1",
- "lazy-cache": "^0.2.3",
"mixin-object": "^2.0.1"
},
"devDependencies": {
diff --git a/utils.js b/utils.js
index f6fb96765085a4566690fba625872f44368967a1..f4ca138d6d088f4c05a17c4bfd6eb40d645f515c 100644
--- a/utils.js
+++ b/utils.js
@@ -1,10 +1,7 @@
'use strict';
-var utils = require('lazy-cache')(require);
-var fn = require;
-require = utils;
-require('is-extendable', 'isObject');
-require('mixin-object', 'mixin');
-require('kind-of', 'typeOf');
-require = fn;
-module.exports = utils;
+module.exports = {
+ isObject: require('is-extendable'),
+ mixin: require('mixin-object'),
+ typeOf: require('kind-of')
+};
Place two files under patches
directory naming file using the following schema: package-name@version.patch
. Finally, apply to the package.json
to let pnpm know we patched.
{
"private": true,
"pnpm": {
"patchedDependencies": {
"clone-deep@0.2.4": "patches/clone-deep@0.2.4.patch",
"shallow-clone@0.1.2": "patches/shallow-clone@0.1.2.patch"
}
}
}
Do patch reinstalling node_modules or using pnpm update
command.
Using esbuild to bundle
Most bundlers in stable release won't complain but I'll use esbuild here. Make sure you're compling to CJS.
esbuild --bundle --platform=node --outfile=./out/index.cjs --define:process.env.NODE_ENV="production" --format=cjs <entry>
Adding missing dependencies and covering dynamic scripts
Still we need more steps. First is including scripts all files that pkg might not find and skip as puppeteer-extra-plugin-stealth
and extra families dynamically loads scripts. The additionals such as user-preferences
and user-data-dir
are dependencies I found that some of stealth
evasion scripts require. Install them too. Also, save the following somewhere as json or package.json
. For me, I named the file delivery-config.json
.
{
"pkg": {
"scripts": [
"node_modules/puppeteer/lib/*.js",
"node_modules/puppeteer-extra-plugin-stealth/**/*.js",
"node_modules/puppeteer-extra-plugin-user-preferences/**/*.js",
"node_modules/puppeteer-extra-plugin-user-data-dir/**/*.js"
],
"outputPath": "dist"
}
}
Respecting stringifying strategy of puppeteer
The waiting — last step is here. Appending --public
as the compatibility layer between browser context and Node.JS context that puppeteer implemented work correctly. This is especially for evaluate
function for puppeteer Pages. By converting to Node.JS compile cache, the way evaluate function sends context to browser end breaks, and leads to crash. Don't worry about the code exposure. There is no JavaScript we can deliver safe. Just adding --minify
flag to esbuild will be enough.
pkg ./out/index.cjs --public -t node16-win-x64 -c ./delivery-config.json
Congrats! You now have a setup to deliver your application containing puppeteer every major platform.