Skip to content
June 23, 2026
  • Linkedin
  • Twitter
  • Facebook
  • Youtube

Daily CyberSecurity

Zero-hour alerts. Unmatched analysis.

Primary Menu
  • Home
  • CVE Watchtower
  • Cyber Criminals
  • Data Leak
  • Linux
  • Malware
  • Vulnerability
  • Submit Press Release
  • Vulnerability Report
Light/Dark Button
  • Home
  • Technique
  • Web Scraping with JavaScript: A Beginner’s Guide
  • Technique

Web Scraping with JavaScript: A Beginner’s Guide

Do Son December 20, 2021 6 minutes read
tech-programming

Developers are using web scrapers for various types of data fetching. Let us learn how to do web scrape using JavaScript.

NodeJS, the runtime for JavaScript, has made the language one of the most popular and extensively used globally. For both online and mobile applications, JavaScript now offers the necessary tools. This article will demonstrate how NodeJS’s robust ecosystem enables you to scrape the web to effectively satisfy most of your needs.

What is Web Scraping?

Scraping a website’s content and data using bots is web scraping. Web scraping, unlike screen scraping, collects the HTML code and data contained in a database from a website. As a result, the website’s content can be copied to another location. Web scraping is utilized in various digital enterprises that depend on data collection.

Many companies are using web scraping and big data to revolutionize business intelligence. Web scraping is commonly used for the following purposes.

Search engines use bots to crawl a website, assess its content, and rank it in the results. For example, bots automatically retrieve prices and product descriptions for affiliated vendor websites on price comparison sites. Also, market research firms use web scrapers to collect information from forums and social media.

PHP and Python are commonly used for web scraping, but now you can also use web scraping with JavaScript.

Prerequisites

Here are the things you will require before starting web scraping with JavaScript (NodeJS).

  • Web browser
  • Web page (From which you will extract data)
  • Code editors
  • js
  • Axios
  • Cheerio
  • Puppeteer

Installation Process

Now that we know about the application, let’s get started with the installation process.

NodeJS

Node.js makes it easy to automate the time-consuming collection of data from websites. Follow these steps to get it installed on your computer: get the software and the installation instructions. In addition to Node.js, npm (the Node Package Manager) will also be downloaded and installed as part of the installation process.

Node.js comes with npm as its default package manager. The consumption of packages will be quick and straightforward, with npm’s support you will be employing packages to facilitate web scraping. Run the installation command (npm init) from inside your project’s root directory to produce a package.json file containing all the project’s information.

Axios

Promise-based HTTP client Axios is available in both Node.js and the browser. If you want to make HTTP requests from Node.js using promises, this npm package will help you. The Axios platform can also handle numerous concurrent queries and automatically transform data into JSON format.

Run the installation command (npm install Axios) from the command line in your project’s directory folder. Your project directory will be immediately established with a node modules folder, where NPM will install Axios.

Cheerio

Cheerio offers a jQuery-like syntax for altering web page content, making it a fast and lightweight module. Consequently, the selection, editing, and viewing of DOM components on a web page are substantially simplified.

Cheerio is an excellent tool for fast parsing and manipulating the DOM. However, it does not behave like a web browser. For example, no JavaScript is executed, no external resources are loaded, and no CSS style is applied.

You can install it by running the installation command (npm install cheerio) on the command line of your project’s directory folder. Like Axios, npm will install Cheerio in a node modules folder, which will be automatically generated in your project’s directory by default.

Puppeteer

As a Node.js library, Puppeteer may be used to manipulate and retrieve data from a headless Chrome browser.

As HTTP-based tools like Axios may not be able to provide the desired results because of JavaScript-based websites, by using Puppeteer, you can run JavaScript like a browser, scrape dynamic material from websites, and replicate the browser experience.

Open your project’s directory in the terminal, then type the installation command (npm install puppeteer) to install it.

Now that the installation process is done let us jump right into web scraping!

Scraping

Let us learn how to use JavaScript to scrape data from a website.

Allow us to utilize a web browser’s inspector feature to locate the specific HTML components that contain the data we are searching for.

The data for the number of comments is included inside an <a> element, which is initiated from the <span> tag and has the class of comment bubbles. This information will be utilized to choose these items on the page using Cheerio.

The procedures for developing the scraping logic are as follows:

  1. Begin by creating the index.js file, which will contain the programming logic for getting data from the web page.
  2. Then, use the built-in ‘require’ function in Node.js, including the modules that will be used in the project.
  3. Now, perform a GET HTTP call using Axios to the target web page. Take note that when a request is made to a web page, it responds. This Axios response object comprises many components, one of which is data referring to the payload delivered by the server. As a result, when a GET request is performed, we output the HTML-formatted data included in the response.
  1. Next, populate a Cheerio instance with the response data. In this way, a Cheerio object can be built to assist in parsing the HTML from the destination web page and locating the DOM components containing the data we are looking for. Just like we do with jQuery.
  2. The next step will be to utilize Cheerio’s selector syntax to find the items that contain the data we are looking for. Finally, export the data in a text format using the ‘text()’ function.
  3. In the end, log the errors that occur throughout the scraping process.

When the appropriate code is performed using the ‘node index.js’ command, it returns the data you requested from the destination web page.

Code:

const axios = require("axios");

const cheerio = require("cheerio");

axios
.get("your website url")
//**[for example .get("https://en.wikipedia.org/wiki/Web_scraping")]"**
.then((response) => {

const html = response.data;

const $ = cheerio.load(html);

const scrapedata = $("a", ".comment-bubble").text();

console.log(scrapedata);
})

.catch((error) => {
console.log(error);
});

 

Conclusion

That is how you can use JavaScript and Node.js for web scraping. You will be able to extract valuable data from websites and incorporate it into your application using these abilities.

If you are looking to develop anything more complex, the documentation for Axios, Cheerio, and Puppeteer may help you get started fast.

Share this article:

Facebook Post LinkedIn Telegram

Search

Translation

CVE WATCHTOWER
🚨

Receive alerts for vulnerabilities being exploited in the wild.

⚡

Get notified instantly when a Proof of Concept (PoC) exploit is published.

🔍

Access critical info on vulnerabilities even when marked as "RESERVED".

🧠

Insights powered by decades of expertise and global intelligence sources.

🎯

Customize alerts with up to 10 keywords for your specific tech stack.

📊

Export the raw CVE database for SIEM integration and reporting.

Upgrade Package

🔴 Live Critical Threats

  • CVE-2026-54352CVSS 9.6
    ## Summary `POST /api/pwa/process-zip` at `packages/server/src/api/routes/static.ts:24` accepts a builder-uploaded `.zip`, extracts it...
  • CVE-2026-48746CVSS 9.1
    vLLM is an inference and serving engine for large language models (LLMs)....
  • CVE-2026-48170CVSS 9.1
    ## Summary `scim-patch` performs prototype pollution when applying a SCIM PATCH operation...
  • CVE-2026-46495
    ## Summary **Description** A Deserialization of Untrusted Data (CWE-502) issue in OpenDJ's...
  • CVE-2026-56348CVSS 9.1
    n8n before 2.20.0 contains a credential exfiltration vulnerability in the POST /rest/dynamic-node-parameters/options...
  • CVE-2026-46488
    ### Summary An authentication bypass vulnerability exists due to improper trust in...
  • CVE-2026-44203CVSS 9.3
    ### Summary The OAuth 2.0 / OpenID Connect authorization endpoint does not...
  • CVE-2026-44179CVSS 9.9
    ### Summary The excerpt-include macro does not properly escape the title of...
  • CVE-2026-10789CVSS 9.6
    A maliciously crafted webpage, when visited by a user with Autodesk Fusion...
  • CVE-2026-33646CVSS 9.6
    ## Summary Mise processes `.tool-versions` files through the Tera template engine during...
Powered by CVE WATCHTOWER

🚨 Active Exploits in the Wild

  • CVE-2026-20230CVSS 8.6
    A vulnerability in Cisco Unified Communications Manager (Unified CM) and Cisco Unified Communications Manager Session Management Edition (Unified...
  • CVE-2026-4020CVSS 7.5
    The Gravity SMTP plugin for WordPress is vulnerable to Sensitive Information Exposure in all versions up to, and...
  • CVE-2026-10735
    Multiple plugins by ShapedPlugin contain a backdoor in various versions. This makes it possible for unauthenticated attackers to...
  • CVE-2026-20262CVSS 6.5
    A vulnerability in the web UI of Cisco Catalyst SD-WAN Manager, formerly SD-WAN vManage, could allow an authenticated,...
  • CVE-2026-54420CVSS 8.5
    LiteSpeed cPanel plugin before 2.4.8 (as distributed in LiteSpeed WHM PlugIn before 5.3.2.0) mishandles symlinks provided by a...
  • CVE-2026-53435CVSS 8.8
    In Jenkins 2.567 and earlier, LTS 2.555.2 and earlier, it is possible for attackers to have Jenkins deserialize...
  • CVE-2026-10795CVSS 8.1
    The UpdraftPlus: WP Backup & Migration Plugin plugin for WordPress is vulnerable to Authentication Bypass in all versions...
  • CVE-2026-11645
    Out of bounds read and write in V8 in Google Chrome prior to 149.0.7827.103 allowed a remote attacker...
  • CVE-2026-50751CVSS 9.3
    A logic flow weakness in Remote Access and Mobile Access certificate validation in deprecated IKEv1 key exchange allows...
  • CVE-2026-20245CVSS 7.8
    A vulnerability in the CLI of Cisco Catalyst SD-WAN Manager, formerly SD-WAN vManage, could allow an authenticated, local...
Powered by CVE Watchtower

Our Websites
  • Penetration Testing Tools
  • The Daily Information Technology
  • Daily CyberSecurity

    • About SecurityOnline.info
    • Advertise with us
    • Announcement
    • Contact
    • Contributor Register
    • Login
    • About SecurityOnline.info
    • Advertise on SecurityOnline.info
    • Contact Us

    When you purchase through links on our site, we may earn an affiliate commission. Here’s how it works

    • Disclaimer
    • Privacy Policy
    • DMCA NOTICE
    • Linkedin
    • Twitter
    • Facebook
    • Youtube
    © 2017 - 2026 Daily CyberSecurity. All Rights Reserved.