a web crawler that crawls website links & fetches SEO Data

Overview

npm Contributor Covenant

Overview 📝

It is a module that crawls sites and extracts basic information on any web page of interest to site owners in general, and SEO specialists in particular, which enables them to use that information in analyzing the efficiency and performance of their site in search engines.

Features 🥁

  • Crawling: The module can crawl the entire internal links of any site, and extract SEO information from site pages
  • Fetching: Extracting SEO information from a single web page.

Installation 📦

npm install spido --save or yarn add spido

Usage ⌨️

To make spido suitable for all uses, it will be developed repeatedly for use in API projects or used independently within the CLI.

API 📡

Spido can be used as a Node.js module, which can return the SEO information in JSON format.

  • fetch: Fetches the SEO information from a single web page.
const spido = require('spido');
const url = 'https://www.google.com';
spido.fetch(url, (err, data) => {
  if (err) {
    console.log(err);
  } else {
    return (data);
  }
});
  • crawl: Crawls the entire internal links of any site, and extract SEO information from site pages.
const spido = require('spido');
const url = 'https://www.google.com';
spido.crawl(url, (err, data) => {
  if (err) {
    console.log(err);
  } else {
    return (data);
  }
});

CLI 💻

spido can be used as a command line tool, which can return the SEO information and print it on the console.

  • fetch: spido fetch <url>
$ spido fetch https://www.example.com
  • crawl: spido crawl <url>
$ spido crawl https://www.example.com

Bug Fixes 🐛

v1.1.3:

  • Fixed: spido crawl command line tool.

  • Fixed: handle errors when defining the crawler options.

TODO 🛠

  • Extract the information and save it to a JSON file
  • Limit the number of links that can be crawled in a website
  • The ability to use as a docker image

Links 🔗

You might also like...

This tool fetches my BrewFather inventory and publishes it at the link below.

🍺 BrewFather Inventory 🚀 How to get your own BrewFather public inventory list Fork this repo Setup GitHub Pages Navigate to Settings Pages Select

Feb 24, 2022

Leader board application made with JavaScript that fetches an API to keep a record of users' scores.

Leader board application made with JavaScript that fetches an API to keep a record of users' scores.

Leaderboard This is a leaderboard that allows the user to created boards and input scores to them usin Rest API. Also, it saves new boards into local

Mar 23, 2022

🤖 An action that fetches the list of malicious domains on Discord in different providers and creates/updates a JSON file with them from time to time.

Discord Guardian Action 🤖  This action fetches the list of malicious domains on Discord in different providers and creates/updates a JSON file with t

Nov 30, 2022

Collection of SEO utilities like sitemap, robots.txt, etc. for a Remix Application

Remix SEO Collection of SEO utilities like sitemap, robots.txt, etc. for a Remix Features Generate Sitemap Generate Robots.txt Installation To use it,

Dec 21, 2022

A Cloudflare Workers service that fetches and renders Notion pages as HTML, Markdown, or JSON.

notion-fetch A Cloudflare Workers service that fetches and renders Notion pages as HTML, Markdown, or JSON. Powered by Durable Objects and R2. Usage P

Jan 6, 2023

High performance and SEO friendly lazy loader for images (responsive and normal), iframes and more, that detects any visibility changes triggered through user interaction, CSS or JavaScript without configuration.

lazysizes lazysizes is a fast (jank-free), SEO-friendly and self-initializing lazyloader for images (including responsive images picture/srcset), ifra

Jan 1, 2023

Build Schema.org graphs for JavaScript Runtimes (Browser, Node, etc). Improve your sites SEO with quick and easy Rich Results.

schema-org-graph-js The quickest and easiest way to build Schema.org graphs for JavaScript Runtimes (Browser, Node, etc). Status: 🔨 In Development Pl

Dec 21, 2022

A simple slack app / bot starter that fetches answers to questions using Langchain, OpenAI and a Pincone vectorstore

Slack-GPT Table of Contents Introduction Prerequisites Creating and installing the application Configuration Starting the app Next Steps Sample Implem

Jul 30, 2023

InstantClick makes following links in your website instant.

InstantClick All the informations you need to use InstantClick are on the link above. This ReadMe’s purpose is about how to use and contribute to a de

Jan 5, 2023
Comments
  • import package from main lib/core class

    import package from main lib/core class

    importing package in projects not working in v1.3, fixing the issue manually can be done by using import from the main lib/core crawler class.

    Ex:

    import {Spido} from '/home/yazan/dev/sos-spido/src/lib/core/crawler'
    
    bug fixed 
    opened by yazan-zoghbi 0
  • xml sitemap generator

    xml sitemap generator

    a very useful feature for websites owner/developer will work on lately, enabling this option will generate an XML sitemap file that contains all links your website has.

    currently, I've thought that it will be implemented in CLI for now, maybe in the near future, I can develop it to cover more cases.

    Notice: a new dependant will be added to Spido that is related to this feature, it's xml-formatter, as its name suggests, it will format generated XML sitemap automatically.

    feature 
    opened by yazan-zoghbi 0
Releases(v1.1.3)
Owner
Syrian Open Source
An open source platform that contains everything that was launched by Syrian developers and everything we think is interesting to publish.
Syrian Open Source
A crawler that crawls the site's internal links, fetching information of interest to any SEO specialist to perform appropriate analysis on the site.

Overview ?? It is a module that crawls sites and extracts basic information on any web page of interest to site owners in general, and SEO specialists

Yazan Zoghbi 2 Apr 22, 2022
Google-reviews-crawler - A simple Playwright crawler that stores Google Maps Place/Business reviews to a JSON file.

google-reviews-crawler A simple Playwright crawler that stores Google Maps Place/Business reviews to a JSON file. Usage Clone the repo, install the de

￸A￸l￸e￸x D￸o￸m￸a￸k￸i￸d￸i￸s 6 Oct 26, 2022
A crawler that extracts data from a dynamic webpage. Written in node js.

??️ Gumo "Gumo" (蜘蛛) is Japanese for "spider". Overview ?? A web-crawler (get it?) and scraper that extracts data from a family of nested dynamic webp

Nuthalapai Venkata Krishna Chaitanya 22 Sep 13, 2022
Minimal, SEO-focused website starter kit powered by Notion, GitHub, and Vercel.

wr8 wr8 lets you create a website in Notion with better SEO. It is a customized version of nextjs-notion-starter-kit, based on NotionX. Introduction T

Verfasor 7 Dec 22, 2022
Simply Netflix clone using ReactJS. It fetches the data from TMDB API

NETFLIX CLONE This project is a simply front end clone of Netflix. It was created with React. It uses The MovieDB Api to search for movies and display

null 14 Dec 9, 2022
A Hulu Clone which created with NextJS. It fetches the data from TMDB (The Movie Database) API.

A Hulu Clone which created with NextJS. It fetches the data from TMDB (The Movie Database) API. Click demo to try it by yourself! Hulu Clone Demo Link

Özge Coşkun Gürsucu 10 Aug 17, 2022
Simple modern JavaScript ES6 library that fetches JSON data into an HTML table which displays nicely within a Bootstrap 4 Card.

Simple modern JavaScript ES6 library that fetches JSON data into an HTML table which displays nicely within a Bootstrap 4 Card. Uses simplenotsimpler/modern-table library.

SimpleNotSimpler 6 Feb 17, 2022
Crawler Crypto using NodeJS for performance with Elasticsearch DB for high efficiency.

Coin crawler - Coingecko version Crawler using NodeJS for performance with Elasticsearch DB for high efficiency. Requirements For development, you wil

Minh.N.Pham 1 Jan 20, 2022
An automated crawler that extracts products and categories from Digikala!

Digikala-Crawler The purpose of this project is to conduct research and we are not responsible for any misuse of this project. This is not a very clea

Ali Azizi 5 Feb 16, 2022
Simple web app that fetches posts and comments from jsonplaceholder

Simple web app that fetches posts and comments from jsonplaceholder

Paulo Luan 2 Mar 24, 2022