a web crawler that crawls website links & fetches SEO Data

Last update: Dec 15, 2022

Overview

Overview 📝

It is a module that crawls sites and extracts basic information on any web page of interest to site owners in general, and SEO specialists in particular, which enables them to use that information in analyzing the efficiency and performance of their site in search engines.

Features 🥁

Crawling: The module can crawl the entire internal links of any site, and extract SEO information from site pages
Fetching: Extracting SEO information from a single web page.

Installation 📦

npm install spido --save or yarn add spido

Usage ⌨️

To make spido suitable for all uses, it will be developed repeatedly for use in API projects or used independently within the CLI.

API 📡

Spido can be used as a Node.js module, which can return the SEO information in JSON format.

fetch: Fetches the SEO information from a single web page.

const spido = require('spido');
const url = 'https://www.google.com';
spido.fetch(url, (err, data) => {
  if (err) {
    console.log(err);
  } else {
    return (data);
  }
});

crawl: Crawls the entire internal links of any site, and extract SEO information from site pages.

const spido = require('spido');
const url = 'https://www.google.com';
spido.crawl(url, (err, data) => {
  if (err) {
    console.log(err);
  } else {
    return (data);
  }
});

CLI 💻

spido can be used as a command line tool, which can return the SEO information and print it on the console.

fetch: spido fetch <url>

$ spido fetch https://www.example.com

crawl: spido crawl <url>

$ spido crawl https://www.example.com

Bug Fixes 🐛

v1.1.3:

Fixed: spido crawl command line tool.
Fixed: handle errors when defining the crawler options.

TODO 🛠

Extract the information and save it to a JSON file
Limit the number of links that can be crawled in a website
The ability to use as a docker image

Links 🔗

npm
yarn

This tool fetches my BrewFather inventory and publishes it at the link below.

🍺 BrewFather Inventory 🚀 How to get your own BrewFather public inventory list Fork this repo Setup GitHub Pages Navigate to Settings Pages Select

Feb 24, 2022

Leader board application made with JavaScript that fetches an API to keep a record of users' scores.

Leaderboard This is a leaderboard that allows the user to created boards and input scores to them usin Rest API. Also, it saves new boards into local

Mar 23, 2022

🤖 An action that fetches the list of malicious domains on Discord in different providers and creates/updates a JSON file with them from time to time.

Discord Guardian Action 🤖 This action fetches the list of malicious domains on Discord in different providers and creates/updates a JSON file with t

Nov 30, 2022

Collection of SEO utilities like sitemap, robots.txt, etc. for a Remix Application

Remix SEO Collection of SEO utilities like sitemap, robots.txt, etc. for a Remix Features Generate Sitemap Generate Robots.txt Installation To use it,

Dec 21, 2022

A Cloudflare Workers service that fetches and renders Notion pages as HTML, Markdown, or JSON.

notion-fetch A Cloudflare Workers service that fetches and renders Notion pages as HTML, Markdown, or JSON. Powered by Durable Objects and R2. Usage P

Jan 6, 2023

High performance and SEO friendly lazy loader for images (responsive and normal), iframes and more, that detects any visibility changes triggered through user interaction, CSS or JavaScript without configuration.

lazysizes lazysizes is a fast (jank-free), SEO-friendly and self-initializing lazyloader for images (including responsive images picture/srcset), ifra

Jan 1, 2023

Build Schema.org graphs for JavaScript Runtimes (Browser, Node, etc). Improve your sites SEO with quick and easy Rich Results.

schema-org-graph-js The quickest and easiest way to build Schema.org graphs for JavaScript Runtimes (Browser, Node, etc). Status: 🔨 In Development Pl

Dec 21, 2022

A simple slack app / bot starter that fetches answers to questions using Langchain, OpenAI and a Pincone vectorstore

Slack-GPT Table of Contents Introduction Prerequisites Creating and installing the application Configuration Starting the app Next Steps Sample Implem

Jul 30, 2023

InstantClick makes following links in your website instant.

InstantClick All the informations you need to use InstantClick are on the link above. This ReadMe’s purpose is about how to use and contribute to a de

Jan 5, 2023

Comments

import package from main lib/core class
importing package in projects not working in v1.3, fixing the issue manually can be done by using import from the main lib/core crawler class.

Ex:

import {Spido} from '/home/yazan/dev/sos-spido/src/lib/core/crawler'
bug fixed
opened by yazan-zoghbi 0
xml sitemap generator

a very useful feature for websites owner/developer will work on lately, enabling this option will generate an XML sitemap file that contains all links your website has.

currently, I've thought that it will be implemented in CLI for now, maybe in the near future, I can develop it to cover more cases.

Notice: a new dependant will be added to Spido that is related to this feature, it's xml-formatter, as its name suggests, it will format generated XML sitemap automatically.
feature

opened by yazan-zoghbi 0

Releases(v1.1.3)

v1.1.3(Apr 22, 2022)
Version 1.1.3

Improvements

[ADD]: CLI unit testing added!

Fixed bugs

[FIX]: spido command-line tool.

[FIX]: handle errors when defining the crawler options.

Source code(tar.gz)
Source code(zip)
v1.1.2(Apr 19, 2022)

Source code(tar.gz)
Source code(zip)
v1.1.1(Apr 19, 2022)

Source code(tar.gz)
Source code(zip)
v1.1.0(Apr 19, 2022)

Source code(tar.gz)
Source code(zip)