A crawler that crawls the site's internal links, fetching information of interest to any SEO specialist to perform appropriate analysis on the site.

Yazan Zoghbi

Last update: Apr 22, 2022

Related tags

Learning resource spido

Overview

Overview 📝

It is a module that crawls sites and extracts basic information on any web page of interest to site owners in general, and SEO specialists in particular, which enables them to use that information in analyzing the efficiency and performance of their site in search engines.

Features 🥁

Crawling: The module can crawl the entire internal links of any site, and extract SEO information from site pages
Fetching: Extracting SEO information from a single web page.

Installation 📦

npm install spido --save or yarn add spido

Usage ⌨️

To make spido suitable for all uses, it will be developed repeatedly for use in API projects or used independently within the CLI.

API 📡

spido can be used as a Node.js module, which can return the SEO information in JSON format.

fetch: Fetches the SEO information from a single web page.

const spido = require('spido');
const url = 'https://www.google.com';
spido.fetch(url, (err, data) => {
  if (err) {
    console.log(err);
  } else {
    return (data);
  }
});

crawl: Crawls the entire internal links of any site, and extract SEO information from site pages.

const spido = require('spido');
const url = 'https://www.google.com';
spido.crawl(url, (err, data) => {
  if (err) {
    console.log(err);
  } else {
    return (data);
  }
});

CLI 💻

spido can be used as a command line tool, which can return the SEO information and print it on the console.

fetch: spido -u <url> -f

$ spido -u https://www.example.com -f

crawl: spido -u <url> -c

$ spido -u https://www.example.com -c

Bug Fixes 🐛

Nothing yet

TODO 🛠

Extract the information and save it to a JSON file
Limit the number of links that can be crawled in a website
Fully embedding with CLI
The ability to use as a docker image

Links 🔗

npm
yarn

⚡ Archive of all Zotero Translators co-created by participants of the Information Analysis course in 2018 to date.

awesome-translators 1. awesome-translators 维护小组 1.1 Translators 更新流程 1.2 Zotero 安装流程 1.3 Zotero 进阶资料 2. Translators 2.1 Translators 总览表 2.2 Translator

Dec 30, 2022

With this File Manager prepared for PHP/Js, you can perform all file operations on your server without any problems.

FileManager With this File Manager prepared for PHP/Js, you can perform all file operations on your server without any problems. Instead of downloadin

Sep 23, 2022

A crawler that extracts data from a dynamic webpage. Written in node js.

🕸️ Gumo "Gumo" (蜘蛛) is Japanese for "spider". Overview 👓 A web-crawler (get it?) and scraper that extracts data from a family of nested dynamic webp

Sep 13, 2022

Crawler Crypto using NodeJS for performance with Elasticsearch DB for high efficiency.

Coin crawler - Coingecko version Crawler using NodeJS for performance with Elasticsearch DB for high efficiency. Requirements For development, you wil

Jan 20, 2022

An automated crawler that extracts products and categories from Digikala!

Digikala-Crawler The purpose of this project is to conduct research and we are not responsible for any misuse of this project. This is not a very clea

Feb 16, 2022

The most advanced responsive front-end framework in the world. Quickly create prototypes and production code for sites that work on any kind of device.

Install | Documentation | Releases | Contributing Foundation is the most advanced responsive front-end framework in the world. Quickly go from prototy

Jan 4, 2023

Collection of SEO utilities like sitemap, robots.txt, etc. for a Remix Application

Remix SEO Collection of SEO utilities like sitemap, robots.txt, etc. for a Remix Features Generate Sitemap Generate Robots.txt Installation To use it,

Dec 21, 2022

Minimal, SEO-focused website starter kit powered by Notion, GitHub, and Vercel.

wr8 wr8 lets you create a website in Notion with better SEO. It is a customized version of nextjs-notion-starter-kit, based on NotionX. Introduction T

Dec 22, 2022

The Main Purpose The main purpose of creating an anaonline information system, as an effort responsive to the management of the data of the Members of the Persis Youth based on information technology systems

landing-page-pp landing-page-pp.vercel.app #The Main Purpose The main purpose of creating an anaonline information system, as an effort responsive to

Oct 21, 2022

Comments

Dev
Version 1.1.0

Features Added

cli commands integration

new argument -u for url

new argument -f to fetch

new argument -c to crawl

Fixed bugs

crawl internal links bug

fetch bug
opened by yazan-zoghbi 0

Releases(v1.1.2)

v1.1.2(Apr 17, 2022)
Version 1.1.2

Features Added

define new options for the control crawling process, like sitemap and internal links

Improvements

add new functions to get the base URL, path, hostname

add a jest framework for the testing process

Fixed bugs

fix bug for detecting internal links as absolute URL and relative URL

fix the bug for removing duplicated links from the queue

Source code(tar.gz)
Source code(zip)
v1.1.1(Apr 8, 2022)
Version 1.1.1

Fixed bugs

duplicated files deleted.

Source code(tar.gz)
Source code(zip)
v1.1.0(Apr 8, 2022)
Version 1.1.0

Features Added

cli commands integration

new argument -u for url

new argument -f to fetch

new argument -c to crawl

Fixed bugs

crawl internal links bug

fetch bug

Source code(tar.gz)
Source code(zip)