A crawler that crawls the site's internal links, fetching information of interest to any SEO specialist to perform appropriate analysis on the site.

Overview

npm Contributor Covenant

Overview 📝

It is a module that crawls sites and extracts basic information on any web page of interest to site owners in general, and SEO specialists in particular, which enables them to use that information in analyzing the efficiency and performance of their site in search engines.

Features 🥁

  • Crawling: The module can crawl the entire internal links of any site, and extract SEO information from site pages
  • Fetching: Extracting SEO information from a single web page.

Installation 📦

npm install spido --save or yarn add spido

Usage ⌨️

To make spido suitable for all uses, it will be developed repeatedly for use in API projects or used independently within the CLI.

API 📡

spido can be used as a Node.js module, which can return the SEO information in JSON format.

  • fetch: Fetches the SEO information from a single web page.
const spido = require('spido');
const url = 'https://www.google.com';
spido.fetch(url, (err, data) => {
  if (err) {
    console.log(err);
  } else {
    return (data);
  }
});
  • crawl: Crawls the entire internal links of any site, and extract SEO information from site pages.
const spido = require('spido');
const url = 'https://www.google.com';
spido.crawl(url, (err, data) => {
  if (err) {
    console.log(err);
  } else {
    return (data);
  }
});

CLI 💻

spido can be used as a command line tool, which can return the SEO information and print it on the console.

  • fetch: spido -u <url> -f
$ spido -u https://www.example.com -f
  • crawl: spido -u <url> -c
$ spido -u https://www.example.com -c

Bug Fixes 🐛

  • Nothing yet

TODO 🛠

  • Extract the information and save it to a JSON file
  • Limit the number of links that can be crawled in a website
  • Fully embedding with CLI
  • The ability to use as a docker image

Links 🔗

You might also like...

⚡ Archive of all Zotero Translators co-created by participants of the Information Analysis course in 2018 to date.

awesome-translators 1. awesome-translators 维护小组 1.1 Translators 更新流程 1.2 Zotero 安装流程 1.3 Zotero 进阶资料 2. Translators 2.1 Translators 总览表 2.2 Translator

Dec 30, 2022

With this File Manager prepared for PHP/Js, you can perform all file operations on your server without any problems.

FileManager With this File Manager prepared for PHP/Js, you can perform all file operations on your server without any problems. Instead of downloadin

Sep 23, 2022

A crawler that extracts data from a dynamic webpage. Written in node js.

🕸️ Gumo "Gumo" (蜘蛛) is Japanese for "spider". Overview 👓 A web-crawler (get it?) and scraper that extracts data from a family of nested dynamic webp

Sep 13, 2022

Crawler Crypto using NodeJS for performance with Elasticsearch DB for high efficiency.

Coin crawler - Coingecko version Crawler using NodeJS for performance with Elasticsearch DB for high efficiency. Requirements For development, you wil

Jan 20, 2022

An automated crawler that extracts products and categories from Digikala!

Digikala-Crawler The purpose of this project is to conduct research and we are not responsible for any misuse of this project. This is not a very clea

Feb 16, 2022

The most advanced responsive front-end framework in the world. Quickly create prototypes and production code for sites that work on any kind of device.

The most advanced responsive front-end framework in the world. Quickly create prototypes and production code for sites that work on any kind of device.

Install | Documentation | Releases | Contributing Foundation is the most advanced responsive front-end framework in the world. Quickly go from prototy

Jan 4, 2023

Collection of SEO utilities like sitemap, robots.txt, etc. for a Remix Application

Remix SEO Collection of SEO utilities like sitemap, robots.txt, etc. for a Remix Features Generate Sitemap Generate Robots.txt Installation To use it,

Dec 21, 2022

Minimal, SEO-focused website starter kit powered by Notion, GitHub, and Vercel.

Minimal, SEO-focused website starter kit powered by Notion, GitHub, and Vercel.

wr8 wr8 lets you create a website in Notion with better SEO. It is a customized version of nextjs-notion-starter-kit, based on NotionX. Introduction T

Dec 22, 2022

The Main Purpose The main purpose of creating an anaonline information system, as an effort responsive to the management of the data of the Members of the Persis Youth based on information technology systems

landing-page-pp landing-page-pp.vercel.app #The Main Purpose The main purpose of creating an anaonline information system, as an effort responsive to

Oct 21, 2022
Comments
  • Dev

    Dev

    Version 1.1.0

    Features Added

    • cli commands integration
    • new argument -u for url
    • new argument -f to fetch
    • new argument -c to crawl

    Fixed bugs

    • crawl internal links bug
    • fetch bug
    opened by yazan-zoghbi 0
Releases(v1.1.2)
  • v1.1.2(Apr 17, 2022)

    Version 1.1.2

    Features Added

    • define new options for the control crawling process, like sitemap and internal links

    Improvements

    • add new functions to get the base URL, path, hostname
    • add a jest framework for the testing process

    Fixed bugs

    • fix bug for detecting internal links as absolute URL and relative URL
    • fix the bug for removing duplicated links from the queue
    Source code(tar.gz)
    Source code(zip)
  • v1.1.1(Apr 8, 2022)

  • v1.1.0(Apr 8, 2022)

    Version 1.1.0

    Features Added

    • cli commands integration
    • new argument -u for url
    • new argument -f to fetch
    • new argument -c to crawl

    Fixed bugs

    • crawl internal links bug
    • fetch bug
    Source code(tar.gz)
    Source code(zip)
Owner
Yazan Zoghbi
Yazan Zoghbi
a web crawler that crawls website links & fetches SEO Data

Overview ?? It is a module that crawls sites and extracts basic information on any web page of interest to site owners in general, and SEO specialists

Syrian Open Source 7 Dec 15, 2022
Google-reviews-crawler - A simple Playwright crawler that stores Google Maps Place/Business reviews to a JSON file.

google-reviews-crawler A simple Playwright crawler that stores Google Maps Place/Business reviews to a JSON file. Usage Clone the repo, install the de

￸A￸l￸e￸x D￸o￸m￸a￸k￸i￸d￸i￸s 6 Oct 26, 2022
Build Schema.org graphs for JavaScript Runtimes (Browser, Node, etc). Improve your sites SEO with quick and easy Rich Results.

schema-org-graph-js The quickest and easiest way to build Schema.org graphs for JavaScript Runtimes (Browser, Node, etc). Status: ?? In Development Pl

Harlan Wilton 17 Dec 21, 2022
Adds external & internal translators to various sites.

Twitter External Translator Adds a "Translate with ..." button to Tweets and User Bios. This was a fork of DeepL Twitter translation Version Link Alte

Magic 6 Oct 17, 2022
This plugin for Obsidian enables you to quickly jump to internal and external links

Obsidian Quick Jump Plugin This plugin for Obsidian enables you to quickly jump to internal and external links. This plugin is inspired by Jump to lin

Tadashi Aikawa 9 Sep 24, 2022
A quickstart AWS Lambda function code generator. Downloads a template function code file, test harness file, sample SAM deffiniation and appropriate file structure.

Welcome to function-stencil ?? A quickstart AWS Lambda function code generator. Downloads a template function code file, test harness file, sample SAM

Ben Smith 21 Jun 20, 2022
Simple weather app written in HTML, CSS, and JavaScript using the OpenWeather API for fetching weather and geolocation information

Description Simple weather app written in HTML, CSS, and JavaScript using the OpenWeather API for fetching weather and geolocation information. Acknow

Gleb Korzan 4 Feb 23, 2022
A cli tool for fetching information about countries.

countryfetch A cli tool for fetching information about countries. It uses https://restcountries.com/ API for backend. Dependencies DENO Installation D

Pridon Tetradze 119 Dec 24, 2022
High performance and SEO friendly lazy loader for images (responsive and normal), iframes and more, that detects any visibility changes triggered through user interaction, CSS or JavaScript without configuration.

lazysizes lazysizes is a fast (jank-free), SEO-friendly and self-initializing lazyloader for images (including responsive images picture/srcset), ifra

Alexander Farkas 16.6k Jan 1, 2023
Windmill: Open-source platform and runtime to turn any scripts into internal apps, integrations and workflows

. Open-source and self-hostable alternative to Airplane, Pipedream, Superblocks and a simplified Temporal with autogenerated UIs to trigger flows and

Windmill Labs, Inc 1.6k Jan 4, 2023