A web scraping / data mining script for extracting beginner-friendly github repos from Y Combinator's company database

Overview

ycombinator_githubs

A web scraping / data mining script for extracting beginner-friendly github repos from Y Combinator's company database: https://www.ycombinator.com/companies/

Watch this YouTube video for a detailed explanation, and more!

Watch the video

This scraper is designed with Microverse students and graduates in mind. It extracts the most recent and active Y Combinator github repos, written in JavaScript and Ruby, which are the two languages Micronauts use the most!

The resulting list as of July 2022 for YC batches S21, W21, S22 and W22, is in 05_final_repos.csv. Be sure to check it out to find beginner-friendly repos written in JS and Ruby.

Installation

  1. Clone the repository.
  2. $ cd ycombinator_githubs
  3. $ npm install

Usage

  1. Generate your github Personal Access Token as detailed in .env.sample
  2. Change the name of .env.sample to simply .env
  3. $ ./scrape.sh

For details on how the scraping works, or what the .csv files contain, read scrape.sh

You might also like...

Dump of a 5 day small scale data scraping project on the io game "arras.io"

Arras DB A bot built to scan all arras.io game servers (besides sandbox) in search of bulk server and player data. This bot was run 5 days straight, a

Jul 19, 2022

🧾 My personal CLI app to manage my invoices via Web Scraping.

ALM Invoices CLI My personal CLI (Command Line Interface) app to manage my invoices via Web Scraping. WIP (Work in Progress): For now only the list co

Nov 29, 2022

A simple library that I use for web scraping. Uses htmlparser2 to parse dom.

Docpa A simple library that I use for web scraping. Uses htmlparser2 to parse dom. Usage const Docpa = require("docpa") const doc = new Docpa(`html

Jan 21, 2022

Script to synchronize between a Notion database and Google Calendar both ways. Uses Google App Script.

Yet Another Two Way Notion-Google Calendar Sync Script A script to sync events between Google calendar and a Notion database. Features! Google App Scr

Jan 7, 2023

Basic setting plugin for beginner BDSX users / bdsx

Basic setting plugin for beginner BDSX users / bdsx

sos9533scr Basic Setting Plugin for BDSX. [ Features and usage ] [ 기능 및 사용법 ] [ Note ] 1.8.5 UPDATE Fix & Upgrade device ban / Fix tpa output / Fix lo

Dec 25, 2022

This repo is dedicated to making minimal repos of existing defi primatives.

Defi Minimal This repo is dedicated to making minimal repos of existing defi primatives. WARNING: None of the contracts are audited! Completed (but un

Jan 7, 2023

A utility for cloning all your repos, including issues, discussions, stargazers and more!

github-takeout A utility for cloning all your repos, including issues, discussions, stargazers and more! The tool gives you the ability to download a

Oct 26, 2022

Script to fetch all NFT owners using moralis API. This script output is a data.txt file containing all owner addresses of a given NFT and their balances.

Script to fetch all NFT owners using moralis API. This script output is a data.txt file containing all owner addresses of a given NFT and their balances.

🔎 Moralis NFT Snapshot Moralis NFT API will only return 500 itens at a time when its is called. For that reason, a simple logic is needed to fetch al

Jun 23, 2022
Comments
  • got error after executing the scrape.sh file.

    got error after executing the scrape.sh file.

    After running the scrape.sh everything works fine until we reach jsonize.sh. I got this error ./jsonize.sh: 3: Syntax error: redirection unexpected I didn't get this error when I manually executed the same command,

    cat 05_final_repos.csv | \
        ./jsonize.sh | \
        tac | \
        sed '2s/,//' | \
        tac > 06_repos.json
    

    The OS I used is ubuntu on wsl.

    opened by shadmanhere 2
  • Non-YCombinator repositories filtered

    Non-YCombinator repositories filtered

    Repositories like github.com/github and /axios are present in the final list and need to be manually removed. This happens because some company sites call libraries hosted by github. Can't think of any other way to filter out than manually atm.

    opened by voscarmv 1
  • Updates list of repos and adds topics in the final json file.

    Updates list of repos and adds topics in the final json file.

    Hi @voscarmv, I have updated the companies list and made changes to get topics in the final JSON file. This change will enable us to add filters by the framework in the issue-finder app.

    Please let me know if you have any other ideas to achieve the same or want something changed.

    opened by shadmanhere 0
Owner
Oscar Mier
Aspiring Full-Stack Web Developer. BSc in Electronic Engineering. Currently developing RoR/ES6/React apps. 5 years experience w/VisualBasic.NET and MySQL.
Oscar Mier
Want to *contribute* to *open source*? Participate in **HacktoberFest 2022** *Beginner friendly* *First Timer*

# HacktoberFest 2022 (No Longer Participating in hacktoberfest) THIS REPO IS NO LONGER CONSIDERED IN HACKTOBERFEST. YOU CAN STILL USE THIS REPOSITORY

Jitender Singh Chhapola 5 Nov 12, 2022
A beginner friendly hacktoberfest2022 repo made lately to accept valid open source contribution.

Hacktoberfest2022 A hacktoberfest2022 repo made lately to accept valid open source contribution. What is Hacktoberfest? Hacktoberfest is digitalocean’

One Teacher One 5 Oct 20, 2022
The repos includes script for uploading bulk files in a directory to ipfs using nft.storage

Uploading Foloder to IPFS using nft.storage This repository includes script for uploading bulk files in a directory to ipfs using nft.storage Acknowle

Dapp Composer 22 Dec 17, 2022
Grupprojekt för kurserna 'Javascript med Ramverk' och 'Agil Utveckling'

JavaScript-med-Ramverk-Laboration-3 Grupprojektet för kurserna Javascript med Ramverk och Agil Utveckling. Utvecklingsguide För information om hur utv

Svante Jonsson IT-Högskolan 3 May 18, 2022
Hemsida för personer i Sverige som kan och vill erbjuda boende till människor på flykt

Getting Started with Create React App This project was bootstrapped with Create React App. Available Scripts In the project directory, you can run: np

null 4 May 3, 2022
Kurs-repo för kursen Webbserver och Databaser

Webbserver och databaser This repository is meant for CME students to access exercises and codealongs that happen throughout the course. I hope you wi

null 14 Jan 3, 2023
Save your favorite GitHub Repos/Profiles Live

Save your favorite GitHub Repos/Profiles Live Getting Started by Fork and clone this repository or simply git clone https://github.com/ttran293/useful

Thanh Nam Tran 3 May 2, 2022
🦀 A browser extension to explore rust cargo dependencies on GitHub repos

cratehub On every GitHub repository or folder with a Cargo.toml file, scroll to the bottom of the page to see a list of its npm dependencies and their

一块木头 23 Aug 10, 2022
Sell access to your GitHub repos using Gumroad.

GitCash Sell access to your GitHub repos using Gumroad. Documentation Visit the GitCash documentation for detailed documentation on how to set up your

Ronald BlĂĽthl 7 Nov 1, 2022
Scraping data dari 21cineplex untuk keperluan hobby dan riset, tidak ada hal yang berbahaya atau dapat merugikan website sumber.

21cineplex-api Scraping data dari 21cineplex untuk keperluan hobby dan riset, tidak ada hal yang berbahaya atau dapat merugikan website sumber. Softwa

Vava Heirro 8 Jun 25, 2022