Ali Abdaal Search Engine
This is a personalized search engine for my favorite YouTubers, Ali Abdaal. I used selenium to scrape all his videos, youtube-dl to download them as audio files, and Google Speech Recognition to transcribe the audio files.
I then took all the data and used it to populate a Postgres database hosted on supabase, then built a frontend with React where users can look up phrases and find out how many times and in which videos were they said.
Technologies Used
- Python (all scripting)
- Selenium (web scraping)
- SpeechRecognition (Google Speech Recognition)
- youtube-dl (downloading videos)
- React & Chakra UI (Frontend)
- Firebase (Hosting & Analytics)
- Supabase (Postgres DataBase)
Usage
- Run scraping_vid_info.py to get a JSON file with all the video names and URLs of a channel
- Run download_yt_vids to download a WAV audio file from each video URL and save it locally
- Run transcribe_audio.py to transcribe all the audio files and save them in a JSON file
- Create a database and port the JSON file over there. Then connect it to the frontend and voila!
Progress is being tracked with GitHub Issues and a Kanban board in the Projects tab of this repo.
Motivation
This project was inspired by Kalle Hallden's Joe Rogan project. The idea behind it is to have a search engine for my favorite YouTuber so that I can lookup certain phrases / words and find videos where he mentions them.
This could also be applicable to students to use for downloading their professor's lectures and creating a searchable database from it to quickly lookup where certain concepts were mentioned. Another application is for conferences to take all talks, makes them transcribable, and search it. I plan to develop this into a boilerplate anyone can use to create their own search engines starting from video, audio, or text files.