Scrape tweets from Twitter search results based on keywords and date range using Playwright. Save scraped tweets in a CSV file for easy analysis

Overview

Tweet Harvest (Twitter Crawler)

Tweet Harvest is a command-line tool that uses Playwright to scrape tweets from Twitter search results based on specified keywords and date range. The scraped tweets are saved in a CSV file.

Note: This script is for educational purposes only. Twitter prohibits unauthenticated users from performing search or advanced search. To use this script, you need to have a valid Twitter account and obtain an Access Token, which can be obtained by logging into Twitter in your browser and extracting the auth_token cookie.

How to Use

To use Tweet Harvest, follow these simple steps:

  1. Install Node.js (LTS) on your computer.
  2. Open your terminal or command prompt.
  3. Type npx tweet-harvest@latest and press Enter.
  4. Follow the prompts to provide the data you want to search for on Twitter, such as keywords, dates, and other parameters.

That’s it! Tweet Harvest will open a Chromium browser instance and navigate to Twitter's search page. It will then enter your search parameters and scrape the resulting tweets. The tweets will be saved in a CSV file in a directory named tweets-data in the current working directory.

Note: You will need a Twitter auth token to use this tool. When prompted, enter your Twitter auth token to authenticate your search.

Comments
  • Timeout error

    Timeout error

    node:internal/process/promises:288
                triggerUncaughtException(err, true /* fromPromise */);
                ^
    
    page.waitForResponse: Timeout 30000ms exceeded while waiting for event "response"
        at C:\Users\dwiya\AppData\Local\npm-cache\_npx\b456e97c96423ae5\node_modules\tweet-harvest\dist\crawl.js:130:54
        at step (C:\Users\dwiya\AppData\Local\npm-cache\_npx\b456e97c96423ae5\node_modules\tweet-harvest\dist\crawl.js:56:23)
        at Object.next (C:\Users\dwiya\AppData\Local\npm-cache\_npx\b456e97c96423ae5\node_modules\tweet-harvest\dist\crawl.js:37:53)
        at step (C:\Users\dwiya\AppData\Local\npm-cache\_npx\b456e97c96423ae5\node_modules\tweet-harvest\dist\crawl.js:41:139)
        at Object.next (C:\Users\dwiya\AppData\Local\npm-cache\_npx\b456e97c96423ae5\node_modules\tweet-harvest\dist\crawl.js:37:53)
        at fulfilled (C:\Users\dwiya\AppData\Local\npm-cache\_npx\b456e97c96423ae5\node_modules\tweet-harvest\dist\crawl.js:28:58) {
      name: 'TimeoutError'
    }
    
    Node.js v18.16.0
    

    pas harvest ditengah jalan, berlaku juga ketika manual npx [email protected] -s "Prabowo Subianto" -f 01-01-2023 -t 06-05-2023

    opened by DWISAx13 3
  • How to scrape the tweets on spesific date

    How to scrape the tweets on spesific date

    Hello Mas Helmi, thanks for your work. This tools helps me to scrape tweets for my thesis. You are like life-saviour because a few days ago snscrape didn't work anymore. So this tools helps me so much.

    But I am wondering, can I get a tweets on spesific date?

    Maybe that's all from me, thanks.

    opened by kokohandoko00 1
  • input[name=

    input[name="allOfTheseWords"] Not Found

    bang ini pakai yang 0.0.20 sama 0.0.29 terus chromium muncul buka web twitter terus nutup lagi, terus muncul kayak gini

    node:internal/process/promises:288 triggerUncaughtException(err, true /* fromPromise */); ^

    page.click: Timeout 30000ms exceeded. =========================== logs =========================== waiting for locator('input[name="allOfTheseWords"]') locator resolved to <input value="" dir="auto" type="text" autocorrect="on"…/> attempting click action waiting for element to be visible, enabled and stable

    at C:\Users\Dibya\AppData\Local\npm-cache\_npx\3c06dd1aee42c42e\node_modules\tweet-harvest\dist\crawl.js:381:47
    at step (C:\Users\Dibya\AppData\Local\npm-cache\_npx\3c06dd1aee42c42e\node_modules\tweet-harvest\dist\crawl.js:56:23)
    at Object.next (C:\Users\Dibya\AppData\Local\npm-cache\_npx\3c06dd1aee42c42e\node_modules\tweet-harvest\dist\crawl.js:37:53)
    at fulfilled (C:\Users\Dibya\AppData\Local\npm-cache\_npx\3c06dd1aee42c42e\node_modules\tweet-harvest\dist\crawl.js:28:58) {
    

    name: 'TimeoutError' }

    Node.js v18.16.0

    opened by helmisatria 1
  • TypeError: Cannot read property 'user_results' of undefined

    TypeError: Cannot read property 'user_results' of undefined

    Error-08-07-2023_15-18-15

    Keterangan error:

    TypeError: Cannot read property 'user_results' of undefined
        at C:\Users\WahyuDP\AppData\Roaming\npm-cache\_npx\10776\node_modules\tweet-harvest\dist\crawl.js:171:87
        at Array.map (<anonymous>)
        at C:\Users\WahyuDP\AppData\Roaming\npm-cache\_npx\10776\node_modules\tweet-harvest\dist\crawl.js:162:58
        at step (C:\Users\WahyuDP\AppData\Roaming\npm-cache\_npx\10776\node_modules\tweet-harvest\dist\crawl.js:56:23)
        at Object.next (C:\Users\WahyuDP\AppData\Roaming\npm-cache\_npx\10776\node_modules\tweet-harvest\dist\crawl.js:37:53)       
        at step (C:\Users\WahyuDP\AppData\Roaming\npm-cache\_npx\10776\node_modules\tweet-harvest\dist\crawl.js:41:139)
        at Object.next (C:\Users\WahyuDP\AppData\Roaming\npm-cache\_npx\10776\node_modules\tweet-harvest\dist\crawl.js:37:53)       
        at fulfilled (C:\Users\WahyuDP\AppData\Roaming\npm-cache\_npx\10776\node_modules\tweet-harvest\dist\crawl.js:28:58)
        at runMicrotasks (<anonymous>)
        at runNextTicks (internal/process/task_queues.js:58:5)
    Twitter Harvest v 2.0.10
    

    Data terakhir yang dapat di scrape: data pada baris pertama gambar di atas: Thu Jun 29 17:36:34 +0000 2023;1674471938570489856;"pengen sih nonton pam di prj tp bosen ah songlist nya itu2 mulu. pengen w komen “lu gak bosen pam bawain lagu2 itu mulu? w aja yg jarang nonton lu bosen” tp ngeri baper bocahnya cuaks";0;0;0;0;in;1250305661037973507;1674471938570489856;curlybr0wn;https://twitter.com/curlybr0wn/status/1674471938570489856

    opened by wdprsto 0
  • No additional tweet, scrolling more...

    No additional tweet, scrolling more...

    I tried to use tweet-harvest@v1 like the information I got through the youtube channel. However, the program only displays the tweets and does not save them into a csv file. Where the program only says "No additional tweets, scrolling more...". Meanwhile, when I use tweet-harvest@latest I get an error where I am asked to install playwright again. Is there a newer version or a version that works well?

    opened by bagasandriann 0
Releases(2.1.0)
Owner
Helmi Satria
Full stack javascript developer who actively shares things about web development and other stuff on socials
Helmi Satria
It uses JavaScript and a web browser (for example, Firefox) to scrape tweets.

Twitter JS Scraper Introduction There are many tools available for collecting tweets. Some of these tools make use of the official Twitter API, which

vahid baghi 16 Nov 25, 2022
A lightweight (~2kB) library to create range sliders that can capture a value or a range of values with one or two drag handles

range-slider-input A lightweight (~2kB) library to create range sliders that can capture a value or a range of values with one or two drag handles. Ex

Utkarsh Verma 42 Dec 24, 2022
Generate release notes from git commit history either commit range or tag range.

Would you like to support me? Release Notes Generate release notes from git commit history either commit range or tag range. App Store Template Change

Numan 6 Oct 8, 2022
Nepali Multi Date Picker for jQuery. Supports both single date selections and multiple date selection.

Nepali Multi Date Picker A simple yet powerful date picker based in Nepali calendar. Supports both single date selections and multiple date selection.

Sanil Shakya 4 May 23, 2022
Search for coding resources by relevant keywords

Search for coding resources by relevant keywords. This API serves educational content for a wide variety of computer science topics, languages and technologies relevant to web development.

null 22 Nov 4, 2022
A JavaScript component that is a date & time range picker, no need to build, no dependencies except Moment.js, that is based on Dan Grossman's bootstrap-daterangepicker.

vanilla-datetimerange-picker Overview. A JavaScript component that is a date & time range picker, no need to build, no dependencies except Moment.js,

null 22 Dec 6, 2022
A Twitter filtered search to only get the live broadcasts hosted on Twitter itself, Built using Vanilla JS and Node.js

Twitter Broadcasts Search A Twitter filtered search to only get the live broadcasts hosted on Twitter itself, Built using Vanilla JS and Node.js. Live

Mohammad Mousad 2 Oct 6, 2022
A self-hosted Thumbnail generator/finder which creates thumbnails based on folder names and google search results.

Thumba A self hosted Thumbnail generator/finder which creates thumbnails based on folder names and google search results. Description This project use

Norbert Takács 20 Dec 15, 2022
LinkOff - Cleans the LinkedIn feed based on keywords and filters

LinkOff - LinkedIn Filter and Customizer ?? LinkOff cleans and customizes Linked

Noah Jelich 120 Dec 19, 2022
Grupprojekt för kurserna 'Javascript med Ramverk' och 'Agil Utveckling'

JavaScript-med-Ramverk-Laboration-3 Grupprojektet för kurserna Javascript med Ramverk och Agil Utveckling. Utvecklingsguide För information om hur utv

Svante Jonsson IT-Högskolan 3 May 18, 2022
Hemsida för personer i Sverige som kan och vill erbjuda boende till människor på flykt

Getting Started with Create React App This project was bootstrapped with Create React App. Available Scripts In the project directory, you can run: np

null 4 May 3, 2022
Kurs-repo för kursen Webbserver och Databaser

Webbserver och databaser This repository is meant for CME students to access exercises and codealongs that happen throughout the course. I hope you wi

null 14 Jan 3, 2023
View maps, graphs, and tables of your save and compete in a casual, evergreen leaderboard of EU4 achievement speed runs. Upload and share your save with the world.

PDX Tools PDX Tools is a modern EU4 save file analyzer that allow users to view maps, graphs, and data tables of their save all within the browser. If

PDX Tools 24 Dec 27, 2022
On this page, you can save and load all the awesome books you have and save the name and the author into the local storage. this project uses Javascript to interact with the pages

Awesome Books: refactor to use JavaScript classes In this project, We add the links to the applications into the final project Getting Started if you

Cesar Valencia 8 Nov 29, 2022
A simple To Do List application that allows users to save, edit, mark completed, and delete their to-dos, and save their list when application is closed. Build with JavaScript.

To Do List A simple To Do List online application that allows users to save, and manipulate their to-dos, and save their list when application is clos

Mahmoud Rizk 10 Dec 20, 2022
A Twitter bot that reads the tweets of a given username and analyzes the user's personality using AI.

Twitter Chatgpt Analysor Create a bot that reads the tweets of a given username and analyzes the user's personality using artificial intelligence.. In

Sabber Soltani 8 May 9, 2023
"Jira Search Helper" is a project to search more detail view and support highlight than original jira search

Jira Search Helper What is Jira Search Helper? "Jira Search Helper" is a project to search more detail view and support highlight than original jira s

null 41 Dec 23, 2022
⚡ Archive of all Zotero Translators co-created by participants of the Information Analysis course in 2018 to date.

awesome-translators 1. awesome-translators 维护小组 1.1 Translators 更新流程 1.2 Zotero 安装流程 1.3 Zotero 进阶资料 2. Translators 2.1 Translators 总览表 2.2 Translator

开智学堂 99 Dec 30, 2022
Google-reviews-crawler - A simple Playwright crawler that stores Google Maps Place/Business reviews to a JSON file.

google-reviews-crawler A simple Playwright crawler that stores Google Maps Place/Business reviews to a JSON file. Usage Clone the repo, install the de

￸A￸l￸e￸x D￸o￸m￸a￸k￸i￸d￸i￸s 6 Oct 26, 2022