Twarc comes with a set utilities that will help you explore the data you’ve collected. To begin, first download the utilities folder from github.
Next, install a few python modules required for some of the utilities. See here for details.
Then, select the relevant utilities guide:
Go to the DocNow Twarc Github page: https://github.com/DocNow/twarc
Find and click on the green button towards the right side of the page that says Clone or download:
Download and unzip the compressed folder to your computer: click Download ZIP from the drop-down options. (Use git clone if familiar)
Navigate to the location on your computer where you want to download the Twarc folder and click save. Save to your desktop for easy access.
Open the folder location and unzip the compressed folder to access the contents.
Inside you will find a folder labeled utils. Copy or move this entire folder into the folder where you’ve saved your twitter json files:
The utils folder has a bunch of python (.py) files inside. You don’t need to directly open these files, unless you’re familiar with python and want to explore them further.
For the purposes of this tutorial, all you need to do is make sure the JSONL file of collected tweet data you want to use is in the same folder as the utils folder (but not inside this utils folder, see screenshot above for an example).
pip install the following packages for the associated utility. You’ll only need to do this step once. If you run a twarc utility and recieve an error requires <something> module
, you can follow the pattern below, and enter the command, pip install <something>
, and/or search for the package on https://pypi.org/
Open the Terminal application (MacOS users), or Command Prompt (Windows users), and enter the following, one at a time:
pip install emoji
Installs the emoji package
pip install networkx
Installs the networkx package