How to Scrape All Tweets from any Public User. Let's say you want to analyze a major brand's Twitter page, your own Tweets, or those of a friends and get back ...
🧐DropFiletoAnalyze
Twitter
ScrapeTweets
UserTweets
👤TwitterUserTweetsScraping
SignUptoScrapeTwitterTweets
DownloadTwitterAPITweetsDatatoExcel&CSVFiles
UserID
IDoftheTwitterUsertofetchTweetsfor.Ifyouonlyknowtheusername,usetheUserDetailsEndpointtogettheUserID.
ThisIDissometimesreferredtoastheauthor_idfromotherTwitterendpoints,suchastheTweetSearchEndpoints.
HowtoDownloadAllTweetsfromaUser
Let’ssayyouwanttoanalyzeamajorbrand’sTwitterpage,yourownTweets,orthoseofafriendsandgetbackthetextandengagementmetricsfromALLTweetsforagivenuser.ThisisextremelyeasytodousingtheOfficialTwitterAPI.ExactlywhichTwitterAPIendpointyouusewilldependonhowmanyTweetstheuserhasTweetedandwhetherornotyouhaveaccesstotheAcademicResearchProductTrack.
ThisarticlewillshowyouthebestwaytoscrapeandgetthedownloadedTweetsintoaCSVorExcelfileforeasyanalysisandsoyoucanuploadthedataintocountless3rdpartytoolsanddatabases.
Under3,200Tweets
IftheuseryouwanttocollecttheTweetsforhas3,200orfewerTweets,thenyoucanusetheUserTweetTimelineEndpointwhichwillreturnuptothe3,200mostrecentTweetswhileusingpagination.
Over3,200Tweets
However,iftheuserisveryactiveonTwitter(Tweetingover3,200Tweets),andyoumustcollectALLtheTweets,thenyouwillneedtousethe“ArchiveSearch”versionoftheSearchTweetsEndpointandaskTwitterforspecialpermissiontouseitforacademicresearch.
Ifyouarenotpartofanacademicinstitution,butarewillingtopayTwitterforaccesstotheirhistoricalarchive,thenyoucanusetheolderPremiumSearchFullArchiveAPIafteryouhavesetupyourTwitteraccounttoworkwiththepremiumarchiveproduct.
CollectingWithStevesieData
TherestofthisarticlewilldiscusshowtousetheStevesieDataPlatformtocollectTweetsfromthethreeendpointslistedabove,whichwillreturntheresultsinasingleCSVfileyoucaneasilyanalyzeanduseinotherprogramswithoutwritinganycode.StevesieDataisapaidplatformandtherestofthisarticleassumesyouhavetheplusplanforrunningworkflows.
Ifyoudonotwanttouseapaidplatform,thenyoucanrefertothelinksaboveanddirectlyaccesstheTwitterAPIatyourowntime,expenseandeffort.Disclaimer:I,theauthorofthisarticle,happentoowntheStevesieDataPlatform.
I-TimlineScraping(Under3,200Tweets)
Ifyourtargetuserhasunder3,200Tweets(oryou’rehappyjustscrapingthemostrecent3,200),thentheeasiestwaytocollectthemisthroughtheUserTimelineTweetsIntegration.
1.GettheTwitterUserID
YouwillneedtoknowtheUserIDinordertocontinue,whichyoucanobtainthroughtheUserDetailsIntegrationbyprovidingtheTwitterUsernameandyou’llgetbacktheID.
Onceyouexecutetheendpoint,you’llgetbacktheUserIDor993999718987509760inthisexample:
2.ScrapetheTweets
OncewehavetheUserID,thenwecanheadtotheUserTimelineTweetsIntegrationandtestouttheendpointforgettingthemostrecentTweets.We’lluseanotherTwitterIDinthisexample(529797130)whichhasalotofTweetssowecandemonstratepaginationandhowtogetalargeamountofTweetsback.
Togetsomeinitialresultsback,justprovidetheUserIDandyourAPItoken,leavingeverythingelseblank:
Intheresponse,you’llseethe100mostrecentTweets(includingretweetsandreplies)withjusttheTweettextasatableofvaluesyoucandownload.It’sonly100becauseTwitterreturnsupto100perpage-we’lldiscusshowtogetupto3,200soon.
ExcludeRetweets&Replies
IfyouwanttoexcludeanyTweetsthatareretweetsorreplies(soyouonlyhavetheoriginalTweetsfromtheauthor),thenyouwanttosettheexcludefieldtoretweets,replies:
GetMoreFieldsBack
Bydefault,theTwitterAPIwillonlyreturntheTweettextinanefforttoembraceminimalism,orsomethingofthatsort.Thisisgreatifyouonlyneedthetext,butifyou’reinterestedinotherdatapointslikelinkstowebsites,geo-taggedlocations,mentions,taggedentities,etc…
TogetmoredatabackabouttheTweetsyou’reinterestedin,you’llwanttopassinExpansions,suchastheonessupportedbythisendpointare:attachments.poll_ids,attachments.media_keys,author_id,entities.mentions.username,geo.place_id,in_reply_to_user_id,referenced_tweets.id,referenced_tweets.id.author_idandtheresultsreturnmoredataback.
Specifically,wecanseethatwhenaTweetmentionsanotheruserusingthe@tag,theexpandedresultsnowincludeareferencetothementioneduserandtheirID,inthiscase5739642,aswellasaseparateincludes>userscollectionwithmoredetailsaboutthosereferencedusers(e.g.theuser’sfullname,orOregonianBusinessinthiscase):
ReferencedFields&IncludedObjects
Nowifwewantevenmoredatabackaboutthementionedusers,wecanprovideprovidevaluesfortheuser.fieldsqueryparameter,totellTwittertoexpandbackmoredataintheincludes>userscollection.E.g.ifweprovidecreated_at,description,entities,id,location,name,pinned_tweet_id,profile_image_url,protected,public_metrics,url,username,verified,withheldthenwe’llseethesefieldsreturnedintheincludes>userscollection:
Youcanfollowthesestepsforother“attached”or“referenced”objectstotweetssuchasmedia,places,pollsandotherTweets.Justseethe“Fields”sectionoftheinputsandaddwhateveryouneedforyourusecase.
AdditionalTweetFields
Attheminimum,you’llprobablywanttosettheTweetFieldssoyougetmoredatabackabouttheTweetsyou’recollecting,suchasthetimestampwhichisnotreturnedbydefault.Justenterinalistoffieldslikeattachments,author_id,context_annotations,conversation_id,created_at,entities,geo,id,in_reply_to_user_id,lang,public_metrics,possibly_sensitive,referenced_tweets,reply_settings,source,text,withheldintothetweet.fieldsinputlikethis:
Andyouroutputdatacollectionwillnowcontaintheadditionalfieldsyourequested:
3.UseWorkflowsforPagination
Onceyou’rehappywiththetypeofdatayou’regettingback,you’lllikelywanttogetmorethan100resultsbackperrequest.Todothis,weneedtouse“pagination”andpassthemeta.next_tokenvalueinthecollectionfromaresponseintothepagination_tokeninputofthenextresponsetogetthenext100resultsandsoon.
Fortunately,wedon’thavetodoallofthismanuallyandcanusea“workflow”versionofthisendpointtoautomaticallypaginateandcombinetheresultsforusintobulkCSVfiles.
3.1ImporttheWorkflowFormula
CheckouttheTwitterUserTimelineTweets-Paginationworkflowformulapageandclick“Import”toaddtheworkflowtoyouraccount.
3.2SpecifyWorkflowInputs
Entertheinputslikeyoudidwiththeendpointversion.YoucanenterinasingleUserIDormultipleIDs(oneperline).OnlyprovideoneBearerToken(justoneline).
Youmayalsowanttospecifyotheroptionalparameterslikeexcludeandsetsomeexpansionsorfieldstogetmoreresultsback:
3.3ExecutetheWorkflow
Onceyou’vereviewedeverything,youcanoptionallysetanamefortheworkflow(tohelpreferenceitlater),andthenruntheworkflow.Typicallyyouwanttojustleaveallthedefaultsaloneunderthe“Execute”section:
3.4DownloadResults
Onceyourworkflowfinishesrunning,you’llseetheresultsinthegreensectionbelow(whichwillcombinetheresponsesfromallpagestogetherforyou):
You’llnoticethoughthatweonlycollected850Tweets,andNOT3,200.Thisisbecauseweexcludedretweets&replies,andtheaccounthasalotofretweetsandreplies,whichateintothe3,200limitthatTwitterprovidesus.Soifthishappenstoyou,thenyou’llwanttorefertothe2otheroptionsbelowsoyoucansearchusingtheTwitterArchivesearch.
3.5AddExtractorsifNeeded
Ifyoufindyourselfwantingmoredataaboutreferencedobjects(likementionedusers)inothercollectionssuchasincludes>usersthatwesawearlier,wecancapturethesefromtheworkflow.
First,gobacktotheUserTimelineEndpointandexecutetheendpointwiththefieldsandexpansionsyouwanttocaptureintheworkflowversion(e.g.usetheexampleprovidedearliertocapturementionedusers).
Next,findthecollectionsyouwanttocaptureandselect“NewExtractor…”fromthedropdown,thenfollowthenextscreenandjustusethedefaultstocreateanewextractor:
Lastly,gobacktoyourworkflowandontherightsideyou’llseealistofextractors.Usethedropdownandselectyournewextractortocapturethisdatainyournextworkflowrun:
II-ArchiveSearchScraping(Over3,200Tweets+Academic)
▶️
Ifyourtargetuserhasover3,200TweetsandyouhaveaccesstotheTwitterAPIAcademicTrack,thenwe’llneedtousetheSearchTweetsIntegration,whichwillletussearchallTweetsfrom2006onward.
1.ScrapetheTweets
Let’sstartwithaquicktesttogetmoreTweetsbackfrom@612BrewReviewperourpreviousexample(sincewewerecappedbythe3,200limit).
WesimplygototheSearchTweetsIntegrationandprovideourqueryasfrom:612BrewReviewtogettheTweetsfromtheaccount:
Thedefaultresultswilllookasfollows,returningonlythetextoftheTweet.Ifyoudonotseeresults,trysettingthe“RecentorAll”inputtoall,asbydefaultthisendpointwillonlyreturnTweetsfromthepast7days:
ExcludeRetweets,Quotes&Replies
Youprobablywanttoexcluderetweets,quotesandreplies(likeinourpreviousexample).Todothis,simplyadd-is:retweet-is:reply-is:quotetoyourquery,soyourfinalquerywillbeasfollows:
from:612BrewReview-is:retweet-is:reply-is:quote
GetMoreFieldsBack
Justlikeinthelastsection,wetalkedabouthowtheTwitterAPIlikestoreturnthebareminimumbackbydefaultintheresponse(e.g.justtheTweettextwithoutanyfurtherinformation).Ifyouneedmoredataback,simplycopyandpastethe“Example”valuesforExpansions&FieldsInputs(seetheprevioussection’sGetMoreFieldsBackforamoredetailedwalkthrough):
Withtheabovefilledin,we’llnowgetALOTofdataback,probablymuchmorethanweneed:
Onceyou’rehappywiththeresults,wenowwanttosetthe“RecentorAll”flagtoallsowegetalltheTweetsbackfromtheentirearchive(thiswillerroroutifyou’renotontheAcademicproducttrack):
2.UseWorkflowsforPagination
Onceyou’rehappywiththetypeofdatayou’regettingback,you’lllikelywanttogetmorethan100resultsbackperrequest.Todothis,weneedtouse“pagination”andpassthenext_tokenvalueinthecollectionfromaresponseintothepagination_tokeninputofthenextresponsetogetthenext100resultsandsoon.
Fortunately,wedon’thavetodoallofthismanuallyandcanusea“workflow”versionofthisendpointtoautomaticallypaginateandcombinetheresultsforusintobulkCSVfiles.
3.1ImporttheWorkflowFormula
CheckouttheTwitterTweets&ArchiveSearch-Paginationworkflowformulapageandclick“Import”toaddtheworkflowtoyouraccount.
3.2SpecifyWorkflowInputs
Entertheinputslikeyoudidwiththeendpointversion.IfyouwanttoscrapetheTweetsformultipleaccounts,justprovidemultiplequeries,oneperlinelikeso(butjustuseonelineforeverythingelse):
from:612BrewReview-is:retweet-is:reply-is:quote
from:stevesiedata-is:retweet-is:reply-is:quote
UnlessyoujustneedtheTweettext,youprobablyalsowanttoaddsomeinputsforthefieldsand/orexpansionsyouneedback,e.g.attheminimumyouprobablywanttosettheTweetFields:
Andlastly,besuretosetthe“RecentorAll”flagsoyougetbackTweetsbeyondtheprevious7days:
3.3ExecutetheWorkflow
Onceyou’vereviewedeverything,youcanoptionallysetanamefortheworkflow(tohelpreferenceitlater),andthenruntheworkflow.Typicallyyouwanttojustleaveallthedefaultsaloneunderthe“Execute”section:
3.4DownloadResults
Onceyourworkflowfinishesrunning,you’llseetheresultsinthegreensectionbelow(whichwillcombinetheresponsesfromallpagestogetherforyou):
You’llprobablyjustwantthe“Tweets.csv”file,butyoucanalsodownloadothercollectionslikeTweetMentions,URLs,etc…whichareprovidedasseparatecollectionssinceoneTweetcancontainmultiplementions,etc…
III-LegacyArchiveScraping(Over3,200Tweets+Premium)
▶️
Atthetimeofthiswriting,TwitteronlyallowsyoutousetheV2Archivesearchwithacademicaccess,howeverweexpectthemtoaddpremiumsupportinthefuture.SopleasebesuretochecktheTwitterAPItoseeiftheyhaveupdatedit,beforeyouperformthefollowingwhichwillbeoutdatedoneday.
1.ScrapetheTweets
Herewecanfollowthestepsfromabove,butwewanttousetheV1versionoftheTweetSearchIntegrationandwe’llenterourqueryjustaswedidbeforeasfrom:612BrewReview(youcantryaddingfilterssuchas-filter:retweets-filter:replies-filter:quotes,buttheymaynotworkdependingonyourlevelofaccess).
Alsobesuretosetupandincludean“EnvironmentLabel”(pleasecheckthedirectionsthere,asthereissomeadditionalsetupyouneedtoinprovidingaccesstoaTwitterDevEnvironment.
You’llalsowanttosetastartandenddate,otherwisetheAPImaybehavestrangely.Yourinputsshouldlooksomethinglikethis:
Whenyougetitworking,you’llseeresultslikethiswherewecanconfirmwewentbackintimetotheaccount’sfirstTweet:
You’llalsoseeothercollectionsincluded,likeURLstoimagespostedwithTweets,mentionsetc…
Nowtogetthenextpageofresults,weneedtolookforanextfieldinthecollectionandpassthisontothePaginationinputtogetthenextpageofresults:
However,thisistediousworktodomanually,sothenextsectionwillwalkthroughhowtousetheworkflowversionoftheendpointtoautomaticallypaginateandgetalltheresultsback.
2.UseWorkflowsforPagination
Touseworkflows,simplyvisittheTwitterTweetSearchFullArchive-Paginationformulaandclick“Import”toaddtheworkflowtoyouraccount.
Thenjustsettheinputslikebefore:
Andgivetheworkflowarun:
Waitfortheworkflowtorun,asitwillquerytheTwitterAPIandcombineallresultstogetherintoCSVfiles.Whenfinished,you’llbeabletodownloadtheresultsinthegreensectionasCSVfiles:
💡Related
👥ScrapeTwitterFollowers
🔊ScrapeTwitterSpaces
📊ScrapeTwitterTweetCounts
👤ScrapeTwitterUsers
⚡️Endpoints
UserFollowing(V2)
/2/users/{{user_id}}/following
Tweets&ArchiveSearch(V2)
/2/tweets/search/{{recent_or_all}}
UserDetailsbyUsername(V2)
/2/users/by/username/{{username}}
UserFollowers(V2)
/2/users/{{user_id}}/followers
TweetRetweeters(V2)
/2/tweets/{{tweet_id}}/retweeted_by
UserDetailsbyID(V2)
/2/users/{{user_id}}
UserTimeline&Mentions(V2)
/2/users/{{user_id}}/{{tweets_or_mentions}}
TweetSearchFullArchive(V1)
/1.1/tweets/search/fullarchive/{{environment_label}}.json
UserDetails(V1)
/1.1/users/lookup.json
UserLikedTweets(V2)
/2/users/{{user_id}}/liked_tweets
TweetCountsTimeline(V2)
/2/tweets/counts/{{recent_or_all}}
UserFollowing(V1)
/1.1/friends/list.json
Retweets(V1)
/1.1/statuses/retweets/{{tweet_id}}.json
TweetLikers(V2)
/2/tweets/{{tweet_id}}/liking_users
ListMembers(V1)
/1.1/lists/members.json
PlaceSearch(V1)
/1.1/geo/search.json
SpacesDetails(V2)
/2/spaces/{{space_id}}
SpacesSearch(V2)
/2/spaces/search
TrendingPlacesSearch(V1)
/1.1/trends/closest.json
TrendingTopics(V1)
/1.1/trends/place.json
TweetDetails(V2)
/2/tweets/{{tweet_id}}
TweetSearch(V1)
/1.1/search/tweets.json
UserFollowers(V1)
/1.1/followers/list.json
UserListMemberships(V1)
/1.1/lists/memberships.json
UserLists(V1)
/1.1/lists/list.json
UserOwnedLists(V1)
/1.1/lists/ownerships.json
UserTweets(V1)
/1.1/statuses/user_timeline.json