Twitter User Tweets Scraper | Download to Excel & CSV

文章推薦指數: 80 %
投票人數:10人

How to Scrape All Tweets from any Public User. Let's say you want to analyze a major brand's Twitter page, your own Tweets, or those of a friends and get back ... 🧐DropFiletoAnalyze Twitter ScrapeTweets UserTweets 👤TwitterUserTweetsScraping SignUptoScrapeTwitterTweets DownloadTwitterAPITweetsDatatoExcel&CSVFiles UserID IDoftheTwitterUsertofetchTweetsfor.Ifyouonlyknowtheusername,usetheUserDetailsEndpointtogettheUserID. ThisIDissometimesreferredtoastheauthor_idfromotherTwitterendpoints,suchastheTweetSearchEndpoints. HowtoDownloadAllTweetsfromaUser Let’ssayyouwanttoanalyzeamajorbrand’sTwitterpage,yourownTweets,orthoseofafriendsandgetbackthetextandengagementmetricsfromALLTweetsforagivenuser.ThisisextremelyeasytodousingtheOfficialTwitterAPI.ExactlywhichTwitterAPIendpointyouusewilldependonhowmanyTweetstheuserhasTweetedandwhetherornotyouhaveaccesstotheAcademicResearchProductTrack. ThisarticlewillshowyouthebestwaytoscrapeandgetthedownloadedTweetsintoaCSVorExcelfileforeasyanalysisandsoyoucanuploadthedataintocountless3rdpartytoolsanddatabases. Under3,200Tweets IftheuseryouwanttocollecttheTweetsforhas3,200orfewerTweets,thenyoucanusetheUserTweetTimelineEndpointwhichwillreturnuptothe3,200mostrecentTweetswhileusingpagination. Over3,200Tweets However,iftheuserisveryactiveonTwitter(Tweetingover3,200Tweets),andyoumustcollectALLtheTweets,thenyouwillneedtousethe“ArchiveSearch”versionoftheSearchTweetsEndpointandaskTwitterforspecialpermissiontouseitforacademicresearch. Ifyouarenotpartofanacademicinstitution,butarewillingtopayTwitterforaccesstotheirhistoricalarchive,thenyoucanusetheolderPremiumSearchFullArchiveAPIafteryouhavesetupyourTwitteraccounttoworkwiththepremiumarchiveproduct. CollectingWithStevesieData TherestofthisarticlewilldiscusshowtousetheStevesieDataPlatformtocollectTweetsfromthethreeendpointslistedabove,whichwillreturntheresultsinasingleCSVfileyoucaneasilyanalyzeanduseinotherprogramswithoutwritinganycode.StevesieDataisapaidplatformandtherestofthisarticleassumesyouhavetheplusplanforrunningworkflows. Ifyoudonotwanttouseapaidplatform,thenyoucanrefertothelinksaboveanddirectlyaccesstheTwitterAPIatyourowntime,expenseandeffort.Disclaimer:I,theauthorofthisarticle,happentoowntheStevesieDataPlatform. I-TimlineScraping(Under3,200Tweets) Ifyourtargetuserhasunder3,200Tweets(oryou’rehappyjustscrapingthemostrecent3,200),thentheeasiestwaytocollectthemisthroughtheUserTimelineTweetsIntegration. 1.GettheTwitterUserID YouwillneedtoknowtheUserIDinordertocontinue,whichyoucanobtainthroughtheUserDetailsIntegrationbyprovidingtheTwitterUsernameandyou’llgetbacktheID. Onceyouexecutetheendpoint,you’llgetbacktheUserIDor993999718987509760inthisexample: 2.ScrapetheTweets OncewehavetheUserID,thenwecanheadtotheUserTimelineTweetsIntegrationandtestouttheendpointforgettingthemostrecentTweets.We’lluseanotherTwitterIDinthisexample(529797130)whichhasalotofTweetssowecandemonstratepaginationandhowtogetalargeamountofTweetsback. Togetsomeinitialresultsback,justprovidetheUserIDandyourAPItoken,leavingeverythingelseblank: Intheresponse,you’llseethe100mostrecentTweets(includingretweetsandreplies)withjusttheTweettextasatableofvaluesyoucandownload.It’sonly100becauseTwitterreturnsupto100perpage-we’lldiscusshowtogetupto3,200soon. ExcludeRetweets&Replies IfyouwanttoexcludeanyTweetsthatareretweetsorreplies(soyouonlyhavetheoriginalTweetsfromtheauthor),thenyouwanttosettheexcludefieldtoretweets,replies: GetMoreFieldsBack Bydefault,theTwitterAPIwillonlyreturntheTweettextinanefforttoembraceminimalism,orsomethingofthatsort.Thisisgreatifyouonlyneedthetext,butifyou’reinterestedinotherdatapointslikelinkstowebsites,geo-taggedlocations,mentions,taggedentities,etc… TogetmoredatabackabouttheTweetsyou’reinterestedin,you’llwanttopassinExpansions,suchastheonessupportedbythisendpointare:attachments.poll_ids,attachments.media_keys,author_id,entities.mentions.username,geo.place_id,in_reply_to_user_id,referenced_tweets.id,referenced_tweets.id.author_idandtheresultsreturnmoredataback. Specifically,wecanseethatwhenaTweetmentionsanotheruserusingthe@tag,theexpandedresultsnowincludeareferencetothementioneduserandtheirID,inthiscase5739642,aswellasaseparateincludes>userscollectionwithmoredetailsaboutthosereferencedusers(e.g.theuser’sfullname,orOregonianBusinessinthiscase): ReferencedFields&IncludedObjects Nowifwewantevenmoredatabackaboutthementionedusers,wecanprovideprovidevaluesfortheuser.fieldsqueryparameter,totellTwittertoexpandbackmoredataintheincludes>userscollection.E.g.ifweprovidecreated_at,description,entities,id,location,name,pinned_tweet_id,profile_image_url,protected,public_metrics,url,username,verified,withheldthenwe’llseethesefieldsreturnedintheincludes>userscollection: Youcanfollowthesestepsforother“attached”or“referenced”objectstotweetssuchasmedia,places,pollsandotherTweets.Justseethe“Fields”sectionoftheinputsandaddwhateveryouneedforyourusecase. AdditionalTweetFields Attheminimum,you’llprobablywanttosettheTweetFieldssoyougetmoredatabackabouttheTweetsyou’recollecting,suchasthetimestampwhichisnotreturnedbydefault.Justenterinalistoffieldslikeattachments,author_id,context_annotations,conversation_id,created_at,entities,geo,id,in_reply_to_user_id,lang,public_metrics,possibly_sensitive,referenced_tweets,reply_settings,source,text,withheldintothetweet.fieldsinputlikethis: Andyouroutputdatacollectionwillnowcontaintheadditionalfieldsyourequested: 3.UseWorkflowsforPagination Onceyou’rehappywiththetypeofdatayou’regettingback,you’lllikelywanttogetmorethan100resultsbackperrequest.Todothis,weneedtouse“pagination”andpassthemeta.next_tokenvalueinthecollectionfromaresponseintothepagination_tokeninputofthenextresponsetogetthenext100resultsandsoon. Fortunately,wedon’thavetodoallofthismanuallyandcanusea“workflow”versionofthisendpointtoautomaticallypaginateandcombinetheresultsforusintobulkCSVfiles. 3.1ImporttheWorkflowFormula CheckouttheTwitterUserTimelineTweets-Paginationworkflowformulapageandclick“Import”toaddtheworkflowtoyouraccount. 3.2SpecifyWorkflowInputs Entertheinputslikeyoudidwiththeendpointversion.YoucanenterinasingleUserIDormultipleIDs(oneperline).OnlyprovideoneBearerToken(justoneline). Youmayalsowanttospecifyotheroptionalparameterslikeexcludeandsetsomeexpansionsorfieldstogetmoreresultsback: 3.3ExecutetheWorkflow Onceyou’vereviewedeverything,youcanoptionallysetanamefortheworkflow(tohelpreferenceitlater),andthenruntheworkflow.Typicallyyouwanttojustleaveallthedefaultsaloneunderthe“Execute”section: 3.4DownloadResults Onceyourworkflowfinishesrunning,you’llseetheresultsinthegreensectionbelow(whichwillcombinetheresponsesfromallpagestogetherforyou): You’llnoticethoughthatweonlycollected850Tweets,andNOT3,200.Thisisbecauseweexcludedretweets&replies,andtheaccounthasalotofretweetsandreplies,whichateintothe3,200limitthatTwitterprovidesus.Soifthishappenstoyou,thenyou’llwanttorefertothe2otheroptionsbelowsoyoucansearchusingtheTwitterArchivesearch. 3.5AddExtractorsifNeeded Ifyoufindyourselfwantingmoredataaboutreferencedobjects(likementionedusers)inothercollectionssuchasincludes>usersthatwesawearlier,wecancapturethesefromtheworkflow. First,gobacktotheUserTimelineEndpointandexecutetheendpointwiththefieldsandexpansionsyouwanttocaptureintheworkflowversion(e.g.usetheexampleprovidedearliertocapturementionedusers). Next,findthecollectionsyouwanttocaptureandselect“NewExtractor…”fromthedropdown,thenfollowthenextscreenandjustusethedefaultstocreateanewextractor: Lastly,gobacktoyourworkflowandontherightsideyou’llseealistofextractors.Usethedropdownandselectyournewextractortocapturethisdatainyournextworkflowrun: II-ArchiveSearchScraping(Over3,200Tweets+Academic) ▶️ Ifyourtargetuserhasover3,200TweetsandyouhaveaccesstotheTwitterAPIAcademicTrack,thenwe’llneedtousetheSearchTweetsIntegration,whichwillletussearchallTweetsfrom2006onward. 1.ScrapetheTweets Let’sstartwithaquicktesttogetmoreTweetsbackfrom@612BrewReviewperourpreviousexample(sincewewerecappedbythe3,200limit). WesimplygototheSearchTweetsIntegrationandprovideourqueryasfrom:612BrewReviewtogettheTweetsfromtheaccount: Thedefaultresultswilllookasfollows,returningonlythetextoftheTweet.Ifyoudonotseeresults,trysettingthe“RecentorAll”inputtoall,asbydefaultthisendpointwillonlyreturnTweetsfromthepast7days: ExcludeRetweets,Quotes&Replies Youprobablywanttoexcluderetweets,quotesandreplies(likeinourpreviousexample).Todothis,simplyadd-is:retweet-is:reply-is:quotetoyourquery,soyourfinalquerywillbeasfollows: from:612BrewReview-is:retweet-is:reply-is:quote GetMoreFieldsBack Justlikeinthelastsection,wetalkedabouthowtheTwitterAPIlikestoreturnthebareminimumbackbydefaultintheresponse(e.g.justtheTweettextwithoutanyfurtherinformation).Ifyouneedmoredataback,simplycopyandpastethe“Example”valuesforExpansions&FieldsInputs(seetheprevioussection’sGetMoreFieldsBackforamoredetailedwalkthrough): Withtheabovefilledin,we’llnowgetALOTofdataback,probablymuchmorethanweneed: Onceyou’rehappywiththeresults,wenowwanttosetthe“RecentorAll”flagtoallsowegetalltheTweetsbackfromtheentirearchive(thiswillerroroutifyou’renotontheAcademicproducttrack): 2.UseWorkflowsforPagination Onceyou’rehappywiththetypeofdatayou’regettingback,you’lllikelywanttogetmorethan100resultsbackperrequest.Todothis,weneedtouse“pagination”andpassthenext_tokenvalueinthecollectionfromaresponseintothepagination_tokeninputofthenextresponsetogetthenext100resultsandsoon. Fortunately,wedon’thavetodoallofthismanuallyandcanusea“workflow”versionofthisendpointtoautomaticallypaginateandcombinetheresultsforusintobulkCSVfiles. 3.1ImporttheWorkflowFormula CheckouttheTwitterTweets&ArchiveSearch-Paginationworkflowformulapageandclick“Import”toaddtheworkflowtoyouraccount. 3.2SpecifyWorkflowInputs Entertheinputslikeyoudidwiththeendpointversion.IfyouwanttoscrapetheTweetsformultipleaccounts,justprovidemultiplequeries,oneperlinelikeso(butjustuseonelineforeverythingelse): from:612BrewReview-is:retweet-is:reply-is:quote from:stevesiedata-is:retweet-is:reply-is:quote UnlessyoujustneedtheTweettext,youprobablyalsowanttoaddsomeinputsforthefieldsand/orexpansionsyouneedback,e.g.attheminimumyouprobablywanttosettheTweetFields: Andlastly,besuretosetthe“RecentorAll”flagsoyougetbackTweetsbeyondtheprevious7days: 3.3ExecutetheWorkflow Onceyou’vereviewedeverything,youcanoptionallysetanamefortheworkflow(tohelpreferenceitlater),andthenruntheworkflow.Typicallyyouwanttojustleaveallthedefaultsaloneunderthe“Execute”section: 3.4DownloadResults Onceyourworkflowfinishesrunning,you’llseetheresultsinthegreensectionbelow(whichwillcombinetheresponsesfromallpagestogetherforyou): You’llprobablyjustwantthe“Tweets.csv”file,butyoucanalsodownloadothercollectionslikeTweetMentions,URLs,etc…whichareprovidedasseparatecollectionssinceoneTweetcancontainmultiplementions,etc… III-LegacyArchiveScraping(Over3,200Tweets+Premium) ▶️ Atthetimeofthiswriting,TwitteronlyallowsyoutousetheV2Archivesearchwithacademicaccess,howeverweexpectthemtoaddpremiumsupportinthefuture.SopleasebesuretochecktheTwitterAPItoseeiftheyhaveupdatedit,beforeyouperformthefollowingwhichwillbeoutdatedoneday. 1.ScrapetheTweets Herewecanfollowthestepsfromabove,butwewanttousetheV1versionoftheTweetSearchIntegrationandwe’llenterourqueryjustaswedidbeforeasfrom:612BrewReview(youcantryaddingfilterssuchas-filter:retweets-filter:replies-filter:quotes,buttheymaynotworkdependingonyourlevelofaccess). Alsobesuretosetupandincludean“EnvironmentLabel”(pleasecheckthedirectionsthere,asthereissomeadditionalsetupyouneedtoinprovidingaccesstoaTwitterDevEnvironment. You’llalsowanttosetastartandenddate,otherwisetheAPImaybehavestrangely.Yourinputsshouldlooksomethinglikethis: Whenyougetitworking,you’llseeresultslikethiswherewecanconfirmwewentbackintimetotheaccount’sfirstTweet: You’llalsoseeothercollectionsincluded,likeURLstoimagespostedwithTweets,mentionsetc… Nowtogetthenextpageofresults,weneedtolookforanextfieldinthecollectionandpassthisontothePaginationinputtogetthenextpageofresults: However,thisistediousworktodomanually,sothenextsectionwillwalkthroughhowtousetheworkflowversionoftheendpointtoautomaticallypaginateandgetalltheresultsback. 2.UseWorkflowsforPagination Touseworkflows,simplyvisittheTwitterTweetSearchFullArchive-Paginationformulaandclick“Import”toaddtheworkflowtoyouraccount. Thenjustsettheinputslikebefore: Andgivetheworkflowarun: Waitfortheworkflowtorun,asitwillquerytheTwitterAPIandcombineallresultstogetherintoCSVfiles.Whenfinished,you’llbeabletodownloadtheresultsinthegreensectionasCSVfiles: 💡Related 👥ScrapeTwitterFollowers 🔊ScrapeTwitterSpaces 📊ScrapeTwitterTweetCounts 👤ScrapeTwitterUsers ⚡️Endpoints UserFollowing(V2) /2/users/{{user_id}}/following Tweets&ArchiveSearch(V2) /2/tweets/search/{{recent_or_all}} UserDetailsbyUsername(V2) /2/users/by/username/{{username}} UserFollowers(V2) /2/users/{{user_id}}/followers TweetRetweeters(V2) /2/tweets/{{tweet_id}}/retweeted_by UserDetailsbyID(V2) /2/users/{{user_id}} UserTimeline&Mentions(V2) /2/users/{{user_id}}/{{tweets_or_mentions}} TweetSearchFullArchive(V1) /1.1/tweets/search/fullarchive/{{environment_label}}.json UserDetails(V1) /1.1/users/lookup.json UserLikedTweets(V2) /2/users/{{user_id}}/liked_tweets TweetCountsTimeline(V2) /2/tweets/counts/{{recent_or_all}} UserFollowing(V1) /1.1/friends/list.json Retweets(V1) /1.1/statuses/retweets/{{tweet_id}}.json TweetLikers(V2) /2/tweets/{{tweet_id}}/liking_users ListMembers(V1) /1.1/lists/members.json PlaceSearch(V1) /1.1/geo/search.json SpacesDetails(V2) /2/spaces/{{space_id}} SpacesSearch(V2) /2/spaces/search TrendingPlacesSearch(V1) /1.1/trends/closest.json TrendingTopics(V1) /1.1/trends/place.json TweetDetails(V2) /2/tweets/{{tweet_id}} TweetSearch(V1) /1.1/search/tweets.json UserFollowers(V1) /1.1/followers/list.json UserListMemberships(V1) /1.1/lists/memberships.json UserLists(V1) /1.1/lists/list.json UserOwnedLists(V1) /1.1/lists/ownerships.json UserTweets(V1) /1.1/statuses/user_timeline.json



請為這篇文章評分?