tweepyclean package

Submodules

tweepyclean.tweepyclean module

tweepyclean.tweepyclean.clean_tweets(tweets_df, handle='', text_only=True, word_count=True, emojis=True, hashtags=True, sentiment=True, flesch_readability=True, proportion_of_avg_retweets=True, proportion_of_avg_favorites=True)[source]

Adds new columns based on the data in the raw_df() pandas.dataframe output :param raw_dataframe: Dataframe generated by raw_tweets() which will have columns added to it :type raw_dataframe: pandas.core.frame.DataFrame :param handle: String which adds adds a column containing the a specified twitter

handle, (default is none and adds no column)

Parameters
  • text_only (bool, optional) – Bool which specifies to add a column of the tweet text containing no emojis, links, hashtags, or mentions (default is True)

  • emojis (bool, optional) – Bool which specifies to add a column of the extracted emojis from tweet text and places them in their own column (default is True)

  • hashtags (bool, optional) – Bool which specifies to add a column of the extracted hashtags from tweet text (default is True)

  • sentiment (bool, optional) – Bool which specifies to add a column containing the nltk.sentiment.vader SentimentIntensityAnalyzer sentiment score for each tweet (default is True)

  • flesch_readability (bool, optional) – Bool which specifies to add a column containing the textstat flesch readability score (default is True)

  • proportion_of_avg_retweets (bool, optional) – Bool which specifies to add a column containing a proportion value of how many retweets a tweet received compared to the account average (default is True)

  • proportion_of_avg_favorites (bool, optional) – Bool which specifies to add a column containing a proportion value of how many favorites a tweet received compared to the account average (default is True)

Returns

tweets_df_extra – Pandas dataframe containing the additional columns specified by the user.

Return type

pandas.core.frame.DataFrame

Examples

#>>> extra_cols(tweets_df) #>>> extra_cols(tweets_df, flesch_readability = False) #>>> extra_cols(tweets_df, emojis = False, hashtags = False) #>>> extra_cols(tweets_df, sentiment = False)

tweepyclean.tweepyclean.engagement_by_hour(tweets_df)[source]

Creates a line chart of average number of likes and retweets received based on hour of tweet posted.

Parameters

tweets (pandas.DataFrame) – A processed dataframe containing a user’s tweet history and associated information

Returns

  • An Altair graph object (line chart) of average engagement

  • received by hour of tweet posted

Examples

#>>> engagement_by_hour(tweets_df)

tweepyclean.tweepyclean.raw_df(tweepy_items)[source]

Creates a dataframe with labeled columns from a tweepy.cursor.ItemIterator object. :param tweepy_items: Input Iterator object generated using the tweepy package :type tweepy_items: tweepy.cursor.ItemIterator

Returns

  • pd.DataFrame(tweet_search_results) (pandas.core.frame.DataFrame) – Dataframe with up to 31 labeled columns based on the ItemIterator.

  • Examples

  • ——–

  • #>>> raw_df(tweets)

tweepyclean.tweepyclean.sentiment_total(tweets, drop_sentiment=False)[source]

Takes an input of of single english words and outputs the number of words associated with eight emotions and positive/negative sentiment. This is based on the the crowd-sourced NRC Emotion Lexicon, which associates words with eight basic emotions (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust) and two sentiments (negative and positive). For more information on NRC: http://saifmohammad.com/WebPages/NRC-Emotion-Lexicon.htm

Note that words can be 0:n with emotions (either associated with none, 1, or many).

data: pandas.DataFrame or np.array

A list or single column dataframe of single words.

drop_sentiment: boolean

drop emotion/sentiment rows if no words are associated with them. Default is False.

pandas.DataFrame

#>>> sentiment(df, drop_sentiment = True)

3 x 5 sentiment word_count total_words <chr> <int> <dbl> anger 1 4 disgust 2 4 fear 1 4 negative 2 4 sadness 1 4

tweepyclean.tweepyclean.tweet_words(clean_dataframe, top_n=1)[source]

Returns the most common words and counts from a list of tweets.

The output is sorted descending by the count of words and in reverse alphabetical order for any word ties.

Parameters
  • clean_dataframe (pandas.DataFrame) – A processed dataframe containing a user’s tweet history and associated information

  • top_n (int) – An integer representing the the number of most common words to display

Returns

  • pandas.DataFrame – A dataframe with one column containing individual words and a second column with the count of each word

  • Examples

  • ——–

  • #>>> tweet_words(dataframe, 3)

  • pd.DataFrame(data = {‘words’ ([‘best’, ‘apple’, ‘news’],)

  • ’count’ ([102, 52, 24]}))

Module contents