tweepyclean package¶
Submodules¶
tweepyclean.tweepyclean module¶
-
tweepyclean.tweepyclean.clean_tweets(tweets_df, handle='', text_only=True, word_count=True, emojis=True, hashtags=True, sentiment=True, flesch_readability=True, proportion_of_avg_retweets=True, proportion_of_avg_favorites=True)[source]¶ Adds new columns based on the data in the raw_df() pandas.dataframe output :param raw_dataframe: Dataframe generated by raw_tweets() which will have columns added to it :type raw_dataframe: pandas.core.frame.DataFrame :param handle: String which adds adds a column containing the a specified twitter
handle, (default is none and adds no column)
- Parameters
text_only (bool, optional) – Bool which specifies to add a column of the tweet text containing no emojis, links, hashtags, or mentions (default is True)
emojis (bool, optional) – Bool which specifies to add a column of the extracted emojis from tweet text and places them in their own column (default is True)
hashtags (bool, optional) – Bool which specifies to add a column of the extracted hashtags from tweet text (default is True)
sentiment (bool, optional) – Bool which specifies to add a column containing the nltk.sentiment.vader SentimentIntensityAnalyzer sentiment score for each tweet (default is True)
flesch_readability (bool, optional) – Bool which specifies to add a column containing the textstat flesch readability score (default is True)
proportion_of_avg_retweets (bool, optional) – Bool which specifies to add a column containing a proportion value of how many retweets a tweet received compared to the account average (default is True)
proportion_of_avg_favorites (bool, optional) – Bool which specifies to add a column containing a proportion value of how many favorites a tweet received compared to the account average (default is True)
- Returns
tweets_df_extra – Pandas dataframe containing the additional columns specified by the user.
- Return type
pandas.core.frame.DataFrame
Examples
#>>> extra_cols(tweets_df) #>>> extra_cols(tweets_df, flesch_readability = False) #>>> extra_cols(tweets_df, emojis = False, hashtags = False) #>>> extra_cols(tweets_df, sentiment = False)
-
tweepyclean.tweepyclean.engagement_by_hour(tweets_df)[source]¶ Creates a line chart of average number of likes and retweets received based on hour of tweet posted.
- Parameters
tweets (pandas.DataFrame) – A processed dataframe containing a user’s tweet history and associated information
- Returns
An Altair graph object (line chart) of average engagement
received by hour of tweet posted
Examples
#>>> engagement_by_hour(tweets_df)
-
tweepyclean.tweepyclean.raw_df(tweepy_items)[source]¶ Creates a dataframe with labeled columns from a tweepy.cursor.ItemIterator object. :param tweepy_items: Input Iterator object generated using the tweepy package :type tweepy_items: tweepy.cursor.ItemIterator
- Returns
pd.DataFrame(tweet_search_results) (pandas.core.frame.DataFrame) – Dataframe with up to 31 labeled columns based on the ItemIterator.
Examples
——–
#>>> raw_df(tweets)
-
tweepyclean.tweepyclean.sentiment_total(tweets, drop_sentiment=False)[source]¶ Takes an input of of single english words and outputs the number of words associated with eight emotions and positive/negative sentiment. This is based on the the crowd-sourced NRC Emotion Lexicon, which associates words with eight basic emotions (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust) and two sentiments (negative and positive). For more information on NRC: http://saifmohammad.com/WebPages/NRC-Emotion-Lexicon.htm
Note that words can be 0:n with emotions (either associated with none, 1, or many).
- data: pandas.DataFrame or np.array
A list or single column dataframe of single words.
- drop_sentiment: boolean
drop emotion/sentiment rows if no words are associated with them. Default is False.
pandas.DataFrame
#>>> sentiment(df, drop_sentiment = True)
3 x 5 sentiment word_count total_words <chr> <int> <dbl> anger 1 4 disgust 2 4 fear 1 4 negative 2 4 sadness 1 4
-
tweepyclean.tweepyclean.tweet_words(clean_dataframe, top_n=1)[source]¶ Returns the most common words and counts from a list of tweets.
The output is sorted descending by the count of words and in reverse alphabetical order for any word ties.
- Parameters
clean_dataframe (pandas.DataFrame) – A processed dataframe containing a user’s tweet history and associated information
top_n (int) – An integer representing the the number of most common words to display
- Returns
pandas.DataFrame – A dataframe with one column containing individual words and a second column with the count of each word
Examples
——–
#>>> tweet_words(dataframe, 3)
pd.DataFrame(data = {‘words’ ([‘best’, ‘apple’, ‘news’],)
’count’ ([102, 52, 24]}))