{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Trump's tone to Congress\n",
"\n",
"We're going to reproduce [Trump Sounds a Different Tone in First Address to Congress](https://www.nytimes.com/interactive/2017/02/28/upshot/trump-sounds-different-tone-in-first-address-to-congress.html) from the Upshot.\n",
"\n",
"**Datasource 1:** The [NRC Emotional Lexicon](http://saifmohammad.com/WebPages/NRC-Emotion-Lexicon.htm), a list of English words and their associations with eight basic emotions (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust) and two sentiments (negative and positive). The annotations were manually done by crowdsourcing. \n",
"\n",
"**Datasource 2:** A database of [Trump speeches](https://github.com/PedramNavid/trump_speeches), one speech per file. There are a lot of GitHub repositories of Trump speeches, but this one is better than the vast majority.\n",
"\n",
"**Datasource 3:** State of the Union addresses taken from [this repo's data directory](https://github.com/m-aleem/SOTU-Analyzer). I also cheated and pasted Trump's SOTU-y address in."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import pandas as pd\n",
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Reading in the EmoLex\n",
"\n",
"I'm just copying this from the other notebook! It's the one at the very bottom that does a lot of reshaping. I think it's the easiest to work with."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" emotion | \n",
" word | \n",
" anger | \n",
" anticipation | \n",
" disgust | \n",
" fear | \n",
" joy | \n",
" negative | \n",
" positive | \n",
" sadness | \n",
" surprise | \n",
" trust | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" aback | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" 1 | \n",
" abacus | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
"
\n",
" \n",
" 2 | \n",
" abandon | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" 3 | \n",
" abandoned | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" 4 | \n",
" abandonment | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
"emotion word anger anticipation disgust fear joy negative \\\n",
"0 aback 0 0 0 0 0 0 \n",
"1 abacus 0 0 0 0 0 0 \n",
"2 abandon 0 0 0 1 0 1 \n",
"3 abandoned 1 0 0 1 0 1 \n",
"4 abandonment 1 0 0 1 0 1 \n",
"\n",
"emotion positive sadness surprise trust \n",
"0 0 0 0 0 \n",
"1 0 0 0 1 \n",
"2 0 1 0 0 \n",
"3 0 1 0 0 \n",
"4 0 1 1 0 "
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"filepath = \"NRC-Emotion-Lexicon-v0.92/NRC-emotion-lexicon-wordlevel-alphabetized-v0.92.txt\"\n",
"emolex_df = pd.read_csv(filepath, names=[\"word\", \"emotion\", \"association\"], skiprows=45, sep='\\t')\n",
"emolex_df = emolex_df.pivot(index='word', columns='emotion', values='association').reset_index()\n",
"emolex_df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Reading in Trump's speeches\n",
"\n",
"### Get a list of all of the files"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['trump_speeches-master/data/speech_0.txt',\n",
" 'trump_speeches-master/data/speech_1.txt',\n",
" 'trump_speeches-master/data/speech_10.txt',\n",
" 'trump_speeches-master/data/speech_11.txt',\n",
" 'trump_speeches-master/data/speech_12.txt']"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import glob\n",
"\n",
"filenames = glob.glob(\"trump_speeches-master/data/speech*\")\n",
"filenames[:5]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Read them all in individually"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"56"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"speeches = [open(filename).read() for filename in filenames]\n",
"len(speeches)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create a dataframe out of the results\n",
"\n",
"Instead of passing a list of dictionaries to `pd.DataFrame`, we pass a dictionary that says \"here are all of the filenames\" and \"here are all of the texts\" and it puts each list into a column."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" filename | \n",
" text | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" trump_speeches-master/data/speech_0.txt | \n",
" Remarks Announcing Candidacy for President in ... | \n",
"
\n",
" \n",
" 1 | \n",
" trump_speeches-master/data/speech_1.txt | \n",
" Remarks at the AIPAC Policy Conference in Wash... | \n",
"
\n",
" \n",
" 2 | \n",
" trump_speeches-master/data/speech_10.txt | \n",
" Remarks at the Washington County Fair Park in ... | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" filename \\\n",
"0 trump_speeches-master/data/speech_0.txt \n",
"1 trump_speeches-master/data/speech_1.txt \n",
"2 trump_speeches-master/data/speech_10.txt \n",
"\n",
" text \n",
"0 Remarks Announcing Candidacy for President in ... \n",
"1 Remarks at the AIPAC Policy Conference in Wash... \n",
"2 Remarks at the Washington County Fair Park in ... "
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"speeches_df = pd.DataFrame({\n",
" 'text': speeches,\n",
" 'filename': filenames\n",
"})\n",
"speeches_df.head(3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Splitting out the title and content of the speech\n",
"\n",
"The \"text\" column is formatted with first the title of the speech, then the text. Like this:"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"\"Remarks Announcing Candidacy for President in New York City\\nTrump: Wow. Whoa. That is some group of people. Thousands.So nice, thank you very much. That's really nice. Thank you. It's great to be at T\""
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"speeches_df.loc[0]['text'][:200]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We're going to split those out into multiple columns, then delete the original column so we don't get mixed up later."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" filename | \n",
" name | \n",
" content | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" trump_speeches-master/data/speech_0.txt | \n",
" Remarks Announcing Candidacy for President in ... | \n",
" Trump: Wow. Whoa. That is some group of people... | \n",
"
\n",
" \n",
" 1 | \n",
" trump_speeches-master/data/speech_1.txt | \n",
" Remarks at the AIPAC Policy Conference in Wash... | \n",
" Good evening. Thank you very much. I speak to... | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" filename \\\n",
"0 trump_speeches-master/data/speech_0.txt \n",
"1 trump_speeches-master/data/speech_1.txt \n",
"\n",
" name \\\n",
"0 Remarks Announcing Candidacy for President in ... \n",
"1 Remarks at the AIPAC Policy Conference in Wash... \n",
"\n",
" content \n",
"0 Trump: Wow. Whoa. That is some group of people... \n",
"1 Good evening. Thank you very much. I speak to... "
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"speeches_df['name'] = speeches_df['text'].apply(lambda value: value.split(\"\\n\")[0])\n",
"speeches_df['content'] = speeches_df['text'].apply(lambda value: value.split(\"\\n\")[1])\n",
"del speeches_df['text']\n",
"speeches_df.head(2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# How does Trump sound?\n",
"\n",
"Let's analyze by counting words.\n",
"\n",
"We would use the code below to count all of his words. **Do we really want all of them?**\n",
"\n",
"```python\n",
"from sklearn.feature_extraction.text import CountVectorizer\n",
"\n",
"vec = CountVectorizer()\n",
"matrix = vec.fit_transform(speeches_df['content'])\n",
"vocab = vec.get_feature_names()\n",
"wordcount_df = pd.DataFrame(matrix.toarray(), columns=vocab)\n",
"wordcount_df.head()\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 70,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 aback\n",
"1 abacus\n",
"2 abandon\n",
"Name: word, dtype: object"
]
},
"execution_count": 70,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"emolex_df['word'].head(3)"
]
},
{
"cell_type": "code",
"execution_count": 71,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" aback | \n",
" abacus | \n",
" abandon | \n",
" abandoned | \n",
" abandonment | \n",
" abate | \n",
" abatement | \n",
" abba | \n",
" abbot | \n",
" abbreviate | \n",
" ... | \n",
" zephyr | \n",
" zeppelin | \n",
" zest | \n",
" zip | \n",
" zodiac | \n",
" zone | \n",
" zoo | \n",
" zoological | \n",
" zoology | \n",
" zoom | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.000677 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" ... | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
"
\n",
" \n",
" 1 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" ... | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
"
\n",
" \n",
" 2 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.001321 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" ... | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
"
\n",
" \n",
" 3 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" ... | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
"
\n",
" \n",
" 4 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" ... | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
"
\n",
" \n",
"
\n",
"
5 rows × 14182 columns
\n",
"
"
],
"text/plain": [
" aback abacus abandon abandoned abandonment abate abatement abba \\\n",
"0 0.0 0.0 0.0 0.000677 0.0 0.0 0.0 0.0 \n",
"1 0.0 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 \n",
"2 0.0 0.0 0.0 0.001321 0.0 0.0 0.0 0.0 \n",
"3 0.0 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 \n",
"4 0.0 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 \n",
"\n",
" abbot abbreviate ... zephyr zeppelin zest zip zodiac zone zoo \\\n",
"0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n",
"1 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n",
"2 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n",
"3 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n",
"4 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n",
"\n",
" zoological zoology zoom \n",
"0 0.0 0.0 0.0 \n",
"1 0.0 0.0 0.0 \n",
"2 0.0 0.0 0.0 \n",
"3 0.0 0.0 0.0 \n",
"4 0.0 0.0 0.0 \n",
"\n",
"[5 rows x 14182 columns]"
]
},
"execution_count": 71,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from sklearn.feature_extraction.text import TfidfVectorizer\n",
"\n",
"# I only want you to look for words in the emotional lexicon\n",
"# because we don't know what's up with the other words\n",
"vec = TfidfVectorizer(vocabulary=emolex_df.word,\n",
" use_idf=False, \n",
" norm='l1') # ELL - ONE\n",
"matrix = vec.fit_transform(speeches_df['content'])\n",
"vocab = vec.get_feature_names()\n",
"wordcount_df = pd.DataFrame(matrix.toarray(), columns=vocab)\n",
"wordcount_df.head()"
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {},
"outputs": [],
"source": [
"# wordcount_df.sort_values(by='america', ascending=False).head(5)"
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {},
"outputs": [],
"source": [
"# wordcount_df[['murder', 'america', 'great', 'prison', 'immigrant']].head(2)"
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 0.007442\n",
"1 0.010121\n",
"2 0.007926\n",
"3 0.003871\n",
"4 0.001709\n",
"5 0.000000\n",
"6 0.005051\n",
"7 0.000000\n",
"8 0.000000\n",
"9 0.001506\n",
"10 0.001391\n",
"11 0.013943\n",
"12 0.002304\n",
"13 0.000000\n",
"14 0.004525\n",
"15 0.000000\n",
"16 0.000000\n",
"17 0.002066\n",
"18 0.004785\n",
"19 0.001166\n",
"20 0.000000\n",
"21 0.000000\n",
"22 0.004367\n",
"23 0.007979\n",
"24 0.000000\n",
"25 0.001626\n",
"26 0.001873\n",
"27 0.000000\n",
"28 0.002008\n",
"29 0.000000\n",
"30 0.000000\n",
"31 0.006631\n",
"32 0.000000\n",
"33 0.002976\n",
"34 0.002535\n",
"35 0.001511\n",
"36 0.003810\n",
"37 0.000000\n",
"38 0.000000\n",
"39 0.001658\n",
"40 0.001802\n",
"41 0.000000\n",
"42 0.003704\n",
"43 0.001812\n",
"44 0.002165\n",
"45 0.002976\n",
"46 0.002247\n",
"47 0.003937\n",
"48 0.001592\n",
"49 0.000000\n",
"50 0.003401\n",
"51 0.002950\n",
"52 0.002430\n",
"53 0.005650\n",
"54 0.014153\n",
"55 0.009298\n",
"dtype: float64"
]
},
"execution_count": 57,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# bad bad bad = 100% negative\n",
"# bad bad evil evil = 50% bad + 50% evil = 100% negative\n",
"# bad fish evil fish = 25% bad + 25% evil = 50% negative \n",
"# awful % + hate % + bad % + worse % + evil % = negative %\n",
"\n",
"wordcount_df[['awful', 'hate', 'bad', 'worse', 'evil']].sum(axis=1)"
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" filename | \n",
" name | \n",
" content | \n",
" negative | \n",
" policy | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" trump_speeches-master/data/speech_0.txt | \n",
" Remarks Announcing Candidacy for President in ... | \n",
" Trump: Wow. Whoa. That is some group of people... | \n",
" 0.007442 | \n",
" 0.00128 | \n",
"
\n",
" \n",
" 1 | \n",
" trump_speeches-master/data/speech_1.txt | \n",
" Remarks at the AIPAC Policy Conference in Wash... | \n",
" Good evening. Thank you very much. I speak to... | \n",
" 0.010121 | \n",
" 0.00000 | \n",
"
\n",
" \n",
" 2 | \n",
" trump_speeches-master/data/speech_10.txt | \n",
" Remarks at the Washington County Fair Park in ... | \n",
" It's so great to be here tonight. I am honored... | \n",
" 0.007926 | \n",
" 0.00709 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" filename \\\n",
"0 trump_speeches-master/data/speech_0.txt \n",
"1 trump_speeches-master/data/speech_1.txt \n",
"2 trump_speeches-master/data/speech_10.txt \n",
"\n",
" name \\\n",
"0 Remarks Announcing Candidacy for President in ... \n",
"1 Remarks at the AIPAC Policy Conference in Wash... \n",
"2 Remarks at the Washington County Fair Park in ... \n",
"\n",
" content negative policy \n",
"0 Trump: Wow. Whoa. That is some group of people... 0.007442 0.00128 \n",
"1 Good evening. Thank you very much. I speak to... 0.010121 0.00000 \n",
"2 It's so great to be here tonight. I am honored... 0.007926 0.00709 "
]
},
"execution_count": 58,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# speeches_df['negative'] = wordcount_df[['awful', 'hate', 'bad', 'worse', 'evil']].sum(axis=1)\n",
"# speeches_df.head(3)"
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {},
"outputs": [],
"source": [
"# speeches_df['policy'] = wordcount_df[['crime', 'discrimination', 'poverty', 'border']].sum(axis=1)\n",
"# speeches_df.head(3)"
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {},
"outputs": [],
"source": [
"# speeches_df.plot(x='negative', \n",
"# y='policy', \n",
"# kind='scatter',\n",
"# ylim=(0,0.01),\n",
"# xlim=(0,0.005))"
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" emotion | \n",
" word | \n",
" anger | \n",
" anticipation | \n",
" disgust | \n",
" fear | \n",
" joy | \n",
" negative | \n",
" positive | \n",
" sadness | \n",
" surprise | \n",
" trust | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" aback | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" 1 | \n",
" abacus | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
"
\n",
" \n",
" 2 | \n",
" abandon | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" 3 | \n",
" abandoned | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" 4 | \n",
" abandonment | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
"emotion word anger anticipation disgust fear joy negative \\\n",
"0 aback 0 0 0 0 0 0 \n",
"1 abacus 0 0 0 0 0 0 \n",
"2 abandon 0 0 0 1 0 1 \n",
"3 abandoned 1 0 0 1 0 1 \n",
"4 abandonment 1 0 0 1 0 1 \n",
"\n",
"emotion positive sadness surprise trust \n",
"0 0 0 0 0 \n",
"1 0 0 0 1 \n",
"2 0 1 0 0 \n",
"3 0 1 0 0 \n",
"4 0 1 1 0 "
]
},
"execution_count": 61,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"emolex_df.head()"
]
},
{
"cell_type": "code",
"execution_count": 62,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" emotion | \n",
" word | \n",
" anger | \n",
" anticipation | \n",
" disgust | \n",
" fear | \n",
" joy | \n",
" negative | \n",
" positive | \n",
" sadness | \n",
" surprise | \n",
" trust | \n",
"
\n",
" \n",
" \n",
" \n",
" 3 | \n",
" abandoned | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" 4 | \n",
" abandonment | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
"
\n",
" \n",
" 17 | \n",
" abhor | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" 18 | \n",
" abhorrent | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" 27 | \n",
" abolish | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
"emotion word anger anticipation disgust fear joy negative \\\n",
"3 abandoned 1 0 0 1 0 1 \n",
"4 abandonment 1 0 0 1 0 1 \n",
"17 abhor 1 0 1 1 0 1 \n",
"18 abhorrent 1 0 1 1 0 1 \n",
"27 abolish 1 0 0 0 0 1 \n",
"\n",
"emotion positive sadness surprise trust \n",
"3 0 1 0 0 \n",
"4 0 1 1 0 \n",
"17 0 0 0 0 \n",
"18 0 0 0 0 \n",
"27 0 0 0 0 "
]
},
"execution_count": 62,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"emolex_df[emolex_df.anger == 1].head()"
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"3 abandoned\n",
"4 abandonment\n",
"17 abhor\n",
"18 abhorrent\n",
"27 abolish\n",
"Name: word, dtype: object"
]
},
"execution_count": 63,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Get your list of angry words\n",
"angry_words = emolex_df[emolex_df.anger == 1]['word']\n",
"angry_words.head()"
]
},
{
"cell_type": "code",
"execution_count": 68,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" abandoned | \n",
" abandonment | \n",
" abhor | \n",
" abhorrent | \n",
" abolish | \n",
" abomination | \n",
" abuse | \n",
" accursed | \n",
" accusation | \n",
" accused | \n",
" ... | \n",
" wreck | \n",
" wrecked | \n",
" wretch | \n",
" wring | \n",
" wrongdoing | \n",
" wrongful | \n",
" wrongly | \n",
" yell | \n",
" yelp | \n",
" youth | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 0.000677 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.000677 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" ... | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
"
\n",
" \n",
" 1 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" ... | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
"
\n",
" \n",
" 2 | \n",
" 0.001321 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" ... | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
"
\n",
" \n",
" 3 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" ... | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
"
\n",
" \n",
" 4 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" ... | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
"
\n",
" \n",
"
\n",
"
5 rows × 1247 columns
\n",
"
"
],
"text/plain": [
" abandoned abandonment abhor abhorrent abolish abomination abuse \\\n",
"0 0.000677 0.0 0.0 0.0 0.0 0.0 0.000677 \n",
"1 0.000000 0.0 0.0 0.0 0.0 0.0 0.000000 \n",
"2 0.001321 0.0 0.0 0.0 0.0 0.0 0.000000 \n",
"3 0.000000 0.0 0.0 0.0 0.0 0.0 0.000000 \n",
"4 0.000000 0.0 0.0 0.0 0.0 0.0 0.000000 \n",
"\n",
" accursed accusation accused ... wreck wrecked wretch wring \\\n",
"0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 \n",
"1 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 \n",
"2 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 \n",
"3 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 \n",
"4 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 \n",
"\n",
" wrongdoing wrongful wrongly yell yelp youth \n",
"0 0.0 0.0 0.0 0.0 0.0 0.0 \n",
"1 0.0 0.0 0.0 0.0 0.0 0.0 \n",
"2 0.0 0.0 0.0 0.0 0.0 0.0 \n",
"3 0.0 0.0 0.0 0.0 0.0 0.0 \n",
"4 0.0 0.0 0.0 0.0 0.0 0.0 \n",
"\n",
"[5 rows x 1247 columns]"
]
},
"execution_count": 68,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"wordcount_df[angry_words].head()"
]
},
{
"cell_type": "code",
"execution_count": 65,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" filename | \n",
" name | \n",
" content | \n",
" negative | \n",
" policy | \n",
" anger | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" trump_speeches-master/data/speech_0.txt | \n",
" Remarks Announcing Candidacy for President in ... | \n",
" Trump: Wow. Whoa. That is some group of people... | \n",
" 0.007442 | \n",
" 0.00128 | \n",
" 0.055480 | \n",
"
\n",
" \n",
" 1 | \n",
" trump_speeches-master/data/speech_1.txt | \n",
" Remarks at the AIPAC Policy Conference in Wash... | \n",
" Good evening. Thank you very much. I speak to... | \n",
" 0.010121 | \n",
" 0.00000 | \n",
" 0.099190 | \n",
"
\n",
" \n",
" 2 | \n",
" trump_speeches-master/data/speech_10.txt | \n",
" Remarks at the Washington County Fair Park in ... | \n",
" It's so great to be here tonight. I am honored... | \n",
" 0.007926 | \n",
" 0.00709 | \n",
" 0.133421 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" filename \\\n",
"0 trump_speeches-master/data/speech_0.txt \n",
"1 trump_speeches-master/data/speech_1.txt \n",
"2 trump_speeches-master/data/speech_10.txt \n",
"\n",
" name \\\n",
"0 Remarks Announcing Candidacy for President in ... \n",
"1 Remarks at the AIPAC Policy Conference in Wash... \n",
"2 Remarks at the Washington County Fair Park in ... \n",
"\n",
" content negative policy \\\n",
"0 Trump: Wow. Whoa. That is some group of people... 0.007442 0.00128 \n",
"1 Good evening. Thank you very much. I speak to... 0.010121 0.00000 \n",
"2 It's so great to be here tonight. I am honored... 0.007926 0.00709 \n",
"\n",
" anger \n",
"0 0.055480 \n",
"1 0.099190 \n",
"2 0.133421 "
]
},
"execution_count": 65,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Only give me the columns of angry words\n",
"speeches_df['anger'] = wordcount_df[angry_words].sum(axis=1)\n",
"speeches_df.head(3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
},
{
"cell_type": "code",
"execution_count": 74,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" filename | \n",
" name | \n",
" content | \n",
" negative | \n",
" policy | \n",
" anger | \n",
" positivity | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" trump_speeches-master/data/speech_0.txt | \n",
" Remarks Announcing Candidacy for President in ... | \n",
" Trump: Wow. Whoa. That is some group of people... | \n",
" 0.007442 | \n",
" 0.00128 | \n",
" 0.055480 | \n",
" 0.261164 | \n",
"
\n",
" \n",
" 1 | \n",
" trump_speeches-master/data/speech_1.txt | \n",
" Remarks at the AIPAC Policy Conference in Wash... | \n",
" Good evening. Thank you very much. I speak to... | \n",
" 0.010121 | \n",
" 0.00000 | \n",
" 0.099190 | \n",
" 0.315789 | \n",
"
\n",
" \n",
" 2 | \n",
" trump_speeches-master/data/speech_10.txt | \n",
" Remarks at the Washington County Fair Park in ... | \n",
" It's so great to be here tonight. I am honored... | \n",
" 0.007926 | \n",
" 0.00709 | \n",
" 0.133421 | \n",
" 0.214003 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" filename \\\n",
"0 trump_speeches-master/data/speech_0.txt \n",
"1 trump_speeches-master/data/speech_1.txt \n",
"2 trump_speeches-master/data/speech_10.txt \n",
"\n",
" name \\\n",
"0 Remarks Announcing Candidacy for President in ... \n",
"1 Remarks at the AIPAC Policy Conference in Wash... \n",
"2 Remarks at the Washington County Fair Park in ... \n",
"\n",
" content negative policy \\\n",
"0 Trump: Wow. Whoa. That is some group of people... 0.007442 0.00128 \n",
"1 Good evening. Thank you very much. I speak to... 0.010121 0.00000 \n",
"2 It's so great to be here tonight. I am honored... 0.007926 0.00709 \n",
"\n",
" anger positivity \n",
"0 0.055480 0.261164 \n",
"1 0.099190 0.315789 \n",
"2 0.133421 0.214003 "
]
},
"execution_count": 74,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Get your list of positive words\n",
"positive_words = emolex_df[emolex_df.positive == 1]['word']\n",
"\n",
"# Only give me the columns of angry words\n",
"speeches_df['positivity'] = wordcount_df[positive_words].sum(axis=1)\n",
"speeches_df.head(3)"
]
},
{
"cell_type": "code",
"execution_count": 75,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 75,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAZkAAAEPCAYAAACQmrmQAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAHupJREFUeJzt3X+QHGd95/H3R7ZWXv9YySpv7gob7drYRuaHQQIZ5y7A\nCizQ3QH2mYClFMGcdTl0jgvwQcrm7nKWD1L8SOUCV5wiwymWL4l+EOCM4wrYFtY6ocDsIskWIMmS\nDbu2sMlOiC2cyypSxPf+mF5ptJrdnZ3pnu6Z+byqptTT0z39PDOr/szTT/fTigjMzMyyMCfvApiZ\nWftyyJiZWWYcMmZmlhmHjJmZZcYhY2ZmmXHImJlZZnIPGUkrJe2XdEDSbVVev1XSjyQ9JukhSS+t\neO3GZL0nJL2/uSU3M7OZKM/rZCTNAQ4AbwWeBYaBVRGxv2KZNwPfi4gjktYCAxGxStL5wPeBpYCA\nncDSiDjc7HqYmVl1ebdkrgIORsRoRBwDtgLXVi4QEY9ExJHk6aPAhcn024EHI+JwRLwAPAisbFK5\nzcysBnmHzIXAMxXPD3EyRKpZA3xjinV/OsO6ZmbWZGfmvH1VmVf1+J2k9wGvA94823XNzCwfeYfM\nIWBRxfOLKPfNnELSNcDHgTclh9Um1h2YtO6OahuR5PAxM6tDRFT7QV+zvA+XDQOXSuqT1AWsAu6r\nXEDSEmAD8K6I+HnFSw8AKyTNT04CWJHMqyoi2vZxxx135F4G1831c/3a75GGXFsyEXFc0i2UO+3n\nABsjYp+kO4HhiLgf+CxwDvDnkgSMRsR1EfG8pE9QPsMsgDujfAKAmZkVRN6Hy4iIbwIvnzTvjorp\nFdOsuwnYlFXZzMysMXkfLrMUDAwM5F2EzLRz3cD1a3XtXr805HoxZrNIik6op5lZmiQRLd7xb2Zm\nbcwhY2ZmmXHImJlZZhwyZmaWGYeMmZllxiFjZmaZcciYmVlmHDJmZpYZh4yZmWXGIWNmZplxyJiZ\nWWYcMmZmlhmHjJmZZcYh0yFKpRLDw8OUSqW8i2JmHcQh0wG2bNlGX99iVqxYS1/fYrZs2ZZ3kcys\nQ/h+Mm2uVCrR17eY8fEdwJXAHrq7lzM6up/e3t68i2dmBeb7ydiMRkZG6OrqpxwwAFcyd24fIyMj\n+RXKzDqGQ6bN9ff3c/ToCLAnmbOHY8dG6e/vz69QZtYxHDJtrre3l40b19PdvZyenqV0dy9n48b1\nPlRmZk3hPpkOUSqVGBkZob+/3wFjZjVJo0/GIWNmZlW549/MzArNIWNmZplxyJiZWWYcMmZmlhmH\njJmZZcYhY2ZmmXHImJlZZhwyZmaWGYeMmZllxiFjZmaZcciYmVlmHDJmZpYZh4yZmWUm95CRtFLS\nfkkHJN1W5fU3Stop6Zik6ye9dlzSLkm7Jd3bvFKbmVktzsxz45LmAF8A3go8CwxL+npE7K9YbBS4\nEfhYlbf4fxGxNPuSmplZPXINGeAq4GBEjAJI2gpcC5wImYh4Onmt2g1hGrrPgZmZZSvvw2UXAs9U\nPD+UzKvVPElDkr4j6dp0i2ZmZo3KuyVTrSUym1tYLoqIn0m6GHhY0p6I+Em1BdetW3diemBggIGB\ngdmU08ys7Q0ODjI4OJjqe+Z6+2VJVwPrImJl8vx2ICLiM1WWvRv4i4j42hTvNeXrvv2ymdnstcPt\nl4eBSyX1SeoCVgH3TbP8icpKWpCsg6QLgH8B7M2ysGZmNju5hkxEHAduAR4EfgRsjYh9ku6U9A4A\nSa+X9Azw68AGST9IVr8C+L6k3cC3gE9NOivNWkipVGJ4eJhSqZR3UcwsRbkeLmsWHy4rti1btrFm\nzc10dfVz9OgIGzeuZ/XqG/IullnHS+NwmUPGclUqlejrW8z4+A7gSmAP3d3LGR3dT29vb97FM+to\n7dAnYx1uZGSErq5+ygEDcCVz5/YxMjKSX6HMLDUOGctVf3/5EBnsSebs4dixUfr7+/MrlJmlxiHT\nIYrasd7b28vGjevp7l5OT89SuruXs3Hjeh8qM2sT7pPpAK3QsV4qlRgZGaG/v98BY1YQ7vivUSeH\njDvWzaxe7vi3Gblj3czy5JBpc9N1rBe1n8bM2odDpkny2qFP1bG+ffvD9PUtZsWKtfT1LWbLlm1N\nLVezOEjN8uU+mSYoQsd7Zcc60BH9NEX43M1amTv+a5RnyBSx4314eJgVK9Zy+PDOE/N6epayfftd\nLFu2LJcypa2In7tZq3HHfwsoYsd7J1wAWcTP3awTOWQyVsQdeidcAFnEz92sE/lwWRNM9A3MndvH\nsWOjhekbaPcLIIv6uZu1CvfJ1CjvkIH236EXlT93s/o5ZGpUhJAxM2s17vg3M7NCc8iYmVlmHDJt\nwFe1m1lROWRa3JYt21IZHsZBZWZZcMd/C0vrqnYPv2Jm1bjjv8OlcVV7qVRizZqbGR/fweHDOxkf\n38GaNTc3vUXjlpRZe3LItLA0rmovwvAraR3yM7Picci0sDSGh8l7+JWitKTMLBtn5l0Aa8zq1Tdw\nzTVvqfuq9omgWrNm+SnDrzTr6viJltT4+OktKV+hb9b63PFvQO3Dr6Q9TEsRh+T3UDRmZe74t4ZU\ndrb39vaybNmyaXeqWZwuXbQRod0/ZJayiGj7R7maVmnz5q3R3b0w5s9fGt3dC2Pz5q3TLj82Nhbd\n3QsDHg+IgMeju3thjI2NpbLdsbGxGBoamvX7pSmtOpq1i2Tf2dD+1y2ZDlRPZ3vWp0vX0pLKWhHO\ntDNrNw6ZDlTPzrRdTpeeTt5n2pm1I4dMB6pnZzpT30ktF1P29/czPv7UKds9cuTHhdmJ59E/5ItQ\nre01erytFR4UtE8mz36Iib6Rnp4lNfXJTKhW5lr7d8bGxmLu3HMDzg9YEnB+zJ17buH6PJr1vcy2\nX8ys2UihTyb3AGjGo4ghU4QdTBo709l0lg8NDcX8+UsDxgKGAsaip2dJDA0NNVKNQpw0MFs+ycBa\nQRoh48NlOSjKVe5pdLbPpp/l5GG654BlwHMN93m06inHRe+fMkuLQyYH7bSDmU3/Ttp9HkUJ63r4\nJAPrFLmHjKSVkvZLOiDptiqvv1HSTknHJF0/6bUbk/WekPT+5pW6Ma26g6nWST3b4Fi9+gZGR/ez\nfftdjI7ub+iWAq0c1kW7CNUsM40eb2vkQTnkngT6gLnAY8DiScssAl4FbAKur5h/PvAUMB9YMDE9\nxXbSOkSZmno73ptlcj/HTH1IefSLtEO/Riv2J1nnoNU7/oGrgW9UPL8duG2KZe+eFDKrgD+qeP5H\nwA1TrNv4p52BtHYwae+oJgfKhg1fLOzOvOhhbdbK0giZvEdhvhB4puL5IeCqOtf9aTKvZfT29jZ8\neCTtu1pW9nOUR0bew4c//Ea6ul5GtcNSeR/eaXQU6nbgAT2tyPLuk6k2umetwyU3sm5byKLju3o/\nxyKOHv0J1fqQinAxYRGGpMlLq55dZ50j75bMIcp9LhMuAp6dxboDk9bdMdXC69atOzE9MDDAwMDA\nVIu2jCzuxXLqSQnllszx48/y+c9/lltvPfWeM9u3P5xqK2oy/0KfWqlUYvfu3dx001qOHHnkRKtz\nzZrlXHPNW/x5WV0GBwcZHBxM900bPd7WyAM4g5Md/12UO/6vmGLZu4F3Vzyv7PifmF4wxbrpHKAs\nmKw6vqfq56js+8m6070IF6sW1cRnc845rwk4O2Br8h1EKhe3mk2g1Tv+y3VgJfAEcBC4PZl3J/CO\nZPr1lPteXgRKwA8q1v1Ast4B4P3TbCOlj7wxaV1hX+2sr2od33v37o1NmzbF3r1769rOAw88EA88\n8MAMV+9H6ju4WgKsU8/KqvbZlIfpGSvUCRnWHtoiZJrxKELIpPHLfPJ7fOITv3eiVTF5h3vLLR8O\n6A64PKA7brnlQ6mWt96WTC3hMFOAdXIrp9pnAy+Lc865vOM+C8ueQ6ZFQiaNQ0t79+6NefMWTPoF\ne3acddaC03Yse/fuTQKmctnumls0tZZ3tqcPz2Ygzam23w7XxjRiqvpP1eI0a4RDpkVCptFDS5s3\nb41583qSVklUPJYE/NlpO9lNmzZVWfay2LRp04zbGhsbi02bNsV55726pvLWethqtuEwVYBleZiu\nVfjaIGsWh0yLhEwjv75PrrsjYPKx+IVRbSTjqVoy27Ztm3abEzuv885bkqz/mdRaC/WEQ7UA6/SW\nzIRO7ZOy5nLItEjIRNT/6/PUnfPWJFguDZifPK++k73llg8lQXFZQHdI86Y9TFW9Q7k7zj33Van8\nWk4zHDZs+GLMm7cgzjvvtf4lb5Yhh0wLhUxEfb8+T98574gzzzwn5s3rmTGwfv/3/yC5SdjFyRlI\nU4dStZbGeee9NjZt2pT6acm1BO1Un9XJ1tarY968ntiw4YuplM3MTteUkKF8Zf1LG91Qno+ihEy9\nqu2cZwqs6i2T6ofXplo+i8NQtQTtVCcI+FCZWXM1rSUD7Gx0Q3k+Wj1kImbfCqp+qmv1EwUmpNmh\nXG+fwXRB4k5/s+ZKI2RqHbvsUUnLalzWMjDb8bmq3bMGnuCss357yvuWpHWvl0bG05ruHjGteh8e\ns45WSxIBe4HjlIdu2QP8ANjTaMI160EbtGTqMbllMnHxZpYaPaQ10/o+fdeseUihJaPy+0xPUt8U\nATWaUtZlSlLUUs921OxBJoeHh1mxYi2HD+88Ma+nZynbt9/FsmW1NYYnbl9QORhnZavKA2eaNYck\nIqLaiPe1v0etO19JvwZcFhF3S+oFzo2InzSy8Wbp5JBptlKpRF/fYsbHdzAxinN393JGR/fPKhAc\nJGb5a1rISLqD8kCVL4+IyyW9BPjziPiXjWy8WRwyzTVTS6QRDh+z5mlmyDwGLAF2RcSSZN6eiLhy\n+jWLwSHTfFmEQbW7gHb6XTHNstTMkBmKiKsk7YqIpZLOAb7rkCmudvvFXyqVWLToco4c+RRwCdBF\nV9d1zJkj5s27JJObppl1ujRCptZTmL8s6S5ggaTfArYDX2pkw5addrwl7113fYkjR8aBjwK/DVzH\n0aNnceTI/0rt1tNmlr7ZdPyvAN5GeQSAByLioSwLlqZOasmk1fFeJCdbMQIGmagXXA3sBK4Aaj+L\nrd1aeWZZaWZLhoh4KCJ+JyI+1koB02mmu5ixVe3evRvoBS6msl7wEmB38ry2CzPbsZVnVmS19sm8\nCExe8DDwfeCjEfHjDMqWmlZqyTT6K7vdWjJbtmzjppvWcuTIPwJnUdmSmTv3jZxxxhy6ui6u6Sy2\ndvtszLLWzJbM/wB+B7gQuAj4GOU+ma3AHzdSADspjV/Zvb29bNy4nu7u5fT0LKW7e/mUw8gUXalU\nYs2amzly5BHgbuAo8KvApXR1vYl77vkiTz99oOZhcNqxlWdWdLW2ZL4XEW+YNO/RiLha0uMR8ZrM\nSpiCVmjJpP0rux36HU4fPaDE2Wdfzac//RFWrVo163q5JWM2O81syfxS0nslzUke7614rdh77xaR\n9q/s2Q6oWUSnD4j5HBEv1BUw0F6tPLNWUWtL5hLg85SPVQTwKHAr8FPgdRHx7SwL2ahObMm0iyxG\nD2iHVp5ZMzR17LJW1gohA9kOx9LKHApm+WjmFf+9wG8B/cCZE/Mj4qZGNt4srRIy4B2qmRVHM0Pm\nO8BfU77y7fjE/Ij4aiMbb5ZWChmrj8PZLH1NHSAzIl7byIby5JBpb9UGzvRhRqvkHyH1aWbIfBL4\nTkT8ZSMby4tDpn35hAmbiX+E1K+ZpzB/GLhf0rikX0h6UdIvGtmwdaZSqcTw8HBqA1n6AkubzsQF\nvePjOzyQak5qCpmIOA+4ABgA3gm8I/nXrGb1jGgwUyidfi1NbWOYWWfwj5ACiIgZH8C/B34APA/s\nAMaBb9WybhEe5WpansbGxqK7e2HA4wER8Hh0dy+MsbGxKdfZvHlrdHcvjPnzl0Z398LYvHnrtMv1\n9CyZdjnrPPX83dlJyb6zsf1vTQuVA+Ys4LHk+WJgW6Mbb9bDIZO/oaGhmD9/afIfvfzo6VkSQ0ND\nVZef7c5hbGwshoaGvPOw0/hHSP3SCJkzqc2RiDgiCUnzImK/pJen1pyytnfqYa1yB/10h7UmDnOM\nj59+mKNah35vb687+q2q1atv8G26c1RryByStAC4F3hI0vPAs9kVy9rNxLhha9YsP2VEg6n+w882\nlMym4x8h+Zn1sDKS3gzMB74ZEUczKVXKfApzcczmeoV6htnx9RBm6fHYZTVyyLSuekLJ10OYpcMh\nUyOHTPvzRZlm6WvmxZiZkbRS0n5JByTdVuX1LklbJR2U9F1Ji5L5fZL+QdKu5LG++aW3ovD1EGbF\nVGvHfyYkzQG+ALyV8okEw5K+HhH7KxZbA/xdRFwm6Qbgs8Cq5LUnI2JpUwttheQTBcyKKe+WzFXA\nwYgYjYhjwFbg2knLXAvck0x/hXIgTWioGWftw3e9NCumXFsywIXAMxXPD1EOnqrLRMRxSS9IWpi8\n1i9pJ/AL4Hej4HfotGz5egiz4sk7ZKq1RCb30E9eRskyzwGLIuJ5SUuBeyW9IiL+PoNyWovw9RBm\nxZJ3yBwCFlU8v4jTL/J8Bngp8KykM4CeiHg+ee0oQETskvQUcDmwq9qG1q1bd2J6YGCAgYGBFIpv\nZtY+BgcHGRwcTPU9cz2FOQmNJyj3szwHDAGrI2JfxTI3A6+KiJslrQKui4hVki6gfELALyVdAjwC\nvDoiXqiyHZ/CbGY2S2mcwpxrSybpY7kFeJDySQgbI2KfpDuB4Yi4H9gI/Imkg8DPOXlm2ZuA/y7p\nGOVbQn+wWsCYmVl+fDGmmZlV1RYXY5qZWftyyJiZWWYcMmZmlhmHjFlOSqUSw8PDlEqlvItilhmH\njFkOtmzZRl/fYlasWEtf32K2bNmWd5HMMuGzy8yazLclsFbhs8vMWpBvS2CdxCFj1mSn3pYAfFsC\na2cOGbMm820JrJO4T8YsJ6VSybclsEJLo0/GIWNmZlW549/MzArNIWNmZplxyJiZWWYcMmZmlhmH\njJmZZcYhY2ZmmXHImJlZZhwyZtY2fPuE4nHImFlb8O0TislX/JtZy/PtE7LhK/7NzEj/9gk+7JYe\nh4yZtbw0b5/gw27p8uEyM2sLW7ZsY82am5k7t49jx0bZuHE9q1ffMKv38GG3U6VxuOzMtApjZpan\n1atv4Jpr3tLQ7RMmDruNj59+2K0TQyYNDhkzaxu9vb0NhcGph93KLRnftbQx7pMxM0v4rqXpc5+M\nmdkkvmtpme+MWSOHjJnZ7Pk6GTMzKzSHjJmZZcYhY2ZmmXHImJlZZhwyZmaWGYeMmZllxiFjZmaZ\nyT1kJK2UtF/SAUm3VXm9S9JWSQclfVfSoorXPp7M3yfpbc0tuZmZzSTXkJE0B/gC8HbglcBqSYsn\nLbYG+LuIuAz4HPDZZN1XAO8FrgD+FbBeUkMXDZmZWbrybslcBRyMiNGIOAZsBa6dtMy1wD3J9FeA\ntyTT7wK2RsQ/RcQIcDB5PzMzK4i8Q+ZC4JmK54eSeVWXiYjjwGFJC6us+9Mq65qZWY7yHuq/2uGt\nyYOMTbVMLeuesG7duhPTAwMDDAwMzFw6M7MOMjg4yODgYKrvmesAmZKuBtZFxMrk+e1ARMRnKpb5\nRrLM9ySdATwXEb8yeVlJ3wTuiIjvVdmOB8g0M5uldhggcxi4VFKfpC5gFXDfpGX+ArgxmX4P8HAy\nfR+wKjn77GLgUmCoCWU2M7Ma5Xq4LCKOS7oFeJBy4G2MiH2S7gSGI+J+YCPwJ5IOAj+nHERExF5J\nXwb2AseAm91cMTMrFt9PxszMqmqHw2VmZtbGHDJmZpYZh4yZmWXGIWNmZplxyJiZWWYcMmZmlhmH\njJmZZcYhY2ZmmXHImJlZZhwyZmaWGYeMmZllxiFjZmaZcciYmVlmHDJmZpYZh4yZmWXGIWNmZplx\nyJiZWWYcMmZmlhmHjJmZZcYhY2ZmmXHImJlZZhwyZmaWGYeMmZllxiFjZmaZcciYmVlmHDJmZpYZ\nh4yZmWXGIWNmZplxyJiZWWYcMmZmlhmHjJmZZcYhY2ZmmXHImJlZZhwyZmaWGYeMmZllJreQkXS+\npAclPSHpAUnzp1juRkkHkuXeXzF/h6T9knZL2iXpguaV3szMapFnS+Z2YHtEvBx4GPj45AUknQ/8\nN2AZ8AbgjklhtDoilkTE0oj422YUuogGBwfzLkJm2rlu4Pq1unavXxryDJlrgXuS6XuA66os83bg\nwYg4HBEvAA8CKyte9+E+2vsPvZ3rBq5fq2v3+qUhz530r0TE3wBExM+A3irLXAg8U/H8p8m8CX+c\nHCr7r9kV08zM6nVmlm8u6SHgn1XOAgKoNRRUZV4k//5GRDwn6Rzga5LeFxF/Wn9pzcwsbYqImZfK\nYsPSPmAgIv5G0j8HdkTEFZOWWZUsszZ5viFZbtuk5W4EXhcRH5piW/lU0sysxUVEtR/7Ncu0JTOD\n+4APAJ8BbgS+XmWZB4DfSzr75wArgNslnQEsiIifS5oLvAN4aKoNNfohmZlZffJsySwEvgy8FHga\neE9EvCDpdcAHI+I/JMt9APgvlA+TfTIi/o+ks4G/ohySZwDbgf8UeVXGzMyqyi1kzMys/bX0KcCS\nViYXZB6QdFuV198oaaekY5Kun/Ta8eTMtN2S7m1eqWtXQ/1ulfQjSY9JekjSSyteq3oRa5E0WL92\n+P4+KGlPUoe/krS44rWPSzooaZ+ktzW35DOrt26S+iT9Q/Ld7ZK0vvmln9lM9atY7tcl/VLS0op5\nhf7uoP761fX9RURLPigH5JNAHzAXeAxYPGmZRcCrgE3A9ZNe+0XedUihfm8Gzkqm1wJbk+nzgaeA\n+cCCiem865RW/dro+zu3YvqdwDeS6VcAuykfDu5P3kd51ymluvUBe/KuQ6P1m6gj8AjwHWBpMu+K\nIn93KdRv1t9fK7dkrgIORsRoRBwDtlK+wPOEiHg6In7IydOeKxX9ZIBa6vdIRBxJnj7KyWuIZrqI\ntQgaqR+0x/f39xVPzwV+mUy/i3Kg/lNEjAAHk/crikbqBm3w3SU+QfnEpX+smHctxf7uoLH6wSy/\nv1YOmckXah7i1J3QTOZJGpL0HUnVPuC8zbZ+a4BvTLHu5ItYi6CR+kGbfH+Sbpb0JPBp4ENTrFu0\n76+RugH0J4exd0j6tWyLWpcZ6yfptcBFEfGXM6xbtO8OGqsfzPL7y/MU5kZNd6FmLRZFxM8kXQw8\nLGlPRPwkpbKloeb6SXof8DrKh5dmtW6OGqkftMn3FxHrgfXJNWG/S/m0/qJ/f43U7TnK393zyXH+\neyW9YlLLJ2/T1k+SgD+kfOnFrNYtiHrqN7HOrL+/Vm7JHKLc5zLhIuDZWleO8lA2JDumQWBJmoVL\nQU31k3QN5cFF35k0fWteN2eN1K9tvr8K2zg5ft8hyqf217pus9Vdt4g4GhHPJ9O7KPcXXp5ROes1\nU/3OA14JDEr6CXA1cF+y022H/3vV6vd1SUvr+v7y7oRqoPPqDE52XnVR7ry6Yopl7wbeXfF8AdCV\nTF8APEGVjq+i14/yjvVJ4GWT5ld2/E9ML8i7TinWr12+v0srpt8JDCXTEx3/XcDFFKzzuMG6XQDM\nSaYvoXzYpuX+NictvwNY0grfXQr1m/X3l3uFG/ywViY7mIPA7cm8O4F3JNOvTz6EF4ES8INk/q8C\ne5I/hseBD+Rdlzrr9xDl5uuupC73Vqz7gWS9A8D7865LmvVro+/vc8APk/p9q/I/OuXW25PAPuBt\nedclrboB1yfzdwPfB/513nWpp36Tln2Y5OyrVvjuGqlfPd+fL8Y0M7PMtHKfjJmZFZxDxszMMuOQ\nMTOzzDhkzMwsMw4ZMzPLjEPGzMwy45Axa5Jk+Pv3JdM3qnzb8YnXvlg51P8U6387+bdP0upsS2uW\nDl8nY5YDSTuAj0XEzjrWHQA+GhHvTL1gZilzS8asBknrYZ+kTZIel/RlSWdJemty86bHJf1vSXOT\n5T9dccO1zybz7pD0UUnvpjwaxZ8m656VjGi7VNJaSZ+p2O6Nkj6fTL+YzP4U8GvJuh9Jbgp2ZcU6\n35b0qmZ9NmbTcciY1e7lwIaIeA3wC+CjlMfFe08yby7wHyWdD1wXEa+MiNcCn6x4j4iIr1IekuM3\nImJpnLxnDsBXKA/dMeEGyvf7gJMj5d4O/HWy7ueALwH/DkDSZZTHdfthetU2q59Dxqx2T0fEo8n0\nnwFvBX4cEU8l8+4B3kQ5gMYlfUnSvwXGp3i/04Zcj4i/BZ6SdJWkhcDlEfHdGcr1FeDfSDoDuIny\nnWDNCsEhY5ayiDhO+e6DX6U8xP03Z/kWX6bcgnk38H9r2N445cFErwPeA2ye5fbMMuOQMavdIklv\nSKZXU96x90u6JJn3m8Ajks6mPPz5N4FbgddUea8XgZ4ptvM1yoGxivK9WCZMtHxepHzPj0obgf9J\neUj9F2qvklm2HDJmtdsH3Cjpccr36flDyn0hX0nmHQc2UA6P+5N5O4CPVHmvTcCGiY5/Ku5MmITE\nXsp3IPx+xToTy+wBjkvaLenDyTq7KB+muzutypqlwacwm9VAUh9wf0S8Ou+yVCPpJcDDETHttTZm\nzeaWjFntCvmLTNJvAt8F/nPeZTGbzC0ZMzPLjFsyZmaWGYeMmZllxiFjZmaZcciYmVlmHDJmZpYZ\nh4yZmWXm/wPKLYAzDTsCCwAAAABJRU5ErkJggg==\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"speeches_df.plot(x='positivity', y='anger', kind='scatter')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Reading in the SOTU addresses\n",
"\n",
"Pretty much the same thing as what we did with Trump!"
]
},
{
"cell_type": "code",
"execution_count": 76,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" content | \n",
" filename | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Gentlemen of the Congress:\\n\\nIn pursuance of ... | \n",
" SOTU/1913.txt | \n",
"
\n",
" \n",
" 1 | \n",
" GENTLEMEN OF THE CONGRESS:\\n\\nThe session upon... | \n",
" SOTU/1914.txt | \n",
"
\n",
" \n",
" 2 | \n",
" GENTLEMEN OF THE CONGRESS:\\n\\nSince I last had... | \n",
" SOTU/1915.txt | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" content filename\n",
"0 Gentlemen of the Congress:\\n\\nIn pursuance of ... SOTU/1913.txt\n",
"1 GENTLEMEN OF THE CONGRESS:\\n\\nThe session upon... SOTU/1914.txt\n",
"2 GENTLEMEN OF THE CONGRESS:\\n\\nSince I last had... SOTU/1915.txt"
]
},
"execution_count": 76,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Get the filenames\n",
"# Read them in\n",
"# Create a dataframe from the results\n",
"filenames = glob.glob(\"SOTU/*.txt\")\n",
"contents = [open(filename).read() for filename in filenames]\n",
"sotu_df = pd.DataFrame({\n",
" 'content': contents,\n",
" 'filename': filenames\n",
"})\n",
"sotu_df.head(3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Add a column for the name \n",
"\n",
"We don't have a name for these, so we'll just use the filename."
]
},
{
"cell_type": "code",
"execution_count": 77,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" content | \n",
" filename | \n",
" name | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Gentlemen of the Congress:\\n\\nIn pursuance of ... | \n",
" SOTU/1913.txt | \n",
" SOTU/1913.txt | \n",
"
\n",
" \n",
" 1 | \n",
" GENTLEMEN OF THE CONGRESS:\\n\\nThe session upon... | \n",
" SOTU/1914.txt | \n",
" SOTU/1914.txt | \n",
"
\n",
" \n",
" 2 | \n",
" GENTLEMEN OF THE CONGRESS:\\n\\nSince I last had... | \n",
" SOTU/1915.txt | \n",
" SOTU/1915.txt | \n",
"
\n",
" \n",
" 3 | \n",
" GENTLEMEN OF THE CONGRESS:\\n\\nIn fulfilling at... | \n",
" SOTU/1916.txt | \n",
" SOTU/1916.txt | \n",
"
\n",
" \n",
" 4 | \n",
" Gentlemen of the Congress:\\n\\nEight months hav... | \n",
" SOTU/1917.txt | \n",
" SOTU/1917.txt | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" content filename \\\n",
"0 Gentlemen of the Congress:\\n\\nIn pursuance of ... SOTU/1913.txt \n",
"1 GENTLEMEN OF THE CONGRESS:\\n\\nThe session upon... SOTU/1914.txt \n",
"2 GENTLEMEN OF THE CONGRESS:\\n\\nSince I last had... SOTU/1915.txt \n",
"3 GENTLEMEN OF THE CONGRESS:\\n\\nIn fulfilling at... SOTU/1916.txt \n",
"4 Gentlemen of the Congress:\\n\\nEight months hav... SOTU/1917.txt \n",
"\n",
" name \n",
"0 SOTU/1913.txt \n",
"1 SOTU/1914.txt \n",
"2 SOTU/1915.txt \n",
"3 SOTU/1916.txt \n",
"4 SOTU/1917.txt "
]
},
"execution_count": 77,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sotu_df['name'] = sotu_df['filename']\n",
"sotu_df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# How do State of the Unions sound?\n",
"\n",
"Let's analyze by counting words."
]
},
{
"cell_type": "code",
"execution_count": 83,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" aback | \n",
" abacus | \n",
" abandon | \n",
" abandoned | \n",
" abandonment | \n",
" abate | \n",
" abatement | \n",
" abba | \n",
" abbot | \n",
" abbreviate | \n",
" ... | \n",
" zephyr | \n",
" zeppelin | \n",
" zest | \n",
" zip | \n",
" zodiac | \n",
" zone | \n",
" zoo | \n",
" zoological | \n",
" zoology | \n",
" zoom | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" ... | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
"
\n",
" \n",
" 1 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" ... | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.000923 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
"
\n",
" \n",
" 2 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" ... | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
"
\n",
" \n",
" 3 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" ... | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
"
\n",
" \n",
" 4 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" ... | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
"
\n",
" \n",
"
\n",
"
5 rows × 14182 columns
\n",
"
"
],
"text/plain": [
" aback abacus abandon abandoned abandonment abate abatement abba \\\n",
"0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n",
"1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n",
"2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n",
"3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n",
"4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n",
"\n",
" abbot abbreviate ... zephyr zeppelin zest zip zodiac zone \\\n",
"0 0.0 0.0 ... 0.0 0.0 0.000000 0.0 0.0 0.0 \n",
"1 0.0 0.0 ... 0.0 0.0 0.000923 0.0 0.0 0.0 \n",
"2 0.0 0.0 ... 0.0 0.0 0.000000 0.0 0.0 0.0 \n",
"3 0.0 0.0 ... 0.0 0.0 0.000000 0.0 0.0 0.0 \n",
"4 0.0 0.0 ... 0.0 0.0 0.000000 0.0 0.0 0.0 \n",
"\n",
" zoo zoological zoology zoom \n",
"0 0.0 0.0 0.0 0.0 \n",
"1 0.0 0.0 0.0 0.0 \n",
"2 0.0 0.0 0.0 0.0 \n",
"3 0.0 0.0 0.0 0.0 \n",
"4 0.0 0.0 0.0 0.0 \n",
"\n",
"[5 rows x 14182 columns]"
]
},
"execution_count": 83,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from sklearn.feature_extraction.text import TfidfVectorizer\n",
"\n",
"# I only want you to look for words in the emotional lexicon\n",
"# because we don't know what's up with the other words\n",
"vec = TfidfVectorizer(vocabulary=emolex_df.word,\n",
" use_idf=False, \n",
" norm='l1') # ELL - ONE\n",
"matrix = vec.fit_transform(sotu_df['content'])\n",
"vocab = vec.get_feature_names()\n",
"sotu_wordcount_df = pd.DataFrame(matrix.toarray(), columns=vocab)\n",
"sotu_wordcount_df.head()"
]
},
{
"cell_type": "code",
"execution_count": 84,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" content | \n",
" filename | \n",
" name | \n",
" positivity | \n",
" anger | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Gentlemen of the Congress:\\n\\nIn pursuance of ... | \n",
" SOTU/1913.txt | \n",
" SOTU/1913.txt | \n",
" 0.272425 | \n",
" 0.019934 | \n",
"
\n",
" \n",
" 1 | \n",
" GENTLEMEN OF THE CONGRESS:\\n\\nThe session upon... | \n",
" SOTU/1914.txt | \n",
" SOTU/1914.txt | \n",
" 0.277932 | \n",
" 0.042475 | \n",
"
\n",
" \n",
" 2 | \n",
" GENTLEMEN OF THE CONGRESS:\\n\\nSince I last had... | \n",
" SOTU/1915.txt | \n",
" SOTU/1915.txt | \n",
" 0.267757 | \n",
" 0.051099 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" content filename \\\n",
"0 Gentlemen of the Congress:\\n\\nIn pursuance of ... SOTU/1913.txt \n",
"1 GENTLEMEN OF THE CONGRESS:\\n\\nThe session upon... SOTU/1914.txt \n",
"2 GENTLEMEN OF THE CONGRESS:\\n\\nSince I last had... SOTU/1915.txt \n",
"\n",
" name positivity anger \n",
"0 SOTU/1913.txt 0.272425 0.019934 \n",
"1 SOTU/1914.txt 0.277932 0.042475 \n",
"2 SOTU/1915.txt 0.267757 0.051099 "
]
},
"execution_count": 84,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Get your list of positive words\n",
"positive_words = emolex_df[emolex_df.positive == 1]['word']\n",
"\n",
"# Only give me the columns of angry words\n",
"sotu_df['positivity'] = sotu_wordcount_df[positive_words].sum(axis=1)\n",
"sotu_df.head(3)"
]
},
{
"cell_type": "code",
"execution_count": 85,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" content | \n",
" filename | \n",
" name | \n",
" positivity | \n",
" anger | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Gentlemen of the Congress:\\n\\nIn pursuance of ... | \n",
" SOTU/1913.txt | \n",
" SOTU/1913.txt | \n",
" 0.272425 | \n",
" 0.019934 | \n",
"
\n",
" \n",
" 1 | \n",
" GENTLEMEN OF THE CONGRESS:\\n\\nThe session upon... | \n",
" SOTU/1914.txt | \n",
" SOTU/1914.txt | \n",
" 0.277932 | \n",
" 0.042475 | \n",
"
\n",
" \n",
" 2 | \n",
" GENTLEMEN OF THE CONGRESS:\\n\\nSince I last had... | \n",
" SOTU/1915.txt | \n",
" SOTU/1915.txt | \n",
" 0.267757 | \n",
" 0.051099 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" content filename \\\n",
"0 Gentlemen of the Congress:\\n\\nIn pursuance of ... SOTU/1913.txt \n",
"1 GENTLEMEN OF THE CONGRESS:\\n\\nThe session upon... SOTU/1914.txt \n",
"2 GENTLEMEN OF THE CONGRESS:\\n\\nSince I last had... SOTU/1915.txt \n",
"\n",
" name positivity anger \n",
"0 SOTU/1913.txt 0.272425 0.019934 \n",
"1 SOTU/1914.txt 0.277932 0.042475 \n",
"2 SOTU/1915.txt 0.267757 0.051099 "
]
},
"execution_count": 85,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Get your list of positive words\n",
"angry_words = emolex_df[emolex_df.anger == 1]['word']\n",
"\n",
"# Only give me the columns of angry words\n",
"sotu_df['anger'] = sotu_wordcount_df[angry_words].sum(axis=1)\n",
"sotu_df.head(3)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Comparing SOTU vs Trump"
]
},
{
"cell_type": "code",
"execution_count": 88,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 88,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAZkAAAEPCAYAAACQmrmQAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3X+QHGd95/H319bOzki7q5WKTa4w9oyxwHKwhSUi8F1B\nkIyV+DjAHITYyhFM2LvEUQkIF6owXC62S+cKhksRKE6RxW0Qd7FWIiQBQgGLHa9IOH5IyMIyWQvb\nhF1sINkJB8KQ9Wkx3/tjelajUc9uz0z3dM/M51U15dme7unnmZGf7zzP9+mnzd0RERFJwgVpF0BE\nRHqXgoyIiCRGQUZERBKjICMiIolRkBERkcQoyIiISGJSDzJmdr2ZnTKzR8zsHSGvv83M/t7MvmZm\n95rZxTWv3Rwc9w0ze0NnSy4iIiuxNK+TMbMLgEeAlwHfBY4BN7n7qZp9Xgp8xd2fMrNbgG3ufpOZ\nrQO+CmwBDDgObHH3052uh4iIhEu7J/NC4FF3n3P3ReAQcEPtDu7+eXd/Kvjzy8BFwfNfAT7n7qfd\n/YfA54DrO1RuERGJIO0gcxHweM3fT3A2iIQZBz7T4NjvrHCsiIh02KqUz28h20LH78zs9cALgJc2\ne6yIiKQj7SDzBHBJzd/PopKbOYeZXQe8E/ilYFiteuy2umOnw05iZgo+IiItcPewH/SRpT1cdgzY\nYGZFM8sBNwGfrN3BzDYD+4BXufv3a16aAnaY2dpgEsCOYFsod+/Zx2233ZZ6GVQ31U/1671HHFLt\nybj702a2m0rS/gJgwt0fNrM7gGPu/ingPcAa4M/NzIA5d3+1u//AzPZQmWHmwB1emQAgIiIZkfZw\nGe7+WeDyum231TzfscyxB4ADSZVNRETak/ZwmcRg27ZtaRchMb1cN1D9ul2v1y8OqV6M2Slm5v1Q\nTxGROJkZ3uWJfxER6WEKMiIikhgFGRERSYyCjIiIJEZBRkREEqMgIyIiiVGQERGRxCjIiIhIYhRk\nREQkMQoyIiKSGAUZERFJjIKMiIgkRkFGREQSoyDTJ8rlMseOHaNcLqddFBHpIwoyfWBy8jDF4kZ2\n7LiFYnEjk5OH0y6SiPQJ3U+mx5XLZYrFjSwsTAObgJMUCtuZmzvF2NhY2sUTkQzT/WRkRbOzs+Ry\nJSoBBmATAwNFZmdn0yuUiPQNBZkeVyqVOHNmFjgZbDnJ4uIcpVIpvUKJSN9QkOlxY2NjTEzspVDY\nzsjIFgqF7UxM7NVQmYh0hHIyfaJcLjM7O0upVFKAEZFI4sjJKMiIiEgoJf5FRCTTFGRERCQxCjIi\nIpIYBRkREUmMgoyIiCRGQUZERBKjICMiIolRkBERkcQoyIiISGIUZEREJDEKMiIikhgFGRERSYyC\njIiIJCb1IGNm15vZKTN7xMzeEfL6S8zsuJktmtlr6l572sweMLMTZvbxzpVaRESiWJXmyc3sAuCD\nwMuA7wLHzOwT7n6qZrc54Gbg7SFv8RN335J8SUVEpBWpBhnghcCj7j4HYGaHgBuApSDj7t8OXgu7\nIUxb9zkQEZFkpT1cdhHweM3fTwTboho0s6Nm9kUzuyHeoomISLvS7smE9USauYXlJe7+j2Z2KXC/\nmZ1092+F7Xj77bcvPd+2bRvbtm1rppwiIj3vyJEjHDlyJNb3TPX2y2Z2DXC7u18f/H0r4O5+V8i+\nHwb+2t3/ssF7NXxdt18WEWleL9x++RiwwcyKZpYDbgI+ucz+S5U1s9HgGMzsGcC/AWaSLKyIiDQn\n1SDj7k8Du4HPAX8PHHL3h83sDjN7BYCZ/aKZPQ78KrDPzB4KDr8C+KqZnQD+BvjDullp0kXK5TLH\njh2jXC6nXRQRiVGqw2WdouGybJucPMz4+C5yuRJnzswyMbGXnTtvTLtYIn0vjuEyBRlJVblcpljc\nyMLCNLAJOEmhsJ25uVOMjY2lXTyRvtYLORnpc7Ozs+RyJSoBBmATAwNFZmdn0yuUiMRGQUZSVSpV\nhsjgZLDlJIuLc5RKpfQKJSKxUZDpE1lNrI+NjTExsZdCYTsjI1soFLYzMbFXQ2UiPUI5mT7QDYn1\ncrnM7OwspVJJAUYkI5T4j6ifg4wS6yLSKiX+ZUVKrItImhRketxyifWs5mlEpHcoyHRIWg16o8T6\nfffdT7G4kR07bqFY3Mjk5OGOlqtTFEhF0qWcTAdkIfFem1gH+iJPk4XPXaSbKfEfUZpBJouJ92PH\njrFjxy2cPn18advIyBbuu+9utm7dmkqZ4pbFz12k2yjx3wWymHjvhwsgs/i5i/QjBZmEZbFB74cL\nILP4uYv0Iw2XdUA1NzAwUGRxcS4zuYFevwAyq5+7SLdQTiaitIMM9H6DnlX63EVapyATURaCjIhI\nt1HiX0REMk1BRkREEqMg0wN0VbuIZJWCTJebnDwcy/IwClQikgQl/rtYXFe1a/mVlWmWmvQjJf77\nXBxXtZfLZcbHd7GwMM3p08dZWJhmfHxXx3s0We5JHZ6cZGOxyC07drCxWOTw5GTaRRLpGgoyXSyO\nq9qzsPxKXEN+SSiXy+waH2d6YYHjp08zvbDArvHxTAZDkSxSkOlicSwPk/byK1npSTUyOztLKZer\nCcFQHBjQGmgiEa1KuwDSnp07b+S6665tOV9QDVTj49vPWX6lU3mHak9qYeH8nlQWch+lUonZM2c4\nSTXrBXOLi1oDTSQiJf4FiJ7YjjsBnsUl+evreHhykl3j4xQHBphbXGTvxAQ37tyZStlEOimOxD/u\n3vOPSjWl3vz8vB89etTn5+cj7X/w4CEvFNb72rVbvFBY7wcPHorlvNX3HRnZ3Nb7xqFRHZv9rER6\nQdB2ttf+tvsG3fBQkDlfswFjfn7eC4X1Dg86uMODXiisb7rRzXIjHlcdRXpFHEFGif8+1EqyPenp\n0mNjY2zdujXVPEwWZtqJ9BoFmT7USmPaK9Oll5P2TLskZPn6I+kPCjJ9qJXGdKXp0lEas1KpxMLC\nN88571NP/UNmGvE07hiaZBDQRaSSCe2Ot3XDg4zmZNLMQ7SabA8rc9T8zvz8vA8MDDmsc9jssM4H\nBoYyl/Po1PcS10SKMPPz876+UPAHK8klfxB8faGQuc9asg0l/rs3yCTZwEQVR2PaTLL86NGjvnbt\nFod5h6MO8z4ystmPHj3aTjUyMWmgWUlPMjh69KhvWbvWgzd3B988MtL2Zy39JY4go+GyFGTlKvc4\nku3N5FnODtN9D9gKfK/tnEeWl6RZTtL5qdqLSEEXkUp6FGRSkPUEeDOaye/EnfPISrBuRdKTDMbG\nxtg7McH2QoEtIyNsLxTYOzGRiVUUpM+02xVq9wFcD5wCHgHeEfL6S4DjwCLwmrrXbg6O+wbwhmXO\nEUfPMTbdej1Go2GpZvM7cQ1vnR1+OzsqFMfwW6d04iLUbhxKlOyg23MyVHpSjwFFYAD4GrCxbp9L\ngCuBA7VBBlgHfBNYC4xWnzc4T1yfeWyydJV7mEZX5TfKIaXRmHVrsK6lICBZ1gtB5hrgMzV/3xrW\nmwle+3BdkLkJ+JOav/8EuLHBse1/2gmIq4GJu6GqDyj79u3PbGOe9WAt0s3iCDJpr8J8EfB4zd9P\nAC9s8djvBNu6xtjYWNtj5HHf1bI2z1FZGfkkb33rS8jlLiMsh5T2GH+7q1D3At21U7Is7cR/2Oqe\nUZdLbufYnpBE4jt8UsIlnDnzLcKS1Fm4ojwLS9KkpVtn10n/SLsn8wSVnEvVs4DvNnHstrpjpxvt\nfPvtty8937ZtG9u2bWu0a9dI4l4s5856qvRknn76u7z//e/hbW87954z9913f6y9qHr6hd5YuVzm\nxIkTvOlNt/DUU59f6nWOj2/nuuuu1eclLTly5AhHjhyJ903bHW9r5wFcyNnEf45K4v+KBvt+GHht\nzd+1if/q89EGx8YzQJkxSSW+G+U5anM/SSfdm1lFoN8S59XPZs2a5zusdjjUlbPrJPvo9sR/pQ5c\nT2UK8qPArcG2O4BXBM9/kUru5UmgDDxUc+wbg+MeoQumMMd1hX3Ue7HMzMz4gQMHfGZmpqXzTE1N\n+dTU1ApX73vsDVyUADY/P+937tnj6wsF37J2ra8vFPzQwYNtnzvrwj6byjI985makCG9oSeCTCce\nWQgycSwjU/8ee/bcudSrqA9eu3e/1aHg8FyHgu/e/ZZYy9tqTyZKoF0pgB08eMjz+VEvYH23NlfY\nZwOX+Zo1z9XsOomdgkyXBJk4hpZmZmZ8cHC07hfsas/nR89rWGZmZoIAU7tvIXKPJmp5m50+3MwQ\nWKPzn33tHr+M/lubq9Fn06jHKdIOBZkuCTLtDi0dPHjIBwdHgl5Jbbu62eGe8wLAgQMHQvZ9jh84\ncGDFc83Pz/uBAwd8ePiqSOWNOgTYbKBtFMBqF9nM0/2rDLcyhKprg6RTFGS6JMi005M5e+y0Q/1Y\n/HoPW8m4UU/m8OHDy56z2ngND28Ojr8rtqR+K4E2rAE+97M86HnyflkQYLotJ3Po4MGWc0r9OOFB\nOk9BpkuCjHvrvz7PbZwPBYFlg8Pa4O/wALB791uCQPEch4KbDS47TBWeUC740NCVsfxajnM22r59\n+31wcNSHh6/2fH50KTfVTXS/F+kGCjJdFGTcW/v1eX7jPO2rVq3xwcGRFQPWe9/7R8FNwi4NZiA1\nDkphPY3h4av9wIEDsU9LjhJoV1qMc3j4Kh8cHPF9+/bHUrZO0/1epBt0JMhQubL+4nZPlOYjK0Gm\nVWGN80oBK7xnEj681mj/JKbDRgm0jSYI9MKCmFVJ9mQ0lCZx6VhPBjje7onSfHR7kHFvvuEIn+oa\nPlGgKs6EcqsN3XKBpNuX9q9XzclsHhlpO6dU/Xz279vXd9cOSXI6GWT+B7C13ZOl9eiFINOs8J5M\n+JTn+uPa/RXczjVBywWSqD2ZTv6Sb/dccZR1KVgND3sB/C7leSQmnQwyM8DTwdItJ4GHgJPtnrxT\nj34MMu7n90w6kSBvd0hrpeNX6m21M2OrWbXnGs3n/c49ezreoIcOu4HPK88jMehkkCmGPdo9eace\n/Rpk3Ds/Ph/HkNZKgaRRnTo5Y6v2XIfA14FvSGEqddgEgk3gR9WTkRh0dHYZ8GLgN4PnY8Cl7Z68\nU49+DjKdFldyvpXgGNeMrahL32xZu9bng55DWlORwwJrAfzKoSHlZKRtnezJ3Ab8NfBI8Pczgf/T\n7sk79VCQ6awkr0hfLgDE0ZOJOtxWPdc94FvOnV2xYmCLu3dZP4Fg/759ml0msehkkPlaMJX5RM02\n5WSkoSQS4mETCur3a2fGVrNB6tDBgz6az/vqJnoySeWMNG1ZktDJIHM0+O8DwX/XKMhkWzc3OmEN\n8fz8vOfzow5/4jDlMO253FrP50dDr6dppe6tDLfNz5+95cBKgS0LV/l3878L6bxOBpm3A3cD/wD8\nJ+BLwJvbPXmnHv0WZOK4rUBaGjXEt976LodBr9ykq7qszs873BPbhZntBIFm8jjt5oxa1cmZd9Ib\nOp343wG8F/jvwI52T9zJRz8FmW6/Kj6sIb56eNhzuSGvLI1z7tpqMNP0LLZGC2/WXswYxwWSYedN\nqyeThV6UdJ84gswqInL3e4F7o+4v6ZidnSWXKwX3fAfYxMBAkdnZ2a6473upVGL2zBlOApW71sPs\nmTOYPZPKXbbP1qsy/+QEcAVwksXFOUql0rLvPzl5mPHxXeRyJc6cmWViYi8X8DN2jY9TyuWYPXOG\nd7/vfVy9ZQulUinWz2xsbIy9ExNsHx+nODDA3OIieycmOvK9zM7OUsrl2LSwAFQ+veLAQNf8u5Au\nFiUSUbn18Y/qHo8DfwU8u91Il/SDLurJxJEw7+aejPu5yfu1uZwPrCoEvZZzezIDAyOez49GnsUW\n9tnk86OJ/MJfaRZcp/Mi6slIK+hgTuYO4LeBYWAE+C3gD4AbgSPtFiLpR7cEmbhyKb1wU6v5+Xmf\nmpoKkv0PemUF6TVBTuYyz+XWRlootFbYhaJr1jzXn79mjddujJInWe687eY+kgpCca6VJv2hk0Hm\nKyHbvhz898F2C5H0oxuCTNw9kF6YRXR+UJj31auf7R/4wAdaqldcPZnlgki7PYakk/O98O9COqeT\nQeZLwK8BFwSPX6sJMl9rtxBJP7ohyPTaCsNxSGLoL6yX18wv/JWCSDszyDSkJVkTR5CJmvj/D8D7\ngb2AA18GXm9mBWB39AyQNFIqVRLR1KS8oySye9nY2BgTE3sZH9/OwECRxcU5Jib2tpWo3rnzRq67\n7tpKIrwmsX/tddedty3MSgn0sIkLc4uLkb5HJeelJ7UbpbrhQRf0ZNx7I5eShCwN8azU26henDma\nz5/TM4pSB/VkJGvo4HDZGPAuYD/wp9VHuyfv1KNbgox7thpUCddoeK0+n1Jd+j9KnqUT1+kkSf9u\ne1McQcYq77M8M/si8HfAcSr3lan2gv4i1m5VQszMo9RTule5XI403JXU+crlMhuLRaYXFpaGybYX\nCnzh+HFe/IIXnLf91NzcUjkPT0525DqdpNSXf+/EBDfu3Jl2sSQGZoa7W1tvEiUS0QXJ/RXK30oQ\nly6RhWV0GiX8Dxw44JuHh8/ZvmnNGp+amnL37h8i65byq6fVGmLoyVwQMRZ9ysxe3lY0E0lAuVxm\nfHwXCwvTnD59nIWFacbHd1EulztajtqEP5xN+H+/XObUk0+es/2xn/yEG2+4gcOTk2eT/cHrtcn+\nbtAN5Z+cPEyxuJEdO26hWNzI5OThtIvUX6JEIipX/P8MWKBytf+TwI/ajXCdeqCeTGZUf1HOzMzE\n8ssy6tTvTvySDbuvy/pCwe+icmOzy6jcQfNQzS/+mZmZrugJNJL1nkwvrICRJjq8QOZ64EXAS6uP\ndk/eqYeCTDZUh7XWFEpeAL+qUIh0XcpywSFKI9LJ1Ydry1s7hDYFfjn4fMj1M4cOHvS1uZxfBr4a\nfGhgoGsS/u7ZXklA15+1p2NBBviPwEPAD4DpoEfzN+2evFMPBZn0nQ0G054n2i/fqLmW5aZ+Z2Hl\n4+kgyKwl/OZm8/PzPprP+z1BEMpabyCKrOY81JNpTyeDzENAnmACALARONzuyTv1UJBJ39lflEf9\nMla+Ir7ZxqFRI5f2PVzeunu358EvAc+Dr77wwvN+8addxl6n689aF0eQiXrF/1Pu/pSZYWaD7n7K\nzC5vLQsk/ejsigY/4TusfEV8s7csGBsbC93ezhX40N7U6HK5zJ/efTcF4BlUEpmLF1zAu//8z9m8\nefPS+7VbRlleo1UepDOizi57wsxGgY8D95rZJ4DvJlcs6TXVJWIKhddi+Z/jGuCKfJ6XDA7y7ve9\nb+l//HK5zLFjxxgaGqpZZgdaXWZn6R4uhQJbRkbYXihEvofLh+6+mw0XX8zN27ez4eKL+dDddzd1\n7hMnTvD04iJHqFxgdgT42eLiUrniKKNEMzY2xtatW/WZpqHZrg+VpP+rgFy73ahOPdBwWWZUh7X+\n6L3v9dHBQd88PLw0dFSfoH/z7jc3PczRaNhsubthhg3B7d+3zwvgJfAC+POC/+7fty9yXaempnxD\nbcY5mGFWvUYmatmTkNUcimQLnZxd1s0PBZlsaZSMH83nz9vWzFTnZi7KXGm5/tHBQZ8Oph7Xlml0\ncDBSAKtuX5vLLR0/Db5m1SqfmZlp8hOLVydn20l3U5BRkMm0+sa3+vfU1NR5ie5Na9b45S3cPKz2\nXFEnCkRZrn/z8LAfBd9S1xO5enj4nDKt1GBXXy/mcpGnbTfzmbZyfLOz7dTr6V89EWSA64FTwCPA\nO0JezwGHgEep3NfmkmB7EfgX4IHgsXeZc8TziUtk9Y3vW3fvXvp7NJ8/5xf+cj2ZqI1fM9dDrDSb\nq3bqcX1Ppn7F5SgN9szMjI8ODjacvjw1NeVTU1MrNuJx9ECancmmXk9/6/ogQ2XiwWNBwBgAvgZs\nrNvnd6oBhMrtng/52SBzMuJ54vi8JaL6xnc6yGfUNrJDAwPnXcDXzEV99Y3f/n37VuzJVIPRF77w\nhYaNfv37lwYHKzmZfP68MkVtsBvtd+eePb42l/MNrHwRZlzX+zTzPr1w/Y60pxeCzDXAZ2r+vrW+\nNwN8FnhR8PxCoOxng8xDEc/T/qctkdU3qkfBnxuSAH/Xrbc2lYyv3Sesody3b3/DiQLVoPHsQsEL\n4JcMDHgB/Mplhq9WWgKnUTnq9w/bb10+7yN1vbl14KP5fGivpvqZzgef53yTw4m1ogbzO/fs8dXB\nkOF6Ksvh6Pqd/tILQea1wP6av18PfKBun4eAZ9b8/SiVJW6KVC49OE5lFYIXL3OeWD5wOWu5YBCl\nJ1NtUKP+Km60XEt9D6LRLLKw4a9p8JHBQZ+ZmWlq2Kr2HPUNdu2wYNi9ZjaPjPjaXM4LF154XuDd\nDH4R+OVr1oRORhgeGPB1QaO/Luj5tJObWW4WXtiaas1+Z9L9eiHI/GpIkHl/3T5frwsyjwHrglzN\numDbFuDbwFCD88TygUtFlHH6+sb3hle9ygfBr2zhV3HY0FgzQ0fVoBSWyK8OWw0PDPhq8A3ga3O5\npTrVN8ZhNyar9lxWWuyyGshG8/nQfM86KkvPhA1N1c9UezAoZ5wNfm3dRgYH/apC4ZzP6jLwO/fs\nie18kn29EGSuAT5b83fYcNln6obL5hu81zSwpcFrftttty09pqen2/7w+1WzY/q1d3vcEDSi+5sY\n3290vmbuILlcIn80n/eRwUEfDno2YedYLritDt7j0MGDkXI0tfscCspTzckMBtvCjm1n6ZlWhiDD\nep/Kx/S+6enpc9rKXggyF9Yk/nNB4v+Kun121ST+b6pJ/D8DuCB4/mzgcWC0wXli+gqk2cYuLEjU\nNsztnK+ZqbVLifx8/pxczE2ve52vppIzWl/TyF+1erWP1E0OGAkuHq0f4rqH6Mv2hzXmI4OD/qEP\nfWhpdt188J61Q1OtJv5Xuh5ouSHIUj5fuWA2g6srS2d0fZCp1IHrgW8EuZZbg213AK8Ing8CHw1e\n/zJQCra/JhhKOwF8FXj5MueI6zPve802dmGNV/2dIVdayj9Kcj1q2WsT+aFBIWj4RwcH/aq6gHLl\n0ND5s9I4Nwm/f98+Hx0c9KtrVjKo1yjxfujgwYbDdmHH7d+3b+kzWC4XFfZdRR2CjOu+P9KdeiLI\ndOKhIBOvZqYaN9vQhTVoUZLrrVwwGBYAN1C5Kr+67E2jYbqwG5BVX7tqeNhHBgeXXYKm2aBQf1zt\nUN7wwICvzeXO66006gVOTU21PQQp/UFBRkEmNa0MVdU2XvUN6l1BDmBzgx5ANWl++PDh8xrItbmc\nj+bzTV8wGNaojw4O+h+9972+vlDwS4PpzvXTnOfn5/3OPXt8NJ8/7y6Y7eQwog5F1pZ7Pgh2jS70\nDCtT2IoLrQxBSu9TkFGQ6Rr1jVdtgzrP8lfWu58NVJevWRO66OQ9ERr32mnK1WGg+l/v9cGimjMJ\nW28s6rTqKJ9HdVuUQFV7rkYz5qrnjRLgldSXRhRkFGS6Vm1DdxT8+cs0lCv9cl9N+G2Na9XnOwrg\nlwY9lNphuqmpKb98zZoV32+5+izXcFd7QY2S8VGGIqP2ZGr3rw9oWb5lsmSHgoyCTFerNnRXDg0t\nO122vpdwKAgsm4KLFocGBlZsZEfz+fMb46CnUp8j2sD5+ZZWhwXrXx/N5311XTnqr/KPMmRVe66h\nICezUsCof18NjclKFGQUZLpefSI7rKEMzZ3UNMwrNe5Hjx71y9esOW9YaVPQi2qUDG9mqnV9fRot\nQXNP3fBWNWA+b3Cw6R5F7blWChha6FJaoSCjINNTlmsoVwokyx0bpSfT6PYDjW4w1qzatceq+aco\nQ11xUA5GWqUgoyDTV9oZ3jl08KAPBTmZy4KczMWDg0tTjZNuiGvf/1AQXC4KyhJ1skCrwiYlXDk0\n5AcOHFCgkWXFEWSs8j69zcy8H+opyyuXy5w4cQKAr588yZ7f/30uzeWY++lP2TsxAcCu8XGKAwPM\nLS6yd2KCG3fujO38hycn2TU+ziWrVnHqySd5ijeRZ5KvsMAm4CSwvVDg1NxcrPeiL5fLbCwWmV6o\nnOc9wO3AxuHhpbrHWc9GZZidnaVUKsVaN0mWmeHu1tabtBuluuGBejJSY7leS9LJ8Or7V29LUMgX\nV7zlQByiTrJI8tzKB3Uf1JOJRj2ZZGXpV2qUshw7doxbduzg+OnTS9u2jIxw9333sXXr1tjOE/U9\nhoaG+PGPf5z451cul/n0pz/N+9/8Zh548sml7c3Wvdlz1vaikuqtSTLUk1FPJnVZ+pUatSzt5l+y\nVOdmdXoSQDurR0v6UOJfQSZNWZq11GxZWr0YMUt1blUnL8Tshc+rn8URZFa13Z+SvjU7O0spl2PT\nwgIAm4DiwACzs7MdHwpptiw37tzJtddd19SQV3W4qbhqFZuCbWnWuVWt1L1VY2Nj7J2YYHvdhIpu\n+aykfcrJSMuyNN6edFmqM8MuWrWKx558ki9D6nXuJlnK20l0ceRk1JORlmXpV2qSZSmXy+waHz9n\nCvA1wOXDw3w7mAKshnN5Y2Nj+oz6lHoy0rYs/UpNoixhs9GuGhri7R/8IC9/+cs7UucsfcbSP+Lo\nySjIiKwg7WHB6lBdKZdj9syZjlw8KQIKMpEpyEi7qg19UqsBNJJ2gJP+ppyMSId0ckZWrSzN4OsG\nGlbMngvSLoBImsrlMseOHaNcLq+479jYGFu3bu1o41UqlZg9c4aTwd8ngbnFRUqlUsfK0C0mJw9T\nLG5kx45bKBY3Mjl5OO0iCQoy0scOT06ysVjklh072FgscnhyMu0inWdp1lyhwJaREbYXCprNFqJc\nLjM+vouFhWlOnz7OwsI04+O7Iv14kGQpJyN9qdtyHRoGWt6xY8fYseMWTp8+vrRtZGQL9913d0tr\nsunzrogjJ6OejPSlpVxH8HdtriOL0hiq6yalUokzZ2ahZmBxcXGupWFFDbvFS0FG+pJyHb1lbGyM\niYm9FArbGRnZQqGwnYmJvU0HZQ27xU+zy6QvZWm1AonHzp03ct1117Y1zDU7O0suV2Jh4Wwfd2Cg\nqNl8bVDXXwEZAAAKeUlEQVRORvqaxt6lVrlcpljcyMLCNNXV6QqF7czNnerLfx+6TkakTVpTS2pV\nh93Gx7czMFBkcXGupWE3OUs9GRGROurhVmhZmYgUZEREmqcpzCIikmkKMiIikhgFGRERSYyCjIiI\nJEZBRkREEqMgIyIiiVGQERGRxKQeZMzsejM7ZWaPmNk7Ql7PmdkhM3vUzL5kZpfUvPbOYPvDZvbL\nnS25iIisJNUgY2YXAB8EfgV4HrDTzDbW7TYO/F93fw7wx8B7gmN/Afg14Arg3wJ7zayti4ZERCRe\nafdkXgg86u5z7r4IHAJuqNvnBuAjwfOPAdcGz18FHHL3n7r7LPBo8H4iIpIRaQeZi4DHa/5+ItgW\nuo+7Pw2cNrP1Icd+J+RYERFJUdqrMIcNb9UvMtZonyjHLrn99tuXnm/bto1t27atXDoRkT5y5MgR\njhw5Eut7prpAppldA9zu7tcHf98KuLvfVbPPZ4J9vmJmFwLfc/efq9/XzD4L3ObuXwk5jxbIFBFp\nUi8skHkM2GBmRTPLATcBn6zb56+Bm4PnrwPuD55/ErgpmH12KbABONqBMouISESpDpe5+9Nmthv4\nHJWAN+HuD5vZHcAxd/8UMAH8bzN7FPg+lUCEu8+Y2UeBGWAR2KXuiohItuh+MiIiEqoXhstERKSH\nKciIiEhiFGRERCQxCjIiIpIYBRkREUmMgoyIiCRGQUZERBKjICMiIolRkBERkcQoyIiISGIUZERE\nJDEKMiIikhgFGRERSYyCjIiIJEZBRkREEqMgIyIiiVGQERGRxCjIiIhIYhRkREQkMQoyIiKSGAUZ\nERFJjIKMiIgkRkFGREQSoyAjIiKJUZAREZHEKMiIiEhiFGRERCQxCjIiIpIYBRkREUmMgoyIiCRG\nQUZERBKjICMiIolRkBERkcQoyIiISGIUZEREJDGpBRkzW2dmnzOzb5jZlJmtbbDfzWb2SLDfG2q2\nT5vZKTM7YWYPmNkzOld6ERGJIs2ezK3Afe5+OXA/8M76HcxsHfAHwFbgRcBtdcFop7tvdvct7v7P\nnSh0Fh05ciTtIiSml+sGql+36/X6xSHNIHMD8JHg+UeAV4fs8yvA59z9tLv/EPgccH3N6xruo7f/\nofdy3UD163a9Xr84pNlI/5y7/xOAu/8jMBayz0XA4zV/fyfYVvWnwVDZ7ydXTBERadWqJN/czO4F\nfr52E+BA1KBgIds8+O+vu/v3zGwN8Jdm9np3/7PWSysiInEzd195ryRObPYwsM3d/8nM/hUw7e5X\n1O1zU7DPLcHf+4L9DtftdzPwAnd/S4NzpVNJEZEu5+5hP/YjS7Qns4JPAm8E7gJuBj4Rss8UcGeQ\n7L8A2AHcamYXAqPu/n0zGwBeAdzb6ETtfkgiItKaNHsy64GPAhcD3wZe5+4/NLMXAL/t7r8V7PdG\n4L9QGSb7b+7+v8xsNfC3VILkhcB9wH/2tCojIiKhUgsyIiLS+7p6CrCZXR9ckPmImb0j5PWXmNlx\nM1s0s9fUvfZ0MDPthJl9vHOlji5C/d5mZn9vZl8zs3vN7OKa10IvYs2SNuvXC9/fb5vZyaAOf2tm\nG2tee6eZPWpmD5vZL3e25CtrtW5mVjSzfwm+uwfMbG/nS7+ylepXs9+vmtnPzGxLzbZMf3fQev1a\n+v7cvSsfVALkY0ARGAC+Bmys2+cS4ErgAPCautd+lHYdYqjfS4F88PwW4FDwfB3wTWAtMFp9nnad\n4qpfD31/QzXPXwl8Jnj+C8AJKsPBpeB9LO06xVS3InAy7Tq0W79qHYHPA18EtgTbrsjydxdD/Zr+\n/rq5J/NC4FF3n3P3ReAQlQs8l7j7t93965yd9lwr65MBotTv8+7+VPDnlzl7DdFKF7FmQTv1g974\n/n5c8+cQ8LPg+auoBNSfuvss8GjwflnRTt2gB767wB4qE5f+X822G8j2dwft1Q+a/P66OcjUX6j5\nBOc2QisZNLOjZvZFMwv7gNPWbP3Ggc80OLb+ItYsaKd+0CPfn5ntMrPHgHcDb2lwbNa+v3bqBlAK\nhrGnzezFyRa1JSvWz8yuBp7l7p9e4disfXfQXv2gye8vzSnM7VruQs0oLnH3fzSzS4H7zeyku38r\nprLFIXL9zOz1wAuoDC81dWyK2qkf9Mj35+57gb3BNWH/lcq0/qx/f+3U7XtUvrsfBOP8HzezX6jr\n+aRt2fqZmQHvo3LpRVPHZkQr9ase0/T31809mSeo5FyqngV8N+rBXlnKhqBhOgJsjrNwMYhUPzO7\njsrioq8Mur6Rj01ZO/Xrme+vxmHOrt/3BJWp/VGP7bSW6+buZ9z9B8HzB6jkC5+bUDlbtVL9hoHn\nAUfM7FvANcAng0a3F/7fC6vfJ8xsS0vfX9pJqDaSVxdyNnmVo5K8uqLBvh8GXlvz9yiQC54/A/gG\nIYmvrNePSsP6GHBZ3fbaxH/1+WjadYqxfr3y/W2oef5K4GjwvJr4zwGXkrHkcZt1ewZwQfD82VSG\nbbru32bd/tPA5m747mKoX9PfX+oVbvPDuj5oYB4Fbg223QG8Inj+i8GH8CRQBh4Ktv9r4GTwj+FB\n4I1p16XF+t1Lpfv6QFCXj9cc+8bguEeAN6Rdlzjr10Pf3x8DXw/q9ze1/6NT6b09BjwM/HLadYmr\nbsBrgu0ngK8CL0+7Lq3Ur27f+wlmX3XDd9dO/Vr5/nQxpoiIJKabczIiIpJxCjIiIpIYBRkREUmM\ngoyIiCRGQUZERBKjICMiIolRkBHpkGD5+9cHz2+2ym3Hq6/tr13qv8HxXwj+WzSzncmWViQeuk5G\nJAVmNg283d2Pt3DsNuD33P2VsRdMJGbqyYhEEPQeHjazA2b2oJl91MzyZvay4OZND5rZ/zSzgWD/\nd9fccO09wbbbzOz3zOy1VFaj+LPg2Hywou0WM7vFzO6qOe/NZvb+4PmTweY/BF4cHPu7wU3BNtUc\n8wUzu7JTn43IchRkRKK7HNjn7s8HfgT8HpV18V4XbBsAfsfM1gGvdvfnufvVwH+reQ9397+gsiTH\nr7v7Fj97zxyAj1FZuqPqRir3+4CzK+XeCvxdcOwfAx8CfhPAzJ5DZV23r8dXbZHWKciIRPdtd/9y\n8Pwe4GXAP7j7N4NtHwF+iUoAWjCzD5nZvwcWGrzfeUuuu/s/A980sxea2Xrgue7+pRXK9THg35nZ\nhcCbqNwJViQTFGREYubuT1O5++BfUFni/rNNvsVHqfRgXgv8VYTzLVBZTPTVwOuAg02eTyQxCjIi\n0V1iZi8Knu+k0rCXzOzZwbbfAD5vZqupLH/+WeBtwPND3utJYKTBef6SSsC4icq9WKqqPZ8nqdzz\no9YE8AEqS+r/MHqVRJKlICMS3cPAzWb2IJX79LyPSi7kY8G2p4F9VILHp4Jt08DvhrzXAWBfNfFP\nzZ0JgyAxQ+UOhF+tOaa6z0ngaTM7YWZvDY55gMow3YfjqqxIHDSFWSQCMysCn3L3q9IuSxgzeyZw\nv7sve62NSKepJyMSXSZ/kZnZbwBfAt6VdllE6qknIyIiiVFPRkREEqMgIyIiiVGQERGRxCjIiIhI\nYhRkREQkMQoyIiKSmP8PCkUFvbQkwdsAAAAASUVORK5CYII=\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"ax = speeches_df.plot(x='positivity', y='anger', kind='scatter')\n",
"sotu_df.plot(x='positivity', y='anger', kind='scatter', c='red', ax=ax)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.1"
}
},
"nbformat": 4,
"nbformat_minor": 2
}