{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# NRC Emotional Lexicon\n",
"\n",
"This is the [NRC Emotional Lexicon](http://saifmohammad.com/WebPages/NRC-Emotion-Lexicon.htm): \"The NRC Emotion Lexicon is a list of English words and their associations with eight basic emotions (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust) and two sentiments (negative and positive). The annotations were manually done by crowdsourcing.\"\n",
"\n",
"I don't trust it, but everyone uses it."
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" word | \n",
" emotion | \n",
" association | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" aback | \n",
" anger | \n",
" 0 | \n",
"
\n",
" \n",
" 1 | \n",
" aback | \n",
" anticipation | \n",
" 0 | \n",
"
\n",
" \n",
" 2 | \n",
" aback | \n",
" disgust | \n",
" 0 | \n",
"
\n",
" \n",
" 3 | \n",
" aback | \n",
" fear | \n",
" 0 | \n",
"
\n",
" \n",
" 4 | \n",
" aback | \n",
" joy | \n",
" 0 | \n",
"
\n",
" \n",
" 5 | \n",
" aback | \n",
" negative | \n",
" 0 | \n",
"
\n",
" \n",
" 6 | \n",
" aback | \n",
" positive | \n",
" 0 | \n",
"
\n",
" \n",
" 7 | \n",
" aback | \n",
" sadness | \n",
" 0 | \n",
"
\n",
" \n",
" 8 | \n",
" aback | \n",
" surprise | \n",
" 0 | \n",
"
\n",
" \n",
" 9 | \n",
" aback | \n",
" trust | \n",
" 0 | \n",
"
\n",
" \n",
" 10 | \n",
" abacus | \n",
" anger | \n",
" 0 | \n",
"
\n",
" \n",
" 11 | \n",
" abacus | \n",
" anticipation | \n",
" 0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" word emotion association\n",
"0 aback anger 0\n",
"1 aback anticipation 0\n",
"2 aback disgust 0\n",
"3 aback fear 0\n",
"4 aback joy 0\n",
"5 aback negative 0\n",
"6 aback positive 0\n",
"7 aback sadness 0\n",
"8 aback surprise 0\n",
"9 aback trust 0\n",
"10 abacus anger 0\n",
"11 abacus anticipation 0"
]
},
"execution_count": 44,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"filepath = \"NRC-Emotion-Lexicon-v0.92/NRC-emotion-lexicon-wordlevel-alphabetized-v0.92.txt\"\n",
"emolex_df = pd.read_csv(filepath, names=[\"word\", \"emotion\", \"association\"], skiprows=45, sep='\\t')\n",
"emolex_df.head(12)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Seems kind of simple. A column for a word, a column for an emotion, and whether it't associated or not. You see \"aback aback aback aback\" because there's a row for every word-emotion pair."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## What emotions are covered?\n",
"\n",
"Let's look at the 'emotion' column. What can we talk about?"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array(['anger', 'anticipation', 'disgust', 'fear', 'joy', 'negative',\n",
" 'positive', 'sadness', 'surprise', 'trust'], dtype=object)"
]
},
"execution_count": 46,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"emolex_df.emotion.unique()"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"fear 14182\n",
"anger 14182\n",
"trust 14182\n",
"anticipation 14182\n",
"sadness 14182\n",
"disgust 14182\n",
"surprise 14182\n",
"joy 14182\n",
"positive 14182\n",
"negative 14182\n",
"Name: emotion, dtype: int64"
]
},
"execution_count": 47,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"emolex_df.emotion.value_counts()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## How many words does each emotion have?\n",
"\n",
"Each emotion doesn't have 14182 words associated with it, unfortunately! `1` means \"is associated\" and `0` means \"is not associated.\"\n",
"\n",
"We're only going to care about \"is associated.\""
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"negative 3324\n",
"positive 2312\n",
"fear 1476\n",
"anger 1247\n",
"trust 1231\n",
"sadness 1191\n",
"disgust 1058\n",
"anticipation 839\n",
"joy 689\n",
"surprise 534\n",
"Name: emotion, dtype: int64"
]
},
"execution_count": 50,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"emolex_df[emolex_df.association == 1].emotion.value_counts()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In theory things could be *kind of* angry or *kind of* joyous, but it doesn't work like that. If you want to spend a few hundred dollars on Mechnical Turk, though, *your own personal version can.*"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## What if I just want the angry words?"
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"30 abandoned\n",
"40 abandonment\n",
"170 abhor\n",
"180 abhorrent\n",
"270 abolish\n",
"300 abomination\n",
"630 abuse\n",
"1120 accursed\n",
"1130 accusation\n",
"1150 accused\n",
"1160 accuser\n",
"1170 accusing\n",
"1470 actionable\n",
"1650 adder\n",
"2390 adversary\n",
"2400 adverse\n",
"2410 adversity\n",
"2500 advocacy\n",
"2840 affront\n",
"2920 aftermath\n",
"3030 aggravated\n",
"3040 aggravating\n",
"3050 aggravation\n",
"3080 aggression\n",
"3090 aggressive\n",
"3100 aggressor\n",
"3140 agitated\n",
"3150 agitation\n",
"3190 agony\n",
"3570 alcoholism\n",
" ... \n",
"138470 warlike\n",
"138530 warp\n",
"138600 warrior\n",
"138680 wasted\n",
"138690 wasteful\n",
"139330 wench\n",
"139550 whip\n",
"139950 willful\n",
"140020 wimpy\n",
"140030 wince\n",
"140220 wireless\n",
"140290 witch\n",
"140300 witchcraft\n",
"140610 wop\n",
"140640 words\n",
"140870 worthless\n",
"140900 wound\n",
"140920 wrangling\n",
"140960 wrath\n",
"140970 wreak\n",
"140990 wreck\n",
"141000 wrecked\n",
"141060 wretch\n",
"141090 wring\n",
"141210 wrongdoing\n",
"141220 wrongful\n",
"141230 wrongly\n",
"141470 yell\n",
"141500 yelp\n",
"141640 youth\n",
"Name: word, Length: 1247, dtype: object"
]
},
"execution_count": 52,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"emolex_df[(emolex_df.association == 1) & (emolex_df.emotion == 'anger')].word"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Reshaping\n",
"\n",
"You can also reshape the data in order to look at it a slightly different way"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" emotion | \n",
" word | \n",
" anger | \n",
" anticipation | \n",
" disgust | \n",
" fear | \n",
" joy | \n",
" negative | \n",
" positive | \n",
" sadness | \n",
" surprise | \n",
" trust | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" aback | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" 1 | \n",
" abacus | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
"
\n",
" \n",
" 2 | \n",
" abandon | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" 3 | \n",
" abandoned | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" 4 | \n",
" abandonment | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
"emotion word anger anticipation disgust fear joy negative \\\n",
"0 aback 0 0 0 0 0 0 \n",
"1 abacus 0 0 0 0 0 0 \n",
"2 abandon 0 0 0 1 0 1 \n",
"3 abandoned 1 0 0 1 0 1 \n",
"4 abandonment 1 0 0 1 0 1 \n",
"\n",
"emotion positive sadness surprise trust \n",
"0 0 0 0 0 \n",
"1 0 0 0 1 \n",
"2 0 1 0 0 \n",
"3 0 1 0 0 \n",
"4 0 1 1 0 "
]
},
"execution_count": 54,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"emolex_words = emolex_df.pivot(index='word', columns='emotion', values='association').reset_index()\n",
"emolex_words.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can now pull out individual words..."
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" emotion | \n",
" word | \n",
" anger | \n",
" anticipation | \n",
" disgust | \n",
" fear | \n",
" joy | \n",
" negative | \n",
" positive | \n",
" sadness | \n",
" surprise | \n",
" trust | \n",
"
\n",
" \n",
" \n",
" \n",
" 2001 | \n",
" charitable | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
"emotion word anger anticipation disgust fear joy negative \\\n",
"2001 charitable 0 1 0 0 1 0 \n",
"\n",
"emotion positive sadness surprise trust \n",
"2001 1 0 0 1 "
]
},
"execution_count": 58,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# If you didn't reset_index you could do this more easily\n",
"# by doing emolex_words.loc['charitable']\n",
"emolex_words[emolex_words.word == 'charitable']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"...or individual emotions...."
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" emotion | \n",
" word | \n",
" anger | \n",
" anticipation | \n",
" disgust | \n",
" fear | \n",
" joy | \n",
" negative | \n",
" positive | \n",
" sadness | \n",
" surprise | \n",
" trust | \n",
"
\n",
" \n",
" \n",
" \n",
" 3 | \n",
" abandoned | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" 4 | \n",
" abandonment | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
"
\n",
" \n",
" 17 | \n",
" abhor | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" 18 | \n",
" abhorrent | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" 27 | \n",
" abolish | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
"emotion word anger anticipation disgust fear joy negative \\\n",
"3 abandoned 1 0 0 1 0 1 \n",
"4 abandonment 1 0 0 1 0 1 \n",
"17 abhor 1 0 1 1 0 1 \n",
"18 abhorrent 1 0 1 1 0 1 \n",
"27 abolish 1 0 0 0 0 1 \n",
"\n",
"emotion positive sadness surprise trust \n",
"3 0 1 0 0 \n",
"4 0 1 1 0 \n",
"17 0 0 0 0 \n",
"18 0 0 0 0 \n",
"27 0 0 0 0 "
]
},
"execution_count": 59,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"emolex_words[emolex_words.anger == 1].head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"...or multiple emotions!"
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" emotion | \n",
" word | \n",
" anger | \n",
" anticipation | \n",
" disgust | \n",
" fear | \n",
" joy | \n",
" negative | \n",
" positive | \n",
" sadness | \n",
" surprise | \n",
" trust | \n",
"
\n",
" \n",
" \n",
" \n",
" 61 | \n",
" abundance | \n",
" 0 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
"
\n",
" \n",
" 1018 | \n",
" balm | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" 1382 | \n",
" boisterous | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" 1916 | \n",
" celebrity | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 1 | \n",
"
\n",
" \n",
" 2004 | \n",
" charmed | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
"emotion word anger anticipation disgust fear joy negative \\\n",
"61 abundance 0 1 1 0 1 1 \n",
"1018 balm 0 1 0 0 1 1 \n",
"1382 boisterous 1 1 0 0 1 1 \n",
"1916 celebrity 1 1 1 0 1 1 \n",
"2004 charmed 0 0 0 0 1 1 \n",
"\n",
"emotion positive sadness surprise trust \n",
"61 1 0 0 1 \n",
"1018 1 0 0 0 \n",
"1382 1 0 0 0 \n",
"1916 1 0 1 1 \n",
"2004 1 0 0 0 "
]
},
"execution_count": 60,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"emolex_words[(emolex_words.joy == 1) & (emolex_words.negative == 1)].head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The useful part is going to be just getting words for a **single emotion.**"
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"3 abandoned\n",
"4 abandonment\n",
"17 abhor\n",
"18 abhorrent\n",
"27 abolish\n",
"30 abomination\n",
"63 abuse\n",
"112 accursed\n",
"113 accusation\n",
"115 accused\n",
"116 accuser\n",
"117 accusing\n",
"147 actionable\n",
"165 adder\n",
"239 adversary\n",
"240 adverse\n",
"241 adversity\n",
"250 advocacy\n",
"284 affront\n",
"292 aftermath\n",
"303 aggravated\n",
"304 aggravating\n",
"305 aggravation\n",
"308 aggression\n",
"309 aggressive\n",
"310 aggressor\n",
"314 agitated\n",
"315 agitation\n",
"319 agony\n",
"357 alcoholism\n",
" ... \n",
"13847 warlike\n",
"13853 warp\n",
"13860 warrior\n",
"13868 wasted\n",
"13869 wasteful\n",
"13933 wench\n",
"13955 whip\n",
"13995 willful\n",
"14002 wimpy\n",
"14003 wince\n",
"14022 wireless\n",
"14029 witch\n",
"14030 witchcraft\n",
"14061 wop\n",
"14064 words\n",
"14087 worthless\n",
"14090 wound\n",
"14092 wrangling\n",
"14096 wrath\n",
"14097 wreak\n",
"14099 wreck\n",
"14100 wrecked\n",
"14106 wretch\n",
"14109 wring\n",
"14121 wrongdoing\n",
"14122 wrongful\n",
"14123 wrongly\n",
"14147 yell\n",
"14150 yelp\n",
"14164 youth\n",
"Name: word, Length: 1247, dtype: object"
]
},
"execution_count": 63,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Angry words\n",
"emolex_words[emolex_words.anger == 1].word"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.1"
}
},
"nbformat": 4,
"nbformat_minor": 2
}