← back to class-06

Replacing with .str.replace and .replace

import pandas as pd

df = pd.DataFrame([
    { 'original': 'Potatoes', 'sentiment': -1 },
    { 'original': 'I hate bananas', 'sentiment': -1 },
    { 'original': 'I love potatoes', 'sentiment': 1 },
    { 'original': 'Potatoes are my favorite', 'sentiment': 1 },
    { 'original': 'I ate potatoes', 'sentiment': 0 }
])

Using .replace to replace exact values

When you use .replace, you're matching the exact value (aka the entire cell).

df['edited'] = df.sentiment.replace(-1, "negative")
dfa
original sentiment edited
0 Potatoes -1 negative
1 I hate bananas -1 negative
2 I love potatoes 1 1
3 Potatoes are my favorite 1 1
4 I ate potatoes 0 0

Using .replace to replace multiple exact values

You can also ask .replace to replace multiple exact values by passing it a dictionary.

df['edited'] = df.sentiment.replace({
    -1: "negative",
    0: "neutral",
    1: "positive"
})
df
original sentiment edited
0 Potatoes -1 negative
1 I hate bananas -1 negative
2 I love potatoes 1 positive
3 Potatoes are my favorite 1 positive
4 I ate potatoes 0 neutral

Comparing .replace and .str.replace

Both .replace and .str.replace replace things in your data. The difference is that .replace looks at the entire cell, while .str.replace looks for matches inside of the cell.

Let's see some examples.

df
original sentiment edited
0 Potatoes -1 negative
1 I hate bananas -1 negative
2 I love potatoes 1 positive
3 Potatoes are my favorite 1 positive
4 I ate potatoes 0 neutral
df['edited'] = df.original.replace("Potatoes", "Chocolates")
df
original sentiment edited
0 Potatoes -1 Chocolates
1 I hate bananas -1 I hate bananas
2 I love potatoes 1 I love potatoes
3 Potatoes are my favorite 1 Potatoes are my favorite
4 I ate potatoes 0 I ate potatoes

.replace will only replace "Potatoes" if it finds an exact match. Notice how "Potatoes are my favorite" is untouched, but the first row changed from Potatoes to Chocolcates.

df['edited'] = df.original.str.replace("Potatoes", "Chocolate")
df
original sentiment edited
0 Potatoes -1 Chocolate
1 I hate bananas -1 I hate bananas
2 I love potatoes 1 I love potatoes
3 Potatoes are my favorite 1 Chocolate are my favorite
4 I ate potatoes 0 I ate potatoes

.str.replace will replace "Potatoes" even inside of a sentence. Notice how the last sentence is now Chocolates are my favorite.

Making .str.replace not case sensitive

By default, both .replace and .str.replace are case sensitive. They need an exact match - uppercase and lowercase are treated differently.

df['edited'] = df.original.str.replace("Potatoes", "Chocolate")
df
original sentiment edited
0 Potatoes -1 Chocolate
1 I hate bananas -1 I hate bananas
2 I love potatoes 1 I love potatoes
3 Potatoes are my favorite 1 Chocolate are my favorite
4 I ate potatoes 0 I ate potatoes

Notice how "I love potatoes" is still about potatoes and not chocolate. If you want pandas to ignore case while replacing strings, use case=False.

df['edited'] = df.original.str.replace("Potatoes", "Chocolate", case=False)
df
original sentiment edited
0 Potatoes -1 Chocolate
1 I hate bananas -1 I hate bananas
2 I love potatoes 1 I love Chocolate
3 Potatoes are my favorite 1 Chocolate are my favorite
4 I ate potatoes 0 I ate Chocolate

You cannot make replace case-insensitive (unless you work with regular expressions).

Removing parts of strings

If you want to remove something from a cell, use .str.replace to replace it with an empty string "".

df['edited'] = df.original.str.replace("I ", "")
df
original sentiment edited
0 Potatoes -1 Potatoes
1 I hate bananas -1 hate bananas
2 I love potatoes 1 love potatoes
3 Potatoes are my favorite 1 Potatoes are my favorite
4 I ate potatoes 0 ate potatoes

This is really useful for data cleaning, especially if you don't know regular expressions.

dirty = pd.DataFrame([
    {'phrase': 'Please call 555-1212 for assistance' },
    {'phrase': 'Please call 332-3456 for assistance' },
    {'phrase': 'Please call 123-4333 for assistance' },
])
dirty
phrase
0 Please call 555-1212 for assistance
1 Please call 332-3456 for assistance
2 Please call 123-4333 for assistance
dirty.phrase = dirty.phrase.str.replace("Please call ", "").str.replace(" for assistance", "")
dirty
phrase
0 555-1212
1 332-3456
2 123-4333

Don't confuse .str.replace and .replace

Even though they're very similar in many situations, sometimes you'll run into errors because you're treating one like the other.

Only replace can use dictionaries

When you use replace, you can replace multiple values at once.

df['edited'] = df.sentiment.replace({
    -1: "negative",
    0: "neutral",
    1: "positive"
})
df
original sentiment edited
0 Potatoes -1 negative
1 I hate bananas -1 negative
2 I love potatoes 1 positive
3 Potatoes are my favorite 1 positive
4 I ate potatoes 0 neutral

If you try to do that with .str.replace, you get an error: replace() missing 1 required positional argument: 'repl'. This means "You didn't tell me what to replace with what," even though it feels like you tried.

# This will not work
df['edited'] = df.original.str.replace({
    "potatoes": "chocolate",
    "love": "hate"
})
df
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/var/folders/l0/h__2c37508b8pl19zp232ycr0000gn/T/ipykernel_44881/629204424.py in <module>
----> 1 df['edited'] = df.original.str.replace({
      2     "potatoes": "chocolate",
      3     "love": "hate"
      4 })
      5 df

~/.pyenv/versions/3.9.7/lib/python3.9/site-packages/pandas/core/strings/accessor.py in wrapper(self, *args, **kwargs)
    114                 )
    115                 raise TypeError(msg)
--> 116             return func(self, *args, **kwargs)
    117 
    118         wrapper.__name__ = func_name

TypeError: replace() missing 1 required positional argument: 'repl'

The easiest fix is to just do your replacing one replacement at a time.

df['edited'] = df.original.str.replace("potatoes", "chocolate")
df['edited'] = df.original.str.replace("love", "hate")
df
original sentiment edited
0 Potatoes -1 Potatoes
1 I hate bananas -1 I hate bananas
2 I love potatoes 1 I hate potatoes
3 Potatoes are my favorite 1 Potatoes are my favorite
4 I ate potatoes 0 I ate potatoes

← back to class-06