Functions

Functions are like tiny, self-contained programs – they allow you to execute a large block of code by just calling its name! And, we’ve already seen some functions at work: print(), len(), sum(), etc., so we have some idea of how they work.

Methods are also a thing – they are similar to functions, but they “attached” to an object and look like my_list.sort() (remember that .sort() acts on our original list, rather than returning a new list object).

Basic structure of a function

Functions take in inputs that we call parameters and return an output, using a return statement. Sample function below:

def function_name(parameter):
    some_variable = 0
    # a block of code where you do something with/to your parameter
    # a block of code where you do something with/to your parameter
    return some_item

REALLY IMPORTANT POINTS:

  • You can pretty much name your parameter ANYTHING you want! Just like in our for loop structure (for item in my_list:) it doesn’t matter what you call item, it doesn’t matter what we call parameter, as long as you understand what it’s supposed to be. But, you probably don’t want to name it the same name as another variable that exists in your code somewhere – this will get confusing for all sentient and non-sentient beings involved!
  • You DO want to name your parameter something that reminds you what type of thing you want to pass as an input to your function.
    • For example: the sorted() function always takes in a list of numbers or strings. When you are writing your own function, you will want to think about what you want your function to accept as an input (how do you want to design your function?). If you were writing the sorted() function, you might do something like: def sorted(some_list):
  • If you define a variable within the body of your function, DO NOT use that variable name anywhere outside of that function.
  • Somewhat similarly, if you have a variable name you’ve used in your earlier in your code, DO NOT overwrite that variable within your function – just come up with a new name!

How to write a function

To start off your function, you have to tell Python that you’re about to define a function using: def function_name(parameter):. Pay special attention to syntax like :. The code indented below that first line is the code that will run when you call the function (so, as always, pay special attention to indentation).

In your indented code, you will probably do something to or with the parameter you’ve passed your function.

To get things out of your function, you have to use return.

Sample double() function

def double(number):
    bigger = number * 2
    return bigger

  • Note that the name number is not significant! we could have called that blahblahblah if we wanted – but we called it number just to make our code easier to read and understand.
  • Make sure that every name that you give your function is unique.

Debugging your function

Do you remember our #1 favorite hobby of all time? It’s PRINTING. But, we have to get a little tricky to our functions.

One, it’s always, always, always to write code incrementally. Build one line, test it, then add the text line.

For functions, you want to write both a function definition (as seen above in our sample double() function) AND call to that function. In other words, the only way you can check to see if a function works is to USE IT. For our double() function, we could use something like print(double(3)) to test our function.

If within the body of your function you are saving something to a variable, print that variable to make sure it actually holds what you think it holds.

Cron Jobs

Cron is a tool that allow us to automate tasks on our computer. By our crontab file, we can tell our computer how frequently we want it to run some command.

For example, in class, we edited our crontab files (in Terminal/Babun, run crontab -e to begin editing your crontab file) to tell our computers to announce the time every minute. The code looked like this: */1 * * * * say "it's a new minute". The */1 * * * * portion of this code looks pretty weird, but this is what tells your computer how frequently it should run the command that follows. Here’s more info about the format.

An aside: You can further customize how you want the say command to sound.

We can also do other cool, useful things (besides making our computers talk):

How to setup an automated task

  • We’re going to use a tool called Cron to schedule automated tasks
  • We can use the text editors called vi or nano to edit our crontab file.

Let’s get ready to edit our crontab file

  • export EDITOR=nano: sets your
  • crontab -e: lets us edit our configurations for tasks we want our computer to run automatically.

Possible lines we could add to our crontab file

  • */1 * * * * say "it's a new minute": for every minute divisible by 1, say it’s a new minute
  • */1 * * 6,7 1,3: Every minute on Mondays and Wednesdays of June and July, say it’s a new minute
  • Use the date command:
    • To get “Monday June 20 11:57 AM”: date +"%A %B %d, %I:%M %p"
    • echo date
    • Every other minute between 12-1: */2 12 * * * say date +”%A %B %d, %I:%M %p”``

IMPORTANT: Use # to command stuff out in crontab

When you run cronjob and there’s an error, it’ll send you an email about it. But not to your gmail or something – it’s mail related to your computer account… you can pretty much ignore it.

Practical application: regularly scaping NYTimes homepage

We mayyy want to do something besides periodically making our computers talk. What if we wanted to periodically download the NYTimes homepage? We could use a cronjob to accomplish that task!

There are a couple of ways to handle this task:

  • We could use something like: curl http://www.nytimes.com > ~/Desktop/nytimes.txt
    • This will grab the nytimes page, save and download it.
    • But each time we run this command, it’ll overwrite the previous nytimes.txt file.
    • Ideally we’d want to put the date and time in our filename so that each time, it produces a unique filename.
    • This downloads and saves the NYTimes homepage every 10 minutes with a datestamped filename prefixed with nytimes */10 * * * * curl http://www.nytimes.com > ~/Desktop/nytimes+date +”%Y-%m-%d-%I-%M”.txt
  • We can also just tell crontab to run a Python scraper script every 10 minutes! */10 * * * * python3 nytimes_scraper.py
    • Note that while we can write our scraper in a Jupyter notebook, we will want to download it as a .py file in order to run the scraper from the command line.

This is the scraper we wrote in class:

import requests
from bs4 import BeautifulSoup

# Grab the NYT homepage
response = requests.get("http://www.nytimes.com")

# Feed it into BeautifulSoup
doc = BeautifulSoup(response.text, 'html.parser')

stories = doc.find_all("article", { 'class': 'story' })

# Scrape to find headlines and bylines; save to a list of dictionaries
all_stories = []
# Grab their headlines and bylines
for story in stories:
    # Grab all of the h2's inside of the story
    headline = story.find('h2', {'class': 'story-heading'})
    # If a headline exists, then process the rest!
    if headline:
        # They're COVERED in whitespace
        headline_text = headline.text.strip()
        # Make a dictionary with the headline
        this_story = { 'headline': headline_text }
        byline = story.find('p', {'class': 'byline'})
        # Not all of them have a byline
        if byline:
            byline_text = byline.text.strip()
            this_story['byline'] = byline_text
        all_stories.append(this_story)

import pandas as pd
# Save our list of dictionaries as a DataFrame
stories_df = pd.DataFrame(all_stories)

# We use this to save the current time to 'datestring'
import time
datestring = time.strftime("%Y-%m-%d-%H-%M")

# We create a string that includes our timestamp
# We save the scraped data into a .csv with a unique, time/datastamped file name!
filename = "../nyt-data-" + datestring + ".csv"
stories_df.to_csv(filename, index=False