Analysis of movies watched so far (2020)

December 20, 2020

I’ve always quite enjoyed watching movies, and somewhat kept track of the list of movies I’ve seen for no reason. Since discovering (Criterion), I’ve also found more international and older movies to watch.

So, I thought it would be quite fun to do some analysis on the movies I’ve watched. The data is not quite exhaustive. I don’t remember when I started keeping track of the movies I’ve watched and I’m sure I don’t log everything down, but it’s the only personal data set I have so that’s what I’m going to work with.

Some interesting results

I’ve spent at least 16714 minutes, or 278 hours, or 11 days watching movies.

Only 13/155 movies that I’ve watched is made before the year 2000.

I’ve watched movies from 22 countries.

Map of countries whose movies I've seen at least once

That’s pretty cool.



My list of movies is logged in a text file. The first thing I did was check it for any duplicates, and just clean up any typos I’ve found.

I then run that list of movies through a python script that calls the (TMDB API) and save the results in a csv file.

Detailed runthrough is provided below.


import csv
import requests
import json
import time

def jdump(obj):
    # create a formatted string of the Python JSON object
    text = json.dumps(obj, sort_keys=True, indent=4)
    return text

def write(movieName, id, original_language, popularity, overview, title, vote_average, vote_count):
    with open('new-content.csv', 'a') as file:
        writer = csv.writer(file)
        writer.writerow([movieName, id, original_language, popularity, overview, title, vote_average, vote_count])

with open('movies.csv', 'r') as file: # Open the movies list csvs
    reader = csv.reader(file)
    for row in reader: # For each row
        requesturl = "" + str(row[0]) + "&page=1&include_adult=false" #Do a GET request from TMDB API
        response = requests.get(requesturl)
        if (response.status_code == 200): # If the call is successful
                # save the response jsons
                id = jdump(response.json()["results"][0]["id"]) 
                original_language = jdump(response.json()["results"][0]["original_language"])
                popularity = jdump(response.json()["results"][0]["popularity"])
                overview = jdump(response.json()["results"][0]["overview"])
                title = jdump(response.json()["results"][0]["title"])
                vote_average = jdump(response.json()["results"][0]["vote_average"])
                vote_count = jdump(response.json()["results"][0]["vote_count"])
                write(str(row[0]), id, original_language, popularity, overview, title, vote_average, vote_count) # And write them as a row in a new csv file
                print("str(row[0])" + "succcess") # A success message for me
            except IndexError: # If there's a typo in the movie name I provided, or my movie cannot be found by TMDB, just fill the csv row with 0s
                write(str(row[0]), 0, 0, 0, 0, 0, 0, 0)
        else: # print what happened