I’ve always quite enjoyed watching movies, and somewhat kept track of the list of movies I’ve seen for no reason. Since discovering (Criterion), I’ve also found more international and older movies to watch.
So, I thought it would be quite fun to do some analysis on the movies I’ve watched. The data is not quite exhaustive. I don’t remember when I started keeping track of the movies I’ve watched and I’m sure I don’t log everything down, but it’s the only personal data set I have so that’s what I’m going to work with.
Some interesting results
I’ve spent at least 16714 minutes, or 278 hours, or 11 days watching movies.
Only 13/155 movies that I’ve watched is made before the year 2000.
I’ve watched movies from 22 countries.
That’s pretty cool.
Process
Overview
My list of movies is logged in a text file. The first thing I did was check it for any duplicates, and just clean up any typos I’ve found.
I then run that list of movies through a python script that calls the (TMDB API) and save the results in a csv file.
Detailed runthrough is provided below.
Code
import csv
import requests
import json
import time
def jdump(obj):
# create a formatted string of the Python JSON object
text = json.dumps(obj, sort_keys=True, indent=4)
return text
def write(movieName, id, original_language, popularity, overview, title, vote_average, vote_count):
with open('new-content.csv', 'a') as file:
writer = csv.writer(file)
writer.writerow([movieName, id, original_language, popularity, overview, title, vote_average, vote_count])
with open('movies.csv', 'r') as file: # Open the movies list csvs
reader = csv.reader(file)
for row in reader: # For each row
requesturl = "https://api.themoviedb.org/3/search/movie?api_key=ed72c8e5b3a4abaa084fa80131ad55af&language=en-US&query=" + str(row[0]) + "&page=1&include_adult=false" #Do a GET request from TMDB API
response = requests.get(requesturl)
if (response.status_code == 200): # If the call is successful
try:
# save the response jsons
id = jdump(response.json()["results"][0]["id"])
original_language = jdump(response.json()["results"][0]["original_language"])
popularity = jdump(response.json()["results"][0]["popularity"])
overview = jdump(response.json()["results"][0]["overview"])
title = jdump(response.json()["results"][0]["title"])
vote_average = jdump(response.json()["results"][0]["vote_average"])
vote_count = jdump(response.json()["results"][0]["vote_count"])
write(str(row[0]), id, original_language, popularity, overview, title, vote_average, vote_count) # And write them as a row in a new csv file
print("str(row[0])" + "succcess") # A success message for me
except IndexError: # If there's a typo in the movie name I provided, or my movie cannot be found by TMDB, just fill the csv row with 0s
write(str(row[0]), 0, 0, 0, 0, 0, 0, 0)
continue
else: # print what happened
print(response.json)