Google PageSpeed Insight API with Python tutorial

How To Use Google PageSpeed Insights API With Python

In this post, we’ll create a simple Python script with which we’ll demonstrate how to use Google PageSpeed Insights API. In case you’re not familiar with it yet, it’s a powerful SEO tool for analyzing performance of webpages.

Furthermore, we can store the audit data and check how our work impacts the performance of our pages in bulk. We’re only going to touch upon core vital metrics here, but the data we can retrieve with it gives us many more metrics to focus on and optimize our pages.

Making requests with PageSpeed Insights API

In order for us to use this API, we only need to make one request for either mobile or desktop webpage audit. Once we make the request, the API will fetch us information containing measurements and scores for all available metrics.

However, if your page is changing regularly, you should monitor its performance and call the API after each change.

Here are also the link templates you should use to call whenever making a request.

https://www.googleapis.com/pagespeedonline/v5/runPagespeed?url={URL}&strategy={STRATEGY}&locale=en&key={PAGESPEED_KEY}

As you can see, there is a number of query variables we need to set, to get the information we’re looking for. First of all, and most important, we need to define the URL address of the webpage we wish to audit. Secondly, we need to define the strategy which is a choice between mobile or desktop.

And finally, I encourage you to define the API key, which you can get from Google Cloud Platform or you can also follow web dev instructions. Mind you, this step is not necessary to make it work, but it’s preferable, since you’ll avoid 403 response errors.

Alright, let’s dive into code. First this we need to do is import all the necessary modules for this demonstration.

import urllib.request
import json
import os
import argparse
from dotenv import load_dotenv

Next, we’ll create .env file inside the project folder and add the PageSpeed Insights API key inside it. In case you’re going to upload your project to a public GitHub repository, I recommend you also add this file in the .gitignore file.

After we take care of that, we need to load the .env file inside our script, which will allow us to get the API key value without revealing it inside the script.

load_dotenv()
PAGESPEED_KEY = os.getenv('PAGESPEED_API_KEY')

Now, we’ll define a function that will make the request to the API and return the data in JSON format.

def get_data(url, strategy):
    api_url = f"https://www.googleapis.com/pagespeedonline/v5/runPagespeed?url={url}&strategy={strategy}&locale=en&key={PAGESPEED_KEY}"

    response = urllib.request.urlopen(api_url)
    data = json.loads(response.read())

    return data

Getting metrics information and scores

In this part of the tutorial, we’re going to focus on sifting throught the data. Furthermore, we’re going to fetch information for some of the core vitals of the webpage.

data = get_data(args.url, args.strategy)

    # core web vitals
    vitals = {
        'FCP': data['lighthouseResult']['audits']['first-contentful-paint'],
        'LCP': data['lighthouseResult']['audits']['largest-contentful-paint'],
        'FID': data['lighthouseResult']['audits']['max-potential-fid'],
        'TBT': data['lighthouseResult']['audits']['total-blocking-time'],
        'CLS': data['lighthouseResult']['audits']['cumulative-layout-shift']
    }

    for v in vitals:
        print(f'{v} - time: {vitals[v]["numericValue"] / 1000} seconds, score: {vitals[v]["score"] * 100}%')

    # overall performance score
    overall_score = data['lighthouseResult']['categories']['performance']['score'] * 100
    print(f'Overall score: {overall_score}%')

We’re also going to return information for long tasks. There’s a couple of ways we can do this. One is to fetch the data from diagnostics category and retrieve how many of them there are that last longer than 50ms to load. And another is to fetch the display value for long task metrics. However, these two sometimes don’t match.

# long tasks report
    diagnostics = data['lighthouseResult']['audits']['diagnostics']['details']['items'][0]
    long_tasks_report = {
        'Total tasks': diagnostics['numTasks'],
        'Total tasks time': diagnostics['totalTaskTime'],
        'Long tasks': diagnostics['numTasksOver50ms']
    }

    for i in long_tasks_report:
        print(f'{i}: {long_tasks_report[i]}')

    # sometimes doesn't match the number of tasks over 50ms long
    long_tasks = data["lighthouseResult"]["audits"]["long-tasks"]["displayValue"]
    print(long_tasks)

And finally, we’ll define an argument parser at the beginning of the algorithm, so we can use the script directly from command prompt.

parser = argparse.ArgumentParser(description='Audit website with PageSpeed Insights API')
    parser.add_argument('-u', '--url', help='Provide the URL address of the page you want to audit.')
    parser.add_argument('-s', '--strategy', help='Choose between mobile or desktop.')
    args = parser.parse_args()

Full code of the PageSpeed Insights API demo project

import urllib.request
import json
import os
import argparse
from dotenv import load_dotenv

load_dotenv()
PAGESPEED_KEY = os.getenv('PAGESPEED_API_KEY')

def get_data(url, strategy):
    api_url = f"https://www.googleapis.com/pagespeedonline/v5/runPagespeed?url={url}&strategy={strategy}&locale=en&key={PAGESPEED_KEY}"

    response = urllib.request.urlopen(api_url)
    data = json.loads(response.read())

    return data

if __name__ == '__main__':

    parser = argparse.ArgumentParser(description='Audit website with PageSpeed Insights API')
    parser.add_argument('-u', '--url', help='Provide the URL address of the page you want to audit.')
    parser.add_argument('-s', '--strategy', help='Choose between mobile or desktop.')
    args = parser.parse_args()

    data = get_data(args.url, args.strategy)

    # core web vitals
    vitals = {
        'FCP': data['lighthouseResult']['audits']['first-contentful-paint'],
        'LCP': data['lighthouseResult']['audits']['largest-contentful-paint'],
        'FID': data['lighthouseResult']['audits']['max-potential-fid'],
        'TBT': data['lighthouseResult']['audits']['total-blocking-time'],
        'CLS': data['lighthouseResult']['audits']['cumulative-layout-shift']
    }

    for v in vitals:
        print(f'{v} - time: {vitals[v]["numericValue"] / 1000} seconds, score: {vitals[v]["score"] * 100}%')

    # overall performance score
    overall_score = data['lighthouseResult']['categories']['performance']['score'] * 100
    print(f'Overall score: {overall_score}%')

    # long tasks report
    diagnostics = data['lighthouseResult']['audits']['diagnostics']['details']['items'][0]
    long_tasks_report = {
        'Total tasks': diagnostics['numTasks'],
        'Total tasks time': diagnostics['totalTaskTime'],
        'Long tasks': diagnostics['numTasksOver50ms']
    }

    for i in long_tasks_report:
        print(f'{i}: {long_tasks_report[i]}')

    # sometimes doesn't match the number of tasks over 50ms long
    long_tasks = data["lighthouseResult"]["audits"]["long-tasks"]["displayValue"]
    print(long_tasks)

Conclusion

To conclude, we made a simple Python script for auditing webpages using PageSpeed Insights API. I learned a lot while working on this project and I hope you find it useful as well.