On-page SEO Tool With Python

How To Make a SEO Tool For On-Page Optimization With Python

In this post, we’re going to make a SEO tool for analyzing on-page SEO by using Python programming language. Furthermore, we’re going to check for a few different things, that play a role in your page’s SEO.

We’re going to be checking the following:

  • Meta title and description of the page
  • How many H1, H2 and paragraph elements are on the page
  • If the website has an SSL certificate
  • Wether images on the page contain the alt attribute

The way we’re going to set it up is so we can use the script directly with command prompt. Therefore, we’ll need to add an argument parser, with which we can parse the input in the command prompt.

Coding the project

First things first, like with every Python project, we need to import all the neccessary modules first. In order to get the webpage information, we’ll need to use requests module. And to further process that information we’ll need a web scraping module.

import argparse
import requests
from bs4 import BeautifulSoup

Next, we’ll focus on each task individually, which includes making a method for fetching, preprocessing and printing data.

Meta title and description check

Meta tags, although invisible on the page, are essential for on-page SEO. This is because, they are the source of information that search engines use to describe what appears on your page.

Therefore, we’ll need to use webscraping module, which in our case is Beautiful Soup. Basically, we’ll look for title and description meta tags to check if they exist and print them out if they do.

def analyze_metadata(response, meta_data):
    soup = BeautifulSoup(response.content, 'html.parser')
    for m in meta_data:
        get_meta(soup, m)

def get_meta(soup, meta):
    tag = soup.find('meta', {'name': meta})
    if tag:
        print(f'Meta {meta} found: {tag["content"]}')
    else:
        tag = soup.find('meta', {'property': f'og:{meta}'})
        if tag:
            print(f'Meta {meta} found: {tag["content"]}')
        else:
            print(f'No meta {meta} found.')

Since we’ll add a bunch of other SEO checks, we can use the same response from a GET request in all of them. Therefore, we’ll use this response to pass it as an argument of each SEO tool method. To clarify, I’m talking about the GET request we make for the webpages URL address we’re checking.

And to actually put it in action we need to call this method.

meta_data = ['title', 'description']
analyze_metadata(response, meta_data)

H1, H2 and p elements occurences check

Next method, we’re going to implement, is to check how many H1, H2 and paragraph elements can be found on the page.

def lookup_elements(response):
    soup = BeautifulSoup(response.content, 'html.parser')
    h1 = soup.find_all('h1')
    h2 = soup.find_all('h2')
    p = soup.find_all('p')

    return {
        'h1': len(h1),
        'h2': len(h2),
        'p': len(p)
    }

Method above finds all elements for each specific tag and returns the total number for each in a dictionary.

In order to print this information out, we need to add the following snippet to the main thread.

elements = lookup_elements(response)
for e in elements:
    print(f"'{e}' elements found: {elements[e]}")

Checking for SSL certificate

With this tool, we’re going to check wether a website uses HTTPS or not. Furthermore, SSL certificate indicates that the websites traffic is encrypted and it confirms the identity of that website.

In order to check for this, we can simply look at the URL of the website we’re auditing. The following snippet will check if response was successful and check if URL address starts with https.

def check_ssl(response):
    if response.status_code != 200:
        print('Error: Could not access the website')
        return
    
    if response.url.startswith('https://'):
        print('The website has a valid SSL certificate')
    else:
        print('The website does NOT have a valid SSL certificate')

Check images alt attribute

This is going to be the last SEO tool we’re going to implement in this Python script. Furthermore, we’re going to check all images, on the page, if they have an alt attribute.

Search engines use alt attributes in images to understand their context and relevance to the content on a page. They’re also useful for enhancing accessibility for visually impaired visitors, if they include descriptive and relevant text of what appears in them.

def check_images_alt_attr(response):
    soup = BeautifulSoup(response.content, 'html.parser')
    images = soup.find_all('img')
    for img in images:
        if 'alt' in img.attrs:
            print(f"Image with src: {img['src']} contains an 'alt' attribute")
        else:
            print(f"\n\tImage with src: {img['src']} does NOT contain an 'alt' attribute\n")

Entire code of Python on-page SEO tool project

Here is the whole code of the script, which you can also use directly in a command prompt. Furthermore, I’m including a link to the GitHub repository with this code.

import argparse
import requests
from bs4 import BeautifulSoup

def get_meta(soup, meta):
    tag = soup.find('meta', {'name': meta})
    if tag:
        print(f'Meta {meta} found: {tag["content"]}')
    else:
        tag = soup.find('meta', {'property': f'og:{meta}'})
        if tag:
            print(f'Meta {meta} found: {tag["content"]}')
        else:
            print(f'No meta {meta} found.')

def analyze_metadata(response, meta_data):
    soup = BeautifulSoup(response.content, 'html.parser')
    for m in meta_data:
        get_meta(soup, m)

def lookup_elements(response):
    soup = BeautifulSoup(response.content, 'html.parser')
    h1 = soup.find_all('h1')
    h2 = soup.find_all('h2')
    p = soup.find_all('p')

    return {
        'h1': len(h1),
        'h2': len(h2),
        'p': len(p)
    }

def check_ssl(response):
    if response.status_code != 200:
        print('Error: Could not access the website')
        return
    
    if response.url.startswith('https://'):
        print('The website has a valid SSL certificate')
    else:
        print('The website does NOT have a valid SSL certificate')

def check_images_alt_attr(response):
    soup = BeautifulSoup(response.content, 'html.parser')
    images = soup.find_all('img')
    for img in images:
        if 'alt' in img.attrs:
            print(f"Image with src: {img['src']} contains an 'alt' attribute")
        else:
            print(f"\n\tImage with src: {img['src']} does NOT contain an 'alt' attribute\n")

if __name__ == '__main__':
    
    parser = argparse.ArgumentParser(
        description='Analyze on-page SEO.'
    )

    parser.add_argument(
        '-u',
        '--url',
        type=str,
        required=True,
        help='URL address of page you want to analyze.'
    )

    args = parser.parse_args()

    response = requests.get(args.url)

    print('\n---- Meta data status ---------------\n')
    meta_data = ['title', 'description']
    analyze_metadata(response, meta_data)

    print('\n---- Element occurences -------------\n')
    elements = lookup_elements(response)
    for e in elements:
        print(f"'{e}' elements found: {elements[e]}")

    print('\n---- SSL status ---------------------\n')
    check_ssl(response)

    print('\n---- Image alt attributes status ----\n')
    check_images_alt_attr(response)

You can use the script with the following command, where we audit the homepage of this blog.

python analyze.py -u https://ak-codes.com/

Conclusion

To conclude, we made a simple Python SEO tool for auditing on-page SEO of any webpage we want or, at least, we get a successful response from GET request. I learned a lot while working on this project and I hope you’ll find it helpful as well.

Share this article:

Related posts

Discussion(0)