Google Auto Suggest Keyword Tool with Python

How To Make a Google Auto Suggest Keyword Tool With Python

In this post, we’re going to make a simple Google auto suggest keyword generator tool by using Python. Furthermore, our tool will also cluster suggested keywords by most common words and save it into a .csv file.

Keyword research can be quite a time consuming process when you’re looking for best keywords to rank for. However, we can leverage the power of programming to automate processes, like finding ideas for long tail keywords.

Script we’re about to make will take your seed keywords and find hundreds of suggestions by using the Google’s autosuggest mechanism.

Diving into code

The main part of this whole project is going to be sending requests to a special URL address, which will return suggestions along with bunch of other data. But for the sake of this project, we’ll only use the suggestions.

The following link is an example, with which we can see what exactly does it return. When you follow it, you should see a new .txt file in your downloads folder.

http://suggestqueries.google.com/complete/search?client=firefox&hl=en&q=example keyword

So, without further ado, let’s get to coding. Like any other Python project, we need to import all the necessary modules and tools.

import os
import string
import json
import requests
import argparse
import pandas as pd

import nltk
nltk.download('stopwords')

from nltk.corpus import stopwords

As you can see, we already used a method between the imports. The purpose of this is to download stop words, because nltk module doesn’t come prepackaged with it. You can also just leave it there, because it will download it only once and not every time you run the script.

I’m also going to define a constant for the file path of the script, which we’ll use later for saving our suggested keywords.

ROOT = os.path.dirname(__file__)

Getting keyword suggestions from Google auto suggest

Here we’re going to define a method, which will make the requests to get suggestions. Furthermore, the script will search suggestions by adding each letter of the alphabet along the seed keyword. To clarify, we’ll make one request at a time for each letter.

def get_suggestions(keywords):
    
    char_list = list(string.ascii_lowercase)
    headers = {'User-agent': 'Mozilla/5.0'}

    keyword_suggestions = []
    for keyword in keywords:
        for char in char_list:
            url = f"http://suggestqueries.google.com/complete/search?client=firefox&hl=en&q={keyword} {char}"
            response = requests.get(url, headers=headers)
            result = json.loads(response.content.decode('utf-8'))
            
            keyword_suggestions.append(keyword)
            for word in result[1]:
                if word != keyword:
                    keyword_suggestions.append(word)
    
    return list(set(keyword_suggestions))

Clustering keywords by common words

In the following snippet, we’re going to define a method for finding the most common words in our suggested keywords. We’re also going to filter out stop words and words that are already in the seed keywords.

After that, we’re just going to pair keyword suggestions with the most common word that appears in it and save everything into a .csv file.

def cluster_keywords(keywords, seed_words):

    _words = []
    stop_words = list(set(stopwords.words('english')))
    
    for keyword in keywords:
        words = nltk.word_tokenize(str(keyword).lower())
        for word in words:
            if (word not in stop_words 
                and not any(word in s for s in seed_words) 
                and len(word) > 1):
                _words.append(word)
    
    top_common_words = [word for word, word_count in nltk.Counter(_words).most_common(200)]

    clusters = []

    for common_word in top_common_words:
        for keyword in keywords:
            if (common_word in str(keyword)):
                clusters.append([keyword, common_word])
    
    df = pd.DataFrame(clusters, columns=['Keyword', 'Cluster'])
    df.to_csv(os.path.join(ROOT, 'keywords.csv'), index=False)

Putting it all together

Last step in this project is to put both of the methods we defined above in use. We’re also going to add an argument parser, so we’ll be able to use the script directly from the command prompt.

if __name__ == '__main__':

    parser = argparse.ArgumentParser(
        description='Returns suggested long tail keywords from Google.'
    )

    parser.add_argument(
        '-k',
        '--keywords',
        type=list_of_items,
        required=True,
        help='List of seed keywords'
    )

    args = parser.parse_args()
    seed_words = args.keywords
    
    suggestions = get_suggestions(seed_words)
    cluster_keywords(suggestions, seed_words)

Code for Google Auto Suggest Keyword Tool

Here is the entire code of the project.

import os
import string
import json
import requests
import argparse
import pandas as pd

import nltk
nltk.download('stopwords')

from nltk.corpus import stopwords

ROOT = os.path.dirname(__file__)

def get_suggestions(keywords):
    
    char_list = list(string.ascii_lowercase)
    headers = {'User-agent': 'Mozilla/5.0'}

    keyword_suggestions = []
    for keyword in keywords:
        for char in char_list:
            url = f"http://suggestqueries.google.com/complete/search?client=firefox&hl=en&q={keyword} {char}"
            response = requests.get(url, headers=headers)
            result = json.loads(response.content.decode('utf-8'))
            
            keyword_suggestions.append(keyword)
            for word in result[1]:
                if word != keyword:
                    keyword_suggestions.append(word)
    
    return list(set(keyword_suggestions))

def cluster_keywords(keywords, seed_words):

    _words = []
    stop_words = list(set(stopwords.words('english')))
    
    for keyword in keywords:
        words = nltk.word_tokenize(str(keyword).lower())
        for word in words:
            if (word not in stop_words 
                and not any(word in s for s in seed_words) 
                and len(word) > 1):
                _words.append(word)
    
    top_common_words = [word for word, word_count in nltk.Counter(_words).most_common(200)]

    clusters = []

    for common_word in top_common_words:
        for keyword in keywords:
            if (common_word in str(keyword)):
                clusters.append([keyword, common_word])
    
    df = pd.DataFrame(clusters, columns=['Keyword', 'Cluster'])
    df.to_csv(os.path.join(ROOT, 'keywords.csv'), index=False)

def list_of_items(arg):
    return arg.split(',')

if __name__ == '__main__':

    parser = argparse.ArgumentParser(
        description='Returns suggested long tail keywords from Google.'
    )

    parser.add_argument(
        '-k',
        '--keywords',
        type=list_of_items,
        required=True,
        help='List of seed keywords'
    )

    args = parser.parse_args()
    seed_words = args.keywords
    
    suggestions = get_suggestions(seed_words)
    cluster_keywords(suggestions, seed_words)

Great! Now all that is left is to put it in use. The following command demonstrates how we can do that.

python auto-suggest.py -k "google keyword suggest","google suggest python"

Conclusion

To conclude, we made a fairly simple, but powerful keyword research tool by leveraging Google auto suggest mechanism with Python. I learned a lot while working on this project and I hope you will find it helpful as well.

Thank you for your time and I also hope you will consider sharing this post with others.

Share this article:

Related posts

Discussion(0)