Home » How To Make A Image Generating Discord Bot Using Stability AI

How To Make A Image Generating Discord Bot Using Stability AI

In this post, we’ll be making a Discord bot for generating images using a text prompt by utilizing Stability AI. Furthermore, Stability AI models give us a variety of options to set for generating output images. Following tutorial will include a simple text to image generation with a brief explanation of each parameter we can set.

When I started this project, I took inspiration from Midjourney‘s bot for generating images. However, our bot is not going to be as advanced, but you can certainly improve upon it.

Before we begin with code, it’s important that you setup your stability.ai account, where you’ll get 25 credits for free. You can generate a few images with that, how many exactly depends on the settings you’ll use for your generations.

Once you’ve finished, you can copy the API key from developer platform under API Keys tab in your account settings. We’ll need this key, so save it somewhere safe.

Coding Discord bot

First of all, like with any Python project, we need to import all the necessary libraries and tools. These include Discord of course, Pillow for image processing, and Stability-SDK for accessing stable diffusion model.

You can install these libraries with pip with the following commands

pip install discord.py
pip install Pillow
pip install stability-sdk

import os
import io
import json
import discord
import asyncio
from discord.ext import commands
from PIL import Image
from stability_sdk import client
import stability_sdk.interfaces.gooseai.generation.generation_pb2 as generation

Since, we need to access 2 services via APIs, we’ll create a function that will fetch our API keys (tokens) from a separate json file. The purpose of this is so we don’t reveal these tokens in our code in case you want to share it with others.

ROOT = os.path.dirname(__file__)

def get_token(token_name):
    with open(os.path.join(ROOT, 'auth.json'), 'r') as auth_file:
        auth_data = json.load(auth_file)
        token = auth_data[token_name]
        return token

Making Stability AI powered functionality

Okay, now we’ll setup an instance of Stability AI client, for which we need to define a host, API key, and name of the stable diffusion model.

os.environ['STABILITY_HOST'] = 'grpc.stability.ai:443'
os.environ['STABILITY_KEY'] = get_token('stabilityai-token')

stability_api = client.StabilityInference(
    key=os.environ['STABILITY_KEY'],
    verbose=True,
    engine='stable-diffusion-xl-1024-v1-0'
)

Great! With this, our stable diffusion model is ready to use. Next thing on our list is creating an image generation cog for our Discord bot. Inside it, we’ll define a function for a hybrid command that will take a prompt of arbitrary length.

Furthermore, we’re going to check out the settings for the image generation model, and explain how each parameter influences the generation. Additionally, the following code will save the generated image and post it in the Discord channel.

class ImageGenCog(commands.Cog):
    def __init__(self, bot):
        self.bot = bot
    
    @commands.hybrid_command(name='imagine')
    async def imagine(self, ctx, *, prompt):
        async with ctx.typing():
            answers = stability_api.generate(
                prompt=prompt,
                width=1024,
                height=768,
                seed=42,
                steps=50,
                cfg_scale=8.0,
                samples=1, # number of images to generate
                sampler=generation.SAMPLER_K_DPMPP_2M,
                style_preset='3d-model',
            )
    
            img_path = os.path.join(ROOT, 'data', 'generated_images', 'generated.png')
            for resp in answers:
                for artifact in resp.artifacts:
                    if artifact.type == generation.ARTIFACT_IMAGE:
                        img = Image.open(io.BytesIO(artifact.binary))
                        img.save(img_path)
            
            await ctx.send(file=discord.File(img_path))

When we call the generate function with Stability client, we can input only prompt, which will cause the model to use default values for other parameters. However, if we want more control over what the model outputs, we have many options to choose from.

Stability AI Stable diffusion model parameters

We can set width and height of the image/s, but you need to set the values to be multiples of 64. If you leave these options empty, the function will use default values of 1024 for this model we specified.

Next parameter on our list is seed, which will enable you to generate same image with the same prompt, if we set it.

Further, steps option refers to inference steps, which influences the image quality.

Next, the cfg_scale parameter influences how close the image generation is to the prompt you provide. The higher the value, the closer the output image will match the prompt.

Samples parameter defines how many images you want the model to output at once, which by default is 1.

Sampler option allows you to choose between samplers for denoising the generation.

And lastly, style_preset option allows you to set a style in which you want your model to generate.

There is a number of options you can choose from:

enhance
anime
photographic
digital-art
comic-book
fantasy-art
line-art
analog-film
neon-punk
isometric
low-poly
origami
modeling-compound
cinematic
3d-model
pixel-art
tile-texture

Define & run the Discord bot instance

Last thing we need to do is define the Discord bot instance and all its necessary components. These include intents, command prefix, and description. To clarify, intents will give our bot permission to post messages in text channels, command prefix will define how we can invoke its commands, and description will simply describe what our bot is for.

We also need to define the on_ready function where it’s important that we synchonize our bot commands, if we want to use the slash (/) commands.

intents = discord.Intents.default()
intents.message_content = True

bot = commands.Bot(
    command_prefix=commands.when_mentioned_or('!'),
    description='Image generator bot',
    intents=intents
)

@bot.event
async def on_ready():
    print(f'Logged in as {bot.user} (ID: {bot.user.id})')
    print('------')
    await bot.tree.sync()

async def main():
    async with bot:
        await bot.add_cog(ImageGenCog(bot))
        await bot.start(get_token('discord-token'))

asyncio.run(main())

And with that, our bot is finished.

Here is also the entire code of this project.

import os
import io
import json
import discord
import asyncio
from discord.ext import commands
from PIL import Image
from stability_sdk import client
import stability_sdk.interfaces.gooseai.generation.generation_pb2 as generation

ROOT = os.path.dirname(__file__)

def get_token(token_name):
    with open(os.path.join(ROOT, 'auth.json'), 'r') as auth_file:
        auth_data = json.load(auth_file)
        token = auth_data[token_name]
        return token
    
os.environ['STABILITY_HOST'] = 'grpc.stability.ai:443'
os.environ['STABILITY_KEY'] = get_token('stabilityai-token')

stability_api = client.StabilityInference(
    key=os.environ['STABILITY_KEY'],
    verbose=True,
    engine='stable-diffusion-xl-1024-v1-0'
)

class ImageGenCog(commands.Cog):
    def __init__(self, bot):
        self.bot = bot
    
    @commands.hybrid_command(name='imagine')
    async def imagine(self, ctx, *, prompt):
        async with ctx.typing():
            answers = stability_api.generate(
                prompt=prompt,
                width=1024,
                height=768,
                seed=42, # allows you to generate identical images by inputing same prompt
                steps=50, # inference steps
                cfg_scale=8.0, # how strongly our generation is guided to match our prompt
                samples=1, # number of images to generate
                sampler=generation.SAMPLER_K_DPMPP_2M, # sampler to denoise the generation
                style_preset='3d-model', # select a style to create an image in: 
                                         # enhance, anime, photographic, digital-art,
                                         # comic-book, fantasy-art, line-art, analog-film,
                                         # neon-punk, isometric, low-poly, origami, modeling-compound,
                                         # cinematic, 3d-model, pixel-art, tile-texture
            )
    
            img_path = os.path.join(ROOT, 'data', 'generated_images', 'generated.png')
            for resp in answers:
                for artifact in resp.artifacts:
                    if artifact.type == generation.ARTIFACT_IMAGE:
                        img = Image.open(io.BytesIO(artifact.binary))
                        img.save(img_path)
            
            await ctx.send(file=discord.File(img_path))

            

intents = discord.Intents.default()
intents.message_content = True

bot = commands.Bot(
    command_prefix=commands.when_mentioned_or('!'),
    description='Image generator bot',
    intents=intents
)

@bot.event
async def on_ready():
    print(f'Logged in as {bot.user} (ID: {bot.user.id})')
    print('------')
    await bot.tree.sync()

async def main():
    async with bot:
        await bot.add_cog(ImageGenCog(bot))
        await bot.start(get_token('discord-token'))

asyncio.run(main())

Conclusion

To conclude, we made a simple Discord bot for generating images using Stability AI API to access their stable diffusion models. I learned a lot while working on this project and I hope it proves itself helpful for you as well.