AI Image Generation with Stable Diffusion

Lately I’ve been enthralled with AI image generation, specifically Stable Diffusion. The idea is that you have a model that relates text to images. The tool generates noise and refines it over steps into an image that relates to the prompt. I’d previously signed up for the beta access to DALL·E 2 and got to try it out. You get a certain amount of free credits per month, which is nice, but AI image generation is a lot about trial-and-error and thus it’s easy to chew through credits while trying to tweak the prompt to get the target image result. I also played around with Stability.Ai’s DreamStudio Lite, which has a similar credit model.

On August 22, 2022 Stability.Ai publicly released their text-to-image model – a big move that opened up lots of options for developers. After trying a few different projects I landed on AUTOMATIC111 / stable-diffusion-webui, which was pretty easy to set up and has optimizations that make it very usable on my NVIDIA video card with 4GB of VRAM. At 512 x 512 pixels, it usually takes between 1-3 seconds per iteration but I’m a patient person and the tools allows for great batching so that you can set up a bunch variations and then check back when done.

Below are some of the images I’ve generated. These are typically the result of a number of “img2img” iterations – taking the result from the initial text prompt, noising it up a certain percentage and using that plus a a text prompt to nudge the result toward what I had in mind. Sure, it’s fun to just type in something and get results but that’s not satisfying for me.

Stable Diffusion does detailed textures very well
It’s fun watching how mechanical things are generated, how parts meld together
In some variations, the rock on the right was textured like a tree instead
I edited an initial version to add blobs for face and hands and then ran it through further img2img to build the detail
Stable Diffusion struggles with positional prompts, like “parrot on shoulder”, so I edited in the parrot on a good version of the man and then Stable Diffusion was able to stitch the two together
A edited in the fire on the right and the vertical lines of the structure at the top left and then ran it through again to bring out the detail

Leave a Reply