A few days ago, Adobe made an exciting announcement that there’s no longer a waitlist for trying out its generative AI Firefly. If you have not tried it yet, you can check out the beta version on the web by visiting firefly.adobe.com.
One particular feature that caught my attention is the Text Effects, which I find personally fascinating. So, I decided to explore this concept myself using Stable Diffusion — an open-source text-to-image model. You can experiment with Stable Diffusion on the web but for the purposes for this post I’ll be using Automatic1111’s Web UI for Stable Diffusion on my M1 Mac since it lets you have more control over the generations.
So let’s dive in!
Approach #1: txt2img
My initial instinct was to simply use a text prompt and see if I could generate something even remotely similar to Firefly effects.
Prompt: the letter A made with intricate gold ornaments
This was a complete fail! The effect itself looked quite nice, but unfortunately, Stable Diffusion has completely distorted the letter. It looks more similar to ‘B’ than ‘A’. This was somewhat expected since diffusion models generally struggle when it comes to drawing text.
Approach #2: txt2img with ControlNet
To provide it a helping hand with the general shape of the letter, I decided to use ControlNet. It is a neural network structure that allows you to control the output of diffusion models by adding certain conditions.
ControlNet enables you to manipulate poses, expressions, convert scribbles into realistic images, and much more. If you’re curious and want to learn more about ControlNet, you can check it out on GitHub.
You can install the extension to get it working with Automatic1111.
ControlNet takes an input image(the control), a preprocessor and a model.For the control image, I created an image of black text on white background of the letter I desired using Canva and added it in ControlNet.
Prompt: the letter A made with intricate gold ornaments
ControlNet Preprocessor: invert (from white bg & black line)
ControlNet Model: control_v11p_sd15_lineart
Perfect! Here are some more generations using ControlNet:
Notice that I used thicker fonts for these controls, allowing the model to have enough space to work with and produce more pronounced effects as compared to using a thin font.
But does this approach work on full words as well? Let’s check it out!
Upon experimenting, I found out that this approach starts giving unreliable results with words.
Approach #3: Batching & Combining
What if I tried generating each letter of the word (using the same seed for consistent results) separately and put them together after removing their backgrounds? That should work right?
This almost works but again the generations are not perfect, and there’s excessive noise in the background that makes it hard to extract the letters. Look at these failed ‘B’, ‘R’ and ‘D’. I want something more reliable.
Approach #4: img2img with ControlNet
I’m contemplating adding another layer of control using img2img to help nudge Stable Diffusion in the right direction.
Step #1: Generate a letter using txt2img and ControlNet
Step #2: Create a text mask to extract the letter from the output of the previous generation
Step #3: Generate letter again using the text mask as img2img input and ControlNet
Step #4: Combine letters to form words
This technique has yielded the best results so far, even with a white background, making it easier to compile and form complete words.
I’ve also been experimenting with these concepts in code, attempting to create my own Firefly-like app. If you’re interested, you can check it out on GitHub.
I hope you found this post intriguing! See you in the next one!