The Paradox of Guidelines

Can artificial intelligence replace static guidelines with a generative system that works at scale?

Written by Mike Creighton, Director of AI Research & Development

One of the primary things we do at Instrument is create guidelines. Brand guidelines, campaign guidelines, design system guidelines. There’s every kind of guideline! But how effectively do brands use these? How does someone who isn't “a creative" or brand strategist interpret those guidelines when selecting photography or illustration?

There are a lot of challenges when it comes to implementing and adhering to brand guidelines. But time-to-market schedules and sheer quantity and quality of output aren’t changing. If anything, they’re becoming more demanding.

Many companies — the big incumbents and startups alike — have been rising to these challenges by creating products and services to address them. And in the last year, we’re starting to see generative AI being employed, which could make these types of solutions much more effective at scale.

But how would it actually work?

Time to Experiment

While I’m sure there’s a tremendous amount of research, sophistication, and proprietary technology behind those solution providers’ implementations, I was curious how far we could get with just off-the-shelf models.

Intuitively, I know that large language models (LLMs) can evaluate written language for adherence to things like brand voice and tone. Even without fine-tuning, sufficient prompting and examples can get most modern LLMs to evaluate copy against brand guidelines successfully.

But I wanted to test a hypothesis: that we can use a state-of-the-art multi-modal LLM to determine whether a visual asset adheres to brand guidelines.

In case you don’t know what “multi-modal LLMs” are, they’re basically language models that have other types of inputs — other modalities — besides text. Google’s Gemini Pro, OpenAI’s GPT-4 Turbo, and Anthropic’s Claude 3 are all multi-modal. They can “see.” Give them an image, and they understand what’s in it. It’s pretty wild. Want to read more? Check out this article.

So now that LLMs can see, can they help us solve the brand consistency problem?

The Approach

I realized over lunch that this was probably something that could be prototyped in about an hour. Strictly with some simple Python and the OpenAI API. We’re specifically going to use OpenAI’s most recent state-of-the-art model: GPT-4 Turbo.

In case you don’t know, working with an API for an LLM is very different from working with something like ChatGPT. When you use the API, you’re effectively scripting the conversation with the LLM, not chatting with it. And you’re “priming” it with what’s called a “system message.” This is where you describe its role, behavior, and what it’s meant to do.

So, we’d use the system message to tell GPT-4 Turbo (the assistant) that it’s basically the brand police and that it evaluates images against user-supplied brand guidelines. We’d add a user message providing those guidelines, add an assistant message acknowledging that it’s received them, and then add a final user message that has the image to be evaluated.

More simply, if this was turned into an app (or even just a single function in code), here’s what would happen conceptually:

User supplies in an image
The AI tells you whether or not it’s on-brand (with rationale and a score)

Simple.

LLM Whispering

Most of the effort with this type of prototype really goes into “prompt engineering,” which is crafting the system prompt and user prompts that make up the fictional conversation. And believe it or not, prompt engineering is actually a thing (at least as of April 2024… as models advance, this will be less necessary). Understanding how LLMs work — along with all their quirks and limitations — is crucial for getting them to do what you want them to do. And it’s only when you’re able to write effective prompts that the true power and magic of LLMs are exposed. But LLMs are weird, and effective prompts can come in the strangest of forms, though there are some best practices that are emerging. Ethan Mollick covers both sides of that spectrum in a recent article.

The Prompts

I want to take a moment to focus on what these prompts look like, so you can get a sense of what it takes to get a large language model to do some pretty remarkable stuff. Here’s a quick glance of the what the general format of the system prompt looks like:

What role the AI is playing
What it’s supposed to do
What its inputs are
What its outputs are

Please note: This is what I came up with just to get the idea out of my head so I could test the hypothesis quickly. After re-reading it, I’d probably reorder a few things for sake of clarity. I’d also be more precise in defining the format of its outputs.

First, we needed to prompt the LLM with the role it’s playing in this exercise and what it’s supposed to do for us.

See first prompt

You are a brand creative director for a major tech company. Your job is to evaluate images in order to determine whether or not images align with pre-defined brand standards. After you've evaluated whether or not an image aligns with brand standards, you enumerate the reasons why or why not and assign a number between 0 and 1 to the image. Where 0 is completely unaligned and 1 is perfectly on-brand.

Your task is to evaluate on-brand color adherence. The user will provide you with the written brand standards and a few images that are examples that align with the brand. Those images will be a color palette and three example images.

The First User Prompts

We’ve created a fictional SaaS brand (called Zephyr) with fictional brand color guidelines for this exercise. This is written in Markdown syntax since GPT-4 Turbo handles that type of formatting well.

We aimed to be as descriptive and accessible as possible with our brand color guidelines.

See written definition of brand color guidelines

# The Colors of Innovation

At Zephyr, we believe color has the power to inspire, energize, and transform.

Our color palette is a dynamic fusion of cutting-edge cool and high-tech heat, reflecting the intelligent ingenuity at the heart of our brand.

We go beyond conventional tech-industry blues and grays, drawing inspiration from the vibrancy of the digital frontier, the vast potential of unexplored possibilities, and the sheer joy of problem-solving through innovation.

The result is a palette that pulses with life, sparks the imagination, and captures the thrill of turning the impossible into reality — empowering our customers to blaze new trails and reach extraordinary heights.

# Zephyr Color Guidelines

## Intelligent Ingenuity

Our color palette embodies the spirit of intelligent ingenuity.

** Intuitive ** colors are clean, clear, and refined, representing elegant simplicity and ease-of-use that is core to our products and brand.

** Inventive ** colors are bright, bold, and unexpected, capturing the thrill of discovery, creative problem-solving, and pushing boundaries with cutting-edge technology.

** Empowering ** colors are strong, dependable, and uplifting - giving our customers the power and confidence to transform the way they work and achieve more than they thought possible.

---

I'm providing you with a color palette and three example images — one representing an "intuitive" approach, one representing an "inventive" approach, and one representing an "empowering" approach.

Here are the 4 images that we append to the user prompt for the LLM to reference as “good” examples.

(More examples would make this system more effective.)

The Results

I ran a number of images through this AI-powered image evaluator. All of these images were created with image generation models (like Midjourney and DALL-E), but some were art directed to be more “on brand” and some were intended to deviate.

One big thing to note: this is only intended to be evaluating images on the COLOR guidelines, though it does tend to comment on the content of the images as well. Further iterations on the prompts would likely minimize this.

Test Image #001

Brand Alignment Analysis

Adherence to Intelligent Ingenuity: The image's overall impression is one of sophistication and creativity, respecting the innovative spirit of Zephyr.
Alignment with Brand Colors: The image utilizes a good range of Zephyr's palette, particularly the blues, greens, and vibrant accents derived from the inventive and intuitive categories.

Conclusion: The image indeed represents a high degree of alignment with Zephyr's brand standards, showcasing intelligent ingenuity, inventiveness, and empowerment in line with the prescribed color palette. Its score could be slightly improved by balancing the bright with more neutral tones to enhance the 'intuitive' aspect further.

Test Image #002

Brand Alignment Analysis

Alignment with Brand Themes: The image exudes a sense of high-tech and futuristic environments, mirroring the "Intelligent Ingenuity" brand virtue. It adheres to the concept of "pushing boundaries with cutting-edge technology".
Emotive Connection: The nighttime urban setting layered with digital enhancements evoke a sense of exploring uncharted technological territories. This is synergistic with Zephyr's emphasis on exploring the “vast potential of unexplored possibilities.”

Conclusion: Overall, the image scores highly in terms of color alignment and thematic consistency with Zephyr’s brand standards. It effectively encapsulates elements of both the Intuitive (through its refined and functional aesthetic) and Inventive (through bold and futuristic digital visuals) aspects described in the guidelines.

Test Image #003

Brand Alignment Analysis

Color Misalignment: While the Zephyr palette does include a bright orange color — which closely matches some elements in the image — the palette also contains many blues, greens, and cool tones which are not represented in the provided image.
Strong Thematic Expression: The fluid, evolving nature of the design could metaphorically represent the "Inventive" aspect through its depiction of unrestrained and organic flow, resonating with the ideas of creativity and problem-solving.

Conclusion: While the image does strongly align thematically with the brand, it only halfway represents the color palette. In brand displays or marketing, it may be beneficial when used in conjunction with images or elements that highlight the cooler and more subdued shades from the palette to maintain holistic brand integrity.

Test Image #004

Brand Alignment Analysis

Color Misalignment: The golden and yellow colors align with Zephyr's descriptors of "empowering" colors. However, it lacks the direct representation of the exact colors like the bright orange or vibrant red shown in the brand palette.
Not-Perfect Brand Fit: The emotional impact of the image is harmonious with the brand's vision of transforming the impossible into reality, albeit this link is subtler and more interpretative.

Conclusion: The image partially aligns with the warmth and uplifting aspects of the “empowering” color themes but does not fully embrace the bold and vibrant spectrum that defines Zephyr's color identity. The usage of colors in the image has a metaphorical alignment rather than a literal adherence to the defined palette.

Test Image #005

Brand Alignment Analysis

Color Misalignment: This image strongly focuses on the grayscale which embodies more of the traditional tech aesthetic rather than a broader dynamic and vibrant spectrum. Therefore, the image doesn’t fully capture the brand essence.
Brand Messaging and Image Theme: While the image beautifully represents modernity and technological innovation, the absence of varied colors may not fully communicate the vibrancy and the emotional engagement.

Conclusion: The image aligns well with the grayscale component of the palette but does not integrate the vibrant, imaginative, and empowering color spectrum that the brand aims to represent. It captures the high-tech feel but doesn't thoroughly embrace the empowering and inventive color aspirations detailed in Zephyr's guidelines.

Test Image #006

Brand Alignment Analysis

Color Palette Match: The image incorporates a variety of colors that align well with the provided Zephyr color palette. Notably, the shades of deep blue, gray, red, and yellow are strikingly similar to those specified in the Zephyr palette.
Theme Conformity: The geometric and abstract nature of the image conveys a sense of inventive and intelligent design, which is harmonious with the theme of “Intelligent Ingenuity” emphasized by Zephyr.

Conclusion: The colors and motifs in this image powerfully exhibit the qualities Zephyr aspires to communicate. It stands as an excellent representation of the Zephyr brand’s commitment to innovative and empowered technological solutions. This image would be a strong candidate for use in marketing materials, digital platforms, or environmental branding.

Test Image #007

Brand Alignment Analysis

Brand Color Misalignment: The color palette of Zephyr includes bold and vibrant colors along with strong neutrals. From the color palette provided:

Conclusion: The image aligns partially with the "Intuitive" approach due to its natural and subdued tones which evoke a sense of simplicity and coherence. However, it lacks the vibrant, inventive, and empowering color characteristics that are significant to the full spectrum of the Zephyr palette. Overall, it lacks significant representation in the inventive and empowering categories vital to the Zephyr brand’s visual identity.

Test Image #008

Brand Alignment Analysis

Brand Attributes Reflection: The image successfully captures the essence of Zephyr’s color philosophy, involving the vibrant digital frontier. The neon and ambient lighting set a futuristic mood, ideal for depicting Zephyr's core attributes.
Consistency and Cohesiveness: Within the context of the image, the use of color is strategically cohesive. Each color is placed to enhance the technological and modern feel of the setting, maintaining a strong brand identity leveraging color to inspire and energize.

Conclusion: Overall, this image aligns well with the Zephyr brand standards. The rich and luminous color usage not only enhances the visual appeal but directly connects to the core values of innovation and empowerment.

The Learnings

I sort of thought this might work, but that it would take some poking and prodding to really know if this hypothesis could be proven with “off-the-shelf” technology. Meaning: to figure out if this could be done without access to fine-tuning (where you further train a model on custom data to get a good result).

Even with such a simple first-pass prototype, I can pretty confidently say that this is possible. More samples need to be tested, but the added discovery here is that the LLM can give you detailed rationale for why an image might not be on-brand and how you might improve it.

We could iterate on the prompts further to give a user more specific and actionable advice for ensuring a given image is on-brand. Furthermore, we could tell the LLM to provide this advice in a way that would be intuitive and accessible for a “non-creative” person. The unlock here is that both agency partners and the brand’s employees can get up to speed and start executing with effectiveness by getting instant feedback. This effectively empowers everyone to take the right actions to keep assets on-brand.

Looking Ahead

As we think about these guidelines that we meticulously create for our clients and as we think about the challenges that their organizations face with implementing these guidelines, this little experiment represents a possible evolution in how we think about guidelines. And how we think about our own service offerings.

No longer do guidelines need to be a static Figma doc. Guidelines can be a holistic system — a system that’s supported and made enforceable by this very weird technology. Sufficient examples and supporting descriptors of rules will be the activator of these systems. Yes, that means that creating guidelines will be a unique form of prompt engineering, suitable for LLM intake and interpretation. This doesn’t just serve AI; it aids all humans reviewing these guidelines because it leaves things less open to interpretation. And it still exists in plain language.

There are many ways we can extend this system to cover more than color guidelines. It’s a matter of some additional prompting and providing the LLM with more examples. It would result in more time to process an image, but a couple minutes of processing time would yield so much more efficiency, autonomy, and consistency for a company at large.

This approach could be further extended to include copywriting guidelines, ensuring that voice and tone and length are adhered to. As I said at the top, LLMs are really good at spotting those types of inconsistencies.

All-in-all, this approach of using off-the-shelf models is surprisingly promising. There’s certainly more experimentation to do, since my gut says that certain types of evaluations simply won’t be possible with that technology. That’s really where I’m expecting the companies focused on this problem space to excel.

But the benefits of a system that gets you 80% of the way there — even without full evaluation coverage — is that it still brings more consistency to output, streamlines the review process, and brings confidence to everyone creating brand content. Moreover, I think there’s merit to a more bespoke system that’s able to be customized to a given brand’s workflow and infrastructure rather than the other way around. This is an area that we’ll continue pushing into.

No longer do guidelines need to be a static Figma doc. Guidelines can be a holistic system — a system that’s supported and made enforceable by this very weird technology.

Mike CreightonDirector of AI Research & Development

In Summary

Generative AI has a ton of potential to address current challenges for brand guidelines. By prompting multi-modal large language models, we can turn static visual guidelines into holistic, AI-powered systems that evaluate content adherence and provide detailed feedback. The result is a streamlined review process that’s customized to a brand, resulting in improved consistency and efficiency that can outperform a one-size-fits-all solution.

Want to learn more about what AI can do for your business?
Get in touch.

Written by Mike Creighton, Director of AI Research & Development

Related Reading