Bing does not understand the word "subtle"

Usernameblankface@lemmy.world · edit-2 9 months ago

Bing does not understand the word "subtle"

j4k3@lemmy.world · 9 months ago

I never play with proprietary AI like this, so I don’t know this model, but I have many image diffusion models I run offline.

I don’t know how experienced you are with prompting, but making a few assumptions…

Shift how you think about prompting for an image. Think of the prompt like you are addressing an entity like an roleplaying with a LLM. If you really get to know a LLM with roleplaying, you’ll learn that the model is trying to satisfy the fundamental needs of every character involved including the one you play. It is doing all of this within the limits it has assumed (or have been described) for each character.

Image diffusion works in much the same way. The prompt is talking to something akin to a roleplaying entity that can only respond by generating an image, but it is still a dynamic and emotional entity. When you say it “does not understand the word subtle” that is likely not the case. There is a configuration setting (that may or may not be available to you) that tells the model how strongly to follow the prompt. If you try and make this too strong of a setting, you’ll get terrible results. If you explore this in detail you may notice these responses are like a vindictive little child retaliating from being punished unfairly. You must allow the entity their own sense of creative collaboration for their own satisfaction.

If you really want subtlety, the key is to describe what you really want with more passion and flair. There is a major emotional element to this and it really requires the user exploring their own inner emotions on never before explored levels of thought needed to communicate their ideas with more verbosity.

I only learned this because I connected a text roleplaying model to an image diffusion model in software someone else wrote and I modified. I monitored how the images were generated and noticed it was simply long text. I started observing the effect in detail and that lead me here.

You can write a few keywords into an image prompt and it will try and create an emotional story to fill in the gaps, but you need to describe how the image makes you feel and why if you really want specificity in detail. This is hard to do IMO and it takes a lot of practice along with a willingness to explore things like why you like a “subtle Hawaiian shirt” or what subtle really means in less subjective terms.

Usernameblankface@lemmy.world · edit-2 9 months ago

Hmm. Using Bing, I definitely do not have access to settings, I can only change my input to be longer and more descriptive of my idea.

In Bing Image Creator, DAL-E 3

Prompt: a Hawaiian shirt with a normal palm trees, bright colored flower blooms, and white background design. Artfully and playfully hidden in the white spaces and among the loud colors are many subtle, small, hidden hotwheels logos, designed to catch the eye on closer observation, but hidden at first glance.

It seems to have taken Hot Wheels as meaning the cars rather than the logos. But it’s a lot better!

IvanOverdrive@lemm.ee · 9 months ago

What exactly is a subtle Hawaiian shirt design? Muted colors? Lots of negative space? Be very specific in what you want. I don’t know what “subtle” is supposed to mean. An LLM sure isn’t going to.

Usernameblankface@lemmy.world · 9 months ago

Ok, I meant a regular Hawaiian shirt with subtle hotwheels logos… Like hidden mikeys or something

IvanOverdrive@lemm.ee · 9 months ago

Try this: A Hawaiian shirt with hidden Hot Wheels logos incorporated into the leaves.

Huckledebuck@sh.itjust.works · 9 months ago

I know a lot of grown ass people that also don’t know what subtle means.

fruitycoder@sh.itjust.works · 9 months ago

Guy Fieri levels of subtle

Altima NEO@lemmy.zip · edit-2 9 months ago

I agree with the other guy. Ive been using Stable Diffusion for about a year now, and been playing with Bing for a few months.

When it comes to describing stuff, words like “subtle” dont really mean much. You gotta be real deliberate with your description and almost treat it like its dumb. Simple descriptions, but detailed with quantifiable words. You can sprinkle a few qualitative words here and there, but dont rely on them to be the main driver of the composition. They can help make a blah image into a much nicer one, but they dont usually make as huge of a difference as more descriptive words.

Now, trying to see if I can get bing to do anything with the Hot Wheels logo makes me think it may be a bit overtrained, because it sure isnt budging!

Usernameblankface@lemmy.world · 9 months ago

Ah, of course, quantifiable. It’s a computer, it has to have a quantifiable description to work with.

Yeah, Hot Wheels has a LOT of images online, and they’re never subtle or hiding anything.

Altima NEO@lemmy.zip · 9 months ago

Yeah. Its too bad theres not as much control as you would have like with Stable Diffusion. Its a lot easier to tweak it to try and hide the logo within the image, for example.