I’m using Stable Diffusion. It’s open-source, and the software doesn’t cost anything…but you run it locally, so you need a graphics card capable of handling it, and it has a serious hunger for VRAM. I’m using an AMD Radeon RX 7900 XTX with 24GB of VRAM; that’s something like $1k. Your CPU doesn’t really matter, as the GPU does the heavy-lifting. An 8GB card probably is going to have problems running out of memory running the SDXL models that are current; I’d probably go with a 16GB+ card if I intended to run it.
Part of the “AI” here is the software, Stable Diffusion, and part is the model, the “memory” or “knowledge” of the AI; I use realmixXL 1.5; as with most other Stable Diffusion models, it’s downloadable from civitai.com, and costs nothing.
Relative to the other AIs people on here are likely to use…hmm. Well, the major commercial services tend to censor; and have various content filters. If you want to generate pornography, Stable Diffusion is probably where you want to be. Ditto for things like images of celebrities, which I understand the online services are aiming at also restricting (and maybe art styles of artists, not sure where the commercial services are going with that). And if you’re a programmer and want to write software that interacts directly with it, SD’s where you’ll want to be.
Both Midjourney and DALL-E have a natural-language processing pass, so the idea is that one feeds them somewhat English-looking sentences, which are easier to write. You’ll generally get something not entirely unreasonable in Stable Diffusion if you do that, but normally one just feeds it a list of tokens, a list of keywords.
The commercial services will also data-mine what you’re doing; Stable Diffusion stuff stays on your local computer. That could matter to you, if you take issue with things like Bing or Google data-mining searches.
Stable Diffusion tends to provide for a lot of control, and there are a lot of new extensions being put out regularly for it by researchers.
On the other hand, you’re limited to the hardware that you have. You’re paying for that hardware, even if it’s not actually running; whatever premium pricing commercial online services have or will have, they’ll ultimately be able to spread out the cost of hardware across many users by keeping the compute cards more-or-less constantly in use. If you use your GPU 1% of its lifetime…well, if they can keep the compute cards in their datacenters at an average of 80% activity, they’re getting 80 times as much good out of their parallel compute hardware.
I don’t know what hardware Midjourney and DALL-E are running on, but I would wager that if they aren’t yet, they will be running on compute cards with more VRAM than most home users are going to be able to get ahold of, so they’ll probably have an advantage in terms of the potential model size. That’s a guess.
There may be other features added down the line in the natural-language-processing layer in the commercial services, and AFAICT, that’s not really a primary focus of Stable Diffusion development. My guess is that one of the big commercial services will probably wind up looking something like Instagram – it’ll become approachable enough for everyone and get a huge user base – and I suspect that it won’t be Stable Diffusion because of the lack of the NLP layer, the fact that it doesn’t aim to take in English-language-looking sentences.
Stable Diffusion also has multiple frontends. I generally use the (popular) Automatic1111 frontend, which is more-analogous to the Midjourney or DALL-E UIs. Another notable frontend is ComfyUI; this works more like image-processing software, where one builds up a directed graph of operations, and then as you make changes, the graph will recompute stuff as needed. That’s probably more-useful for compositing complex scenes, but it’s also slower to put together a scene; one isn’t just plonking in some search terms.
Adobe also has some kind of generative AI effort (“Firefly”) going on, I assume is gonna integrate it with their graphics processing software. I’ve got no experience with it, but if you’re a serious Photoshop user, you might want to look into it, see what it’s like, since I’d guess that whatever they come up with, they’ll probably do a reasonable job of integrating it with their traditional image-processing software.
There are a number of users of all of SD, Midjourney, and DALL-E/Bing on this community; you’ll get solid images out of all of them, as things stand.
I should also mention for completeness that one can “rent” a computer with a large GPU and use it remotely on places like vast.ai, if you just want to dabble a bit. But then you’re also kind of in the position of keeping the GPU idle a fair bit of the time, just as if you had it locally.
There may be someone running online Stable Diffusion-based services out there, but I haven’t gone looking to get an appraisal of what the state of affairs there is.
EDIT: I should also note that you can run Stable Diffusion on your CPU. It will be very, very, very slow, and unless you just want to take a look at the UI or something, you are probably going to go bonkers pretty quickly if you try doing any significant work on the CPU. Might work if you just want to occasionally upscale an image – something that it’s pretty good at.
What if I have quad 12-core Xeons with 196GB of RAM?
I have a 24-core i9-13900 and 128GB of RAM and I briefly tried it and recall it being what I’d call unusably slow. That being said, I also just discovered that my water cooler’s pump has been broken and the poor CPU had been running with zero cooling for the past six months and throttling the bajesus out of itself, so maybe I’d be possible to improve on that a bit.
If you seriously want to try it, I’d just give it a spin. Won’t cost you more then the time to download and install it, and you’ll know how it performs. And you’ll get to try the UI.
I just don’t want to give the impression to people that they’re gonna be happy with on-CPU performance and then have them be disappointed, hence the qualifiers.
EDIT: Here’s a fork designed specifically for the CPU that uses a bunch of other optimizations (like the turbo “do a generation in only a couple iterations” thing, which I understand has some quality tradeoffs) that says that it can get down into practical times for a CPU, just a couple of seconds. It can’t do 1024x1024 images, though.
I’m using Stable Diffusion. It’s open-source, and the software doesn’t cost anything…but you run it locally, so you need a graphics card capable of handling it, and it has a serious hunger for VRAM. I’m using an AMD Radeon RX 7900 XTX with 24GB of VRAM; that’s something like $1k. Your CPU doesn’t really matter, as the GPU does the heavy-lifting. An 8GB card probably is going to have problems running out of memory running the SDXL models that are current; I’d probably go with a 16GB+ card if I intended to run it.
Part of the “AI” here is the software, Stable Diffusion, and part is the model, the “memory” or “knowledge” of the AI; I use realmixXL 1.5; as with most other Stable Diffusion models, it’s downloadable from civitai.com, and costs nothing.
Relative to the other AIs people on here are likely to use…hmm. Well, the major commercial services tend to censor; and have various content filters. If you want to generate pornography, Stable Diffusion is probably where you want to be. Ditto for things like images of celebrities, which I understand the online services are aiming at also restricting (and maybe art styles of artists, not sure where the commercial services are going with that). And if you’re a programmer and want to write software that interacts directly with it, SD’s where you’ll want to be.
Both Midjourney and DALL-E have a natural-language processing pass, so the idea is that one feeds them somewhat English-looking sentences, which are easier to write. You’ll generally get something not entirely unreasonable in Stable Diffusion if you do that, but normally one just feeds it a list of tokens, a list of keywords.
The commercial services will also data-mine what you’re doing; Stable Diffusion stuff stays on your local computer. That could matter to you, if you take issue with things like Bing or Google data-mining searches.
Stable Diffusion tends to provide for a lot of control, and there are a lot of new extensions being put out regularly for it by researchers.
On the other hand, you’re limited to the hardware that you have. You’re paying for that hardware, even if it’s not actually running; whatever premium pricing commercial online services have or will have, they’ll ultimately be able to spread out the cost of hardware across many users by keeping the compute cards more-or-less constantly in use. If you use your GPU 1% of its lifetime…well, if they can keep the compute cards in their datacenters at an average of 80% activity, they’re getting 80 times as much good out of their parallel compute hardware.
I don’t know what hardware Midjourney and DALL-E are running on, but I would wager that if they aren’t yet, they will be running on compute cards with more VRAM than most home users are going to be able to get ahold of, so they’ll probably have an advantage in terms of the potential model size. That’s a guess.
There may be other features added down the line in the natural-language-processing layer in the commercial services, and AFAICT, that’s not really a primary focus of Stable Diffusion development. My guess is that one of the big commercial services will probably wind up looking something like Instagram – it’ll become approachable enough for everyone and get a huge user base – and I suspect that it won’t be Stable Diffusion because of the lack of the NLP layer, the fact that it doesn’t aim to take in English-language-looking sentences.
Stable Diffusion also has multiple frontends. I generally use the (popular) Automatic1111 frontend, which is more-analogous to the Midjourney or DALL-E UIs. Another notable frontend is ComfyUI; this works more like image-processing software, where one builds up a directed graph of operations, and then as you make changes, the graph will recompute stuff as needed. That’s probably more-useful for compositing complex scenes, but it’s also slower to put together a scene; one isn’t just plonking in some search terms.
Adobe also has some kind of generative AI effort (“Firefly”) going on, I assume is gonna integrate it with their graphics processing software. I’ve got no experience with it, but if you’re a serious Photoshop user, you might want to look into it, see what it’s like, since I’d guess that whatever they come up with, they’ll probably do a reasonable job of integrating it with their traditional image-processing software.
There are a number of users of all of SD, Midjourney, and DALL-E/Bing on this community; you’ll get solid images out of all of them, as things stand.
I should also mention for completeness that one can “rent” a computer with a large GPU and use it remotely on places like vast.ai, if you just want to dabble a bit. But then you’re also kind of in the position of keeping the GPU idle a fair bit of the time, just as if you had it locally.
There may be someone running online Stable Diffusion-based services out there, but I haven’t gone looking to get an appraisal of what the state of affairs there is.
EDIT: I should also note that you can run Stable Diffusion on your CPU. It will be very, very, very slow, and unless you just want to take a look at the UI or something, you are probably going to go bonkers pretty quickly if you try doing any significant work on the CPU. Might work if you just want to occasionally upscale an image – something that it’s pretty good at.
What if I have quad 12-core Xeons with 196GB of RAM?
How slow are we talking? Would a prompt I can run on Mage.space in 3min take my system hours? or days?
I have a 24-core i9-13900 and 128GB of RAM and I briefly tried it and recall it being what I’d call unusably slow. That being said, I also just discovered that my water cooler’s pump has been broken and the poor CPU had been running with zero cooling for the past six months and throttling the bajesus out of itself, so maybe I’d be possible to improve on that a bit.
If you seriously want to try it, I’d just give it a spin. Won’t cost you more then the time to download and install it, and you’ll know how it performs. And you’ll get to try the UI.
I just don’t want to give the impression to people that they’re gonna be happy with on-CPU performance and then have them be disappointed, hence the qualifiers.
EDIT: Here’s a fork designed specifically for the CPU that uses a bunch of other optimizations (like the turbo “do a generation in only a couple iterations” thing, which I understand has some quality tradeoffs) that says that it can get down into practical times for a CPU, just a couple of seconds. It can’t do 1024x1024 images, though.
https://github.com/rupeshs/fastsdcpu
I haven’t used it, though. And I don’t think that that “turbo” approach lets you use arbitrary models.