StableDiffusionV2Model
Latent diffusion model for high-resolution text-to-image generation.
Wraps the Stable Diffusion 2.x family of checkpoints released by Stability AI. The pipeline uses a U-Net denoiser conditioned on OpenCLIP text embeddings (ViT-H/14) and a variational autoencoder (VAE) to iteratively denoise a latent representation into a high-resolution image.
Four checkpoints are supported:
stable-diffusion-2/stable-diffusion-2-1— trained at 768 px, produce sharper detail; '2-1' is further fine-tuned and generally outperforms '2'.stable-diffusion-2-base/stable-diffusion-2-1-base— trained at 512 px, faster and lower memory; best for rapid prototyping.
Models are served from the sd2-community HuggingFace organisation,
a community mirror of the original Stability AI weights (deprecated at
stabilityai).
References
- [1] Rombach et al., "High-Resolution Image Synthesis with Latent Diffusion Models", CVPR 2022. https://arxiv.org/abs/2112.10752
- [2] https://huggingface.co/sd2-community
Parameters
- model_name : string, default=
sd2-community/stable-diffusion-2 - The specific Stable Diffusion 2.x checkpoint to load. The '-base' variants are trained at 512x512 px and are faster; the non-base variants target 768x768 px and produce sharper detail. The '2-1' variants are fine-tuned further and generally outperform '2'.
- negative_prompt
- num_inference_steps : integer, default=
15 - Number of denoising steps to run. More steps refine the image but increase generation time. Typical range: 15-30 for fast results, 40-50 for higher quality. Values above 100 rarely improve output.
- guidance_scale : number, default=
3.5 - Classifier-Free Guidance (CFG) scale. Controls how strictly the image follows the text prompt. Low values (1-4) allow creative freedom; medium values (5-9) balance quality and adherence; high values (10+) enforce the prompt but may produce artifacts.
- device : string, default=
CPU - Hardware device for inference. Select a GPU option for hardware acceleration, which is strongly recommended for diffusion models. Select 'CPU' on systems without a compatible GPU, but expect significantly longer generation times.
- seed : integer, default=
-1 - Random seed for reproducible generation. A fixed positive integer will always produce the same image for identical settings. Use a negative value (e.g. -1) for a random seed on each run.
- width : integer, default=
512 - Width of the output image in pixels. Must be a multiple of 8. Native resolution is 512 for '-base' variants and 768 for others. Using the native resolution produces the best quality results.
- height : integer, default=
512 - Height of the output image in pixels. Must be a multiple of 8. Native resolution is 512 for '-base' variants and 768 for others. Using the native resolution produces the best quality results.
- num_images_per_prompt : integer, default=
1 - How many images to generate from a single prompt in one batch. Increasing this value is more efficient than running multiple sessions, but requires proportionally more GPU memory.
Methods
generate(self, input: str) -> List[Any]
StableDiffusionV2ModelGenerate output from a generative model.
Parameters
- input : str
- Input data to be generated
Returns
- List[Any]
- Generated output images in a list
get_schema(cls) -> dict
ConfigObjectGenerates the component related Json Schema.
Returns
- dict
- Dictionary representing the Json Schema of the component.
validate_and_transform(self, raw_data: dict) -> dict
ConfigObjectIt takes the data given by the user to initialize the model and returns it with all the objects that the model needs to work.
Parameters
- raw_data : dict
- A dictionary with the data provided by the user to initialize the model.
Returns
- dict
- A validated dictionary with the necessary objects.