Skip to main content

StableDiffusionV2Model

GenerativeModel
DashAI.back.models.hugging_face.StableDiffusionV2Model

Latent diffusion model for high-resolution text-to-image generation.

Wraps the Stable Diffusion 2.x family of checkpoints released by Stability AI. The pipeline uses a U-Net denoiser conditioned on OpenCLIP text embeddings (ViT-H/14) and a variational autoencoder (VAE) to iteratively denoise a latent representation into a high-resolution image.

Four checkpoints are supported:

  • stable-diffusion-2 / stable-diffusion-2-1 — trained at 768 px, produce sharper detail; '2-1' is further fine-tuned and generally outperforms '2'.
  • stable-diffusion-2-base / stable-diffusion-2-1-base — trained at 512 px, faster and lower memory; best for rapid prototyping.

Models are served from the sd2-community HuggingFace organisation, a community mirror of the original Stability AI weights (deprecated at stabilityai).

References

Parameters

model_name : string, default=sd2-community/stable-diffusion-2
The specific Stable Diffusion 2.x checkpoint to load. The '-base' variants are trained at 512x512 px and are faster; the non-base variants target 768x768 px and produce sharper detail. The '2-1' variants are fine-tuned further and generally outperform '2'.
negative_prompt
num_inference_steps : integer, default=15
Number of denoising steps to run. More steps refine the image but increase generation time. Typical range: 15-30 for fast results, 40-50 for higher quality. Values above 100 rarely improve output.
guidance_scale : number, default=3.5
Classifier-Free Guidance (CFG) scale. Controls how strictly the image follows the text prompt. Low values (1-4) allow creative freedom; medium values (5-9) balance quality and adherence; high values (10+) enforce the prompt but may produce artifacts.
device : string, default=CPU
Hardware device for inference. Select a GPU option for hardware acceleration, which is strongly recommended for diffusion models. Select 'CPU' on systems without a compatible GPU, but expect significantly longer generation times.
seed : integer, default=-1
Random seed for reproducible generation. A fixed positive integer will always produce the same image for identical settings. Use a negative value (e.g. -1) for a random seed on each run.
width : integer, default=512
Width of the output image in pixels. Must be a multiple of 8. Native resolution is 512 for '-base' variants and 768 for others. Using the native resolution produces the best quality results.
height : integer, default=512
Height of the output image in pixels. Must be a multiple of 8. Native resolution is 512 for '-base' variants and 768 for others. Using the native resolution produces the best quality results.
num_images_per_prompt : integer, default=1
How many images to generate from a single prompt in one batch. Increasing this value is more efficient than running multiple sessions, but requires proportionally more GPU memory.

Methods

generate(self, input: str) -> List[Any]

Defined on StableDiffusionV2Model

Generate output from a generative model.

Parameters

input : str
Input data to be generated

Returns

List[Any]
Generated output images in a list

get_schema(cls) -> dict

Defined on ConfigObject

Generates the component related Json Schema.

Returns

dict
Dictionary representing the Json Schema of the component.

validate_and_transform(self, raw_data: dict) -> dict

Defined on ConfigObject

It takes the data given by the user to initialize the model and returns it with all the objects that the model needs to work.

Parameters

raw_data : dict
A dictionary with the data provided by the user to initialize the model.

Returns

dict
A validated dictionary with the necessary objects.

Compatible with