StableDiffusionV2Model

GenerativeModel

DashAI.back.models.hugging_face.StableDiffusionV2Model

Latent diffusion model for high resolution text-to-image generation.

Wraps the Stable Diffusion 2.x family of checkpoints released by Stability AI. The pipeline uses a U-Net denoiser conditioned on OpenCLIP text embeddings (ViT-H/14) and a variational autoencoder (VAE) to iteratively denoise a latent representation into a high resolution image.

Four checkpoints are supported:

stable-diffusion-2 / stable-diffusion-2-1: trained at 768 px, produce sharper detail; '2-1' is further fine-tuned and generally outperforms '2'.
stable-diffusion-2-base / stable-diffusion-2-1-base: trained at 512 px, faster and lower memory; best for rapid prototyping.

Models are served from the sd2-community HuggingFace organisation, a community mirror of the original Stability AI weights (deprecated at stabilityai).

References

[1] Rombach et al., "High-Resolution Image Synthesis with Latent Diffusion Models", CVPR 2022. https://arxiv.org/abs/2112.10752
[2] https://huggingface.co/sd2-community

Parameters

model_name : string, default=sd2-community/stable-diffusion-2: The specific Stable Diffusion 2.x checkpoint to load. The '-base' variants are trained at 512x512 px and are faster; the nonbase variants target 768x768 px and produce sharper detail. The '2-1' variants are fine-tuned further and generally outperform '2'.
negative_prompt
num_inference_steps : integer, default=15: Number of denoising steps to run. More steps refine the image but increase generation time. Typical range: 15-30 for fast results, 40-50 for higher quality. Values above 100 rarely improve output.
guidance_scale : number, default=3.5: Classifier-Free Guidance (CFG) scale. Controls how strictly the image follows the text prompt. Low values (1-4) allow creative freedom; medium values (5-9) balance quality and adherence; high values (10+) enforce the prompt but may produce artifacts.
device : string, default=CPU: Hardware device for inference. Select a GPU option for hardware acceleration, which is strongly recommended for diffusion models. Select 'CPU' on systems without a compatible GPU, but expect significantly longer generation times.
seed : integer, default=-1: Random seed for reproducible generation. A fixed positive integer will always produce the same image for identical settings. Use a negative value (e.g. -1) for a random seed on each run.
width : integer, default=512: Width of the output image in pixels. Must be a multiple of 8. Native resolution is 512 for '-base' variants and 768 for others. Using the native resolution produces the best quality results.
height : integer, default=512: Height of the output image in pixels. Must be a multiple of 8. Native resolution is 512 for '-base' variants and 768 for others. Using the native resolution produces the best quality results.
num_images_per_prompt : integer, default=1: How many images to generate from a single prompt in one batch. Increasing this value is more efficient than running multiple sessions, but requires proportionally more GPU memory.

Methods

generate(self, input: str) -> List[Any]

Defined on StableDiffusionV2Model

Generate output from a generative model.

Parameters

input : str: Input data to be generated

Returns

List[Any]: Generated output images in a list

get_schema(cls) -> dict

Defined on ConfigObject

Generates the component related Json Schema.

Returns

dict: Dictionary representing the Json Schema of the component.

validate_and_transform(self, raw_data: dict) -> dict

Defined on ConfigObject

It takes the data given by the user to initialize the model and returns it with all the objects that the model needs to work.

Parameters

raw_data : dict: A dictionary with the data provided by the user to initialize the model.

Returns

dict: A validated dictionary with the necessary objects.

Compatible with

TextToImageGenerationTask

References​

Parameters​

Methods​

Compatible with​

References

Parameters

Methods

Compatible with