StableDiffusionV3Model

GenerativeModel

DashAI.back.models.hugging_face.StableDiffusionV3Model

Multimodal Diffusion Transformer model for high-quality text-to-image generation.

Wraps the Stable Diffusion 3 and 3.5 family of checkpoints from Stability AI. These models use a Multimodal Diffusion Transformer (MMDiT) architecture that jointly processes text and image tokens, delivering significantly improved prompt adherence, typography, and overall image quality compared to U-Net-based predecessors.

Four variants are supported: SD3 Medium (2B), SD3.5 Medium (2B, improved), SD3.5 Large (8B, best quality), and SD3.5 Large Turbo (distilled, 4-8 steps). All produce images natively at 1024 x 1024 px. Access to these gated models requires a HuggingFace API key.

References

[1] Esser et al., "Scaling Rectified Flow Transformers for High-Resolution Image Synthesis", 2024. https://arxiv.org/abs/2403.03206
[2] https://huggingface.co/stabilityai/stable-diffusion-3-medium-diffusers

Parameters

model_name : string, default=stabilityai/stable-diffusion-3-medium-diffusers: The SD3/SD3.5 checkpoint to load. 'sd-3-medium' is the baseline 2B-parameter model. 'sd-3.5-medium' improves quality at similar speed. 'sd-3.5-large' (8B) delivers the highest quality but needs more VRAM. 'sd-3.5-large-turbo' is a distilled large model that requires far fewer steps (4-8) for fast high-quality generation. All variants target 1024x1024 px natively.
huggingface_key : string, default=: Hugging Face read-access token required to download these gated models. To obtain one: accept the model license on huggingface.co/stabilityai, then go to Settings → Access Tokens and generate a token with 'Read' scope.
negative_prompt
num_inference_steps : integer, default=15: Number of denoising steps to run. More steps refine the image but increase generation time. Typical range: 20-40 for standard models; use only 4-8 steps with 'large-turbo'. Values above 50 rarely improve output for SD3/SD3.5.
guidance_scale : number, default=3.5: Classifier-Free Guidance (CFG) scale. Controls how strictly the image follows the text prompt. SD3.5 works well at 3.5-4.5. The 'large-turbo' variant is designed for guidance_scale=1 (no CFG). Higher values enforce the prompt but may introduce oversaturation or artifacts.
device : string, default=CPU: Hardware device for inference. Select a GPU option for hardware acceleration, which is strongly recommended for diffusion models. Select 'CPU' on systems without a compatible GPU, but expect significantly longer generation times.
seed : integer, default=-1: Random seed for reproducible generation. A fixed positive integer will always produce the same image for identical settings. Use a negative value (e.g. -1) for a random seed on each run.
width : integer, default=512: Width of the output image in pixels. Must be a multiple of 8. SD3/SD3.5 models are natively trained at 1024x1024 px; using that resolution yields the best quality.
height : integer, default=512: Height of the output image in pixels. Must be a multiple of 8. SD3/SD3.5 models are natively trained at 1024x1024 px; using that resolution yields the best quality.
num_images_per_prompt : integer, default=1: How many images to generate from a single prompt in one batch. Increasing this value is more efficient than running multiple sessions, but requires proportionally more GPU memory.

Methods

generate(self, input: str) -> List[Any]

Defined on StableDiffusionV3Model

Generate output from a generative model.

Parameters

input : str: Input data to be generated

Returns

List[Any]: Generated output images in a list

get_schema(cls) -> dict

Defined on ConfigObject

Generates the component related Json Schema.

Returns

dict: Dictionary representing the Json Schema of the component.

validate_and_transform(self, raw_data: dict) -> dict

Defined on ConfigObject

It takes the data given by the user to initialize the model and returns it with all the objects that the model needs to work.

Parameters

raw_data : dict: A dictionary with the data provided by the user to initialize the model.

Returns

dict: A validated dictionary with the necessary objects.

Compatible with

TextToImageGenerationTask

References​

Parameters​

Methods​

Compatible with​

References

Parameters

Methods

Compatible with