Skip to main content

StableDiffusionV3Model

GenerativeModel
DashAI.back.models.hugging_face.StableDiffusionV3Model

Multimodal Diffusion Transformer model for high-quality text-to-image generation.

Wraps the Stable Diffusion 3 and 3.5 family of checkpoints from Stability AI. These models use a Multimodal Diffusion Transformer (MMDiT) architecture that jointly processes text and image tokens, delivering significantly improved prompt adherence, typography, and overall image quality compared to U-Net-based predecessors.

Four variants are supported: SD3 Medium (2B), SD3.5 Medium (2B, improved), SD3.5 Large (8B, best quality), and SD3.5 Large Turbo (distilled, 4-8 steps). All produce images natively at 1024 x 1024 px. Access to these gated models requires a HuggingFace API key.

References

Parameters

model_name : string, default=stabilityai/stable-diffusion-3-medium-diffusers
The SD3/SD3.5 checkpoint to load. 'sd-3-medium' is the baseline 2B-parameter model. 'sd-3.5-medium' improves quality at similar speed. 'sd-3.5-large' (8B) delivers the highest quality but needs more VRAM. 'sd-3.5-large-turbo' is a distilled large model that requires far fewer steps (4-8) for fast high-quality generation. All variants target 1024x1024 px natively.
huggingface_key : string, default=
Hugging Face read-access token required to download these gated models. To obtain one: accept the model license on huggingface.co/stabilityai, then go to Settings → Access Tokens and generate a token with 'Read' scope.
negative_prompt
num_inference_steps : integer, default=15
Number of denoising steps to run. More steps refine the image but increase generation time. Typical range: 20-40 for standard models; use only 4-8 steps with 'large-turbo'. Values above 50 rarely improve output for SD3/SD3.5.
guidance_scale : number, default=3.5
Classifier-Free Guidance (CFG) scale. Controls how strictly the image follows the text prompt. SD3.5 works well at 3.5-4.5. The 'large-turbo' variant is designed for guidance_scale=1 (no CFG). Higher values enforce the prompt but may introduce oversaturation or artifacts.
device : string, default=CPU
Hardware device for inference. Select a GPU option for hardware acceleration, which is strongly recommended for diffusion models. Select 'CPU' on systems without a compatible GPU, but expect significantly longer generation times.
seed : integer, default=-1
Random seed for reproducible generation. A fixed positive integer will always produce the same image for identical settings. Use a negative value (e.g. -1) for a random seed on each run.
width : integer, default=512
Width of the output image in pixels. Must be a multiple of 8. SD3/SD3.5 models are natively trained at 1024x1024 px; using that resolution yields the best quality.
height : integer, default=512
Height of the output image in pixels. Must be a multiple of 8. SD3/SD3.5 models are natively trained at 1024x1024 px; using that resolution yields the best quality.
num_images_per_prompt : integer, default=1
How many images to generate from a single prompt in one batch. Increasing this value is more efficient than running multiple sessions, but requires proportionally more GPU memory.

Methods

generate(self, input: str) -> List[Any]

Defined on StableDiffusionV3Model

Generate output from a generative model.

Parameters

input : str
Input data to be generated

Returns

List[Any]
Generated output images in a list

get_schema(cls) -> dict

Defined on ConfigObject

Generates the component related Json Schema.

Returns

dict
Dictionary representing the Json Schema of the component.

validate_and_transform(self, raw_data: dict) -> dict

Defined on ConfigObject

It takes the data given by the user to initialize the model and returns it with all the objects that the model needs to work.

Parameters

raw_data : dict
A dictionary with the data provided by the user to initialize the model.

Returns

dict
A validated dictionary with the necessary objects.

Compatible with