StableDiffusionXLModel

GenerativeModel

DashAI.back.models.hugging_face.StableDiffusionXLModel

Latent diffusion model for high-resolution 1024 px text-to-image generation.

Wraps Stable Diffusion XL (SDXL) checkpoints. SDXL scales the standard SD architecture with a larger U-Net backbone and a two-text-encoder conditioning stack (OpenCLIP-ViT/G + CLIP-ViT/L), enabling significantly better prompt following and photorealism at 1024 x 1024 px compared to SD 1.x/2.x.

Two checkpoints are supported: the official stabilityai/stable-diffusion-xl-base-1.0 and SG161222/RealVisXL_V4.0, a popular community fine-tune optimised for realistic portraits and photography.

References

[1] Podell et al., "SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis", 2023. https://arxiv.org/abs/2307.01952
[2] https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0

Parameters

model_name : string, default=stabilityai/stable-diffusion-xl-base-1.0: The Stable Diffusion XL checkpoint to load. 'stable-diffusion-xl-base-1.0' is the official base model trained at 1024x1024 px for high-quality photorealistic generation. 'RealVisXL_V4.0' is a popular community fine-tune of SDXL optimized for realistic portraits and photography.
negative_prompt
num_inference_steps : integer, default=25: Number of denoising steps to run. More steps refine the image but increase generation time. Typical range: 20-30 for fast results, 40-50 for higher quality. SDXL achieves good results with 25-40 steps.
guidance_scale : number, default=7.0: Classifier-Free Guidance (CFG) scale. Controls how strictly the image follows the text prompt. Low values (1-4) allow creative freedom; medium values (5-9) balance quality and adherence; high values (10+) enforce the prompt but may produce artifacts. SDXL works well with values between 5-9.
device : string, default=CPU: Hardware device for inference. Select a GPU option for hardware acceleration, strongly recommended for SDXL. CPU inference is very slow for this large model; expect 10-30 minutes per image on CPU.
seed : integer, default=-1: Random seed for reproducible generation. A fixed positive integer will always produce the same image for identical settings. Use a negative value (e.g. -1) for a random seed on each run.
width : integer, default=1024: Width of the output image in pixels. Must be a multiple of 8. SDXL's native resolution is 1024x1024 px. Using non-native resolutions may reduce quality.
height : integer, default=1024: Height of the output image in pixels. Must be a multiple of 8. SDXL's native resolution is 1024x1024 px.
num_images_per_prompt : integer, default=1: How many images to generate from a single prompt in one batch. Increasing this value is more efficient than running multiple sessions, but requires proportionally more GPU memory.

Methods

generate(self, input: str) -> List[Any]

Defined on StableDiffusionXLModel

Generate images from a text prompt.

Parameters

input : str: Text prompt to generate an image from.

Returns

List[Any]: Generated output images in a list.

get_schema(cls) -> dict

Defined on ConfigObject

Generates the component related Json Schema.

Returns

dict: Dictionary representing the Json Schema of the component.

validate_and_transform(self, raw_data: dict) -> dict

Defined on ConfigObject

It takes the data given by the user to initialize the model and returns it with all the objects that the model needs to work.

Parameters

raw_data : dict: A dictionary with the data provided by the user to initialize the model.

Returns

dict: A validated dictionary with the necessary objects.

Compatible with

TextToImageGenerationTask

References​

Parameters​

Methods​

Compatible with​

References

Parameters

Methods

Compatible with