StableDiffusionXLModel
Latent diffusion model for high-resolution 1024 px text-to-image generation.
Wraps Stable Diffusion XL (SDXL) checkpoints. SDXL scales the standard SD architecture with a larger U-Net backbone and a two-text-encoder conditioning stack (OpenCLIP-ViT/G + CLIP-ViT/L), enabling significantly better prompt following and photorealism at 1024 x 1024 px compared to SD 1.x/2.x.
Two checkpoints are supported: the official
stabilityai/stable-diffusion-xl-base-1.0 and
SG161222/RealVisXL_V4.0, a popular community fine-tune optimised for
realistic portraits and photography.
References
- [1] Podell et al., "SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis", 2023. https://arxiv.org/abs/2307.01952
- [2] https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0
Parameters
- model_name : string, default=
stabilityai/stable-diffusion-xl-base-1.0 - The Stable Diffusion XL checkpoint to load. 'stable-diffusion-xl-base-1.0' is the official base model trained at 1024x1024 px for high-quality photorealistic generation. 'RealVisXL_V4.0' is a popular community fine-tune of SDXL optimized for realistic portraits and photography.
- negative_prompt
- num_inference_steps : integer, default=
25 - Number of denoising steps to run. More steps refine the image but increase generation time. Typical range: 20-30 for fast results, 40-50 for higher quality. SDXL achieves good results with 25-40 steps.
- guidance_scale : number, default=
7.0 - Classifier-Free Guidance (CFG) scale. Controls how strictly the image follows the text prompt. Low values (1-4) allow creative freedom; medium values (5-9) balance quality and adherence; high values (10+) enforce the prompt but may produce artifacts. SDXL works well with values between 5-9.
- device : string, default=
CPU - Hardware device for inference. Select a GPU option for hardware acceleration, strongly recommended for SDXL. CPU inference is very slow for this large model; expect 10-30 minutes per image on CPU.
- seed : integer, default=
-1 - Random seed for reproducible generation. A fixed positive integer will always produce the same image for identical settings. Use a negative value (e.g. -1) for a random seed on each run.
- width : integer, default=
1024 - Width of the output image in pixels. Must be a multiple of 8. SDXL's native resolution is 1024x1024 px. Using non-native resolutions may reduce quality.
- height : integer, default=
1024 - Height of the output image in pixels. Must be a multiple of 8. SDXL's native resolution is 1024x1024 px.
- num_images_per_prompt : integer, default=
1 - How many images to generate from a single prompt in one batch. Increasing this value is more efficient than running multiple sessions, but requires proportionally more GPU memory.
Methods
generate(self, input: str) -> List[Any]
StableDiffusionXLModelGenerate images from a text prompt.
Parameters
- input : str
- Text prompt to generate an image from.
Returns
- List[Any]
- Generated output images in a list.
get_schema(cls) -> dict
ConfigObjectGenerates the component related Json Schema.
Returns
- dict
- Dictionary representing the Json Schema of the component.
validate_and_transform(self, raw_data: dict) -> dict
ConfigObjectIt takes the data given by the user to initialize the model and returns it with all the objects that the model needs to work.
Parameters
- raw_data : dict
- A dictionary with the data provided by the user to initialize the model.
Returns
- dict
- A validated dictionary with the necessary objects.