PixArtSigmaModel

GenerativeModel

DashAI.back.models.hugging_face.PixArtSigmaModel

Diffusion Transformer model for high-efficiency text-to-image generation.

Wraps the PixArt-Sigma pipeline, which replaces the U-Net backbone used in Stable Diffusion with a scalable Diffusion Transformer (DiT) architecture. Text conditioning is provided by a T5-XXL encoder, enabling richer semantic understanding than CLIP-based models.

PixArt-Sigma achieves state-of-the-art image quality with 14-25 denoising steps (compared to 20-50 for comparable U-Net models) and supports flexible multi-scale resolutions up to 2048 px. Two checkpoint sizes are available: 512 px (lighter) and 1024 px (best quality).

References

[1] Chen et al., "PixArt-Sigma: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation", 2024. https://arxiv.org/abs/2403.04692
[2] https://huggingface.co/PixArt-alpha/PixArt-Sigma-XL-2-1024-MS

Parameters

model_name : string, default=PixArt-alpha/PixArt-Sigma-XL-2-1024-MS: The PixArt-Sigma checkpoint to load. 'PixArt-Sigma-XL-2-1024-MS' is the high-resolution variant trained at 1024px with multi-scale support, delivering the best image quality. 'PixArt-Sigma-XL-2-512-MS' is the 512px variant, faster and lighter while still producing sharp results.
negative_prompt
num_inference_steps : integer, default=20: Number of denoising steps. PixArt-Sigma achieves good quality with 14-25 steps due to its efficient transformer architecture. More steps refine details but increase generation time.
guidance_scale : number, default=4.5: Classifier-Free Guidance (CFG) scale. PixArt-Sigma works best with lower values (3.5-5.5) compared to U-Net models. Higher values enforce the prompt more strictly but may saturate colors. The default of 4.5 is recommended.
device : string, default=CPU: Hardware device for inference. GPU is strongly recommended. PixArt-Sigma uses a DiT (Diffusion Transformer) architecture with T5 text encoding, which is faster than U-Net on GPU.
seed : integer, default=-1: Random seed for reproducible generation. A fixed positive integer always produces the same image. Use -1 for a random seed.
width : integer, default=1024: Width of the output image in pixels. Must be a multiple of 8. PixArt-Sigma supports flexible resolutions up to 2048px.
height : integer, default=1024: Height of the output image in pixels. Must be a multiple of 8. PixArt-Sigma supports flexible resolutions up to 2048px.
num_images_per_prompt : integer, default=1: How many images to generate from a single prompt in one batch. Requires proportionally more GPU memory per additional image.

Methods

generate(self, input: str) -> List[Any]

Defined on PixArtSigmaModel

Generate images from a text prompt.

Parameters

input : str: Text prompt to generate an image from.

Returns

List[Any]: Generated output images in a list.

get_schema(cls) -> dict

Defined on ConfigObject

Generates the component related Json Schema.

Returns

dict: Dictionary representing the Json Schema of the component.

validate_and_transform(self, raw_data: dict) -> dict

Defined on ConfigObject

It takes the data given by the user to initialize the model and returns it with all the objects that the model needs to work.

Parameters

raw_data : dict: A dictionary with the data provided by the user to initialize the model.

Returns

dict: A validated dictionary with the necessary objects.

Compatible with

TextToImageGenerationTask

References​

Parameters​

Methods​

Compatible with​

References

Parameters

Methods

Compatible with