Skip to main content

PixArtSigmaModel

GenerativeModel
DashAI.back.models.hugging_face.PixArtSigmaModel

Diffusion Transformer model for high-efficiency text-to-image generation.

Wraps the PixArt-Sigma pipeline, which replaces the U-Net backbone used in Stable Diffusion with a scalable Diffusion Transformer (DiT) architecture. Text conditioning is provided by a T5-XXL encoder, enabling richer semantic understanding than CLIP-based models.

PixArt-Sigma achieves state-of-the-art image quality with 14-25 denoising steps (compared to 20-50 for comparable U-Net models) and supports flexible multi-scale resolutions up to 2048 px. Two checkpoint sizes are available: 512 px (lighter) and 1024 px (best quality).

References

Parameters

model_name : string, default=PixArt-alpha/PixArt-Sigma-XL-2-1024-MS
The PixArt-Sigma checkpoint to load. 'PixArt-Sigma-XL-2-1024-MS' is the high-resolution variant trained at 1024px with multi-scale support, delivering the best image quality. 'PixArt-Sigma-XL-2-512-MS' is the 512px variant, faster and lighter while still producing sharp results.
negative_prompt
num_inference_steps : integer, default=20
Number of denoising steps. PixArt-Sigma achieves good quality with 14-25 steps due to its efficient transformer architecture. More steps refine details but increase generation time.
guidance_scale : number, default=4.5
Classifier-Free Guidance (CFG) scale. PixArt-Sigma works best with lower values (3.5-5.5) compared to U-Net models. Higher values enforce the prompt more strictly but may saturate colors. The default of 4.5 is recommended.
device : string, default=CPU
Hardware device for inference. GPU is strongly recommended. PixArt-Sigma uses a DiT (Diffusion Transformer) architecture with T5 text encoding, which is faster than U-Net on GPU.
seed : integer, default=-1
Random seed for reproducible generation. A fixed positive integer always produces the same image. Use -1 for a random seed.
width : integer, default=1024
Width of the output image in pixels. Must be a multiple of 8. PixArt-Sigma supports flexible resolutions up to 2048px.
height : integer, default=1024
Height of the output image in pixels. Must be a multiple of 8. PixArt-Sigma supports flexible resolutions up to 2048px.
num_images_per_prompt : integer, default=1
How many images to generate from a single prompt in one batch. Requires proportionally more GPU memory per additional image.

Methods

generate(self, input: str) -> List[Any]

Defined on PixArtSigmaModel

Generate images from a text prompt.

Parameters

input : str
Text prompt to generate an image from.

Returns

List[Any]
Generated output images in a list.

get_schema(cls) -> dict

Defined on ConfigObject

Generates the component related Json Schema.

Returns

dict
Dictionary representing the Json Schema of the component.

validate_and_transform(self, raw_data: dict) -> dict

Defined on ConfigObject

It takes the data given by the user to initialize the model and returns it with all the objects that the model needs to work.

Parameters

raw_data : dict
A dictionary with the data provided by the user to initialize the model.

Returns

dict
A validated dictionary with the necessary objects.

Compatible with