TongyiZImageModel
Tongyi Z-Image S3-DiT model for high-quality text-to-image generation.
Wraps Alibaba's 6B-parameter Tongyi Z-Image pipeline. The model uses a novel Sparse Spatial-Spectral Diffusion Transformer (S3-DiT) architecture that processes image tokens in both spatial and spectral domains for efficient high-fidelity generation. It outperforms previous open-source state-of-the-art models while being more parameter-efficient, and excels at photorealism, diverse artistic styles, and accurate text rendering.
References
Parameters
- model_name : string, default=
Tongyi-MAI/Z-Image - The Tongyi Z-Image checkpoint to load. 'Tongyi-Z-Image' is Alibaba's 6B-parameter text-to-image model using a unique S3-DiT (Sparse Spatial-Spectral Diffusion Transformer) architecture, one of the most downloaded models on Hugging Face. It outperforms previous open-source state-of-the-art models at a fraction of their parameter count.
- negative_prompt
- num_inference_steps : integer, default=
20 - Number of denoising steps. Tongyi Z-Image achieves high-quality results with 20-30 steps. More steps refine detail at the cost of generation time.
- guidance_scale : number, default=
5.0 - Classifier-Free Guidance (CFG) scale. Controls how strictly the image follows the text prompt. Values 4-7 work well for Tongyi Z-Image.
- device : string, default=
CPU - Hardware device for inference. GPU is strongly recommended for this 6B-parameter model. CPU inference is possible but very slow.
- seed : integer, default=
-1 - Random seed for reproducible generation. A fixed positive integer always produces the same image. Use -1 for a random seed.
- width : integer, default=
1024 - Width of the output image in pixels. Must be a multiple of 8. Tongyi Z-Image natively targets 1024x1024 px.
- height : integer, default=
1024 - Height of the output image in pixels. Must be a multiple of 8. Tongyi Z-Image natively targets 1024x1024 px.
- num_images_per_prompt : integer, default=
1 - How many images to generate from a single prompt in one batch. Requires proportionally more GPU memory per additional image.
Methods
generate(self, input: str) -> List[Any]
TongyiZImageModelGenerate images from a text prompt.
Parameters
- input : str
- Text prompt to generate an image from.
Returns
- List[Any]
- Generated output images in a list.
get_schema(cls) -> dict
ConfigObjectGenerates the component related Json Schema.
Returns
- dict
- Dictionary representing the Json Schema of the component.
validate_and_transform(self, raw_data: dict) -> dict
ConfigObjectIt takes the data given by the user to initialize the model and returns it with all the objects that the model needs to work.
Parameters
- raw_data : dict
- A dictionary with the data provided by the user to initialize the model.
Returns
- dict
- A validated dictionary with the necessary objects.