TongyiZImageModel
Tongyi Z-Image S3-DiT model for high quality text-to-image generation.
Wraps Alibaba's 6B parameter Tongyi Z-Image pipeline. The model uses a novel Sparse Spatial-Spectral Diffusion Transformer (S3-DiT) architecture that processes image tokens in both spatial and spectral domains for efficient high fidelity generation. It outperforms previous open source state of the art models while being more parameter efficient, and excels at photorealism, diverse artistic styles, and accurate text rendering.
References
Parameters
- model_name : string, default=
Tongyi-MAI/Z-Image - The Tongyi Z-Image checkpoint to load. 'Tongyi-Z-Image' is Alibaba's 6B parameter text-to-image model using a unique S3-DiT (Sparse Spatial-Spectral Diffusion Transformer) architecture, one of the most downloaded models on Hugging Face. It outperforms previous open source state of the art models at a fraction of their parameter count.
- negative_prompt
- num_inference_steps : integer, default=
20 - Number of denoising steps. Tongyi Z-Image achieves high quality results with 20-30 steps. More steps refine detail at the cost of generation time.
- guidance_scale : number, default=
5.0 - Classifier-Free Guidance (CFG) scale. Controls how strictly the image follows the text prompt. Values 4-7 work well for Tongyi Z-Image.
- device : string, default=
CPU - Hardware device for inference. GPU is strongly recommended for this 6B parameter model. CPU inference is possible but very slow.
- seed : integer, default=
-1 - Random seed for reproducible generation. A fixed positive integer always produces the same image. Use -1 for a random seed.
- width : integer, default=
1024 - Width of the output image in pixels. Must be a multiple of 8. Tongyi Z-Image natively targets 1024x1024 px.
- height : integer, default=
1024 - Height of the output image in pixels. Must be a multiple of 8. Tongyi Z-Image natively targets 1024x1024 px.
- num_images_per_prompt : integer, default=
1 - How many images to generate from a single prompt in one batch. Requires proportionally more GPU memory per additional image.
Methods
generate(self, input: str) -> List[Any]
TongyiZImageModelGenerate images from a text prompt.
Parameters
- input : str
- Text prompt to generate an image from.
Returns
- List[Any]
- Generated output images in a list.
get_schema(cls) -> dict
ConfigObjectGenerates the component related Json Schema.
Returns
- dict
- Dictionary representing the Json Schema of the component.
validate_and_transform(self, raw_data: dict) -> dict
ConfigObjectIt takes the data given by the user to initialize the model and returns it with all the objects that the model needs to work.
Parameters
- raw_data : dict
- A dictionary with the data provided by the user to initialize the model.
Returns
- dict
- A validated dictionary with the necessary objects.