Skip to main content

TongyiZImageModel

GenerativeModel
DashAI.back.models.hugging_face.TongyiZImageModel

Tongyi Z-Image S3-DiT model for high-quality text-to-image generation.

Wraps Alibaba's 6B-parameter Tongyi Z-Image pipeline. The model uses a novel Sparse Spatial-Spectral Diffusion Transformer (S3-DiT) architecture that processes image tokens in both spatial and spectral domains for efficient high-fidelity generation. It outperforms previous open-source state-of-the-art models while being more parameter-efficient, and excels at photorealism, diverse artistic styles, and accurate text rendering.

References

Parameters

model_name : string, default=Tongyi-MAI/Z-Image
The Tongyi Z-Image checkpoint to load. 'Tongyi-Z-Image' is Alibaba's 6B-parameter text-to-image model using a unique S3-DiT (Sparse Spatial-Spectral Diffusion Transformer) architecture, one of the most downloaded models on Hugging Face. It outperforms previous open-source state-of-the-art models at a fraction of their parameter count.
negative_prompt
num_inference_steps : integer, default=20
Number of denoising steps. Tongyi Z-Image achieves high-quality results with 20-30 steps. More steps refine detail at the cost of generation time.
guidance_scale : number, default=5.0
Classifier-Free Guidance (CFG) scale. Controls how strictly the image follows the text prompt. Values 4-7 work well for Tongyi Z-Image.
device : string, default=CPU
Hardware device for inference. GPU is strongly recommended for this 6B-parameter model. CPU inference is possible but very slow.
seed : integer, default=-1
Random seed for reproducible generation. A fixed positive integer always produces the same image. Use -1 for a random seed.
width : integer, default=1024
Width of the output image in pixels. Must be a multiple of 8. Tongyi Z-Image natively targets 1024x1024 px.
height : integer, default=1024
Height of the output image in pixels. Must be a multiple of 8. Tongyi Z-Image natively targets 1024x1024 px.
num_images_per_prompt : integer, default=1
How many images to generate from a single prompt in one batch. Requires proportionally more GPU memory per additional image.

Methods

generate(self, input: str) -> List[Any]

Defined on TongyiZImageModel

Generate images from a text prompt.

Parameters

input : str
Text prompt to generate an image from.

Returns

List[Any]
Generated output images in a list.

get_schema(cls) -> dict

Defined on ConfigObject

Generates the component related Json Schema.

Returns

dict
Dictionary representing the Json Schema of the component.

validate_and_transform(self, raw_data: dict) -> dict

Defined on ConfigObject

It takes the data given by the user to initialize the model and returns it with all the objects that the model needs to work.

Parameters

raw_data : dict
A dictionary with the data provided by the user to initialize the model.

Returns

dict
A validated dictionary with the necessary objects.

Compatible with