MistralModel

GenerativeModel

DashAI.back.models.hugging_face.MistralModel

Mistral Instruct model for open-ended text generation via llama.cpp.

Mistral is a 7B-parameter transformer language model developed by Mistral AI, designed to deliver high performance with efficient inference. It uses grouped- query attention (GQA) for faster decoding and sliding-window attention (SWA) to handle long contexts efficiently. The 12B Mistral-Nemo variant, developed jointly with NVIDIA, extends the context window to 128 K tokens and improves multilingual capability.

Models are loaded as GGUF quantized checkpoints via llama-cpp-python, allowing CPU and GPU inference without requiring a full PyTorch stack.

References

[1] Jiang et al. (2023) "Mistral 7B" https://arxiv.org/abs/2310.06825
[2] https://huggingface.co/mistralai

Parameters

model_name : string, default=bartowski/Mistral-7B-Instruct-v0.3-GGUF: The Mistral Instruct checkpoint to load in GGUF format. 'Mistral-7B-Instruct-v0.3' is a 7B-parameter instruction model that delivers strong performance for its size. 'Mistral-Nemo-Instruct-2407' is a 12B-parameter model jointly developed with NVIDIA, featuring a 128K context window and improved multilingual capabilities.
max_tokens : integer, default=100: Maximum number of new tokens the model will generate per response. Roughly 1 token ≈ 0.75 English words. Set to 100-200 for short answers, 500-1000 for detailed explanations or code.
temperature : number, default=0.7: Sampling temperature controlling output randomness (range 0.0-1.0). At 0.0 the model picks the most likely token (deterministic). Around 0.7 balances quality and creativity. At 1.0 outputs are maximally varied.
frequency_penalty : number, default=0.1: Penalizes tokens that have already appeared in the output based on frequency (range 0.0-2.0). Higher values discourage repetition.
context_window : integer, default=512: Total token budget for a single forward pass, including prompt and response. Mistral-7B supports up to 32K tokens; Mistral-Nemo supports up to 128K tokens.
device : string, default=CPU: Hardware device for llama.cpp inference. 'CPU' runs the model fully in RAM. A GPU option offloads all layers for faster inference.

Methods

generate(self, prompt: list[dict[str, str]]) -> List[str]

Defined on MistralModel

Generate a reply for the given chat prompt.

Parameters

prompt : list of dict: Conversation history in OpenAI chat format. Each dict must contain at least "role" ("system", "user", or "assistant") and "content" (the message text).

Returns

list of str: A single-element list containing the model's reply text, extracted from choices[0]["message"]["content"].

get_schema(cls) -> dict

Defined on ConfigObject

Generates the component related Json Schema.

Returns

dict: Dictionary representing the Json Schema of the component.

validate_and_transform(self, raw_data: dict) -> dict

Defined on ConfigObject

It takes the data given by the user to initialize the model and returns it with all the objects that the model needs to work.

Parameters

raw_data : dict: A dictionary with the data provided by the user to initialize the model.

Returns

dict: A validated dictionary with the necessary objects.

Compatible with

TextToTextGenerationTask

References​

Parameters​

Methods​

Compatible with​

References

Parameters

Methods

Compatible with