Path to the model on the filesystem.
Optional
batchPrompt processing batch size.
Optional
cacheOptional
callbackUse callbacks
instead
This feature is deprecated and will be removed in the future.
It is not recommended for use.
Optional
callbacksOptional
concurrencyUse maxConcurrency
instead
Optional
contextText context size.
Optional
embeddingEmbedding mode only.
Optional
f16Use fp16 for KV cache.
Optional
gpuNumber of layers to store in VRAM.
Optional
logitsThe llama_eval() call computes all logits, not just the last one.
Optional
maxThe maximum number of concurrent calls that can be made.
Defaults to Infinity
, which means no limit.
Optional
maxThe maximum number of retries that can be made for a single call, with an exponential backoff between each attempt. Defaults to 6.
Optional
maxOptional
metadataOptional
onCustom handler to handle failed attempts. Takes the originally thrown error object as input, and should itself throw an error if the input error is not retryable.
Optional
prependAdd the begining of sentence token.
Optional
seedIf null, a random seed will be used.
Optional
tagsOptional
temperatureThe randomness of the responses, e.g. 0.1 deterministic, 1.5 creative, 0.8 balanced, 0 disables.
Optional
threadsNumber of threads to use to evaluate tokens.
Optional
topKConsider the n most likely tokens, where n is 1 to vocabulary size, 0 disables (uses full vocabulary). Note: only applies when temperature
> 0.
Optional
topPSelects the smallest token set whose probability exceeds P, where P is between 0 - 1, 1 disables. Note: only applies when temperature
> 0.
Optional
trimTrim whitespace from the end of the generated text Disabled by default.
Optional
useForce system to keep model in RAM.
Optional
useUse mmap if possible.
Optional
verboseOptional
vocabOnly load the vocabulary, no weights.
Generated using TypeDoc
Note that the modelPath is the only required parameter. For testing you can set this in the environment variable
LLAMA_PATH
.