This lesson will teach you about the different kinds of settings you can adjust when interacting with an LLM in a playground interface. This is a general overview, so it may sometimes turn out that some settings aren’t available for some LLMs. Learning about these concepts will give you a deeper understanding of both how these systems work and how prompting works. Furthermore, it is vital to play around with all of these different settings on your own to get a more intuitive understanding of how they work! Think of these settings as dials you can turn to shape how creative, focused, or structured the AI’s responses are.
1. System Prompt
Also called the “system message” this is probably the most important part of the entire playground interface. This is where you will craft your chatbot’s entire behavior, personality, and more by applying prompt engineering techniques, which you will learn about in the following lessons!
2. User Prompt
Also called the “user message” or “query,” this isn’t really a setting, but rather the text that you send to the LLM so that it can do something. Sometimes this place also has a button to upload an image or file, which can use in your prompt to improve results (we will learn more about this later).
3. Temperature
Temperature regulates the unpredictability of a language model’s output. With higher temperature settings, outputs become more creative and less predictable as it amplifies the likelihood of less probable tokens while reducing that for more probable ones. Lower temperatures result in more conservative and predictable (we also call this “deterministic”) results.
Important: the technical term for this and the following settings is “hyperparameters” so you may see this word whenever you read the official documentation for a model.
When to use a lower temperature
- Generating programming code
- Summarizing text
- Translating text
When to use a higher temperature
- Creative writing
- Inventing new recipes
- Brainstoming new metaphors
4. Top-P
Top-P (also known as Nucleus Sampling) is a setting in language models that helps manage the randomness of their output. It works by establishing a probability threshold and then selecting tokens whose combined likelihood surpasses this limit.
For instance, let’s consider an example where the model predicts the next word in The boy likes the ____. The top five words it might be considering could be:
cake(probability 0.5)ball(probability 0.25)lesson(probability 0.15)world(probability .07)alligator(probability of .03)
If we set Top-P to .90, the LLM will only consider those tokens that cumulatively add up to at least ~90%. In our case:
- Adding
cake→ total so far is50%. - Then adding
ball→ total becomes75%. - Next comes
lesson→ our sum reaches90%.
So, for generating output, the AI will randomly pick one among these three options: cake, ball, and lesson, because they make up around ~90 percent of all likelihoods.
Use lower Top-P for precise answers (e.g., Q&A).
Use higher Top-P for diverse ideas (e.g., naming a product).
5. Maximum Length
This setting is quite straightforward because it is just the number of tokens the LLM is allowed to generate. This setting is useful since it allows users to manage the length of the model’s response, preventing overly long or irrelevant responses.
The maximum length should be short when you want to constrict to output for some reason, but you have to keep in mind that the output might cut-off in the middle of a sentence. So this usually requires a little bit of experimentation.
6. Stop Sequences
Stop sequences tell the model when to stop generating tokens, which allows you to control content length and structure. If you are prompting the LLM to write an email, setting “Best regards,” or “Sincerely,” as the stop sequence ensures the model stops before the closing salutation, which keeps the email short and to the point. Stop sequences are useful for output that you expect to come out in a structured format such as an email, a numbered list, or dialogue.
7. Frequency Penalty
A frequency penalty is a setting that discourages repetition in the generated text by penalizing tokens proportionally to how frequently they appear. In other words, the more often a token is used in the text, the more the LLM tries to avoid using it again.
Example: If you’re writing a review and say “amazing” three times, the AI might switch to “incredible” or “fantastic.”
8. Presence Penalty
The presence penalty is similar to the frequency penalty, but blocks words even if they’ve only been used once, instead of by proportionally.
Example: If you mention “dragons” in a story, the AI might avoid mentioning them again unless absolutely necessary.
Conclusion
In conclusion, mastering settings like temperature, top p, maximum length and others are essential when working with language models. These parameters allow for precise control of the model’s output to cater to specific tasks or applications. They manage aspects such as randomness in responses, response length and repetition frequency among other things — all contributing towards improving your interaction with the LLM. There is no secret recipe to how to set all of these different settings. It’s very individual and is closely related to the quality of your prompts.
It’s important to note that LLMs rely on probabilites and randomness in GPU (graphics processing unit) calculations, so it is basically impossible to ever consistently get the exact same reults/output.