Prompt Engineering Guide
Prompt Engineering is a science that focuses on how to make a computer language model provide the desired answers. For example, when we ask Siri a question or enter a search term on Google, we carefully select the words and how we phrase the question to get the desired information. This is essentially what a prompt is. The study of how to create and optimize these prompts is what we call 'Prompt Engineering'.
Prompt Engineering is a relatively new field that focuses on developing and optimizing prompts to efficiently utilize language models (LMs) in various applications and research topics. The techniques in prompt engineering can help us better understand the capabilities and limitations of large language models (LLMs).
Researchers often use prompt engineering to enhance the capabilities of LLMs in various tasks such as question answering and arithmetic reasoning. Developers design powerful and effective prompts to interface with LLMs and other tools using prompt engineering techniques.
But prompt engineering isn't just about designing and developing prompts. It also encompasses various skills and techniques that are useful when interacting with LLMs. Prompt engineering is a critical skill when it comes to understanding, interfacing with, and building upon the capabilities of LLMs. By mastering prompt engineering, you can improve the safety of LLMs, create new features, and supplement LLMs with domain knowledge and external tools.
Setting Up LLM
When working with prompts, you interact with the LLM directly or via an API. A few parameters can be adjusted to get different results for the prompts.
Max Tokens
This parameter specifies how many tokens the GPT model should generate at a time. A 'token' is a basic unit that the GPT model understands and uses to generate text. Tokens can be anything from characters, words, or punctuation marks.
For instance, if the 'max tokens' value is set to 50, the model will generate a maximum of 50 tokens at a time. The larger this value, the longer the text the model can generate, but the computational cost also increases. Conversely, the smaller this value, the shorter the text the model generates, but it can generate responses more quickly.
Temperature
In ChatGPT and other GPT family models, "temperature" is a hyperparameter that controls the variability of the model's output. The value of temperature can be anything above 0.
Temperature works as follows
At higher values of temperature (for example, 1.0 or higher), the model produces more varied and randomized answers. This has the effect of increasing the entropy of the output.
On the other hand, for lower temperature values (e.g., 0.1), the model produces more predictable and consistent answers. This has the effect of lowering the entropy of the output.
In other words, high temperature values cause the model to take more risks and provide more varied answers, while low temperature values cause the model to provide safer, more consistent answers.
These features can be useful for using GPT models to regulate the diversity of output in a variety of applications. For example, a high temperature value might be appropriate for creative writing tasks, while a low temperature value might be appropriate for business reports or formal communications.
Simply put, a higher set temperature increases randomness, which can lead to more diverse and creative results. In essence, you're increasing the weight of other possible tokens.
In terms of application, we recommend using lower temperature values for tasks like fact-based QA to encourage more factual and concise responses. For poetry writing or other creative tasks, it may be advantageous to use higher temperature values.
Top_p
Another hyperparameter that controls the diversity of the output. This is also known as "nucleus sampling" or "top-p sampling".
Top-p sampling is when the model only considers the most likely words in the probability distribution of selecting a word. The value of "top_p" takes on a value between 0 and 1, and you want to generate a subset with a value of p. This subset contains the most likely words from the entire probability distribution, whose cumulative probability exceeds p. For example, if the value of top_p is 0, the model will only consider the most likely words from the probability distribution of selecting a word.
For example, if the value of top_p is 0.9, the model will only consider the subset of likely words at each step whose probability cumulatively adds up to 0.9. This makes the model's output more diverse, but increases the risk of including irrelevant or meaningless words.
Top-p sampling can be used in conjunction with temperature to control both the diversity and consistency of the output. The difference between the two is that temperature "smoothes" or "sharpens" the entire probability distribution, while top_p adjusts the probability distribution by limiting the set of words that are likely.
As a sampling technique, top_p allows you to control how deterministic your model is when generating responses. If you're looking for accurate, factual answers, keep this value low. If you want a more varied response, increase it to a higher value.
As a general rule, we recommend changing one, not both.
Before we get started with some basic examples, keep in mind that your results may vary depending on the version of LLM you use.
#NE-ILGI #AIDiary #SharePrompts #ChatGPT #DiaryApp
https://www.sharegpt.cc/
Comments
Post a Comment