Hi,
Currently, I'm testing Gemini model on both Vertex and Google AI studios, but the answers are not stable. And I think this is related to hyperparameter tuning.
So, I want to know the details of temperature, top-p and top-k parameters and how they works in the text generating process. Thank you!
Solved! Go to Solution.
LLM models are not deterministic, so even using the same hyperparameters settings it is normal for different answers to be given to the same question.
In practice, hyperparameters try to tune the probabilities that a word is consistent with the text that precedes it. For example, think of a simple completion of the sentence "I slept":
As you can see, they are in order of probability: I slept in bed is the most probable completion, I slept on a cloud is the least probable. If you set the temperature to 0 then the model will almost always answer I slept in bed and sometimes I slept on the couch. If you set the temperature to the opposite, i.e. to 1, then the model will provide you with a more random response among the 4 in the example. Therefore, a temperature from 0.7 to 1 is more suitable for creative contexts, while a temperature between 0 and 0.3 is preferable if you want more technically precise answers. top_k and top_p act like temperature, but using different algorithms. I have no expertise on the algorithms, but in general the same temperature rule applies to both, namely lower value for less random responses and a higher value for more random responses.
I'll try to give you a real example performed with Gemini Pro 1.0. I set the maximum values top_k = 40 and top_P = 1 and kept them fixed. I did tests with different temperature values, three tests for each value. The task consists of generating a title for an article.
PROMPT
Create 1 title for the following article.
article:
As many businesses figure out new ways to go digital, one thing is clear: talent continues to be one of the key ways to enable an inclusive digital economy. Employers in Asia Pacific list technology as the leading in-demand skill, with digital marketing and e-commerce following close behind. Simultaneously, many people are looking to learn new skills that will help them meet the requirements of the evolving job market. So we must create new ways to help businesses and job seekers alike.
RESULTS
- temperature = 0.0 [top_k = 40, top_P = 1]
1. Bridging the Digital Divide: Talent as the Key to an Inclusive Digital Economy
2. Bridging the Digital Divide: Talent as the Key to an Inclusive Digital Economy
3. Bridging the Digital Divide: Talent as the Key to an Inclusive Digital Economy
- temperature = 0.3 [top_k = 40, top_P = 1]
1. Unlocking Digital Inclusion: Talent as the Key to an Equitable Economy
2. Bridging the Digital Divide: Talent and Skills for an Inclusive Digital Economy
3. Bridging the Digital Divide: Talent and Skills for an Inclusive Digital Economy
- temperature = 0.6 [top_k = 40, top_P = 1]
1. Bridging the Digital Divide: Talent as a Catalyst for an Inclusive Economy
2. Unlocking Digital Inclusion: The Role of Talent Acquisition and Reskilling
3. Unlocking an Inclusive Digital Economy: The Essential Role of Talent
- temperature = 1.0 [top_k = 40, top_P = 1]
1. Bridging the Digital Talent Gap: Empowering Businesses and Job Seekers in a Connected Future
2. Unlocking the Future of Digital Success: Talent as the Cornerstone of an Inclusive Economy
3. Bridging the Digital Divide: Talent as the Gateway to an Inclusive Digital Economy
Ciao
Hello,
for all the details I recommend referring to the official Vertex documentation
Quoting the documentation the big picture is "For each token selection step (by model), the top-K tokens with the highest probabilities are sampled. Then tokens are further filtered based on top-P with the final token selected using temperature sampling."
As a very simplified example:
Hope it's useful
Ciao
Thank you so much! But I haven't understood yet. Since I need to explain my co-workers detailed, more deep-dive examples are needed.
I will explore to the link you mentioned. Thank you.
LLM models are not deterministic, so even using the same hyperparameters settings it is normal for different answers to be given to the same question.
In practice, hyperparameters try to tune the probabilities that a word is consistent with the text that precedes it. For example, think of a simple completion of the sentence "I slept":
As you can see, they are in order of probability: I slept in bed is the most probable completion, I slept on a cloud is the least probable. If you set the temperature to 0 then the model will almost always answer I slept in bed and sometimes I slept on the couch. If you set the temperature to the opposite, i.e. to 1, then the model will provide you with a more random response among the 4 in the example. Therefore, a temperature from 0.7 to 1 is more suitable for creative contexts, while a temperature between 0 and 0.3 is preferable if you want more technically precise answers. top_k and top_p act like temperature, but using different algorithms. I have no expertise on the algorithms, but in general the same temperature rule applies to both, namely lower value for less random responses and a higher value for more random responses.
I'll try to give you a real example performed with Gemini Pro 1.0. I set the maximum values top_k = 40 and top_P = 1 and kept them fixed. I did tests with different temperature values, three tests for each value. The task consists of generating a title for an article.
PROMPT
Create 1 title for the following article.
article:
As many businesses figure out new ways to go digital, one thing is clear: talent continues to be one of the key ways to enable an inclusive digital economy. Employers in Asia Pacific list technology as the leading in-demand skill, with digital marketing and e-commerce following close behind. Simultaneously, many people are looking to learn new skills that will help them meet the requirements of the evolving job market. So we must create new ways to help businesses and job seekers alike.
RESULTS
- temperature = 0.0 [top_k = 40, top_P = 1]
1. Bridging the Digital Divide: Talent as the Key to an Inclusive Digital Economy
2. Bridging the Digital Divide: Talent as the Key to an Inclusive Digital Economy
3. Bridging the Digital Divide: Talent as the Key to an Inclusive Digital Economy
- temperature = 0.3 [top_k = 40, top_P = 1]
1. Unlocking Digital Inclusion: Talent as the Key to an Equitable Economy
2. Bridging the Digital Divide: Talent and Skills for an Inclusive Digital Economy
3. Bridging the Digital Divide: Talent and Skills for an Inclusive Digital Economy
- temperature = 0.6 [top_k = 40, top_P = 1]
1. Bridging the Digital Divide: Talent as a Catalyst for an Inclusive Economy
2. Unlocking Digital Inclusion: The Role of Talent Acquisition and Reskilling
3. Unlocking an Inclusive Digital Economy: The Essential Role of Talent
- temperature = 1.0 [top_k = 40, top_P = 1]
1. Bridging the Digital Talent Gap: Empowering Businesses and Job Seekers in a Connected Future
2. Unlocking the Future of Digital Success: Talent as the Cornerstone of an Inclusive Economy
3. Bridging the Digital Divide: Talent as the Gateway to an Inclusive Digital Economy
Ciao
My deepest thanks for your consideration.
It's really useful. I've got the answer.
Phuu
User | Count |
---|---|
2 | |
2 | |
1 | |
1 | |
1 |