Gemini - Long running generations cutting off. - Page 2

codie_petersen · 12-28-2023 04:06 PM

Hey All,

I have an agent with a "smart format" skill that basically tells the agent to format some arbitrary data into markdown. The data in particular is coming from sources like PDFs or OCR, so sometimes the order or the formatting is super weird so I can't really do this programmatically.

Here is the skill instructions:

---
You are a bot that formats run on text into readable markdown.

Do not use headers like #, ##, ###, etc. Only use bolding and italics.

DO NOT SUMMARIZE, CHANGE, REMOVE OR ADD ANY TEXT OR PUNCTUATION. JUST FORMAT IT.

Maintain the wording, spellings, and punctuation of the original text.

However, the text was collected using OCR.

In some cases the text may be malformed or have incorrect newlines and missing spaces or indentation.

There may also be cases where excessive newlines, spaces, and tabs are used.

Detect and correct such issues.

The text is standard operating procedures and most of the time the text contains sections, subsections, and subsubsections.

Try to detect these sections and subsections and format them appropriately, but do not add or remove numberings or letterings.

Here is the input text:

{input_text}

Below, provide the reformatted markdown output of the ENTIRE text:

---

Here is this {input text}:

---

SELF-TAUGHT OPTIMIZER (STOP):
RECURSIVELY SELF-IMPROVING CODE GENERATION
Eric Zelikman1,2
, Eliana Lorch, Lester Mackey1
, Adam Tauman Kalai1
1Microsoft Research, 2Stanford University
ABSTRACT
Several recent advances in AI systems (e.g., Tree-of-Thoughts and Program-Aided
Language Models) solve problems by providing a “scaffolding” program that structures multiple calls to language models to generate better outputs. A scaffolding
program is written in a programming language such as Python. In this work, we
use a language-model-infused scaffolding program to improve itself. We start
with a seed “improver” that improves an input program according to a given utility
function by querying a language model several times and returning the best solution.
We then run this seed improver to improve itself. Across a small set of downstream
tasks, the resulting improved improver generates programs with significantly better
performance than its seed improver. A variety of self-improvement strategies
are proposed by the language model, including beam search, genetic algorithms,
and simulated annealing. Since the language models themselves are not altered,
this is not full recursive self-improvement. Nonetheless, it demonstrates that a
modern language model, GPT-4 in our proof-of-concept experiments, is capable
of writing code that can call itself to improve itself. We consider concerns around
the development of self-improving technologies and evaluate the frequency with
which the generated code bypasses a sandbox.
1 INTRODUCTION
A language model can be queried to optimize virtually any objective describable in natural language.
However, a program that makes multiple, structured calls to a language model can often produce
outputs with higher objective values (Yao et al., 2022; 2023; Zelikman et al., 2023; Chen et al.,
2022). We refer to these as “scaffolding” programs, typically written (by humans) in a programming
language such as Python. Our key observation is that, for any distribution over optimization problems
and any fixed language model, the design of a scaffolding program is itself an optimization problem.
In this work, we introduce the Self-Taught Optimizer (STOP), a method in which code that applies a
language model to improve arbitrary solutions is applied recursively to improve itself. Our approach
begins with an initial seed ‘improver’ scaffolding program that uses the language model to improve a
solution to some downstream task. As the system iterates, the model refines this improver program.
We use a small set of downstream algorithmic tasks to quantify the performance of our self-optimizing
framework. Our results demonstrate improvement when the model applies its self-improvement
strategies over increasing iterations. Thus, STOP shows how language models can act as their own
meta-optimizers. We additionally investigate the kinds of self-improvement strategies that the model
proposes (see Figure 1), the transferability of the proposed strategies across downstream tasks, and
explore the model’s susceptibility to unsafe self-improvement strategies.
Genetic
Algorithm
Beam Search /
Tree Search
Multi-Armed
Prompt Bandit
Vary Temperature
to Explore
Simulated-annealing
Based Search
Decomposing and
Improving Parts
?
Figure 1: Example self-improvement strategies proposed and implemented by GPT-4. Each
strategy is then used as scaffolding to revise arbitrary code, including the scaffolding code itself.

---

Here is the output, very consistent, always stops at the same length:

---

**SELF-TAUGHT OPTIMIZER (STOP):**

**RECURSIVELY SELF-IMPROVING CODE GENERATION**

Eric Zelikman1,2, Eliana Lorch, Lester Mackey1, Adam Tauman Kalai1
1Microsoft Research, 2Stanford University

**ABSTRACT**

Several recent advances in AI systems (e.g., Tree-of-Thoughts and Program-Aided
Language Models) solve problems by providing a “scaffolding” program that structures multiple calls to language models to generate better outputs. A scaffolding
program is written in a programming language such as Python. In this work, we
use a language-model-infused scaffolding program to improve itself. We start
with a seed “improver” that improves an input program according to a given utility
function by querying a language model several times and returning the best solution.
We then run this seed improver to improve itself. Across a small set of downstream
tasks, the resulting improved improver generates programs with significantly better
performance than its seed improver. A variety of self-improvement strategies
are proposed by the language model, including beam search, genetic algorithms,
and simulated annealing. Since the language models themselves are not altered,
this
---

Even though I have the max tokens/characters maxed out (8192), it still only generates around 1200ish chars, then quits generating and returns the text (I have streaming off). I'm actually not that familiar with GCP, so I'm wondering if there is like a response timeout that is hidden somewhere that I am missing.

Just wondering if someone has run across this behavior or knows what might be happening. Thanks.