Best practice for training on very large documents

I am creating a chatbot using Dialogflow-Cx. A few of the documents in my datastore are very large, with a few hundreds of pages, some are in PDF format and some are in HTML format.

I am wondering how DF handles such big documents, and whether I can help it perform better.

Does anyone know how DF breaks down such documents? Is it by page, paragraph, chapter, subchapter, or something else?

What if instead I break down the large document into a few smaller documents, e.g. one per chapter? Will it improve the Bot?

0 5 1,447

5 REPLIES 5

never-displayed