We want to add a "AI-supported search feature" to our application, where the users ask questions in natural language and the feature returns information (and as well some kind of reference, but this is not important) from a kind of manual (or a bunch of web pages like in Confluence).
I have followed the example, created a Cloud Storage and uploaded a manual as PDF, the texts are mostly in German. The results are very bad (both in preview and using the integration code):
The manual has a page which looks like this:
Der CPC gibt bei Paid-Media-Kampagnen an, wie viel im Durchschnitt für einen Klick auf ein Werbemittel, also einen Besuch auf der Landingpage, ausgegeben werden musste.
(Manually translated: The CPC provides for paid-media-campaigns, how much on average had to be paid per click for an ad, that means, for each visit to a landing page.)
Is there a way to enhance the results, e.g. by providing somewhere the language of the Data Store/the underlying data (here: PDF)? Is there a way to hint the script into staying with a certain language, e.g. German?
Test examples:
Welche Abkürzung beschreibt die Kosten pro Klick?
CPÜ steht für Cost per Überleitung
What is a CPC:
CPC stands for Cost per Click. It is a metric that measures the average amount of money spent per click on a paid media campaign
Paid-Media-Kampagnen:
CPC (Cost per Click) og CPA (Cost per Action) er to begreber, der bruges til at beskrive Paid-Media-kampagner. CPC angiver, hvor meget der i gennemsnit skal bruges for at få en person til at klikke på et reklamemiddel.
What I found in my tests of using Knowledgebases is, HTML files in a bucket produce better results compared to the same PDF versions.
We tried both and settled with HTML versions.
User | Count |
---|---|
2 | |
2 | |
1 | |
1 | |
1 |