Re: Improving response times on Gemini API

Dinith_Kumudika · 07-03-2024 01:16 AM

Hello everyone, so I've been using Microsoft's Semantic Kernel to develop a RAG pipeline for some time and making improvements to it. I am using Gemini-1.0-pro and in average it takes 6.246 seconds for a one prompt-response cycles when invoking gemini from the my application. if i run the same prompt with exact parameters using Google AI studio it takes only 4.414 seconds with average. What could be the reason for this? is there any way i can improve that 4 seconds?

RoopeshKr

Hey, @Dinith_Kumudika May I know how you managed to get the response time, is there any function parameter to call? on Vertex AI

Dinith_Kumudika

I am using semantic Kernel on .NET. Used a simple timing functions to log time taken to execute prompt and get the response. So therefore it might not be 100% accurate

OrangiaNebula

The servers include an http header that reports the timing. In milliseconds. Variable dur= in header Server-Timing. Example:

HTTP/1.1 200 OK
Content-Type: application/json; charset=UTF-8
Vary: X-Origin
Vary: Referer
Date: Wed, 24 Apr 2024 00:05:42 GMT
Server: scaffolding on HTTPServer2
Cache-Control: private
X-XSS-Protection: 0
X-Frame-Options: SAMEORIGIN
X-Content-Type-Options: nosniff
Server-Timing: gfet4t7; dur=3756
Alt-Svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000
Accept-Ranges: none
Vary: Origin,Accept-Encoding
Connection: close
Transfer-Encoding: chunked