Hi, as I search through Internet, I see users have problems with setting correct resources for Llama2. Is there anybody who could share their working setups?
Currently, I'm trying to deploy 13B model on n1-standard-4, with one Tesla V100 accelerator without luck (still getting timeout and no info in logs). 7B worked fine on this setup.
Vertex's recommended approach is to use A100 40GB. I don't think a V100 GPU would be sufficient for 13B LLama2
Run
User | Count |
---|---|
2 | |
2 | |
1 | |
1 | |
1 |