Hello,
I'm trying to run a test jupyter notebook of a LSTM model running tensorflow. I have tried setting the GPU memory limit like suggested here. But still the I get the error mentioned above. I can not find anything realted to GC vertex AI and everyone suggest setting the gpu memory in case of such errors.
For reference I have tried to run this as well on my Vertex AI jupyter lab and it crashes as well. The only thing I added was this:
gpus = tf.config.list_physical_devices('GPU')
if gpus:
tf.config.set_logical_device_configuration(
gpus[0],
[tf.config.LogicalDeviceConfiguration(memory_limit=12288)]
)
logical_gpus = tf.config.list_logical_devices('GPU')
print(len(gpus), "Physical GPU,", len(logical_gpus), "Logical GPUs")
On my personal computer it runs just fine, but it would take 13 hours to train which is not a option for me at the moment.
Any help would be appriciated.
Barnabas.
Hi, can you share the error you encountered?
I get this when I reach the training process part.
Thanks for sharing @HBarnabas, since this seems to be an issue specific to your project, you may raise a 1:1 GCP support. This kind of support has access to your internal resources and may check your issue in a more comprehensive way.
I personnaly don't think it's a fair thing to ask a user to pay for support, while this issue only presists on GCP.
I have furthure investigated this issue and I can safely conclude that it's not an issue with my project. I have run the included tutorials for tensorflow 2 that are on the Vertex AI platform. I have run the entire notebook (06_rnns) and it ran everything just fine up until the point it reached the model which had LSTM layers (Exerecise 3). I have attached a screenshot of this.
Please help me figure this out.
User | Count |
---|---|
2 | |
2 | |
1 | |
1 | |
1 |