Model Selection and labeling for Video Data

I am working on building a AI model to analyze video data from underwater baited cameras. My end goal is have a model that will detect when an animal swims through the video.  The vast majority of the video looks like photo 1 (open, blue ocean), but occasionally a shark or fish swims through (photo 2).   

I have trained each version of the video models on the Vertex AI platform; Action Recognition, Video Classification, and Video object tracking.  The only model that appears to be somewhat working is the object tracking however the average precision remains low (~.35).  The largest, most recent training data set I have used has 1124 labels for 'Animal' and 534 labels for 'Bait Can', across 25 videos.  Despite this I am still only getting a low precision and when I run a batch predicted on know video it misses some pretty obvious sharks swimming through the frame.   This is my third iteration of the annotation/training data set, each time adding more videos and labels.  Despite this my model performance is not improving, hence I am getting a little discourage on this approach. 

My questions for the community are:

Does this seem like the right approach or is there another model I should be looking into?

Training/labeling methods...  Is it better to give the video object tracking auto ML model frame by frame labels of an animal swimming through the water OR lots of separate instances of animal through out a video.  

Thanks ahead of time for any feed back or ideas.  And I am happy to elaborate more or share models/video etc.... 

Photo 1 Open OceanPhoto 1 Open OceanPhoto 2 Shark and fishPhoto 2 Shark and fish

 

 

1 REPLY 1

Yes it is the right approach

TEST
احصل على Outlook for Android<>