Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Exploring AutoML video intelligence

I want to explore on AutoML video intelligence to get explore metadata after analyzing the videos on own AI video agents. 

I like to do video analysis, it is like to get details of the video description, like background has which landmark, which celebrity is in there etc.

Actually, I like to see if I can use these Google AutoML to detect and get scene description with or without openAI current process. 

So, to be concluded:
1. I want to know the key points of applying AutoML video intelligence into own systems.

2. What are the parameters, capabilities and limitations of it?

3. How can I test it out to get my expectation?

 ps: I've checked the documentation from Google Cloud, but I didn't get it. Could you help please?

0 2 2,178
2 REPLIES 2

Hi @PhuuPwint,

Welcome to Google Cloud Community!

Let's break down how you can use Google Cloud's AutoML Video Intelligence (now largely superseded by Vertex AI Video Intelligence) to achieve your video analysis goals. 

  1. Key Points of Applying AutoML Video Intelligence (Vertex AI Video Intelligence):
  • Pre-trained Models: Vertex AI Video Intelligence offers pre-trained models for tasks like object detection, label detection, explicit content detection, and shot change detection. These are readily available and require minimal setup.
  • Custom Models (More Advanced): While not directly "AutoML" in the old sense, you can train custom models on your own data for more specialized needs (e.g., detecting specific objects relevant to your videos). This requires significantly more data and expertise.
  • Scalability: Google Cloud's infrastructure allows you to process large volumes of video data efficiently.
  • API Integration: You'll interact with the service primarily through REST APIs or client libraries (Python, Node.js, etc.). This allows for seamless integration into your own systems.
  • Metadata Extraction: The core output is metadata. The API returns JSON responses containing information about detected objects, labels, and other features within your videos.
  1. Parameters, Capabilities, and Limitations:
  • Parameters: Key parameters include the video file itself (URI or uploaded file), features you want to extract (e.g., LABEL_DETECTION, OBJECT_TRACKING, EXPLICIT_CONTENT_DETECTION), and optionally, configuration settings for those features (e.g., confidence thresholds).
  • This page contains the capabilities of Cloud Video Intelligence API.
  • Limitations:
    • Accuracy: Pre-trained models' accuracy depends on the quality of the training data and the diversity of your input videos. They might not be perfect in all scenarios.
    • Contextual Understanding: The system excels at recognizing individual objects and labels but struggles with complex scene understanding or narrative interpretation. 
    • Cost: Processing video is computationally expensive, so costs can add up depending on your video volume and the features you use.
  1. Testing and Getting Your Expected Results:
  1. Set up a Google Cloud Project: Create a project and enable the Vertex AI Video Intelligence API.
  2. Prepare your Videos: Gather a sample set of videos that represent the types of content you'll be analyzing. Aim for diversity to test the model's robustness.
  3. Use the API: Utilize the Vertex AI Video Intelligence API (or client library) to analyze your sample videos. Experiment with different features and parameters. The response will be JSON containing the detected objects and labels. 

I hope the above information is helpful.

Hello Ruthseki,

Thank you for your reply and answers. One thing, I just want to know about the capabilities of AutoML video intelligence instead of Video Intelligence API. I think that this API is more expensive than AutoML.

Can you tell me about thingy only relevant to AutoML video intelligence, including how it works inside such as frame level or landmark detection?

Thank you.