Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Get Cloud Vision API as good as Google Lens

As part of a student team, I am building a system to classify used shoes.

I know that Google Lens is doing a really good job here.

I came across Google Cloud Vision API (which should be a similar thing) and implemented this in python.

For clean, well-angled images like this Air Force One: 

4q5n4

 

 

I am getting really promising results:

10 Web entities found: 

    Score      : 0.9957345128059387
    Description: Nike Air Force 1 07 LV8 EMB Raiders Mens

    Score      : 0.7279999852180481
    Description: Nike

    Score      : 0.7279999852180481
    Description: 

    Score      : 0.7130167484283447
    Description: Nike Mens Air Force 1 '07 LV8 'Metallic Swoosh Pack

    Score      : 0.7052000164985657
    Description: Sneakers

    Score      : 0.7049999833106995
    Description: Shoe

    Score      : 0.6831490993499756
    Description: Nike Mens Air Force 1 Low

    Score      : 0.6559000015258789
    Description: Nike

    Score      : 0.6399800181388855
    Description: Nike Air Max

    Score      : 0.6158000230789185
    Description: Men's Shoe

If however, i input real-world images like this old used Nike Tanjun: 

 

Things fall apart:

8 Web entities found: 

    Score      : 0.5776046514511108
    Description: Shoe

    Score      : 0.4444863796234131
    Description: Product design

    Score      : 0.42980000376701355
    Description: Design

    Score      : 0.4197726845741272
    Description: Product

    Score      : 0.39287227392196655
    Description: Activewear

    Score      : 0.384799987077713
    Description: Walking

    Score      : 0.35569998621940613
    Description: Walking Shoe

    Score      : 0.3215000033378601
    Description: outdoor

But if I upload the image to google lens, I could still figure out the right label: 

 

 

Logo detection (Nike) almost always works. And using this, I could for example search after the most often occurring word after the Logo (Tanjun) to figure out the model.

It must be mentioned that the data of our system will be better than that, there will be multiple images taken from different angles and very good lighting conditions.

Now i am trying to figure out how to

EITHER: Get Vision API working in the same way as Google Lens

OR: Acces Google Lens data in a somehow convenient way (should in the best case run from a raspberry pi)

 

 

 

0 3 1,521
3 REPLIES 3

push. I urgently need help.

I'm in the same shoes. The real answer is, the API does not work so well.

Hi, i'm facing the same problem, do you find any solution to it?