Announcements
This site is in read only until July 22 as we migrate to a new platform; refer to this community post for more details.
Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

What's the training corpus of models behind GCP Natural Language APIs?

Hi, where can I find some information about which datasets are used for training models that power the natural language APIs for sentiment analysis, entity extraction, etc.? Thanks!

1 REPLY 1

comaro
Former Googler

Natural Language API is trained using different types of datasets.

  • Public datasets Examples: Five crowd-flower sentiment benchmarks
  • EAP customer datasets Examples: Feefo sentiment dataset
  • Academic datasets Examples: Stanford rotten tomatoes sentences, UCI Sentiment Labeled Sentences Data Set. See.
  • Google datasets Examples: Shopping, Play.