What's the training corpus of models behind GCP Na...

aslidemiroz · 11-17-2022 02:36 PM

Hi, where can I find some information about which datasets are used for training models that power the natural language APIs for sentiment analysis, entity extraction, etc.? Thanks!

comaro

Natural Language API is trained using different types of datasets.

Public datasets Examples: Five crowd-flower sentiment benchmarks
EAP customer datasets Examples: Feefo sentiment dataset
Academic datasets Examples: Stanford rotten tomatoes sentences, UCI Sentiment Labeled Sentences Data Set. See.
Google datasets Examples: Shopping, Play.

What's the training corpus of models behind GCP Natural Language APIs?