I want to use Google Vision scene text detection on 500,000+ Google Street View images from several vast geographic study areas over an eleven-year period. Is there a way to perform this Google Vision analysis on a selection of Google Street View imagery areas without downloading the images from the GSV API and then uploading them to a Vision bucket? Is there a direct pipeline in place where I can select all GSV imagery in a geographic area, perform the Vision text detection, and output the results as a CSV?