Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

best DS solution for the use case below

we are looking to set a pipeline that will contain two layers , we are thinking a creating a dag that combines the two below layers, but we are still not sure what tools to use for the data science layer

  • data generation layer: Generation of a Table in bigquery containing around hundreds of thousands of rows (product) and around 26 fields based on a sql logic (bigquery)   , baring in mind that the number of rows might increase to millions in the near future
  • Data science layer :prediction of possible outcomes of each product   by a data science model in  written in python.  (for each product we need to predict what are the next possible stages  for it), there is a lot of  computation done by the model which requires 20 types of gaussian mixture fittings, also the performance  will depend on the amount of input products/ output outcomes 

The solution would be running hourly , every day. Priorities of criterias whilst looking for a solution for the data science layer of the pipeline  are as follows:

  1. Inference : possibility of making  the model scale horizontally  ( increase number of samples x ) or vertically (number of producst n )  in order to  produce x possible outcomes for each single product , the scaling will be in the hand of the data scientist
  2. Costs
  3. Possibility of having a model registry ( similar to images registry , which will  keep a history of artifacts of the models that can be deployed)  
  4. Training whilst doing inference
  5. Possibility of giving the end user the choice of  input output (one or more specific product id as input/and to choose   the number of samples  as output for those input products)

 

0 0 51
0 REPLIES 0