Hi all,
When compiling our artifacts to yaml file, we are expecting the following schema format in our runtime parameter value : https://github.com/kubeflow/pipelines/blob/master/api/v2alpha1/pipeline_spec.proto#L677
When I first tried to upload my yaml file with the following schema above and try to run the pipeline template, the runtime value treats everything as a string instead of converting it to it's appropriate data type.
Here is the yaml file I tried to upload and run:
```
But what I'm expecting is something like this (this run is submitted through the ai platform python library):
As an alternative solution to get the above screenshot (the expected one) - I changed the yaml file as a POC to the following:
```
It ran successfully in this use case, but for other components we are expecting an `int` as `int` not `double`. For example we are running the chicago taxi pipeline as a template but we got the following error:
```
The replica workerpool0-0 exited with a non-zero status of 1. To find out more about why your job exited please check the logs: https://console.cloud.google.com/logs/viewer?project=636071587074&resource=ml_job%2Fjob_id%2F1886953...
```
My question here, is there a way in vertex when uploading our yaml file to pipeline template, we can specify the parameter type as well? if so how? if not, what are our alternate approach? (Note here, we are also using TFX to generate these artifacts)
Hi @hobb,
Welcome back to Google Cloud Community.
The issue you are facing is that the runtime parameter value treats everything as a string instead of converting it to its appropriate data type. One possible solution to this problem is to modify your YAML file to use YAML tags to specify the data types of your runtime values.
You may use YAML tags to specify the data type of each runtime value.
For example, to specify an integer value, you can use the !!int tag, and to specify a string value, you can use the !!str tag.
Here is an example of how to modify your YAML file to use YAML tags:
root:
dag:
tasks:
input_data:
inputs:
parameters:
allow_large_results_flag:
runtimeValue:
constantValue: !!int 1
allow_pre_computation_flag:
runtimeValue:
constantValue: !!int 0
create_disposition:
runtimeValue:
constantValue: !!str CREATE_IF_NEEDED
custom_config:
componentInputParameter: custom-config
labels:
runtimeValue:
constantValue: !!str 'null'
non_artifact_input_table:
componentInputParameter: input-table
union_bq_shards_flag:
runtimeValue:
constantValue: !!int 0
write_disposition:
runtimeValue:
constantValue: !!str WRITE_EMPTY
In this modified YAML file, you have to use YAML tags to specify the data types of each runtime value. For example, we have used the !!int tag to specify that the allow_large_results_flag and allow_pre_computation_flag runtime values are integers, and we have used the !!str tag to specify that the create_disposition, labels, and write_disposition runtime values are strings.
By using YAML tags to specify the data types of your runtime values, the Kubeflow Pipelines runtime should
be able to correctly interpret the data types of your runtime values, instead of treating everything as a string.
Here are some documentation that might help you:
User | Count |
---|---|
2 | |
2 | |
1 | |
1 | |
1 |