I have used set counter and assigned it a name. Then I have used that named variable and set-column to create a primary key (distinct) but in preview of wrangler its showing distinct values but when I deployed and ran the pipeline. Results are little different I have same value repeated throughout the column.
Following is the directives for reference.
```
set-column :counter ''
copy :DATE :DATE_copy true
increment-variable count1 1 true
set-column :PK_DTL_WEB counter+count1
drop :counter
drop :DATE_copy
```
Solved! Go to Solution.
In Data Fusion, setting a pipeline-level variable isn't as straightforward as using a single command in the Wrangler or any other plugin's configuration directly. Instead, you typically manage such variables through the pipeline's configuration or by passing runtime arguments when the pipeline is deployed or triggered.
Here's how you can manage and use pipeline-level variables effectively in Cloud Data Fusion:
Steps to set runtime arguments:
counter=1
).Using Runtime Arguments in Transformations Once you've set a runtime argument, you can use it in your transformations by referencing it in the plugin properties where variables are supported. For example, if you're using a plugin that supports macro syntax, you can reference the runtime argument like ${counter}
. Note that direct increment operations on this runtime argument within transformations (like adding directly within a Wrangler directive) are not supported. Instead, you'd need to handle such logic in a more stateful component or script.
Scripting Plugins For advanced use cases, consider using scripting plugins like Python or JavaScript, which can execute more complex logic. You can read runtime arguments, perform calculations, manage state, and pass the output to subsequent steps.
Custom Plugins If your needs exceed the capabilities of the available transformations and runtime configurations, developing a custom plugin might be necessary. Custom plugins can manage internal state, perform complex transformations, and utilize runtime arguments as needed.
Example Using Python Executor Plugin: If you need to generate a unique sequence number, you might use a Python executor with logic similar to this:
# Assuming 'counter' is passed as a runtime argument
import datetime
# Generate a timestamp-based ID
ts = datetime.datetime.now().strftime("%Y%m%d%H%M%S%f")
output = f"{ts}_{counter}"
emit(output)
Considerations
Setting and managing global or pipeline-level variables in Data Fusion requires a good understanding of the tool's capabilities and limitations. For unique key generation, using a combination of timestamping and passed-in initial values (like a counter) in a scripted or custom plugin often provides the best balance of uniqueness and simplicity.
There seems there's a discrepancy between the results in your Wrangler preview and the actual pipeline execution. The likely issue involves how variables are scoped and handled within Cloud Data Fusion, particularly with concurrent execution environments.
Here's an approach to reliably create your primary key:
Consistent Variable Scoping:
counter
variable at the pipeline level to ensure it is accessible across all transformations and maintained throughout the entire pipeline execution.counter
variable directly in your transformation without resetting or re-initializing it within the transformation itself.Transformation Logic:
// In the Cloud Data Fusion Pipeline definition
set-pipeline-variable :counter 1
// Inside your Wrangler transformation
copy :DATE :DATE_copy true // Optional, if you need this copy for other reasons
set-column :PK_DTL_WEB pipeline(:counter) + :count1 // Use pipeline-scoped counter
increment-pipeline-variable :count1 1 true // Increment after using
drop :DATE_copy // Optional, if no longer needed
Explanation:
counter
at the pipeline level ensures it starts at 1 when the pipeline begins.pipeline(:counter)
: This syntax accesses the pipeline-level counter
within the transformation.:count1
to the pipeline's counter
value, each row receives a unique primary key.increment-pipeline-variable :count1 1 true
: Ensures that :count1
increments after each row, maintaining unique key values.:counter
and :DATE_copy
if they are not needed downstream.Why This Approach Works:
Additional Tips:
counter
variable's values throughout the pipeline run, verifying correct incrementation and primary key generation.
// In the Cloud Data Fusion Pipeline definition
set-pipeline-variable :counter 1
Where exactly I should give this command? I tried to use it in one of the wrangler surely thats not the way.
How can I define pipeline level variable
In Data Fusion, setting a pipeline-level variable isn't as straightforward as using a single command in the Wrangler or any other plugin's configuration directly. Instead, you typically manage such variables through the pipeline's configuration or by passing runtime arguments when the pipeline is deployed or triggered.
Here's how you can manage and use pipeline-level variables effectively in Cloud Data Fusion:
Steps to set runtime arguments:
counter=1
).Using Runtime Arguments in Transformations Once you've set a runtime argument, you can use it in your transformations by referencing it in the plugin properties where variables are supported. For example, if you're using a plugin that supports macro syntax, you can reference the runtime argument like ${counter}
. Note that direct increment operations on this runtime argument within transformations (like adding directly within a Wrangler directive) are not supported. Instead, you'd need to handle such logic in a more stateful component or script.
Scripting Plugins For advanced use cases, consider using scripting plugins like Python or JavaScript, which can execute more complex logic. You can read runtime arguments, perform calculations, manage state, and pass the output to subsequent steps.
Custom Plugins If your needs exceed the capabilities of the available transformations and runtime configurations, developing a custom plugin might be necessary. Custom plugins can manage internal state, perform complex transformations, and utilize runtime arguments as needed.
Example Using Python Executor Plugin: If you need to generate a unique sequence number, you might use a Python executor with logic similar to this:
# Assuming 'counter' is passed as a runtime argument
import datetime
# Generate a timestamp-based ID
ts = datetime.datetime.now().strftime("%Y%m%d%H%M%S%f")
output = f"{ts}_{counter}"
emit(output)
Considerations
Setting and managing global or pipeline-level variables in Data Fusion requires a good understanding of the tool's capabilities and limitations. For unique key generation, using a combination of timestamping and passed-in initial values (like a counter) in a scripted or custom plugin often provides the best balance of uniqueness and simplicity.