Data Fusion Driver Issues

gsweet87 · 03-27-2024 07:44 AM

Hello, I am new to Data Fusion, and I am having an issue getting a pipeline up and running. I am trying to move data from a CloudSQL MySQL instance to Big Query. In the pipeline I have tried using the MySQL, CloudSQL MySQL and Database sources, but I get the same error each time:

Database Source:

Spark program 'phase-1' failed with error: Plugin with id Database:source.jdbc.mysql does not exist in program phase-1 of ap

plication gs_test_two.. Please check the system logs for more details.

MySQL Source:

Spark program 'phase-1' failed with error: Plugin with id MySQL2:source.jdbc.mysql does not exist in program phase-1 of application gs_test_two.. Please check the system logs for more details.

CloudSQLMySQL Source:

Spark program 'phase-1' failed with error: Plugin with id CloudSQL MySQL:source.jdbc.cloudsql-mysql does not exist in program phase-1 of application gs_test_two.. Please check the system logs for more details.

So as you can see, basically the same error each time.

I know the connections work, as I can look up the MySQL Databases and see table schemas and data through them. What could I be doing wrong here? It's like the pipeline isn't talking to the connections properly.

The instance is on a dedicated VPC with private IP, we have a VM running cloudSQL proxy, private IP is enabled on the database and is peered to the same VPC.

Thanks.

ms4446

The error message "Plugin with id Database:source.jdbc.mysql does not exist" indicates that Cloud Data Fusion cannot find the necessary plugin to connect to your MySQL database. This is usually caused by:

Missing Plugins: The required plugins might not be installed or deployed correctly.
Version Conflicts: Incompatible plugin versions could be present.
Network/Configuration Issues: Connectivity problems might be preventing the pipeline from accessing the plugins.

Troubleshooting Steps

Verify Plugin Installation and Compatibility:
- Refer to Documentation: Start by checking the Cloud Data Fusion documentation for specific plugin installation instructions and compatibility information .
- Hub: Navigate to the Hub section within your Cloud Data Fusion instance and search for the plugins "MySQL," "CloudSQL MySQL," and "Database". Install any that are missing, then deploy the plugins.
Resolve Network Issues: (If plugins are correctly installed)
- Proxy Configuration: Ensure that your Cloud SQL Proxy is operational and that Cloud Data Fusion is configured to utilize it in order to access your private IP Cloud SQL instance.
- Firewall Rules: Verify that no firewalls are blocking traffic between your Cloud Data Fusion instance and the Cloud SQL MySQL instance. Ensure necessary ports (usually 3306) are open.
- VPC Peering: Make sure the VPC peering between your Cloud SQL MySQL VPC and the Data Fusion VPC is set up correctly.
Review Connection Details:
- Credentials: Meticulously review your connection details (hostname, port, database name, username, password) within your pipeline configuration.
- Private IP: If using a private IP address for the MySQL instance, ensure you've configured the connection details and proxy correctly.
Check System Logs
- Look for Clues Examine the system logs in your Cloud Data Fusion instance for more detailed error messages that might shed light on the problem.

Additional Tips

Start with the Basic "Database" Plugin: This can isolate whether the issue is with MySQL-specific plugins or overall connectivity.
Prioritize Cloud Data Fusion Documentation: It should be your primary reference point for troubleshooting.

gsweet87

Thank you - I will go through these tomorrow 🙂

gsweet87

@ms4446 Please can you advise where I can find the 'Database' plugin. This does not appear to be in the Hub. Thanks.

ms4446

The 'Database' plugin serves as the foundational framework for database connections in Cloud Data Fusion. It leverages JDBC drivers to facilitate communication with a wide range of databases, including MySQL, PostgreSQL, and others.

How to Use the 'Database' Plugin

Obtain the JDBC Driver: Download the JDBC driver for your database (e.g., MySQL Connector/J for MySQL databases) directly from the database vendor's official website or a trusted source.
Upload the Driver:
- Navigate to the Administration section within your Cloud Data Fusion instance.
- Go to the Configuration tab or look for a section dedicated to managing external libraries or drivers.
- Upload the JDBC driver JAR file. This step is crucial for enabling your Cloud Data Fusion instance to establish connections with the database.
Create a Pipeline:
- When configuring a source node in your pipeline, choose the 'Database' plugin as the source type.
- Configure Connection Details:
  - JDBC Driver Class Name: Specify the driver class name (e.g., com.mysql.cj.jdbc.Driver for MySQL).
  - Connection URL: Enter the JDBC connection URL, which includes the hostname, port, and database name.
  - Username/Password: Provide the credentials for your database.

Important Notes

Private IPs: For databases configured with a private IP, ensure that you have set up Cloud SQL Proxy and VPC peering according to the guidelines provided in the Cloud Data Fusion documentation.
Driver Compatibility: Verify that the JDBC driver version is compatible with both your database version and Cloud Data Fusion. Incompatibilities can lead to connection issues or unpredictable behavior.
Refer to Documentation: The Cloud Data Fusion interface and features can evolve, so it's advisable to consult the latest documentation for the most current instructions on uploading JDBC drivers and configuring database connections.

Additional Considerations

Flexibility: While the 'Database' plugin offers versatility in connecting to various databases, specific configurations may require you to explicitly select the database type to ensure accurate data interpretation and handling.

gsweet87

Thank you @ms4446 .

Under the Administration -> Configuration section, I only have 3 options:

Namespaces

System Compute Profiles

System Preferences

Am I missing something obvious?

ms4446

Steps to Create a Connection in Google Cloud Data Fusion

Navigate to Namespace:
- Access the Cloud Data Fusion web interface.
- Click Menu.
- Select Namespace Admin.
Connections Section:
- Click on the Connections tab or link.
Add New Connection:
- Click Add Connection or a similar button.
Select Connection Type:
- Choose the appropriate connection type from the list (e.g., BigQuery, CloudSQL MySQL, Database, etc.).
Configure Properties:
- Connection Name: Provide a meaningful name for your connection.
- Database Details (if applicable): Enter hostname/IP address, port number, database name.
- Credentials: Enter username and password.
- Additional Properties: Configure any advanced or connection-specific settings as needed.
Test Connection:
- Thoroughly click Test Connection to verify connectivity before saving.
Save Connection:
- If the test is successful, click Create or Save.

Additional Tips:

Documentation: Always refer to the Cloud Data Fusion documentation for the specific connection type you're creating, as the required properties may vary.
Permissions: Ensure your user account has the necessary permissions to create connections.
JDBC Drivers: For database connections, make sure you've uploaded the necessary JDBC drivers beforehand, either in the Hub or the relevant namespace area.

gsweet87

Hi @ms4446 - I understand how to create a connection, and to upload a driver, but the pipeline is saying the the driver cannot be found after it has been uploaded and it seems impossible to find any support with this.

ms4446

Hi @gsweet87 ,

I understand the frustration you're experiencing with the JDBC driver not being recognized by your pipeline in Cloud Data Fusion, even after uploading it. Here's a troubleshooting checklist that might help resolve the problem:

1. Verify Driver Upload Location

Emphasize Namespace: JDBC drivers are specific to namespaces. Please confirm that the driver has been uploaded to the exact namespace where your pipeline is running.
Documentation: For more details, you can refer to the Cloud Data Fusion documentation on managing artifacts.

2. Ensure Correct Usage in Pipeline Configuration

Testing Tip: As a preliminary check, test your connection string with an external SQL client to ensure it's correctly configured, ruling out any configuration issues.

3. Consider Restarting Components

Pipeline Restart: After making any changes to the configuration, always restart your pipeline to apply those changes effectively.

4. Dive into Logs for Detailed Errors

System Logs: Look into the Cloud Data Fusion system logs for any specific JDBC driver error messages. These logs are accessible through the Google Cloud Console's Logging service.