Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Custom stopwords in Postgres in Cloud SQL

I am trying to use a custom stopwords file in our Postgres DB hosted in Cloud SQL. We need access to the $SHAREDIR/tsearch_data/ directory, but this is not accessible with the hosted Cloud solution. Is there a workaround for this? Surely there are others who want a custom stopwords file in their database.

0 3 177
3 REPLIES 3

Accessing the $SHAREDIR/tsearch_data/ directory in Cloud SQL for PostgreSQL is not possible due to the managed nature of the service, which restricts file system access. However, a table-based approach, combined with custom text search configurations, offers a robust and flexible solution for implementing custom stopwords within the Cloud SQL environment. Here are some steps you can take:

  1. Create a Stopword Table

    Create a table to store your custom stopwords:

     
    CREATE TABLE custom_stopwords (
        word TEXT PRIMARY KEY
    );
    
  2. Populate the Stopword Table

    Add your custom stopwords:

     
    INSERT INTO custom_stopwords (word) VALUES
        ('example1'),
        ('example2'),
        ('example3');
    
  3. Create a Custom Text Search Dictionary

    Create a custom dictionary based on the pg_catalog.simple template:

     
    CREATE TEXT SEARCH DICTIONARY custom_stopword_dict (
        TEMPLATE = pg_catalog.simple,
        STOPWORDS = custom_stopwords
    );
    
  4. Create a Custom Text Search Configuration

    Modify a copy of the pg_catalog.english configuration to use the custom dictionary:

     
    CREATE TEXT SEARCH CONFIGURATION custom_config (COPY = pg_catalog.english);
    ALTER TEXT SEARCH CONFIGURATION custom_config
        ALTER MAPPING FOR asciiword, asciihword, hword_asciipart, word, hword, hword_part
        WITH custom_stopword_dict, english_stem;
    
  5. Use the Custom Configuration in Queries

    Apply the custom configuration in full-text search queries:

     
    SELECT *
    FROM your_table
    WHERE to_tsvector('custom_config', your_column) @@ to_tsquery('custom_config', 'search term');

The table-based approach, combined with PostgreSQL's custom text search configurations, is an efficient, secure, and Cloud SQL-compatible solution for managing custom stopwords. It provides flexibility and optimal performance while fully leveraging PostgreSQL's powerful text search features, making it an ideal choice for managed database environments.

Thanks for the response, but step 3 does not work. I see this error:

`ERROR: could not open stop-word file "/share/tsearch_data/custom_stopwords.stop": No such file or directory SQL state: F0000`

Sorry for the confusion. Yes, in Postgres, the STOPWORDS parameter in a TEXT SEARCH DICTIONARY does requires a file in the $SHAREDIR/tsearch_data/ directory which as mentioned is not possible in Cloud SQL

To implement custom stopwords in Cloud SQL, the best solution is to preprocess the text to remove stopwords dynamically using a custom_stopwords table before creating a tsvector. This ensures compatibility with Cloud SQL and leverages PostgreSQL's full-text search capabilities.

1. Create a Stopword Table

Start by creating a table to store your custom stopwords:

 
CREATE TABLE custom_stopwords (
    word TEXT PRIMARY KEY
);

2. Populate the Stopword Table

Add your custom stopwords:

 
INSERT INTO custom_stopwords (word) VALUES
    ('example1'),
    ('example2'),
    ('example3');

3. Preprocess Text and Filter Stopwords

Filter out stopwords dynamically when generating the tsvector. Use an UPDATE statement to populate a search_vector column:

 
ALTER TABLE your_table ADD COLUMN search_vector tsvector;

UPDATE your_table
SET search_vector = to_tsvector('pg_catalog.english', (
    SELECT string_agg(word, ' ')
    FROM (
        SELECT word
        FROM unnest(string_to_array(your_column, ' ')) word  -- Tokenize the text
        WHERE word NOT IN (SELECT word FROM custom_stopwords) -- Filter out stopwords
    ) AS filtered_words
));

4. Create a GIN Index

Index the search_vector column for efficient full-text search:

 
CREATE INDEX idx_search_vector ON your_table USING gin(search_vector);

5. Perform Full-Text Searches

Use the search_vector column in your queries:

 
SELECT *
FROM your_table
WHERE search_vector @@ to_tsquery('pg_catalog.english', 'search term');

This approach combines the flexibility of table-based stopword management with the performance and functionality of PostgreSQL's full-text search engine. It ensures efficient, maintainable, and scalable text search within the constraints of a managed database environment.