Re: The query is doing full table scan even if an ...

omarahmed · 10-04-2022 08:04 AM

it looks like the optimizer is doing table scan instead of index scan if I select column that are not part of the index!.

I created the following table and index

CREATE TABLE customer_lastseen_products (

customer_ref_value STRING(50) NOT NULL,

customer_ref_type STRING(20) NOT NULL,

sku_config STRING(40) NOT NULL,

mp_code STRING(20) NOT NULL,

is_added_to_cart BOOL NOT NULL,

is_purchased BOOL NOT NULL,

l ast_visit_time TIMESTAMP OPTIONS( allow_commit_timestamp = true)

)PRIMARY KEY(customer_ref_value, sku_config),

ROW DELETION POLICY (OLDER_THAN(last_visit_time, INTERVAL 30 DAY))

CREATE INDEX customercodeIndex3 ON customer_lastseen_products(customer_ref_value, customer_ref_type, mp_code, last_visit_time DESC);

But this query is doing full table scan

SELECT

sku_config , is_added_to_cart, is_purchased

FROM customer_lastseen_products

WHERE (customer_ref_value, customer_ref_type) in (('0f2e9ed9-2d5e-4c78-b03f-0c6dd3f65598', 'customer_code'), ('', 'visitor_id'))

AND mp_code = "mp"

AND last_visit_time between '2020-10-03T12:35:59' and '2022-10-03T12:35:59'

order by last_visit_time desc

JGerman23

As it was already mentioned in your Stackoverflow question, internal documentation recommends using the directive FORCE INDEX to force the index you want to use, which will increase the consistency of your searches. The query optimizer for Cloud Spanner may take up to 3 days to begin using an index after it has been created since it needs that time to gather the database's information.

Here is the direct link to review more information on Cloud Spanner - Secondary Indexes.

aayushagar

In my case it has already been more than 3 days, instead it's been weeks. But why is index still not being used ?

dglick

If the amount of data in the table is small, the optimizer may estimate that a full table scan is faster, especially if the query is not an index-only scan.

A quick experiment would be to change the SELECT to only read indexed fields, and/or to only read one prefix instead of two. E.g.

SELECT customer_ref_value, customer_ref_type, last_visit_time
FROM customer_lastseen_products
WHERE customer_ref_value= '0f2e9ed9-2d5e-4c78-b03f-0c6dd3f65598' and customer_ref_type = 'customer_code' AND mp_code = "mp"
AND last_visit_time between '2020-10-03T12:35:59' and '2022-10-03T12:35:59'
order by last_visit_time desc

If the optimizer selects the index for that query, then it's highly likely that with more data the optimizer would also select the index for the original query.

The query is doing full table scan even if an index is already defined