Hi,
I am currently completing the Data Analytics Google Career Certificate. As part of it, they are exercises with walk-through. I always try completing it on my own before reading the walk-through. The question was to find the bike had the most usage hours and from which station said bike departed the most often. I found the answer to this question through my method: bike 370 from station 3798.
However, out of curiosity I wanted to add an extra question: how many trips of that most used bike started from this departs most often from station.
With my method, I get 167 but the real answer is 177.
I cannot figure out why I have this discrepancy. Please help me understand/fix the issue in my script. Lots of thanks!
Here my script:
##Find the most used bike and create a temporary table to store the information for later##
WITH
longest_used_bike AS
(
SELECT
bike_id,
SUM(duration_minutes) AS total_use_time
FROM `bigquery-public-data.austin_bikeshare.bikeshare_trips`
GROUP BY bike_id
ORDER BY total_use_time DESC
LIMIT 1
)
##Find the station the most used bike departs from the most often##
SELECT
original_set.start_station_name,
original_set.start_station_id,
original_set.bike_id,
COUNT (start_station_id) AS unique_starts,
FROM `bigquery-public-data.austin_bikeshare.bikeshare_trips` AS original_set
INNER JOIN longest_used_bike ON original_set.bike_id = longest_used_bike.bike_id
GROUP BY start_station_name,start_station_id, bike_id
ORDER BY unique_starts DESC
LIMIT 1
Here Google Data Analytics Course' script:
WITH
longest_used_bike AS
(
SELECT
bike_id,
SUM(duration_minutes) AS total_use_time
FROM `bigquery-public-data.austin_bikeshare.bikeshare_trips`
GROUP BY bike_id
ORDER BY total_use_time DESC
LIMIT 1
)
##Find the station##
SELECT
original_set.start_station_id,
COUNT (*) AS unique_starts
FROM longest_used_bike
INNER JOIN `bigquery-public-data.austin_bikeshare.bikeshare_trips` AS original_set
ON longest_used_bike.bike_id = original_set.bike_id
GROUP BY original_set.start_station_id
ORDER BY unique_starts DESC
Given the previous queries, the following SQL query should accomplish what you're asking for:
WITH
longest_used_bike AS
(
SELECT
bike_id,
SUM(duration_minutes) AS total_use_time
FROM `bigquery-public-data.austin_bikeshare.bikeshare_trips`
GROUP BY bike_id
ORDER BY total_use_time DESC
LIMIT 1
),
most_used_station AS
(
SELECT
original_set.start_station_id,
COUNT (*) AS unique_starts
FROM longest_used_bike
INNER JOIN `bigquery-public-data.austin_bikeshare.bikeshare_trips` AS original_set
ON longest_used_bike.bike_id = original_set.bike_id
GROUP BY original_set.start_station_id
ORDER BY unique_starts DESC
LIMIT 1
)
SELECT
COUNT (*) AS trips_from_most_used_station
FROM `bigquery-public-data.austin_bikeshare.bikeshare_trips`
WHERE bike_id = (SELECT bike_id FROM longest_used_bike)
AND start_station_id = (SELECT start_station_id FROM most_used_station)