When a tweak becomes a refactor

Posted on: 18 April 2023

I recently wanted to make a change to the fitness page on my site. To hide football activities from appearing in run data, in my code there is a rudimentary check that rejects any run activities logged on a Monday (when I regularly play football).

This worked OK as a temporary solution, but is beginning to miss some genuine run activities. Occasionally I won’t play football on a Monday, and I'll instead go for a run. I've also started to sometimes do a short warm-up run around the park before playing football. In both these cases, these runs are mistakenly hidden.

I wanted to do a small refinement of the “Monday” logic - a simple tweak. But as all coders know, sometimes a simple tweak is not a simple tweak. And sometimes, it’s an opportunity for a good old-fashioned refactor.

The tweak

Currently, I have logic where I check the day of the activity, if it was a Monday, zero out the duration when summing the run data for a given year and week:

hash[year].running[week] += date.getDay() == 1 ? 0 : run / 1000;

My idea was to refine this logic to check a) that the day was Monday and b) that the hour was 9pm. Yes, I could potentially start a run between 9 and 10pm, but this was much less likely in my schedule than running at any point on a Monday.

New code:

hash[year].running[week] += date.getDay() == 1 && date.getHours() == 21 ? 0 : run / 1000;

Sorted. Except, for one reason or another, I didn’t have the full timestamp from my Strava data. I was converting the activity timestamp to a zero-timed date (midnight). This was happening in my rather clumsily-written method getStravaData, which retrieves Strava activity data from a variety of different Google Sheets:

// _data/strava.js

async function getStravaData() {
const data = [
await getSpreadsheetData('1MM6EkFB0*************ZqseynEQhg'), // walk
await getSpreadsheetData('1xSMO-Eeb*************_GGsEl9lnQ'), // run
await getSpreadsheetData('1wa3f5U4C*************nMzX-Oajd8'), // workouts
await getSpreadsheetData('1ygT11IY7*************YCIm-IZn7U'), // ride
await getSpreadsheetData('1E18DfjVV*************Sp1heCmq1A'), // swim
]

const [walkDataByDate, runDataByDate, footballDataByDate, rideDataByDate, swimDataByDate] = data.map(data => {
return data.reduce((hash, row) => {
const d = new Date(row.date.split(' at')[0]);
hash[d] = hash[d] || [];
hash[d].push(new Activity(row.date, row.distance));
return hash;
}, {});
});

return { walkDataByDate, runDataByDate, footballDataByDate, rideDataByDate, swimDataByDate };
}

This parses the date/time string format “Feb 9, 2021 at 9:06AM”, removes the time portion and constructs a new Date object. It then groups by this, grouping activities on the same day together.

This made doing lookups and summing durations of activities on a particular day (or week) fairly trivial. If I modify this logic to include the time as well, I’ll lose the day-grouping functionality I’d put in place. The solution? A good old fashioned refactor.

Smelling the code

Sometimes the need for a refactor is obvious - you want to achieve something which you can’t do in your current code without reworking it. Sometimes it’s just a whiff of something in your code. Like this:

const data = [
await getSpreadsheetData('1MM6EkFB0*************ZqseynEQhg'), // walk
await getSpreadsheetData('1xSMO-Eeb*************_GGsEl9lnQ'), // run
await getSpreadsheetData('1wa3f5U4C*************nMzX-Oajd8'), // workouts
await getSpreadsheetData('1ygT11IY7*************YCIm-IZn7U'), // ride
await getSpreadsheetData('1E18DfjVV*************Sp1heCmq1A'), // swim
];

Why was I waiting for each spreadsheet to download sequentially using await? Node has asynchronous capabilities but right in and I wasn’t utilising it.

A tweak to use await Promise.all([..]) instead of separate awaits gave a nice speed boost to the code and was a nice quick, early win to gather momentum on the refactor. Promise.all will wait for all promises in the array to complete, but will preserve the order of the original array (a key feature as will be clearer later). The new code:

const data = await Promise.all([
getSpreadsheetData('1MM6EkFB0*************ZqseynEQhg'), // walk
getSpreadsheetData('1xSMO-Eeb*************_GGsEl9lnQ'), // run
getSpreadsheetData('1wa3f5U4C*************nMzX-Oajd8'), // workouts
getSpreadsheetData('1ygT11IY7*************YCIm-IZn7U'), // ride
getSpreadsheetData('1E18DfjVV*************Sp1heCmq1A'), // swim
]);

The next smell was figuring out exactly why the data needed to be grouped by date - something I needed to change. The original use for the fitness data was on my feed page. I’m displaying the total distance run and walked for each week.

When constructing my feed, before grouping by week, I collate all the daily data. I had the following code:

const { walkDataByDate, runDataByDate } = await getStravaData();

rows.forEach(day => {
[day.walkDistance, day.runDistance] = [walkDataByDate[day.date], runDataByDate[day.date]].
map(data => (data || []).reduce((total, activity) => total += activity.distance, 0));
});

Far from being the most obvious code to read, the code relies on being able to retrieve all the walk and run activities for a given day, and sums together the duration (before later being grouped by week and summed again). If the meaning of code isn’t immediately obvious, it’s a good candidate for refactoring.

The second use of getStravaData in my codebase was on the fitness page, populated by the fitness data file. It was here I noticed I was doing something really funky:

// _data/fitness.js

const { walkDataByDate, runDataByDate, footballDataByDate, rideDataByDate, swimDataByDate } = await getStravaData();

const dates = new Set(Object.keys(walkDataByDate).concat(Object.keys(runDataByDate)));

const weeklyDataByYear = Array.from(dates).map(d => new Date(d)).reduce((hash, date) => {
...
return hash;
}, {});

Notice the dates array I’m building up. To process all the data stored in the walkDataByDate, runDataByDate variables, I was calculating an approximate list of all the dates by concatenating the keys from the walk and run data objects and passing them into a Set to get unique values only. I then iterated over this dates array with reduce to build up a hash of activity totals grouped by year, week and activity.

It’s unclear, unnecessary and somewhat butchering the point of using an object over a simple array. And here is the full reduce callback logic:

(hash, date) => {
const { week, year } = weekNumberYear(date);
const [walk, run, football, ride, swim] = [walkDataByDate, runDataByDate, footballDataByDate, rideDataByDate, swimDataByDate].map(data => {
return (data[date] || []).reduce((total, activity) => total += activity.distance, 0) || 0;
});
hash[year] = hash[year] || { walking: yearTemplate(year), running: yearTemplate(year), other: yearTemplate(year) };
hash[year].running[week] += date.getDay() == 1 ? 0 : run / 1000;
hash[year].walking[week] += walk / 1000;
hash[year].other[week] += (football + ride + swim) / 1000;
return hash;
}

This could be simplified.

Rework and simplify

First, we don’t need to work with objects from our Strava data. It’s much easier to use arrays we can reason with and more easily loop over.

Removing the grouping and simplifying the variable names changes our Strava data file above to this:

// _data/strava.js

const data = await Promise.all([..]);

const [walkData, runData, footballData, rideData, swimData] = data.map(activities => {
return activities.map(row => {
const [d, t] = row.date.split(' at ');
const time = convertTo24Hours(t);
const date = new Date(`${d} ${time}`);
return new Activity(date, row.distance);
});
});

return { walkData, runData, footballData, rideData, swimData };

I was able to remove the the repetitive byDate suffix from the keys in the return data which gives the code more room to breathe. Next, I was able to replace the clunky reduce with a simple map. It takes the raw data and converts it into an Activity object encapsulating a date and a distance - the only 2 metrics we’re concerned about.

Finally, we’ll ensure we’re using the full date and time recorded in the activity. To calculate the time, I extracted a new method: convertTo24Hours. This converts a time like “5:47PM” into the 24-hour format “17:47”:

function convertTo24Hours(time) {
const [hours, mins] = time.slice(0, -2).split(':').map(n => parseInt(n));
const period = time.slice(-2);
const hours24 = period == 'PM' && hours < 12 ? hours + 12 : (period == 'AM' && hours == 12 ? 0 : hours;

return `${hours24}:${mins}`;
}

Now that we’re working with a simple array of activities, in our fitness data file we can retrieve the new arrays and refactor the reduce behemoth to something hopefully a little more readable:

// _data/fitness.js

const { walkData, runData, footballData, rideData, swimData } = await getStravaData();

const fitnessData = {
walking: walkData,
running: runData.filter(({date}) => !isFootball(date)),
other: [footballData, rideData, swimData].flat(),
};

const weeklyDataByYear = Object.entries(fitnessData).reduce((hash, [type, activities]) => {
activities.filter(({date}) => date < cutoff).forEach(({ date, distance }) => {
const { week, year } = weekNumberYear(date);
hash[year] = hash[year] || { walking: yearTemplate(year), running: yearTemplate(year), other: yearTemplate(year) };
hash[year][type][week] += distance / 1000;
});
return hash;
}, {});

First we make it clear what data we want to display (walking, running & other) and what sources of activities it’s comprised of. We store this in an an easy-to-read object called fitnessData. We then convert this into key-value pairs using Object.entries so we can reduce it down to an object.

For the reduce logic, we already have the different fitness types filtered down (running, walking, other), so it can be simplified considerably, essentially into 3 main parts:

  • Get the year of the activity
  • Get the week of the activity
  • Sum the activity’s duration within that year/week

There’s a little extracted bit of logic in the form of the isFootball function. This uses the new logic I posted above:

function isFootball(date) {
return date.getDay() == 1 && date.getHours() == 21;
}

Fixing the feed

Since reworking the format of the getStravaData method in the Strava data file (from returning an object to an array), the feed is now broken. However, as is often the case with a good refactor, this was a blessing in disguise.

It provided an opportunity to tidy up and remove some duplicate logic, simplifying the file.

I made a tweak to the fitness data file above to export the weeklyDataByYear object, in addition to the totalsByYear method used by the charts on the fitness page:

// _data/fitness.js

module.exports = async () => {
..

const weeklyDataByYear = ..

const totalsByYear = ..

return { weeklyDataByYear, totalsByYear };
};

The data contained in this object is tailor-made for the feed. It replaces the need to import individual run and walk data. So instead of importing the Strava data file, we can import the fitness data file:

  // _data/feed.js

- const { getStravaData } = require('./strava');
+ const fitness = require('./fitness');

...

- const { walkDataByDate, runDataByDate } = await getStravaData();
+ const { weeklyDataByYear } = await fitness();

With this data, I was able to remove the reduce statement which summed the data per day:

- rows.forEach(day => {
- [day.walkDistance, day.runDistance] = [walkDataByDate[day.date], runDataByDate[day.date]].
- map(data => (data || []).reduce((total, activity) => total += activity.distance, 0));
- });

And replace the weekly calculation of total distance walked and run with a simple retrieval:

- const totalDistance = (activities) => {
- return Math.round(activities.reduce((total, dist) => total + dist, 0) / 100) / 10;
- }

- const walks = entries.filter(e => e.walkDistance).map(e => e.walkDistance);
+ const walks = weeklyDataByYear[year].walking[week];
- const runs = entries.filter(e => e.date.getDay() != 1).filter(e => e.runDistance).map(e => e.runDistance);
+ const runs = weeklyDataByYear[year].running[week];

return {
..
- walk_distance: walks.length ? totalDistance(walks) : null,
+ walk_distance: Math.round(walks * 10) / 10,
- run_distance: runs.length ? totalDistance(runs) : null
+ run_distance: Math.round(runs * 10) / 10
}

Code and logic simplified. Duplication removed. Fewer places for bugs to hide.

In summary

Here’s what my refactoring achieved:

  • Fix the bug that was causing some runs to be hidden
  • Simplify the fitness logic in my feed script
  • Simplify the logic in my Strava script
  • Improve the readability of my fitness script

If you can fix a bug quickly and succinctly, often that’s the best thing to do. But sometimes, especially when the solution is intertwined with other areas of the codebase, it’s best to take a step back and review the code.

Could other code be rewritten and simplified in order to fix the bug. As well-renowned software engineer Kent Beck famously once said, "for each desired change, make the change easy (warning: this may be hard), then make the easy change”.