Outsourcing the ranking algorithm

Yakread is architected in such a way that it may not be all that difficult to let users supply their own ranking algorithm. Specifically, Yakread’s ranking algo is divided into two separate bits:

  • the “main” algorithm takes all the items from a specific user’s content sources (e.g. their subscriptions, bookmarks, etc) and selects 25 of those items, storing them in a queue. The algorithm takes that users’ usage history (which links they clicked on previously, any thumbs up/thumbs down they gave) into account. Whenever the user clicks on one of the reading recommendations, the next 5 items are popped off the queue. Whenever the queue gets low, the algorithm generates a new queue of 25 items. Importantly, this is a fairly resource-light operation because we’re only dealing with data from a single user.
  • the “discovery” algorithm is structured similarly: there is a separate queue maintained for each user. The difference is that these are items not from the current user’s content sources: instead, they’re items that were liked by other Yakread users. The main algorithm uses this “discovery queue” as one of its content sources. From the main algorithm’s point of view, the discovery queue could be replaced with, say, an RSS feed–at least if the RSS feed had a way to add additional items on demand.

We could allow users to supply their own implementations/providers for either of these. The main algorithm would be particularly fun to tinker with I think. Users could upload code and we could run it in a sandbox whenever we need to update their queue (I’ve been investigating Fly Machines as a potential sandbox for another application).

The discovery algorithm may or may not be more difficult to run in that way. It has to crunch data from everyone. That might not be as easy to scale. However, the discovery algorithm could be replaced with an external source. If users want to develop their own external API for generating recommendations, we could call that whenever we need to top up the discovery queue.

This would definitely be fun. I’ve been unsure about whether it would have any business value. But this morning I realized it might: if we reach the point where we have a bunch of technical users (especially data science people) who would enjoy tinkering with the algorithms and I’m still operating as a one-person company, then I’m guessing Yakread’s users would be able to come up with better algorithm implementations than me. That would be extremely helpful for me. (And if it works well for the main algorithm, maybe it’d be worth figuring out how to let people hack on the discovery algorithm).

Maybe I could pay the authors of any algorithm implementations that I end up adopting for everyone/more people. E.g. run split tests across the user base with all the algorithms that people supply. The best ones get ran more often, and I pay some portion of Yakread’s revenue to algorithm authors based on how often I use their algorithms.