What's the most efficient way to apply a function to all objects in a collection in MongoDB?

Suppose I want to calculate the "popularity" field for all objects in my collection. It depends on the difference from the current time to the field "submitTime" and the numbers in the field "votes". This operation will run every hour. What's the most efficient way to run a function on all objects? Just an example, it could be any function:

function(){
    this.popularity = this.votes / (Date.now() - this.submitTime);
}

If you want to run a function on all objects and save a popularity score in the original collection, your best approach is going to be iterating all the documents to calculate and save the new score. If you wanted to save to a different collection you could use a MapReduce instead.

If you are open to other ideas on how to calculate popularity, there are more options :).

Improving efficiency

To improve efficiency for your current approach you could:

  • Limit your update criteria to documents that have more than 0 votes (otherwise you'll get divide by zero anyway)
  • Only retrieve the fields you need to calculate popularity, and update the popularity field with a $set rather than re-saving the full document.
  • Update the popularity score when you add an individual vote (avoiding a full recalc of all scores every hour) and then do a less frequent (eg. nightly) recalc of all votes

Alternative approaches

  • Use a popularity metric that can be determined by sorting rather than a calculation. For example: { votes: -1, lastVotedTime: -1, submitTime: -1 }. This may not meet your requirements for ageing out the popularity for old documents, though.

  • Use a numeric popularity metric where events and user actions (eg. article published, user views/votes/, .. ) will add different values of popularity. Over time the popularity decays. The Radioactivity module for Drupal implements this with a rules-based approach.

To implement the latter approach in MongoDB, you could:

  • Add an integer popularity field, where new objects start at a certain value (eg 1000)
  • Have different user actions (new voting, views, etc) increase the popularity counter using $inc by appropriate amounts (for example 50 for a new vote)
  • Use a regularly scheduled job to decrement popularity over time.
  • Since all popularity starts with a positive score and decays to 0 or less, you can limit your update query to documents with >0 popularity.
  • Your could also (ab)use the popularity score to ensure important documents stay popular longer.

There are more nuances to "what is a good popularity metric", and plenty of previous questions on StackOverflow (eg: What formula should be used to determine “hot” questions?).