Mongo: Storing user-specific data for a document

I want to store user-specific data for a document.

There is a collection Task that contains general information but should also contain user-specific information. When querying the API clients should only get the general information and specific information for the requesting user.

I thought about different approaches and finally ended up with the following two. Please share your opinions and suggestions for this kind of problem because I think it's a common one.

Thank you.

A - Embedded Array

  • Tasks contain an array of subdocuments that contain settings for each user
  • Before returning a task the array will be replaced by the user-specific object or the object will be merged with the task itself (I prefer the first one so everybody can see what fields are user-specific)

Example

// Task
{
  "title": "brush your teeth",
   ...
  "user_based": [
    {
      "user": "54182b021a944f9c18897642",
      "context": "Work",
      "remind": "2014-09-16T12:41:16.412Z",
      "last_access": "2014-09-16T12:41:16.412Z"
       ...
    }
  ]
}

Problems

  • Calculation (comparisons) and manipulation is needed
  • Mongoose supports a way for manipulating objects before they are sent but it can't be used here since the user is unknown at that point

B - UserTask-Collection

  • Tasks contain general information
  • UserTasks contain user-specific information and refer to a Task
  • UserTasks are queried and resolve the Task as a subdocument

Example

// Task
{
  "title": "brush your teeth",
  ...
}

// UserTask
{
  "task": "54282b021a944f9c18897642",
  "context": "Work",
  "remind": "2014-09-16T12:41:16.412Z",
  "last_access": "2014-09-16T12:41:16.412Z"
   ...
}

Problems

  • Requires resolving references
  • Possible inconsistency if UserTasks exist without a Task
  • A Task cannot exist without a UserTask being created what causes other problems

In my opinion, approach B fits better with what you want to accomplish.

Since a task request should only return the general information and the requesting user's specific data it will be easier to request for it in a separate collection than if it is contained in a nested array. The problem with inconsistant data can always be solved with some preSave and preRemove hooks to make sure a task has at least one user task and that a user task is always connected to a task.

The problem with approach A will need more code for extracting the user specific data from the nested array and returning that to the user, with approach B, this will be pretty simple and actually rather elegant, let me give you an example:

var getTask = function(req, res, next) {
    Task.findById(req.params.id, function(err, task) {
        if (err) ...
        UserTask.findOne({
            user: req.user._id,
            task: task._id
        }, function(err, userTask) {
            task.userTask = userTask;
            res.json(task);
        })
    })
};

Let me know if this makes sense.

In general I always aim to make retrieving data as simple and fast as possible.

I tend to prefer fewer collections (solution A) as it will reduce the need for resolving references. However, I wouldn't want to manipulate the data before sending it to the client. That opens the door for bugs and maybe security holes.

In this case I would opt for solution B. You can use the mongoose populate feature to reduce the pain of resolving references in mongo.

// loading all tasks for a user
router.get('/users/:id/tasks', function(req, res, next) {
  UserTask.find({_user: req.params.id})
    .populate('_tasks')
    .exec(function(err, tasks) {
      if(err) return next(err);
      res.json(tasks)
  })
})