Mongo: Storing user-specific data for a document

Question

Mongo: Storing user-specific data for a document

I want to store user-specific data for a document.

There is a collection Task that contains general information but should also contain user-specific information. When querying the API clients should only get the general information and specific information for the requesting user.

I thought about different approaches and finally ended up with the following two. Please share your opinions and suggestions for this kind of problem because I think it's a common one.

Thank you.

A - Embedded Array

Tasks contain an array of subdocuments that contain settings for each user
Before returning a task the array will be replaced by the user-specific object or the object will be merged with the task itself (I prefer the first one so everybody can see what fields are user-specific)

Example

// Task
{
  "title": "brush your teeth",
   ...
  "user_based": [
    {
      "user": "54182b021a944f9c18897642",
      "context": "Work",
      "remind": "2014-09-16T12:41:16.412Z",
      "last_access": "2014-09-16T12:41:16.412Z"
       ...
    }
  ]
}

Problems

Calculation (comparisons) and manipulation is needed
Mongoose supports a way for manipulating objects before they are sent but it can't be used here since the user is unknown at that point

B - UserTask-Collection

Tasks contain general information
UserTasks contain user-specific information and refer to a Task
UserTasks are queried and resolve the Task as a subdocument

Example

// Task
{
  "title": "brush your teeth",
  ...
}

// UserTask
{
  "task": "54282b021a944f9c18897642",
  "context": "Work",
  "remind": "2014-09-16T12:41:16.412Z",
  "last_access": "2014-09-16T12:41:16.412Z"
   ...
}

Problems

Requires resolving references
Possible inconsistency if UserTasks exist without a Task
A Task cannot exist without a UserTask being created what causes other problems

node.js
mongodb
mongoose

Answer 1

In my opinion, approach B fits better with what you want to accomplish.

Since a task request should only return the general information and the requesting user's specific data it will be easier to request for it in a separate collection than if it is contained in a nested array. The problem with inconsistant data can always be solved with some preSave and preRemove hooks to make sure a task has at least one user task and that a user task is always connected to a task.

The problem with approach A will need more code for extracting the user specific data from the nested array and returning that to the user, with approach B, this will be pretty simple and actually rather elegant, let me give you an example:

var getTask = function(req, res, next) {
    Task.findById(req.params.id, function(err, task) {
        if (err) ...
        UserTask.findOne({
            user: req.user._id,
            task: task._id
        }, function(err, userTask) {
            task.userTask = userTask;
            res.json(task);
        })
    })
};

Let me know if this makes sense.

Answer 2

In general I always aim to make retrieving data as simple and fast as possible.

I tend to prefer fewer collections (solution A) as it will reduce the need for resolving references. However, I wouldn't want to manipulate the data before sending it to the client. That opens the door for bugs and maybe security holes.

In this case I would opt for solution B. You can use the mongoose populate feature to reduce the pain of resolving references in mongo.

// loading all tasks for a user
router.get('/users/:id/tasks', function(req, res, next) {
  UserTask.find({_user: req.params.id})
    .populate('_tasks')
    .exec(function(err, tasks) {
      if(err) return next(err);
      res.json(tasks)
  })
})