What is the best practice for mongoDB to handle 1-n n-n relationships?

In relational database, 1-n n-n relationships mean 2 or more tables. But in mongoDB, since it is possible to directly store those things into one model like this:

Article{
  content: String, 
  uid: String,
  comments:[Comment]
}

I am getting confused about how to manage those relations. For example, in article-comments model, should I directly store all the comments into the article model and then read out the entire article object into JSON every time? But what if the comments grow really large? Like if there is 1,000 comments in an article object, will such strategy make the GET process very slow every time?

I am by no means an expert on this, however I've worked through similar situations before.

From the few demos I've seen yes you should store all the comments directly in line. This is going to give you the best performance (unless you're expecting some ridiculous amount of comments). This way you have everything in your document.

In the future if things start going great and you do notice things going slower you could do a few things. You Could look to store the latest (insert arbitrary number) of comments with a reference to where the other comments are stored, then map-reduce old comments out into a "bucket" to keep loading times quick.

However initially I'd store it in one document.

So would have a model that looked maybe something like this:

Article{    
    content: String, 
    uid: String,
    comments:[
        {"comment":"hi", "user":"jack"},
        {"comment":"hi", "user":"jack"},
    ]
    "oldCommentsIdentifier":12345
}

Then only have oldCommentsIdentifier populated if you did move comments out of your comment string, however I really wouldn't do this for less then 1000 comments and maybe even more. Would take a bit of testing here to see what the "sweet" spot would be.

I think a large part of the answer depends on how many comments you are expecting. Having a document that contains an array that could grow to an arbitrarily large size is a bad idea, for a couple reasons. First, the $push operator tends to be slow because it often increases the size of the document, forcing it to be moved. Second, there is a maximum BSON size of 16MB, so eventually you will not be able to grow the array any more.

If you expect each article to have a large number of comments, you could create a separate "comments" collection, where each document has an "article_id" field that contains the _id of the article that it is tied to (or the uid, or some other field unique to the article). This would make retrieving all comments for a specific article easy, by querying the "comments" collection for any documents whose "article_id" field matches the article's _id. Indexing this field would make the query very fast.

The link that limelights posted as a comment on your question is also a great reference for general tips about schema design.

But if solve this problem by linking article and comments with _id, won't it kinda go back to the relational database design? And somehow lose the essence of being NoSQL?

Not really, NoSQL isn't all about embedding models. Infact embedding should be considered carefully for your scenario.

It is true that the aggregation framework solves quite a few of the problems you can get from embedding objects that you need to use as documents themselves. I define subdocuments that need to be used as documents as:

  • Documents that need to be paged in the interface
  • Documents that might exist across multiple root documents
  • Document that require advanced sorting within their group
  • Documents that when in a group will exceed the root documents 16meg limit

As I said the aggregation framework does solve this a little however your still looking at performing a query that, in realtime or close to, would be much like performing the same in SQL on the same number of documents.

This effect is not always desirable.

You can achieve paging (sort of) of suboducments with normal querying using the $slice operator, but then this can house pretty much the same problems as using skip() and limit() over large result sets, which again is undesirable since you cannot fix it so easily with a range query (aggregation framework would be required again). Even with 1000 subdocuments I have seen speed problems with not just me but other people too.

So let's get back to the original question: how to manage the schema.

Now the answer, which your not going to like, is: it all depends.

Do your comments satisfy the needs that they should separate? Is so then that probably is a good bet.

There is no best way to this. In MongoDB you should be designing your collections according to application that is going to use it.

If your application needs to display comments with article, then I can say it is better to embed these comments in article collection. Otherwise, you will end up with several round trips to your database.

There is one scenario where embedding does not work. As far as I know, document size is limited to 16 MB in MongoDB. This is quite large actually. However, If you think your document size can exceed this limit it is better to have separate collection.