I'm confused about designing mongo databases (or collection based databases)

When designing SQL databases, there are certain clear rules (normalization rules).

So, given this model:

  • There are places
  • There are events
  • Places have name, latitude, longitude
  • Events have name, startDate, endDate
  • Each event is hosted in one place (so a place has many events)

It is easy to design it this way in SQL, with two tables:

Places(placeId, name, lat, lng)
Events(eventId, placeId, name, startDate, endDate)

It is almost undebatable that this is correct. Not many alternatives...

With this design, I have some benefits:

  • If I update a place name, I don't have to care about events.. the join does the job
  • I can forget about places and work with events (for example: show events sorted by start date)
  • I can forget about events and work with places (show events sorted by distance to some user, etc)

Now, I'm trying to design this same model in MongoDb, and I'm not sure about it, because there are many alternatives to achieve this:

  • Two collections: places not containing events, and events not containing places (but keeping a placeId field)

    var placeSchema = new mongoose.Schema({
        name: String,
        lat: Number,
        lng: Number,
    });
    
    var eventSchema = new mongoose.Schema({
        name: String,
        startDate: Date,
        endDate: Date,
        placeId: mongoose.Schema.Types.ObjectId,
    });
    
  • Two collections: places containing events, and events not containing places (but keeping a placeId field)

    var placeSchema = new mongoose.Schema({
        name: String,
        lat: Number,
        lng: Number,
        events: [eventSchema]
    });
    
    var eventSchema = ...
    
  • Two collections: places containing events and events containing places (lot of duplicated data in different documents)

  • One collection of places containing events

    var placeSchema = new mongoose.Schema({
        name: String,
        lat: Number,
        lng: Number,
        events: [eventSchema]
    });
    

In all above alternatives I can imagine troubles.. If I need to update a name (being a place name or an event name), I need to update it in more than one document. For example in the "one collection of places" approach, an event doesn't contain a place.. so If I pass an event object alone to my view, the view won't know anything about the place that contains the event!

So, my questions are two:

  1. What is a good design for this model?

  2. I simplified the model for the sake of this post.. but in reality events have tags and both events and places have pictures. I could describe the entire model here but that would be abusive. I want to know rules for designing mongoDB databases (document based databases). Normalization rules are clear about how to apply them and what are their benefits (which are mostly maintenance, eg. clean code when updating, etc). Are there clear rules here?. Also, these (normalization) rules are very intuitive and by skipping literature one could do it the right way anyway.

Well, it really depend on what you want from your schema.

One thing first: The code that you wrote for "Two collections: places containing events, and events not containing places (but keeping a placeId field)" should be in this way:

var placeSchema = new mongoose.Schema({
    name: String,
    lat: Number,
    lng: Number,
    events: [mongoose.Schema.Types.ObjectId]
});

var eventSchema = ...

Because mongodb don't have joins, usually you have to denormalize until you have a good reason for normalizing.

In what you want to do, if your places may have lots of events, it's good to get events in paging mode or something like that to keep your query small. In that situation I prefer "Two collections: places containing events, and events not containing places". But if your places have at most few events, I'll keep them all in one collection. Also if you want to access events directly, it could be better if you separate them into two collections, and if you need, put name and _id of place in event collection (don't worry about duplications so much, most of times, speed and small queries are what you have to care about).

And for updates, mongodb provides good tools for updating. So for example, if you make changes rarely (for example changing place name), don't normalize it.

One more thing, mongoose provides good tools such as population.