Is there any way other than just nesting a tag array in MongoDB for a Blog-Tag system?

Question

Is there any way other than just nesting a tag array in MongoDB for a Blog-Tag system?

I am trying to write a blog engine for myself with node.js/express/mongodb (also a start to learn node.js). To go a little further than the tutorials on the Internet, I want to add tags support to the blog engine.

I want to do the following things with tags:

Viewers could see all the tags as a tag cloud on a "tag cloud page"
Viewers could see the tags that an article has on article list page and single article page
Viewers are able to click on a single tag to show the article list
What's more, viewers are able to search articles with particluar tags in the SO way: [tag1][tag2] --> /tags/tag1+tag2 --> list of articles that has both tag1 and tag2

In relational database, a post_tag table will be used for this. But how to desgin this in MongoDB?

I have checked MongoDB design - tags
But as efdee comments, the design

db.movies.insert({
  name: "The Godfather",
  director: "Francis Ford Coppola",
  tags: [ "mafia", "wedding", "violence" ]
})

has a problem:

This doesn't seem to actually answer his question. How would you go about getting a distinct list of tags used in the entire movie collection?

That's also my concern: in my design, I need to show a list of all the tags; I also need to know how many articles each tag has. So is there a better way than the design shown above?

My concern with the design above is: if I want to show a list of the tags, the query will go over all the article items in the database. Is there a more efficient way?

node.js
mongodb
nosql

Answer 1

You'd need to create a multi key index on tags to start with.

Then you will be able to find document matching tag using this syntax

db.movies.find({ "tags": { $all : [ /^this/, /^that/ ] }})

Because you're using the ^ (start of string) of the reg ex mongo will still use the index.

To get keyword densities, using the aggregation framework, you could simple get a count.

db.movies.aggregate({ $project: { _id:0, tags: 1}}, 
    { $unwind: "$tags" },
    { $group : { _id : "$tags", occur : { $sum : 1 }}})

Sorry formatting difficult from iPad.

You would end up with collection of docs looking like:

{
   _id: "mytag",
   occur: 383
},
{
   _id: "anothertag",
   occur: 23
},

Using the aggregate command you get an inline result back, so would be down to the client app (or server) to serialise or cache the result if it's frequently used.

Let me know how you get on with that.

Hth

Sam

Answer 2

How would you go about getting a distinct list of tags used in the entire movie collection?

db.movies.distinct("tags")

For efficient queries, I'd probably duplicate data. tags are very unlikely to ever be edited, so I'd put the tags array in the article object, and then also put the tags in a tags collection, and tags has either a count of articles containing that tag, or an array of article ids.

db.movies.insert({
  id: 1,
  name: "The Godfather",
  director: "Francis Ford Coppola",
  tags: [ "mafia", "wedding", "violence" ]
});

db.tags.insert([
   {name: "mafia", movie_count: 1},
   {name: "wedding", movie_count: 1},
   {name: "violence", movie_count: 1}
});

Answer 3

You could perform your 4 tasks using MapReduce functions. For example, for the list of all tags you'd emit the tag as the key and then in the reduce function you'd count them all up and return the count. That would be the route I'd go down. It may require a little more thought, but it's definitely powerful.

http://cookbook.mongodb.org/patterns/count_tags/