Being relatively new in the NoSQL paradigm I've been debating on two approaches which would be more efficient and performant, especially in terms of scale.
So in essence I wish to generate a summary table, e.g:
Schema1 = {
field1: String,
specialVal: String
}
TotalTable = {
specialValName: String,
count: Number
}
What I want to do is keep track of each 'value' that SpecialVal can take and have a count of how many of those documents in Schema1 or across other docs exist.
I had two approaches in mind, one is keep pretty much a schema structure as above and use a Map Reduce program to run and update the statistics.
Two, I am using Mongoose as an ORM layer, I could use a Ref ID's to reference another schema that describes all the 'SpecialVal' and for each insert of Schema1, I could retrieve the a document from 'SpecialValSchema' and increment the count. E.g.:
Schema1 = {
field1 = String,
SpecialVal = {
type: Schema.ObjectId,
ref: 'SpecialVals'
}
SpecialVals = {
name: String,
count: 0
}
So I am debating and wanted to get the views of people in StackOverflow for this. Would an insert that requires at least two queries, one for insert and one for retrieving the SpecialVal document and increasing be more efficient then a MapReduce program that tally's this up?
Having everything embedded is nice as you have the values upfront immediately. Using references to ObjectId is neat in the sense all statistics are updated immediately but there is an overhead to populate and retrieve the document on every Schema doc retrieval and an overhead on insert.
I know this is probably something minor but I'm sure somebody had probably debated the same thing right?