I have a collection of millions of GPS events and a collection of a few user defined geofences. The geofences are geoJSON polygons.
My mission is to allow my users to select a geofence, a time range, and then generate a report that shows when their vehicle entered the geofence, and then when it left.
The first approach I've taken is quite simple and doesn't take advantage of any of the geospatial features of MongoDB:
Of course this is probably the most inefficient way to do this, and sure enough, if I increase the date range beyond a couple of days, the time it takes becomes too long.
There are 2 other approaches I'm thinking about:
Rather than piping all events through the above process, limit the list of GPS events to those that are near the geofence first. For example, if I can work out the bounding box of the of the geofence polygon, I can figure out a maxDistance for a $geoNear query. This should significantly reduce the number of events that need to be processed using the same approach as above.
Use a $geoWithin query to select only the events that are within the geofence. The problem with this is it makes it more difficult to determine when the vehicle entered or exited. I'd have to rely on some time threshold to determine the sessions when the vehicle was inside the geofence.
At this stage, I'm leaning towards approach #1, but I'd really like to hear how other people would go about solving this problem using MongoDB - in particular using aggregation pipeline or mapreduce.