Problem
I have a CSV file with some data like below.
PK,title,year,length,budget,rating,votes,r1,r2,r3,r4,r5,r6,r7,r8,r9,r10,mpaa,Action,Animation,Comedy,Drama,Documentary,Romance,Short
1,$,1971,121,NA,6.4,348,4.5,4.5,4.5,4.5,14.5,24.5,24.5,14.5,4.5,4.5,,0,0,1,1,0,0,0
2,$1000 a Touchdown,1939,71,NA,6,20,0,14.5,4.5,24.5,14.5,14.5,14.5,4.5,4.5,14.5,,0,0,1,0,0,0,0
The CSV file is around 5MB in size and has around 58,000+ lines like the sample above.
Current Scenario
Currently I am parsing the above data and converting it to objects and saving to MongoDB in an array of the objects. Something like below
{ PK: '1',
title: '$',
year: '1971',
length: '121',
budget: 'NA',
rating: '6.4',
votes: '348',
r1: '4.5',
r2: '4.5',
r3: '4.5',
r4: '4.5',
r5: '14.5',
r6: '24.5',
r7: '24.5',
r8: '14.5',
r9: '4.5',
r10: '4.5',
mpaa: '',
Action: '0',
Animation: '0',
Comedy: '1',
Drama: '1',
Documentary: '0',
Romance: '0',
Short: '0' }
{ PK: '2',
title: '$1000 a Touchdown',
year: '1939',
length: '71',
budget: 'NA',
rating: '6',
votes: '20',
r1: '0',
r2: '14.5',
r3: '4.5',
r4: '24.5',
r5: '14.5',
r6: '14.5',
r7: '14.5',
r8: '4.5',
r9: '4.5',
r10: '14.5',
mpaa: '',
Action: '0',
Animation: '0',
Comedy: '1',
Drama: '0',
Documentary: '0',
Romance: '0',
Short: '0' }
Although, when I upload the file I receive the Document exceeds maximum allowed bson size of 16777216 bytes error.
I tried using GridFS. The file is getting uploaded and the chunks are prepared fine as well. But, I am not sure if I can retrieve the data back as an array of objects.
I need to retrieve all the data to crunch and create some analysis.
The CSV file is one source of data. Another source would be getting data from a web service on a proprietary system where the same process is followed.
Question
I think there is a flaw in the data model and in the way I am saving the data to MongoDB. If yes, then what is the optimal way to handle the large amount of data.
Would really appreciate any help.