I’m building an app, and this is my first time working with databases. I went with MongoDB because originally I thought my data structure would be fitting for it. After more research, I’ve become a bit lost in all the possible ways I could structure my data, and which of those would be best for performance vs best for my DB type (currently MongoDB, but could change to PostgreSQL). Here are all my data structures and iterations:
Note: I understand that the “Payrolls” collection is somewhat redundant in the below example. It’s just there as something to represent the data hierarchy in this hypothetical.
Original Data Structure
The structure here is consistent with what NoSQL is good at, quickly fetching everything in a single document. However, I intend for my employee object to hold lots of data, and I don’t want that to encroach on the document size limit as a user continues to add employees and data to those employees, so I split them into a separate collection and tied them together using reference (object) IDs:
Second Data Structure
It wasn’t long after that I wanted to be able to manipulate the clients, locations, departments, and employees all independent of one another, but still maintain their relationships, and I arrived at this iteration of my data structure:
Third and Current Data Structure
It was at this point that I began to realize I had been shifting away from the NoSQL philosophy. Now, instead of executing one query against one collection in a database (1st iteration), or executing one query with a follow-up population (2nd iteration), I was now doing 4 queries in parallel when grabbing my data, despite all the data being related tied to each other.
Is my first data structure suitable to continue with MongoDB? If so, how do I compensate for the document size limit in the event the employees field grows too large?
Is my second data structure more suitable to continue with MongoDB? If so, how can I manipulate the fields independently? Do I create document schemas/models for each field and query them by model?
Is my third data structure still suitable for MongoDB, or should I consider a move to a relational database with this level of decentralized structure? Does this structure allow me any more freedom or ease of access to manipulate my data than the others?
Your question is a bit broad, but I will answer by saying that MongoDB should be able to handle your current data structure without too much trouble. The maximum document size for a BSON Mongo document is 16MB (q.v. the documentation). This is quite a lot of text, and it is probably unlikely that, e.g., an employee would need 16MB of storage.
In the event that you need a single transaction per object to occupy more than the 16MB BSON maximum, you may use GridFS. GridFS uses special collections (
chunks) which do not have any storage limit (other than the limit of maximum database size). With GridFS, you may write objects of any size, and MongoDB will accommodate the operations.