Introduction to MongoDB and NoSQL databases (Part I)
MongoDB is a modern document database that has gained significant popularity due to its ability to address the challenges posed by big data and agile software development practices. In today's data-driven world, organizations face an unprecedented volume, velocity, and variety of data, commonly referred to as big data. This surge in data requires advanced database technology that can efficiently handle and scale with the ever-increasing demands.
Traditional relational databases have been the go-to solution for many years, consolidating data into a unified view. However, the reality is that modern applications often require multiple databases per application or task, leading to data inconsistency, different formats, and duplications. Recognizing this need for flexibility, MongoDB embraces the concept of having a database per application, allowing developers to tailor their databases to specific needs within reasonable limits.
Why do we need a document database?
MongoDB was born out of the founders’ experience in building a large-scale, multi-server database for an internet-based advertising platform. They realized that existing database technologies were not well-suited for their requirements, particularly in terms of scalability and high availability. Determined to address these limitations, they set out to create a new database that would be developer-friendly, horizontally scalable, and capable of ensuring continuous availability, even during maintenance.
One of MongoDB’s key features is its horizontal scalability, which enables distributing data across multiple servers. By adding more hardware resources, organizations can handle big data scenarios where data no longer fits on a single machine. This scalability allows MongoDB to adapt to the ever-growing data demands of modern applications.
Another crucial aspect is MongoDB’s support for agile software development practices. Today, the development process is iterative and fast-paced, with frequent releases and shorter development cycles. MongoDB understands the need for databases to keep up with this rapid pace of change. It provides a flexible schema design, allowing developers to evolve their data model as the application evolves, without the need for complex migrations or downtime.
Can MongoDB be considered a highly available database system?
In terms of availability, MongoDB excels by providing continuous uptime, even during maintenance operations. Traditional databases often required significant investment in specialized hardware to achieve high availability. However, MongoDB is designed to be inherently always on and can function on a global scale. It enables organizations to operate 24/7, 365 days a year, regardless of location, ensuring constant accessibility to the database.
By leveraging the accumulated knowledge and best practices from relational databases, MongoDB incorporates familiar concepts and principles while enhancing them to meet the requirements of scalability and high availability. However, it’s important to note that there are some fundamental differences in how MongoDB handles certain operations compared to traditional relational databases. For example, performing joins efficiently in a distributed system can be challenging, and it’s recommended to minimize their use when aiming for horizontal scalability.
To make the most of MongoDB, it’s essential to understand its features, capabilities, and terminology. Additionally, it’s beneficial to be aware of common pitfalls and mistakes to avoid when working with MongoDB. By grasping these aspects and applying best practices, developers can harness the power of MongoDB to build robust and scalable applications that thrive in the world of big data and agile development.
MongoDB vs Relational databases
Dealing with our current situation requires a thoughtful approach. Ideally, we would have an application that doesn’t heavily rely on joins or transactions. This means simplifying our database structure to a single table and performing individual row updates. Let’s consider a straightforward example: a customer database. Each entry contains fields like first name, last name, address, and cell phone number. Working with this data involves modifying or retrieving one person’s information at a time, without impacting others. By maintaining a single table, we eliminate the complexities of data distribution and ensure that all relevant data is co-located.
Documents in document databases, such as MongoDB, are typically represented in JSON (JavaScript Object Notation). JSON provides a human-readable format for visualizing and writing data in document databases, like how CSV simplifies working with tabular databases. Within the JSON representation, we observe field names and their corresponding values for a specific record in MongoDB. The example demonstrates the storage of customer name, location and cars for a record in MongoDB. It’s important to note that the actual data stored in MongoDB is not the JSON text itself but rather appropriately formatted values, such as integers or decimals. Although JSON lacks comprehensive data type support for databases, it serves as a practical visualization tool.
To summarize, document databases like MongoDB offer an alternative to traditional tabular databases. Leveraging nested structures and a document-oriented format like JSON, we can manage complex data relationships without heavy reliance on joins or transactions. This approach enhances flexibility and scalability in data management.
MongoDB Technical Terms
In the SQL (relational) world and the non-relational world (e.g., MongoDB), we find the concept of a “Database” at the top level. In SQL, we have “Tables,” whereas in the non-relational world, specifically MongoDB, we refer to them as “Collections.”
In SQL, each row within a table corresponds to a BSON document in MongoDB. BSON stands for Binary JSON and serves as a storage format. At this point, it’s worth noting that JSON (JavaScript Object Notation) and BSON are closely related, but there are slight differences between them. We will explore these distinctions shortly.
In the SQL world, a column maps to a field within a JSON document. This means that each field within a JSON document can be seen as analogous to a column in a SQL table.
To better understand MongoDB’s technical terminology, let’s explore the following four terms:
- Collection: A collection in MongoDB is equivalent to a table in SQL. It acts as a container for storing related documents.
- Document: In MongoDB, a document represents a record or a data entity. It is similar to a row in a SQL table. Documents in MongoDB are structured using the BSON format, allowing for flexibility in terms of schema and fields.
- JSON (JavaScript Object Notation): JSON is a lightweight data interchange format that is human-readable and easy to parse. It is commonly used for representing structured data in a key-value format. JSON is widely supported and serves as the foundation for BSON.
- BSON (Binary JSON): BSON is a binary representation of JSON documents. It extends the capabilities of JSON by adding additional data types and features optimized for storage and querying. BSON provides enhanced performance and efficient storage for MongoDB’s non-relational data model.
By understanding these technical terms, we can gain a clearer understanding of the differences between SQL and non-relational databases like MongoDB.
BSON Documents: Introduction to MongoDB’s Binary JSON Format
In order to store our data effectively, we utilize a format called BSON (Binary Serialized Object Notation), which offers advantages over traditional binary formats like JSON.BSON allows us to convert objects from any programming language into a stream of bytes that can be easily converted back into an object, regardless of the programming language.
The binary representation begins with the total length of the stored document, indicated by four bytes. Next, we store the data type, represented as a number (e.g., 02 for a stringr). Then we store the field name, followed by a zero byte to mark the end of the field name. Following that, we store the hexadecimal representation of the field value.
We continue this process for each field in the object. For example, if there is a field named “name” with the value “Devang,” we store its data type (02 for string), the field name, the length of the string (four bytes), the text of the string, and a zero byte to mark its end.
The Advantages of BSON
BSON is designed to be a compact and easily traversable binary format.Unlike traditional parsing methods that require reading character by character, BSON’s computer-friendly format allows for efficient storage and retrieval of data.
While storing individual field names in each record increases the storage requirement, MongoDB offers compression capabilities to mitigate this issue. Compression helps reduce storage consumption when the same text is repeated. Additionally, each record in MongoDB is self-describing, eliminating the need for separate definitions for each table or collection. Every record stands alone and contains all the necessary information about its field names, values, and data types.
One advantage of MongoDB is the flexibility it provides regarding record length. Unlike relational databases, MongoDB allows for variations in record length. If a field is not needed or has a null or empty value, it doesn’t need to be stored at all. This is particularly beneficial for sparse tables where numerous null values would be present. In MongoDB, we can omit mentioning the field entirely. Furthermore, if a field can be present without requiring a value, it is assumed to be present with a null value. Essentially, every record in MongoDB contains all possible legal fields, but they are all null if not explicitly specified.
This implicit handling of null values and the ability to add new columns to records without affecting old ones offer convenience and flexibility. Retrieving data from MongoDB is straightforward since implicit fields are considered to have null values by default.