Data acts as a currency, helping businesses make decisions that reduce operational costs and improve revenue and productivity. In 2020, data generated daily amounted to 1.134 trillion MB, and research estimates this amount to reach over 463 ZB in 2025. Organizations collect this data in numerous ways, such as surveys from social media and websites and other sources like Customer Relationship Management (CRM) and Enterprise Resource Planning (ERP) systems, and therefore require a solution that can store and grant them access to the data when needed. Hence the need for databases.
Databases help store, organize, and manage business data for easier and faster access while providing insights that inform business strategies. Various database systems exist, and businesses choose a database option depending on the data they handle, formats, and volume.
Database systems include relational, non-relational, document, graphical databases, and more. Databases are used to create data warehouses, data marts, and other data solutions for transactions and analytical processing and usually involve a migration strategy.
This article discusses database systems and their implementations for various business use cases.
Relational Database: Managing Customer Data
One of the most popular database systems adopted today is relational database systems like MySQL. As the name suggests, relational databases store data in tabular formats using rows and columns and use foreign keys to form relationships between multiple tables.
The image below represents a simple relational database with customer and shipping tables.
The customer ID, the primary key in the customer table, is linked to the shipping table via the customer ID (foreign key), and analysts can use this customer ID to query and create reports from these tables. Banking, e-commerce, and schools store and manage customer records using relational databases.
For instance, an e-commerce shop selling women’s accessories may have a relational database containing shipping, customer, inventory, and website cart information. An analyst can use the foreign key connecting the customer contact and cart tables to create an email campaign informing customers of discounts on their cart items.
NoSQL Databases: Social Media Platforms
Unlike relational databases, these NoSQL databases manage and store data in non-tabular flexible formats. NoSQL databases store data in formats like JSON key-value pairs, documents, graphs, and in-memory stores. Examples include AmazonDynamoDB, the NoSQL database offered by AWS that stores data in key-value pairs, MongoDB, which stores data as documents in collections; and Redis, the in-memory datastore. In addition, NoSQL databases have a distributed architecture that offers easy scaling, making them ideal for building high-throughput, high-performing, low-latency applications.
Social media applications use NoSQL databases for storing their massive workloads. For instance, the Snapchat Stories feature has an enormous storage write workload and employs DynamoDB because of the database’s ability to store vast amounts of semi-structured and structured data.
NoSQL databases allow you to connect to and integrate with other NoSQL databases to create a robust, flexible storage option for your needs. This flexibility will enable you to store your data and enjoy the benefits offered by multiple databases, thereby improving performance. For example, you can integrate your MongoDB data with data from other sources to create your own data warehouse or data lake using a data integration platform like Streamsets. For example, Streamsets allows you to configure MongoDB as a data origin and destination for your ETL, data integration, and other data management processes.
Object-Oriented Database: CAD Software
Relational databases are insufficient for most complex objects that utilize object-oriented programming like Java, Kotlin, C#, and Node JS. One reason for this limitation is that relational databases lack object data persistence. For instance, object-oriented database systems work with object-oriented databases to ensure that all objects and their properties persist in storage after a program stops executing. In contrast, for relational databases, objects are transient. In addition, object-oriented databases utilize OOP concepts like attributes, methods, classes, and pointers. However, one disadvantage of OOD is its syntax language dependency on the coupled language, which may stifle adoption.
Nevertheless, OODS play a role in applications requiring fast performance and calculations like engineering and architectural systems like computer-aided design(CAD), molecular science, and telecommunications. Apart from PostgreSQL being a relational database system, it is also an Object Relational Database Management System (ORDMS) and supports storing data as objects. PostgreSQL also extends its data types feature by allowing developers to create their custom data types and allowing these custom types to inherit the properties of the objects they relate to. This inheritance enables developers to build complex applications from these custom objects.
For example, a CAD system is a collection of complex objects with different representations at different abstraction levels. OOD helps record each object’s state as it evolves, making it ideal for storing CAD systems.
Graph Database: IoT Systems
Graphical databases help create, navigate, and manipulate relationships between existing data points. They use nodes as data points and edges to show the relationships between them, making them ideal for recommendation engines, social networking, and financial institutions for detecting fraud, where we need to establish relationships between nodes and query these relationships immediately. Popular graph databases include Amazon Neptune, Neo4j, and Giraph. For instance, Amazon Neptune uses its graphical database to identify relationships between data points that may point toward a fraudulent pattern. By extracting data and sending it to Neptune from AWS S3, companies can observe in real-time as users update their info, which creates a cluster of data around each user. Neptune then uses algorithmic models on these clusters to identify fraudulent patterns and, if any, block the user or flag.
Spatial Databases: Navigation Applications
Spatial databases help store and access spatial or geometric data such as coordinates, points, lines, polygons, and topology that define a space or location. They use a spatial index to improve query performance for their databases because the traditional database index becomes insufficient. Organizations can leverage spatial databases to design and consolidate weather information, drone imagery, complex 3D scenes, and indoor information for their applications. Spatial database systems include the ESRI Geodatabase, PostGIS extension for PostgreSQL, and the spatial database supported by Microsoft SQL server.
One widespread use case for spatial databases is designing maps and routing systems for navigation and ride-hailing apps. These navigation apps can consolidate all geometric and traffic data to optimize routes and enable riders to access efficient travel routes to reach their destination faster.
Document Database: CRM Systems
Document databases are non-relational databases that store data as JSON key-value pairs and offer a flexible way to store and query semi-structured data. A popular document database is MongoDB. Document database systems are intuitive, easy to work with, and provide a flexible schema that allows your data models to evolve as business needs change. The flexible model offered by document databases means all documents in a collection can have different types. However, MongoDB also offers schema validation to ensure no unintended data types are present in your collection.
Document databases are efficient at storing and accessing complex customer records due to their flexible data model. Analysts and business intelligence teams can easily query these records for customer segmentation and personalizing customer experiences. Querying MongoDB records is easy because all information about a user can be stored in a single document and not across multiple tables. For example, a document containing user information can harbor information of various types, like the user record below:
- The unique user ID, an integer
- The user’s first name, a string
- the email, also a string
- The social security number, an integer
- The user’s credit card info is a list housing two objects.
{ "_id": 5672, "first_name": "Anna", "email": "anna@example.com", "SSN": 6483028726782, "credit_card": [ { "Provider": "CJK bank", "status": "Active", "date_issued": { "$date": "2019-11-17T04:00:00Z" }, { "Provider": "KJC bank", "status": "Active", "date_issued": { "$date": "2021-03-11T04:00:00Z" } } ] } |
Time-Series Databases: Sports Analytics
Time-series databases store time-series or time-stamped data. You need more than a simple database for time-series data (collected, measured, and aggregated over time). Time-series database infrastructure must incorporate data lifecycle management, summarization, time-stamp data storage and compression, and the ability to handle large volumes of time-stamp data as it accumulates quickly.
Prometheus and TimeScale are two popular time-series databases. Prometheus has a functional query language called PromQL that lets you query aggregated time series data streams and shows results in tabular or graphical forms or consumed via an API.
Time-stamped data are crucial for performing analytics over a period of time. For example, a football game that records only set specific times(say every 10mins) during a match may lose significant insights from players at other times. However, with a time-series database, every second gets recorded, and coaches can use the analytics from these time-stamped records to analyze player performance, take notes of playing styles, and develop new strategies.
Examples of Popular Database Software
Depending on your organizational needs, various database systems are available for your adoption. These options include open-source and commercial database systems with features and weaknesses. Let’s look at some of the most popular database systems:
- MySQL: MySQL is a relational database system built on SQL that stores data in tables, with each table consisting of rows and columns. MySQL is often referred to as the most popular database due to its open-source nature, reliant ACID transactions, stability, rich features, and extensive support from Oracle. Popular social media applications like Twitter, Youtube, and Facebook use MySQL for their backend services.
- PostgreSQL: This is another powerful open-source object-relational database system that extends on the SQL language and is known for its reliability, numerous features, high performance, support, and extensibility to improve functionality. One of its most powerful add-ons is PostGIS, a geospatial database for storing, indexing, and querying geographical data. PostgreSQL is similar to MySQL in numerous ways, like their ACID-compliant nature and supported operated systems, but they differ in performance and architecture.
- Oracle: Oracle is a relational database management software that provides a commercial all-in-one database management solution for housing your data marts, warehouses, and data lakes according to your application requirements. Oracle has numerous editions, is highly available, scalable, and offers failure recovery.
- MongoDB: MongoDB is a famous open-source document, non-relational, flexible database that stores data as documents in JSON, XML, and BSON formats. Developers use APIs or query languages to execute CRUD operations in MongoDB. MongoDB utilizes a highly-distributed architecture, which eases scaling and offers resiliency via replicating its nodes.
- Redis: Redis is an in-memory, non-relational open-source data store for creating databases, caches, and streaming engines. It provides horizontal scalability, availability, and data persistence to permanent storage in case of system reboots.
How StreamSets Makes You Database-Agnostic
Your organizational database needs vary and may change from time to time. StreamSets is data source-agnostic, meaning you can easily swap data sources and destinations in and out of data pipelines. With StreamSets, you don’t have to be married to one particular data architecture, database, or cloud provider. You are free to design and select your data architecture according to your business needs rather than be influenced by rigid platform requirements or the preferences of third-party advertisers. Your pipelines, your databases, your choice. To see how this flexibility works, take a look at this example showing how StreamSets can help optimize a migration pattern from MySQL database to Snowflake. Explore how StreamSets can fit into your organization’s data strategy with StreamSets Academy or our community.
The post Seven Real-Life Database Examples appeared first on StreamSets.