MemSQL is a relational
database management system with a SQL interface and, it stores data in memory
and runs in a cluster.
1.
What is MemSQL?
MemSQL is a relational
database management system with a SQL interface. It has the following
additional properties:
·
Distributed architecture: MemSQL typically runs on a
“cluster” of servers. However, users address a single interface to get data in
or out of MemSQL. MemSQL handles details like which servers store certain
volumes of data, and distributed query execution logic. Users benefit from the
resources of many servers without needing distributed systems expertise.
·
Memory-optimized: MemSQL gives users the option
to read and write data directly to and from main memory. Accessing data in DRAM
is orders of magnitude faster than disk or even flash/SSD, and this provides
extreme performance benefits for high-throughput transaction processing and
real-time analytics. However, MemSQL does not require that all data fit in
memory, and includes the option to store data, in a compressed format, on
flash/SSD/disk.
·
Software only: MemSQL is distributed as
software, and can run in variety of Linux environments including on physical
hardware (“bare metal”), in virtual machines, or in software containers. It can
be run on premises, in your data center, or in a public cloud. Unlike legacy
database vendors that require the use of proprietary hardware, MemSQL can run
on commodity servers.
2.
Where does MemSQL store its data?
When creating a table in
MemSQL, you specify whether the data for that table will reside in memory or on
disk.
MemSQL has two table types:
rowstore tables and columnstore tables.
Rowstore tables are
primarily used for operational or transactional workloads, particularly
workloads that require rapid updates. MemSQL rowstore tables store data
entirely in memory, with logs and full database snapshots written to disk for
durability.
Columnstore tables, on the
other hand, are typically used for analytical workloads and data warehousing,
as the format naturally lends itself to compression, efficient scanning, and
rapidly appending data. MemSQL columnstore tables store data primarily on disk,
but cache data in memory when possible.
3.
How does MemSQL ensure durability for data stored in memory?
MemSQL writes logs and full
database snapshots to disk, which can be used to recover state in the event
that a machine turns off. MemSQL writes logs as data changes, and periodically
triggers a full backup of the data in memory (snapshot). Users can configure
the frequency of full database snapshots.
MemSQL supports transaction
processing, and exposes parameters for the user to tune performance. The most
common question we get is how do we ensure durability when processing
transactions in memory. In MemSQL, a transaction is “committed” when it has
been written to the in-memory transaction buffer. MemSQL keeps a thread running
in the background that is constantly writing blocks of data from the in-memory
transaction buffer to disk as logs. The size of the transaction buffer is
configurable. For instance, setting the size of the transaction buffer to 0 MB
means that a transaction will not be committed until it has been logged to
disk. In practice, it is not necessary to set the transaction buffer to 0 MB
since using a buffer allows MemSQL to write large chunks of data to disk all at
once, and because doing so requires only sequential I/O.
4. How is MemSQL
architected?
A MemSQL cluster consists
of two types of nodes: aggregator and leaf nodes. Client applications connect
to an aggregator, which serves as the query router. Aggregators are aware of
the entire cluster and know where specific data reside. Leaf nodes handle data
storage and most of the computation during query execution. When the client
sends a query, the aggregator splits it into several queries which are sent to
each leaf node. The leaf computes its query and sends the result back to the
aggregator. The aggregator consolidates the results from each leaf and sends
the final result back to the client.
Every cluster has a special
aggregator called the Master Aggregator (non-Master aggregators are called
child aggregators). Any aggregator, master or child, can process data
manipulation language (DML) queries including SELECT, INSERT, UPDATE, and
DELETE. Data definition language (DDL) commands, such as CREATE, DROP, or ALTER
TABLE, must run on the Master Aggregator. In the event that the Master
Aggregator machine fails, an administrator can “promote” a child aggregator to
Master.
Database administrators can
add (and remove) nodes to the cluster at any time while keeping the cluster
online, even while running a workload. Simply provision additional nodes, then
add them to the MemSQL cluster. The easiest way to do this is through MemSQL
Ops, but it can also be accomplished by sending commands to the Master
Aggregator.
5. How does MemSQL
licensing work? How do I know how many machines I will need in my cluster?
MemSQL is licensed based on
the total RAM capacity of a cluster (the sum of the RAM in each
server/VM/container in the cluster). Unlike some vendors in the database space,
we don’t license based on number of CPU cores. This allows our customers to
maximize CPU resources without licensing overhead.
When planning, note that
the cluster must have enough RAM for the following:
·
Rowstore data storage in memory
·
Rowstore and columnstore query execution
·
The operating system (kernel)
MemSQL enables several types of workloads ranging
from stream processing and real-time analytics, to transaction processing, to
data warehousing. Each of these workloads has different performance
characteristics and latency requirements.
No comments:
Post a Comment