Database Server Role Design Considerations (Take Two) - server - Fedora mailing-lists

15 Dec 2014


      Intentionally breaking the thread here. I realize that my initial email
made a lot of assumptions, so I'm going to try to take this back to the
drawing board and have us identify the goals of the Database Server
before we attempt to define an implementation.
So first let's agree on the intention here: the purpose of a Database
Server Role is to provide a place for end-users to store data in an
SQL-compliant format. We (the Server SIG) consider this to be core
infrastructure functionality that will be needed by many (nee. most)
applications deployed in a user's environment.
Requirements from a user perspective:
      * I must be able to create one or more databases for my
        applications
      * I must be able to set up users with some level of privilege for
        the individual databases
      * My database must be able to speak some dialect of SQL
      * I want to be able to use the same Fedora machine/VM for running
        multiple databases
      * I want to be able to replicate my databases between Fedora
        machines
      * I want to have regular, automated backups of my databases.
By its very nature, PostgreSQL (and MariaDB and many others) can support
multiple databases running inside of a single service on a system. One
common approach to reducing resources is to host several such databases
on one physical or virtual machine (or cluster of machines), rather than
assuming the overhead of running an additional operating system to host
the database server.
On the other side of this, there are those who prefer to isolate their
databases entirely from one another, so that a flaw found in one cannot
escalate to revealing data about the others. There are generally two
approaches that people take to this these days. The most secure one is
to run the database on separate physical or virtual hardware. The
second, more recent approach is to run on the same node but inside a
different container environment (such as Docker or rocket). This
approach has its own advantages and disadvantages.
When we designed rolekit, we designed it to have the concept of role
"instances". Our first role, the Domain Controller, allows only a single
instance at present (because it really doesn't make sense to run more
than one Domain Controller on a single machine). However, as we see
above, it *may* in fact make sense to run multiple instances of the
Database Server Role through rolekit. There are several potential
approaches we could take:
     1. We run a single database service on the local machine, using
        PostgreSQL's built-in functionality. The first time a Database
        Server Role is deployed on the system, it sets up the service
        and then creates a rolekit instance associated with the specific
        database on the server. This rolekit settings associated with
        this instance will apply to that database only. The difficulty
        of this approach is that rolekit will not be able to start or
        stop the instances of the database separately. We will need to
        start or stop them all together.
     2. We run a single database service on the local machine with
        multiple databases as above, but instead of using rolekit's
        instancing technology, we treat it as a single role, with more
        complicated deploy/redeploy settings that controls all databases
        running on the system. This approach works around the start/stop
        issue above, but it makes it more difficult to apply settings
        (and creating a new database results in a redeploy call, which
        would seem like an odd user experience).
     3. We run multiple database services in individual containers on
        the system. Each of these services is provided by a rolekit
        instance and is a full, isolated copy of the PostgreSQL
        processes. Each of these databases will need to operate on a
        separate port (or the same port on separate interfaces, etc.)
     4. We run multiple database services in individual Fedora Cloud VMs
        that we spin up and kickstart into being and manage through
        rolekit (talking to libvirt and OpenLMI on the VM). This is a
        very complicated approach and will not be feasible in the F22
        timeframe.
Option 4) above is actually kind of a special-case of 1) and 2) except
that we're managing the creation of a new VM (and need to add the
OpenLMI remoting capabilities). This might be an interesting project
down the road, but I think we can rule it out for Fedora 22 Server,
particularly with the slightly-shortened schedule.
Option 2) provides a workaround to certain peculiarities of how rolekit
works, but I personally don't think that the user-experience trade-off
is worthwhile. I'm open to arguments, of course.
So, from my perspective, Option 1) and Option 3) are the most
interesting. Option 1) is definitely the more traditional approach (and
probably easier to get up and running quickly), but Option 3) has a bit
of a security advantage at the cost of some additional management
difficulty. So this is the part where I end this email and ask for
feedback. Thank you for reading this far.