As some of you know, some of us (mostly smooge and I) have been working on statistics gathering for Fedora web infra as of late.
We are working on setting up a piwik instance to be used on our websites. However, this instance lives in the cloud and thus can't hit the VPN.
A question I ran into and was told to email the list about and/or bring up at the meeting, is: What should our backups story look like for such cases? Mostly, we have a MySQL database on this node that will get really big fairly quickly (right now /var/lib/mysql is about 1.1GB and piwik only has one site added, and it's only been there for a few weeks).
Are other cloud instances backed up in any way, and if so how was it done? If not, how do we go about coming up with a plan for doing this, for this instance and other instances in the future that end up having similar requirements?
(Smooge can weigh in if I'm forgetting any important details here).
-Ricky
On 9 May 2016 at 23:12, Ricky Elrod codeblock@elrod.me wrote:
As some of you know, some of us (mostly smooge and I) have been working on statistics gathering for Fedora web infra as of late.
We are working on setting up a piwik instance to be used on our websites. However, this instance lives in the cloud and thus can't hit the VPN.
A question I ran into and was told to email the list about and/or bring up at the meeting, is: What should our backups story look like for such cases? Mostly, we have a MySQL database on this node that will get really big fairly quickly (right now /var/lib/mysql is about 1.1GB and piwik only has one site added, and it's only been there for a few weeks).
Are other cloud instances backed up in any way, and if so how was it done? If not, how do we go about coming up with a plan for doing this, for this instance and other instances in the future that end up having similar requirements?
I believe in the past we would do a mysqldump of the schemas and then have an rsync target that would allow only a small set of boxes (bapp01? backup02?) to get the data from the 'dump' directory. Then the server would do a regular dump of the data, bapp01 and some other box would rsync the data from that tree and then it would be part of the daily backups of bapp01.
I am not sure how workable that is these days.
(Smooge can weigh in if I'm forgetting any important details here).
-Ricky
infrastructure mailing list infrastructure@lists.fedoraproject.org http://lists.fedoraproject.org/admin/lists/infrastructure@lists.fedoraprojec...
On Mon, 9 May 2016 23:12:46 -0400 Ricky Elrod codeblock@elrod.me wrote:
As some of you know, some of us (mostly smooge and I) have been working on statistics gathering for Fedora web infra as of late.
We are working on setting up a piwik instance to be used on our websites. However, this instance lives in the cloud and thus can't hit the VPN.
A question I ran into and was told to email the list about and/or bring up at the meeting, is: What should our backups story look like for such cases? Mostly, we have a MySQL database on this node that will get really big fairly quickly (right now /var/lib/mysql is about 1.1GB and piwik only has one site added, and it's only been there for a few weeks).
Are other cloud instances backed up in any way, and if so how was it done? If not, how do we go about coming up with a plan for doing this, for this instance and other instances in the future that end up having similar requirements?
Yes, we back up serveral things in the cloud, and it's done the same way as all our other backups. ;)
Basically backup01 does a git checkout of the ansible repo, looks at inventory/backups for the list of backup_clients, then it runs rdiff-backups over /etc, /home, and any additional dirs specified in the hosts vars.
For databases, we have a script that does a daily db dump to /backups and xz compresses it, and we back that up with rdiff-backup.
How hard is the data to regenerate? ie, if we are keeping the logs that makes the data, we could in theory regen it? Or would that be too difficult?
kevin
On 10 May 2016 at 14:41, Kevin Fenzi kevin@scrye.com wrote:
On Mon, 9 May 2016 23:12:46 -0400 Ricky Elrod codeblock@elrod.me wrote:
As some of you know, some of us (mostly smooge and I) have been working on statistics gathering for Fedora web infra as of late.
We are working on setting up a piwik instance to be used on our websites. However, this instance lives in the cloud and thus can't hit the VPN.
A question I ran into and was told to email the list about and/or bring up at the meeting, is: What should our backups story look like for such cases? Mostly, we have a MySQL database on this node that will get really big fairly quickly (right now /var/lib/mysql is about 1.1GB and piwik only has one site added, and it's only been there for a few weeks).
Are other cloud instances backed up in any way, and if so how was it done? If not, how do we go about coming up with a plan for doing this, for this instance and other instances in the future that end up having similar requirements?
Yes, we back up serveral things in the cloud, and it's done the same way as all our other backups. ;)
Basically backup01 does a git checkout of the ansible repo, looks at inventory/backups for the list of backup_clients, then it runs rdiff-backups over /etc, /home, and any additional dirs specified in the hosts vars.
For databases, we have a script that does a daily db dump to /backups and xz compresses it, and we back that up with rdiff-backup.
How hard is the data to regenerate? ie, if we are keeping the logs that makes the data, we could in theory regen it? Or would that be too difficult?
For the outside piwik there are no logs which are generated. This is a live analytics usage of various fedora websites like fedoramagazine.org where a persons browser does the connection to the the piwik server as they go through the page.
kevin
infrastructure mailing list infrastructure@lists.fedoraproject.org http://lists.fedoraproject.org/admin/lists/infrastructure@lists.fedoraprojec...
infrastructure@lists.fedoraproject.org