Greetings,
Fedora Infrastructure currently has the majority of its hardware in a datacenter in Arizona, USA. Red Hat leases this space for use by a number of teams, including Fedora. However, they've been seeking a more modern and cost effective location for some time and have decided on one: So, we will be migrating to a new datacenter located in Ashburn, Virginia in 2020.
FESCo has approved a 2 week window for the actual move to take place ( https://pagure.io/fesco/issue/2221 ): 2020-06-01 to 2020-06-15. This window is after Fedora 32 is released, but before any major Fedora 33 Milestones.
At a high level, our current plan is: * Setup the new datacenter with networking/storage/management * Populate the new datacenter with new hardware to replace old hardware that either wouldn’t survive the shipping or is due to be refreshed * Ship some small shipment of hardware from the old datacenter to the new that are not easily duplicated like signing hardware, alternative arch builders, etc. * Setup and have by the early part of the outage window a Minimum Viable Fedora Infrastructure (see below) using new hardware and some old. * Function in this minimal state as all the rest of the hardware is shipped to the new datacenter. * Re-add hardware to return to normal state.
We want to maintain continuity of service as best we can, so we have defined a Minimal Viable Fedora which will move in advance of the main hardware. Our intention is to reroute traffic to this setup before moving the bulk of our hardware.
Our current list of what a Minimum Viable Fedora Infrastructure is:
* Mirroring fully functional. Users get metalinks, mirrors are crawled, etc * The complete package lifecycle must work. From commit to update installed on users machines. We need this to push security and important bugfixes as well as to allow maintainers to work toward Fedora 33. * Our production openshift cluster must be up and running normally. (This cluster has fas, bodhi and other important items in it) * Builders will likely be constrained. Ie, less of most arches. Capacity will be re-added as soon as the hardware for it arrives. * Rawhide composes take place as normal. * Nameservers functional * rabbitmq/fedora-messaging should be up and functional. * Internal proxies must be functional (used by builders and other internal items) * Mailing lists must be functional * Backups must be functional * OpenQA must be available to test updates/rawhide composes * Wiki must be available for common bugs / qa
Other services not listed may or may not be up depending on capacity and issues with more important services.
And explicitly some things will NOT be available during that window:
* Staging. There will be no staging, so no rolling out new services. * Full capacity/number of builders * External proxies in the new datacenter * HA for some services.
We are sending this announcement not only to let you all be aware of this move, but to help us plan. If you see some service that you think is critical to Fedora and cannot be down for 2 weeks, and isn't listed above please let us know so we can adjust our plans.
We want to make sure things that are critical keep running smoothly for the Fedora community.
Feedback by next friday (2019-10-04) would be welcome.
Thanks,
Kevin for CPE and the Fedora Infrastructure team.
Dne 27. 09. 19 v 23:55 Kevin Fenzi napsal(a):
- Populate the new datacenter with new hardware to replace old hardware that
either wouldn’t survive the shipping or is due to be refreshed
In past, I was in touch with people working in datacenters and they moved several companies from one datacenter to another one. They claimed that during this transport there was non trivial number of servers which did not survive the shipping. I do not recall the number. It may be worth to ask some people who done similar migration in past and get those numbers. IIRC it was "small percent" (~1-2%). And get pre-approved CAPEX for that. And some spare server as temporary replacement if something crucial dies.
Where does Communishift fit in here? I didn't see it mentioned.
HI Ben,
Where does Communishift fit in here? I didn't see it mentioned.
Could you explain about the meaning of the word "Communishift" in plain English? As I am not a native English speaker, I do not understand the actual meaning. Maybe does it mean a kind of behavior?
On Mon, Sep 30, 2019 at 9:29 AM Jun Aruga jaruga@redhat.com wrote:
Could you explain about the meaning of the word "Communishift" in plain English?
Communishift is the OpenShift cluster that Infra runs for community-run applications[1]. Its name is a portmanteau of "Community" and "OpenShift".
[1] https://fedoraproject.org/wiki/Infrastructure/Communishift
Could you explain about the meaning of the word "Communishift" in plain English?
Communishift is the OpenShift cluster that Infra runs for community-run applications[1]. Its name is a portmanteau of "Community" and "OpenShift".
[1] https://fedoraproject.org/wiki/Infrastructure/Communishift
Thanks for the explanation. I understand it.
On Mon, Sep 30, 2019 at 10:07:24AM +0200, Miroslav Suchý wrote:
Dne 27. 09. 19 v 23:55 Kevin Fenzi napsal(a):
- Populate the new datacenter with new hardware to replace old hardware that
either wouldn’t survive the shipping or is due to be refreshed
In past, I was in touch with people working in datacenters and they moved several companies from one datacenter to another one. They claimed that during this transport there was non trivial number of servers which did not survive the shipping. I do not recall the number. It may be worth to ask some people who done similar migration in past and get those numbers. IIRC it was "small percent" (~1-2%). And get pre-approved CAPEX for that. And some spare server as temporary replacement if something crucial dies.
An excellent idea, we will definitely do that.
I imagine we need to make sure that stuff is warenteed to it's value, so if it breaks in shipping we can replace it. Or have some plan for this.
Thanks for the good feedback!
kevin
On Mon, Sep 30, 2019 at 09:02:48AM -0400, Ben Cotton wrote:
Where does Communishift fit in here? I didn't see it mentioned.
An excellent question!
Basically the answer is that it's still to be determined. We may just ship it before the main move, so it would be down for shipping time + racking/deracking, etc. It might be we could just move it to another cluster in the cloud... in which case it would be unaffected by the move except for a small migration window sometime to move all the existing apps to the new cluster.
We will definitely share plans for that as they are finalized.
kevin
On Mon, Sep 30, 2019 at 7:55 PM Kevin Fenzi kevin@scrye.com wrote:
On Mon, Sep 30, 2019 at 10:07:24AM +0200, Miroslav Suchý wrote:
Dne 27. 09. 19 v 23:55 Kevin Fenzi napsal(a):
- Populate the new datacenter with new hardware to replace old hardware that
either wouldn’t survive the shipping or is due to be refreshed
In past, I was in touch with people working in datacenters and they moved several companies from one datacenter to another one. They claimed that during this transport there was non trivial number of servers which did not survive the shipping. I do not recall the number. It may be worth to ask some people who done similar migration in past and get those numbers. IIRC it was "small percent" (~1-2%). And get pre-approved CAPEX for that. And some spare server as temporary replacement if something crucial dies.
An excellent idea, we will definitely do that.
I imagine we need to make sure that stuff is warenteed to it's value, so if it breaks in shipping we can replace it. Or have some plan for this.
Warrantee generally doesn't cover moves but the DC shipping company should have insurance to cover breakages they cause as mistakes happen.
Dne 01. 10. 19 v 9:53 Peter Robinson napsal(a):
Warrantee generally doesn't cover moves but the DC shipping company should have insurance to cover breakages they cause as mistakes happen.
It is hard to claim something if the package does not have any visible damage. And I guess you will not put accellerometer to every package. It does not need to be fault of the shipping company. E.g. it happens very often for rotating disk, that it can work for years. It operate fine as long as it is rotating, but the bearing become dry and once you stop them, the motor does not have enough power to start the rotation again. Or - just recently - I had hands on computer which has broken the thing which hold cooler on CPU. Once I get the computer off the rack and start moving with it, cooler started to move freely in the case. Previously this was not detected as the cooler just sit on CPU (without any press) and the CPU did not overheated because of lot of fans in case and cool air in data center.
So this are the cases which may - and in this scale will - happen.
On Tue, 1 Oct 2019 at 04:35, Miroslav Suchý msuchy@redhat.com wrote:
Dne 01. 10. 19 v 9:53 Peter Robinson napsal(a):
Warrantee generally doesn't cover moves but the DC shipping company should have insurance to cover breakages they cause as mistakes happen.
It is hard to claim something if the package does not have any visible damage. And I guess you will not put accellerometer to every package. It does not need to be fault of the shipping company. E.g. it happens very often for rotating disk, that it can work for years. It operate fine as long as it is rotating, but the bearing become dry and once you stop them, the motor does not have enough power to start the rotation again. Or - just recently - I had hands on computer which has broken the thing which hold cooler on CPU. Once I get the computer off the rack and start moving with it, cooler started to move freely in the case. Previously this was not detected as the cooler just sit on CPU (without any press) and the CPU did not overheated because of lot of fans in case and cool air in data center.
So this are the cases which may - and in this scale will - happen.
So parts of this are going to be outside of anything our team can deal with. The packing of the equipment, the shipping, the insurance, the dates in transit, etc can be influenced by us asking (which we have) but in the end will be owned by a different organization in Red Hat. Because of that, I would prefer if we don't spend a lot of time coming up with all the scenarios which could cause problems because there is 0 we can do about it. What we can do is work out what resources are needed for a minimal viable fedora for 2 weeks. What services we can turn off for those 2 weeks or go to a read-only nature in some form. Then focus on what services we could place elsewhere and how we could do that.
-- Miroslav Suchy, RHCA Red Hat, Associate Manager ABRT/Copr, #brno, #fedora-buildsys _______________________________________________ infrastructure mailing list -- infrastructure@lists.fedoraproject.org To unsubscribe send an email to infrastructure-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@lists.fedorapro...
infrastructure@lists.fedoraproject.org