2020 Datacenter Move: Request for comments - infrastructure - Fedora mailing-lists

27 Sep 2019


      Greetings,
Fedora Infrastructure currently has the majority of its hardware in a 
datacenter in Arizona, USA.  Red Hat leases this space for use by a number of 
teams, including Fedora. However, they've been seeking a more modern and cost
effective location for some time and have decided on one: 
So, we will be migrating to a new datacenter located in Ashburn, Virginia
in 2020.
FESCo has approved a 2 week window for the actual move to take place 
( https://pagure.io/fesco/issue/2221 ): 2020-06-01 to 2020-06-15. 
This window is after Fedora 32 is released, but before any major
Fedora 33 Milestones.
At a high level, our current plan is:
* Setup the new datacenter with networking/storage/management
* Populate the new datacenter with new hardware to replace old hardware that 
either wouldn’t survive the shipping or is due to be refreshed
* Ship some small shipment of hardware from the old datacenter to the new
that are not easily duplicated like signing hardware,
alternative arch builders, etc.
* Setup and have by the early part of the outage window a 
Minimum Viable Fedora Infrastructure (see below) using new hardware
and some old.
* Function in this minimal state as all the rest of the hardware is
shipped to the new datacenter.
* Re-add hardware to return to normal state.
We want to maintain continuity of service as best we can,
so we have defined a Minimal Viable Fedora which will move in advance
of the main hardware. Our intention is to reroute traffic to this setup
before moving the bulk of our hardware.
Our current list of what a Minimum Viable Fedora Infrastructure is:
* Mirroring fully functional. Users get metalinks, mirrors are crawled, etc
* The complete package lifecycle must work.
From commit to update installed on users machines.
We need this to push security and important bugfixes as well as to allow
maintainers to work toward Fedora 33.
* Our production openshift cluster must be up and running normally.
(This cluster has fas, bodhi and other important items in it)
* Builders will likely be constrained.
Ie, less of most arches.
Capacity will be re-added as soon as the hardware for it arrives.
* Rawhide composes take place as normal.
* Nameservers functional
* rabbitmq/fedora-messaging should be up and functional.
* Internal proxies must be functional (used by builders and other internal items)
* Mailing lists must be functional
* Backups must be functional
* OpenQA must be available to test updates/rawhide composes
* Wiki must be available for common bugs / qa
Other services not listed may or may not be up depending on capacity
and issues with more important services.
And explicitly some things will NOT be available during that window:
* Staging. There will be no staging, so no rolling out new services.
* Full capacity/number of builders
* External proxies in the new datacenter
* HA for some services.
We are sending this announcement not only to let you all be aware of this move, 
but to help us plan. If you see some service that you think is critical
to Fedora and cannot be down for 2 weeks, and isn't listed above
please let us know so we can adjust our plans.
We want to make sure things that are critical keep running
smoothly for the Fedora community.
Feedback by next friday (2019-10-04) would be welcome.
Thanks,
Kevin for CPE and the Fedora Infrastructure team.