New subject: InstantMirror Redesign / current best /simplest way to achieve

10 Mar 2008


      Hi all!
InstantMirror currently works fine but involves a lot of configuration and
that might actually stop someone from use it. I came across this project
recently and found it really interesting. After reading the discussions on
fedora-devel, warren's journal, InstantMirror wiki, I thought that I would
come up with a redesign that would be more yum friendly and will involve
minimal configuration for setup. Below is my point of view of new
InstantMirror design.
Note: You can access the design proposal here
(http://iyum.saini.co.in/index.php/InstantMirror) also.
Redesign Proposal
*****************
    Basically the most obvious use case of InstantMirror is to be used by Yum
for updates. I propose to develop InstantMirror keeping in mind that it
should integrate with Yum.  In the new design I am using not multiple
programs to get the job done as proposed by warren here
(https://fedorahosted.org/InstantMirror/wiki/InstantMirrorDaemon). There
will be only one daemon that will continuously listen to requests and if
necessary it will fork itself. Below is the method of operation for the
daemon.
Method of Operation
*******************
1. InstantMirror gets a client request for a URL.
2. Check: if URL is not in (RPM, metadata file)
    * Then its none of our business.
    * Let proxy handle it the normal way.
    * Done and exit.
3. Error Check: if remote host is not reachable
        * Check: if RPM/metadata is available in cache
            1. Stream the RPM/metadata from cache.
            2. Done and exit.
        * else
            1. Throw a "No route to host" error.
            2. Done and exit.
4. Check: if RPM/metadata is available in cache
        * Check: if RPM/metadata in cache is older than upstream
            1. Delete RPM/metadata from cache.
            2. Download and stream.
            3. Done and exit.
        * Check: if RPM/metadata matches upstream or newer than upstream
            1. Stream the RPM/metadata from cache.
            2. Done and exit.
        * Check: if RPM/metadata does not exsit upstream
            1. Delete RPM/metadata from cache.
            2. Throw a "Not found" error.
            3. Done and exit.
5. Check: if RPM/metadata is not available in cache
        * Download and stream.
        * Done and exit.
Download Process
****************
    In the above operation everything is clear except the download process. If
a file is already being downloaded from upstream and another client
request come in for the same file, then we have two options to continue
downloading
    1. Download via only first instance (master) and let the other instances
(slaves) copy the partial content to the client. The disadvantage is that
the slaves will be throttled by the master's download speed.
    2. The other instance also starts downloading from upstream and append
data to the local file. Stream the data to clients when download is
finished. This is quite complicated and as of now I don't even know how to
do it or even if its feasible to do it.
I am currently trying to get the hang of downloading process.
The above design is more or less same as the previous design with may be a
bit of improvements. Now, we have decided to have two types of
InstantMirror.
1. InstantMirror to be used by a small group of people. In this case we
have to get rid of all the dependencies like squid, apache because for a
small setup nobody is going to configure squid and apache. So, we use a
proxy server implemented in python for this kind of setup and integrate
InstantMirror with it in caching mode. So that it becomes easy to setup
and don't require squid, apache or whatever else.
    2. InstantMirror to be used by an organization. As almost all the
organization ( i am focusing more on institutes/universities here) use a
common proxy server to access the Internet, we will have the InstantMirror
which can be integrated with squid. There will be no difficulty in setup
as the people already use squid (assuming squid is widely used in
Unix/Linux world) and know how to configure it. We can't use proxy server
implemented in python here because no organization would ever agree to use
a stripped down version of proxy server instead of squid.
If the above sounds interesting, then I also propose to build a Yum plugin
(say Yum Client) which will interact with InstantMirror sitting near the
proxy server. Yum Client will periodically update the InstantMirror with
the information like which packages are very frequently updated by user
(kernel, kde, yum), which packages are never updated or updated rarely
(vim). Using this information InstantMirror will optimize the download
queue and using some very simple techniques for queuing a download etc.,
we can optimize the bandwidth usage.
Imagine a university with thousands of Linux users and everyone is
updating their system weekly. GBs of bandwidth is being wasted every week
due to subsequent downloads of the same package.
If you have any suggestions for improvements, comments on the current
design or you want to criticize the design, please reply back. They would
really help me to improve.
InstantMirror - https://fedorahosted.org/InstantMirror/wiki
InstantMirrorDaemon -
https://fedorahosted.org/InstantMirror/wiki/InstantMirrorDaemon
InstantMirror needs a rethink -
http://www.redhat.com/archives/rhl-devel-list/2008-January/msg02341.html
Warren Togami's Journal - http://wtogami.livejournal.com/20536.html
PS : This is my first RFC, if i wrote it badly please forgive me :)
-------------------------------------------------------
Thank you,
Kulbir Saini,
Computer Science and Engineering,
International Institute of Information Technology,
Hyderbad, India - 500032.
My Home-Page: http://saini.co.in/
My Institute: http://www.iiit.ac.in/
My Linux-Blog: http://linux.saini.co.in/
IRC nick : generalBordeaux
Channels : #fedora, #fedora-devel, #yum on freenode
-------------------------------------------------------

[RFC]InstantMirror Redesign