Re: Future of fedora-packages

28 Feb 2019

      On Thu, 28 Feb 2019 at 00:06, Stephen John Smoogen smooge@gmail.com wrote:
...
On Wed, 27 Feb 2019 at 16:05, Jim Perrin jperrin@redhat.com wrote:
...
How much heresy is involved in us using Amazon's elasticsearch service
for this, so that we don't have yet-another-thing to maintain?
I was wondering how much data are we looking to shove there, does that
data need to be 'protected', and how fast do we need it to be for us
to talk back and forth to the cloud. The heresy side I don't have any
say in..
For fedora-packages we want to store documents that contains packages
informations (see the current structure used
https://github.com/fedora-infra/fedora-packages/blob/master/fedoracommunity/...).
Currently in production we have 23849 documents in the xapian database
so I honestly don't think that will be much trouble for elasticsearch.
Writing to the cluster should be restricted and I think the search
service should be public, elasticsearch provides Security Privileges
(https://www.elastic.co/guide/en/x-pack/current/security-privileges.html)
that seems to fit with that idea.
Indexing does not have to be crazy fast, for example currently
fedora-packages indexing takes between 2 to 3 hours so I don't think
network latency will matter much here. Searching is a bit more
sensitive since users usually don't want to wait more than a seconds
or so to get a search results but if we use the elasticsearch
javascript library
(https://www.elastic.co/guide/en/elasticsearch/client/javascript-api/current/...)
and handle the search in the frontend then it does not have to go via
our infrastructure.
...
...
On 2/27/19 4:19 AM, Stephen John Smoogen wrote:
...
On Tue, 26 Feb 2019 at 14:39, Clement Verna cverna@fedoraproject.org wrote:
...
Hi all,
fedora-packages [0] code base is showing its age. The code base and
the technology stack  (Turbogears2 [1] web framework and the Moksha
[2] middleware) is currently not ready for Python3 and I am not
planning to do the work required to make it Python3 compatible, so the
application will stop working when Fedora 29 is EOL.
In order to keep the service running, I have started a Proof Of
Concept (fedora-search [3]) to replace the backend of the application.
Fedora-search would be a REST API service offering full test search
API. Such a service would then be available for other application to
use, fedora-packages would then become a frontend only application
using the service provided by fedora-search.
While the POC shows that this is a viable solution, I don't think that
we should be proceeding that way, for the simple reason that this add
yet another code base to maintain, I think we should use this
opportunity to consider using Elasticsearch instead of maintaining our
own "search engine".
The main issues to getting elasticsearch working in the past was the following:
1 The number of systems needed to make it work. There is a large
difference from their 'proof-of-concept see how great this is' to 'ok
you want to do anything with load' setups in everything from storage
to number of search nodes to network speeds. [The number of hardware
for the data we have was to start with 5-8 dedicated Dell systems,
some amount of shared fast storage, and N virtual machines with a
10-40GB backbone.. or throwing all of Fedora Infrastructure at once
into the cloud.. because the feed it from PHX2 to the cloud is
expensive.]

Packaging of elasticsearch was a mess. At the time we had rules

that all packages needed to be packaged in Fedora and follow Fedora
packaging rules. [This one has been relaxed.]

Running of elasticsearch was a large service in itself. It doesn't

take care of itself and we would need one or more people who know it
well to keep it running. [This goes down the ladder.. the logstash
backends are also full services.. ] Most of that was written in Java
which no one on the team at the time had good experiences with.

A kibana/elasticsearch query expert. Just like any database, most

of the queries you can make are the worse kind. They will take a lot
more CPU/memory/time than they should making just grepping for data
faster.
However that is 3-5 years ago.. so a lot has changed since then.
...
I think that Elasticsearch offers quite a few advantages :

Powerful Query language
Python bindings
Javascript bindings
Can be deployed in our infrastructure or used as a service
Can be useful for other applications ( docs.fp.o, pagure, ??)

So what is the general feeling about using Elasticsearch in our
infrastructure ? Should we look at deploying a cluster in our infra /
Should we approach the Council to see if we can get founding to have
this service hosted by Elastic ?
Thanks
Clément
[0] - https://apps.fedoraproject.org/packages/
[1] - http://www.turbogears.org/
[2] - https://mokshaproject.github.io/mokshaproject.net/
[3] - https://github.com/fedora-infra/fedora-search
_______________________________________________
infrastructure mailing list -- infrastructure@lists.fedoraproject.org
To unsubscribe send an email to infrastructure-leave@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@lists.fedorapro...

infrastructure mailing list -- infrastructure@lists.fedoraproject.org
To unsubscribe send an email to infrastructure-leave@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@lists.fedorapro...
--
Stephen J Smoogen.
_______________________________________________
infrastructure mailing list -- infrastructure@lists.fedoraproject.org
To unsubscribe send an email to infrastructure-leave@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@lists.fedorapro...

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: Future of fedora-packages