On Thu, 28 Feb 2019 at 00:06, Stephen John Smoogen smooge@gmail.com wrote:
On Wed, 27 Feb 2019 at 16:05, Jim Perrin jperrin@redhat.com wrote:
How much heresy is involved in us using Amazon's elasticsearch service for this, so that we don't have yet-another-thing to maintain?
I was wondering how much data are we looking to shove there, does that data need to be 'protected', and how fast do we need it to be for us to talk back and forth to the cloud. The heresy side I don't have any say in..
For fedora-packages we want to store documents that contains packages informations (see the current structure used https://github.com/fedora-infra/fedora-packages/blob/master/fedoracommunity/...). Currently in production we have 23849 documents in the xapian database so I honestly don't think that will be much trouble for elasticsearch. Writing to the cluster should be restricted and I think the search service should be public, elasticsearch provides Security Privileges (https://www.elastic.co/guide/en/x-pack/current/security-privileges.html) that seems to fit with that idea.
Indexing does not have to be crazy fast, for example currently fedora-packages indexing takes between 2 to 3 hours so I don't think network latency will matter much here. Searching is a bit more sensitive since users usually don't want to wait more than a seconds or so to get a search results but if we use the elasticsearch javascript library (https://www.elastic.co/guide/en/elasticsearch/client/javascript-api/current/...) and handle the search in the frontend then it does not have to go via our infrastructure.
On 2/27/19 4:19 AM, Stephen John Smoogen wrote:
On Tue, 26 Feb 2019 at 14:39, Clement Verna cverna@fedoraproject.org wrote:
Hi all,
fedora-packages [0] code base is showing its age. The code base and the technology stack (Turbogears2 [1] web framework and the Moksha [2] middleware) is currently not ready for Python3 and I am not planning to do the work required to make it Python3 compatible, so the application will stop working when Fedora 29 is EOL.
In order to keep the service running, I have started a Proof Of Concept (fedora-search [3]) to replace the backend of the application. Fedora-search would be a REST API service offering full test search API. Such a service would then be available for other application to use, fedora-packages would then become a frontend only application using the service provided by fedora-search.
While the POC shows that this is a viable solution, I don't think that we should be proceeding that way, for the simple reason that this add yet another code base to maintain, I think we should use this opportunity to consider using Elasticsearch instead of maintaining our own "search engine".
The main issues to getting elasticsearch working in the past was the following:
1 The number of systems needed to make it work. There is a large difference from their 'proof-of-concept see how great this is' to 'ok you want to do anything with load' setups in everything from storage to number of search nodes to network speeds. [The number of hardware for the data we have was to start with 5-8 dedicated Dell systems, some amount of shared fast storage, and N virtual machines with a 10-40GB backbone.. or throwing all of Fedora Infrastructure at once into the cloud.. because the feed it from PHX2 to the cloud is expensive.]
- Packaging of elasticsearch was a mess. At the time we had rules
that all packages needed to be packaged in Fedora and follow Fedora packaging rules. [This one has been relaxed.]
- Running of elasticsearch was a large service in itself. It doesn't
take care of itself and we would need one or more people who know it well to keep it running. [This goes down the ladder.. the logstash backends are also full services.. ] Most of that was written in Java which no one on the team at the time had good experiences with.
- A kibana/elasticsearch query expert. Just like any database, most
of the queries you can make are the worse kind. They will take a lot more CPU/memory/time than they should making just grepping for data faster.
However that is 3-5 years ago.. so a lot has changed since then.
I think that Elasticsearch offers quite a few advantages :
- Powerful Query language
- Python bindings
- Javascript bindings
- Can be deployed in our infrastructure or used as a service
- Can be useful for other applications ( docs.fp.o, pagure, ??)
So what is the general feeling about using Elasticsearch in our infrastructure ? Should we look at deploying a cluster in our infra / Should we approach the Council to see if we can get founding to have this service hosted by Elastic ?
Thanks Clément
[0] - https://apps.fedoraproject.org/packages/ [1] - http://www.turbogears.org/ [2] - https://mokshaproject.github.io/mokshaproject.net/ [3] - https://github.com/fedora-infra/fedora-search _______________________________________________ infrastructure mailing list -- infrastructure@lists.fedoraproject.org To unsubscribe send an email to infrastructure-leave@lists.fedoraproject.org Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@lists.fedorapro...
infrastructure mailing list -- infrastructure@lists.fedoraproject.org To unsubscribe send an email to infrastructure-leave@lists.fedoraproject.org Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@lists.fedorapro...
-- Stephen J Smoogen. _______________________________________________ infrastructure mailing list -- infrastructure@lists.fedoraproject.org To unsubscribe send an email to infrastructure-leave@lists.fedoraproject.org Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@lists.fedorapro...