I've had multiple friends & colleagues who have fallen prey to our occupational hazard of RSI. Some ten years ago or more, I worked on integrating IBM's ViaVoice for Linux with XEmacs for the benefit of a couple of XEmacs developers who were suffering from that problem at the time. Unfortunately, IBM dropped support for its Linux product while I was still working on the integration.
I'm thinking about working on something like that again, but need a voice recognition product to work with. I've been looking at CMU Sphinx [1] as a possibility, and have made test RPMs out of a few pieces of that project [2,3,4].
Questions for the list:
Is anybody already working on packaging up some voice recognition product?
Does anybody know of anything more accessible than CMU Sphinx (it appears to be very powerful, but is definitely not for newbies; also, not all pieces of it have been released yet)?
Is anybody interested in working with me on getting some voice recognition product packaged up in usable form on Fedora?
References: [1] http://www.cmusphinx.org/ [2] http://jjames.fedorapeople.org/sphinxbase/ [3] http://jjames.fedorapeople.org/pocketsphinx/ [4] http://jjames.fedorapeople.org/SphinxTrain/ -- Jerry James http://loganjerry.googlepages.com/
Jerry James wrote:
Is anybody interested in working with me on getting some voice recognition product packaged up in usable form on Fedora?
I'm happy to help - we have someone here in a similar position. But I have no leads on good software.
On Wed, Mar 25, 2009 at 04:44:20PM -0600, Jerry James wrote:
Is anybody interested in working with me on getting some voice recognition product packaged up in usable form on Fedora?
For speech recognition, software is only part of the problem and, fundamentally, the easiest one (take the algorithms, implement them, optimize/debug at will). The real problem is the data needed to build the models to feed the algorithms. There isn't as far as I know any reasonable set of corpus available under an open source license usable to build a decent speech recognizer. Which makes open source speech recognition something not doable yet.
OG.
On Wed, Mar 25, 2009 at 6:17 PM, Olivier Galibert galibert@pobox.com wrote:
For speech recognition, software is only part of the problem and, fundamentally, the easiest one (take the algorithms, implement them, optimize/debug at will). The real problem is the data needed to build the models to feed the algorithms. There isn't as far as I know any reasonable set of corpus available under an open source license usable to build a decent speech recognizer. Which makes open source speech recognition something not doable yet.
There are some small databases available [1], although admittedly too small for accurate general purpose use. There are some models available [2], built from databases which are not themselves redistributable. There are also a number of model-building tools available [3-5], which may be sufficient for small command-and-control tasks.
But you are right. For general-purpose voice recognition, we don't have the data we need. Still, I think it may be worth putting the software in place so that those who wish to purchase licenses to commercial data have everything else they need, and to encourage the production of better quality free data [6].
References: [1] http://www.speech.cs.cmu.edu/databases/ [2] http://www.speech.cs.cmu.edu/sphinx/models/ [3] http://www.speech.sri.com/projects/srilm/ [4] http://cmusphinx.sourceforge.net/html/download.php#SphinxTrain [5] http://cmusphinx.sourceforge.net/html/download.php/#cmulclmtk [6] http://www.voxforge.org/
Hi all,
/*Jerry James loganjerry@gmail.com*/ wrote on ۰۹/۰۳/۲۶ 07:16:47:
On Wed, Mar 25, 2009 at 6:17 PM, Olivier Galibert galibert@pobox.com wrote:
For speech recognition, software is only part of the problem and, fundamentally, the easiest one (take the algorithms, implement them, optimize/debug at will). �The real problem is the data needed to build the models to feed the algorithms. �There isn't as far as I know any reasonable set of corpus available under an open source license usable to build a decent speech recognizer. �Which makes open source speech recognition something not doable yet.
There are some small databases available [1], although admittedly too small for accurate general purpose use. There are some models available [2], built from databases which are not themselves redistributable. There are also a number of model-building tools available [3-5], which may be sufficient for small command-and-control tasks.
But you are right. For general-purpose voice recognition, we don't have the data we need. Still, I think it may be worth putting the software in place so that those who wish to purchase licenses to commercial data have everything else they need, and to encourage the production of better quality free data [6].
I also think that making the software available has a considerable effect in encouraging people to generate free data. When speech recognition software is available, community will be encouraged to generate data to make it more robust. But people will not generate required data when they can use it already!
I have not much free time, but I'm interested to help you in this direction if I can. :)
Good luck, Hedayat
References: [1] http://www.speech.cs.cmu.edu/databases/ [2] http://www.speech.cs.cmu.edu/sphinx/models/ [3] http://www.speech.sri.com/projects/srilm/ [4] http://cmusphinx.sourceforge.net/html/download.php#SphinxTrain [5] http://cmusphinx.sourceforge.net/html/download.php/#cmulclmtk [6] http://www.voxforge.org/
On Wed, Apr 8, 2009 at 8:24 AM, Hedayat Vatankhah hedayat@grad.com wrote:
I also think that making the software available has a considerable effect in encouraging people to generate free data. When speech recognition software is available, community will be encouraged to generate data to make it more robust. But people will not generate required data when they can use it already!
I have not much free time, but I'm interested to help you in this direction if I can. :)
Good luck, Hedayat
Thanks! To be honest, the biggest problem I am having right now is sorting through the licenses attached (or not!) to the various pieces of software I'm considering. I'm finding code with no clear license, code with open source licenses that depends on commercial libraries, code with licenses that include "no commercial use" clauses, etc. I'm still hopeful I can come up with a usable set of software by the time I'm done....
I'm also firing off emails as I encounter license problems, to let people know why I will NOT be using their software. Whether that will accomplish anything remains to be seen.
Olivier Galibert wrote:
On Wed, Mar 25, 2009 at 04:44:20PM -0600, Jerry James wrote:
Is anybody interested in working with me on getting some voice recognition product packaged up in usable form on Fedora?
For speech recognition, software is only part of the problem and, fundamentally, the easiest one (take the algorithms, implement them, optimize/debug at will). The real problem is the data needed to build the models to feed the algorithms. There isn't as far as I know any reasonable set of corpus available under an open source license usable to build a decent speech recognizer. Which makes open source speech recognition something not doable yet.
OG.
(I'm sorry for cross-posting to fedora-legal)
Well, the most interesting question here for me is what about licensing such language models -- could they be considered to be firmware (redistributable, not modifiable)?
This is important also because of their size (shipping 1G+ corpora, even compressed, is probably not a right way to go).
Regards, Milos
Milos Jakubicek wrote:
Olivier Galibert wrote:
On Wed, Mar 25, 2009 at 04:44:20PM -0600, Jerry James wrote:
Is anybody interested in working with me on getting some voice recognition product packaged up in usable form on Fedora?
For speech recognition, software is only part of the problem and, fundamentally, the easiest one (take the algorithms, implement them, optimize/debug at will). The real problem is the data needed to build the models to feed the algorithms. There isn't as far as I know any reasonable set of corpus available under an open source license usable to build a decent speech recognizer. Which makes open source speech recognition something not doable yet.
OG.
(I'm sorry for cross-posting to fedora-legal)
Well, the most interesting question here for me is what about licensing such language models -- could they be considered to be firmware (redistributable, not modifiable)?
No. They are not firmware and cannot be considered as one.
Rahul
Rahul Sundaram wrote:
No. They are not firmware and cannot be considered as one.
They are not firmware, but are they "content"? Non-code "content", e.g. game data, is allowed under the same rules as firmware. On the other hand, this does not apply for things like fonts or documentation.
Kevin Kofler
On Fri, Mar 27, 2009 at 7:03 PM, Kevin Kofler kevin.kofler@chello.atwrote:
Rahul Sundaram wrote:
No. They are not firmware and cannot be considered as one.
They are not firmware, but are they "content"? Non-code "content", e.g. game data, is allowed under the same rules as firmware. On the other hand, this does not apply for things like fonts or documentation.
Kevin Kofler
-- fedora-devel-list mailing list fedora-devel-list@redhat.com https://www.redhat.com/mailman/listinfo/fedora-devel-list
On the other hand, speech recognition requires a database of content to help it deal with voice and translate it into text or commands to execute. This is a similar enough dependency to games that I figure it would fall under the same jurisdiction. (IANAL)
Kevin Kofler wrote:
Rahul Sundaram wrote:
No. They are not firmware and cannot be considered as one.
They are not firmware, but are they "content"? Non-code "content", e.g. game data, is allowed under the same rules as firmware. On the other hand, this does not apply for things like fonts or documentation.
Yes yes, that's what I meant by "considering to be firmware"...thank you Kevin for clarification.
What's the (juristic) difference between game data and speech recognition data (or in common any scientific appplications data)?
This is a common situation in many scientific apps (especially natural language processing): the code is not important (and thus released under GPL or whatever else), what matters is data (thus if released, then only in binary form).
Regards, Milos
Kevin Kofler wrote:
Rahul Sundaram wrote:
No. They are not firmware and cannot be considered as one.
They are not firmware, but are they "content"? Non-code "content", e.g. game data, is allowed under the same rules as firmware. On the other hand, this does not apply for things like fonts or documentation.
I am not sure, any games are carrying non modifiable content. Which ones are talking about?
Rahul
Kevin Kofler wrote:
Rahul Sundaram wrote:
I am not sure, any games are carrying non modifiable content. Which ones are talking about?
Plenty of them. I forgot which ones exactly. Just check a few of them and I'm sure you'll find some. Or ask the Games SIG.
Ah yes, we do have some content which are just distributable.
Rahul
Hi,
Does anybody know of anything more accessible than CMU Sphinx (it appears to be very powerful, but is definitely not for newbies; also, not all pieces of it have been released yet)?
I've been also working with juluis, which also uses acoustic models from voxforge. http://julius.sourceforge.jp/en_index.php
- fabiand
Is anybody interested in working with me on getting some voice recognition product packaged up in usable form on Fedora?
References: [1] http://www.cmusphinx.org/ [2] http://jjames.fedorapeople.org/sphinxbase/ [3] http://jjames.fedorapeople.org/pocketsphinx/ [4] http://jjames.fedorapeople.org/SphinxTrain/ -- Jerry James http://loganjerry.googlepages.com/
Fabian Deutsch wrote:
I've been also working with juluis, which also uses acoustic models from voxforge. http://julius.sourceforge.jp/en_index.php
Unfortunately, they're using a custom license whose freeness is unclear. In particular, this clause worries me:
- When you publish or present any results by using the Software, you
must explicitly mention your use of "Large Vocabulary Continuous Speech Recognition Engine Julius".
Kevin Kofler
On 03/27/2009 08:14 PM, Kevin Kofler wrote:
Fabian Deutsch wrote:
I've been also working with juluis, which also uses acoustic models from voxforge. http://julius.sourceforge.jp/en_index.php
Unfortunately, they're using a custom license whose freeness is unclear. In particular, this clause worries me:
- When you publish or present any results by using the Software, you
must explicitly mention your use of "Large Vocabulary Continuous Speech Recognition Engine Julius".
I'm pretty sure this is fine. At worst, it would make it GPL incompatible.
~spot
devel@lists.stg.fedoraproject.org