Olivier Galibert wrote:
On Wed, Mar 25, 2009 at 04:44:20PM -0600, Jerry James wrote:
Is anybody interested in working with me on getting some voice recognition product packaged up in usable form on Fedora?
For speech recognition, software is only part of the problem and, fundamentally, the easiest one (take the algorithms, implement them, optimize/debug at will). The real problem is the data needed to build the models to feed the algorithms. There isn't as far as I know any reasonable set of corpus available under an open source license usable to build a decent speech recognizer. Which makes open source speech recognition something not doable yet.
OG.
(I'm sorry for cross-posting to fedora-legal)
Well, the most interesting question here for me is what about licensing such language models -- could they be considered to be firmware (redistributable, not modifiable)?
This is important also because of their size (shipping 1G+ corpora, even compressed, is probably not a right way to go).
Regards, Milos
Milos Jakubicek wrote:
Olivier Galibert wrote:
On Wed, Mar 25, 2009 at 04:44:20PM -0600, Jerry James wrote:
Is anybody interested in working with me on getting some voice recognition product packaged up in usable form on Fedora?
For speech recognition, software is only part of the problem and, fundamentally, the easiest one (take the algorithms, implement them, optimize/debug at will). The real problem is the data needed to build the models to feed the algorithms. There isn't as far as I know any reasonable set of corpus available under an open source license usable to build a decent speech recognizer. Which makes open source speech recognition something not doable yet.
OG.
(I'm sorry for cross-posting to fedora-legal)
Well, the most interesting question here for me is what about licensing such language models -- could they be considered to be firmware (redistributable, not modifiable)?
No. They are not firmware and cannot be considered as one.
Rahul
Rahul Sundaram wrote:
No. They are not firmware and cannot be considered as one.
They are not firmware, but are they "content"? Non-code "content", e.g. game data, is allowed under the same rules as firmware. On the other hand, this does not apply for things like fonts or documentation.
Kevin Kofler
Kevin Kofler wrote:
Rahul Sundaram wrote:
No. They are not firmware and cannot be considered as one.
They are not firmware, but are they "content"? Non-code "content", e.g. game data, is allowed under the same rules as firmware. On the other hand, this does not apply for things like fonts or documentation.
Yes yes, that's what I meant by "considering to be firmware"...thank you Kevin for clarification.
What's the (juristic) difference between game data and speech recognition data (or in common any scientific appplications data)?
This is a common situation in many scientific apps (especially natural language processing): the code is not important (and thus released under GPL or whatever else), what matters is data (thus if released, then only in binary form).
Regards, Milos
Kevin Kofler wrote:
Rahul Sundaram wrote:
No. They are not firmware and cannot be considered as one.
They are not firmware, but are they "content"? Non-code "content", e.g. game data, is allowed under the same rules as firmware. On the other hand, this does not apply for things like fonts or documentation.
I am not sure, any games are carrying non modifiable content. Which ones are talking about?
Rahul
Rahul Sundaram wrote:
I am not sure, any games are carrying non modifiable content. Which ones are talking about?
Plenty of them. I forgot which ones exactly. Just check a few of them and I'm sure you'll find some. Or ask the Games SIG.
Kevin Kofler
Kevin Kofler wrote:
Rahul Sundaram wrote:
I am not sure, any games are carrying non modifiable content. Which ones are talking about?
Plenty of them. I forgot which ones exactly. Just check a few of them and I'm sure you'll find some. Or ask the Games SIG.
Ah yes, we do have some content which are just distributable.
Rahul