Are there any free/open-source TTS options out there that are on the same level as Google Cloud’s? I tried a lot of free ones, but they are absolutely awful and still sound like my Amiga did 30 years ago. With LLMs being available as open source, I am hoping there’s also a good TTS offering I just haven’t found yet.

  • NarrativeBear@lemmy.world
    link
    fedilink
    English
    arrow-up
    2
    ·
    1 year ago

    Balabolka was/is my go to for TTS. It creates audio files as well for later if you need. Used it to make plenty of audio books in the past.

  • tal@lemmy.today
    link
    fedilink
    English
    arrow-up
    11
    ·
    edit-2
    1 year ago

    Festival – not cutting edge – will definitely be better than your Amiga, and can handle long text. Last time I set it up, IIRC I wanted some voices generated by Tokyo University or something, which took some setting up. It’ll probably be packaged in your Linux distro.

    You can listen to a demo here.

    https://www.cstr.ed.ac.uk/projects/festival/onlinedemo.html

    It’s not LLM-based.

    For short snippets, offline, one can use Tortoise TTS – which is LLM based. But it’s slow and can only generate clips of a limited length. Whether it’s reasonable for you will depend a lot on your application. It will let one clone – or make a voice sounding more-or-less similar – a voice using some sound samples from them speaking.

    https://github.com/neonbjb/tortoise-tts

    Examples at:

    https://nonint.com/static/tortoise_v2_examples.html

    I haven’t used Google’s, but I’d assume, given that Google is paying people to work on it full time, that whatever they’ve done probably sounds nicer. But, then not open source, so…shrugs

    • state_electrician@discuss.tchncs.deOP
      link
      fedilink
      English
      arrow-up
      2
      ·
      1 year ago

      Ah, I looked at Tortoise, but I do not have an nVidia GPU, so I couldn’t try it. Festival I tried and the results were bad. Not so much for the voice, but for intonation and pronunciation.

      • tal@lemmy.today
        link
        fedilink
        English
        arrow-up
        4
        ·
        edit-2
        1 year ago

        Ah, I looked at Tortoise, but I do not have an nVidia GPU, so I couldn’t try it.

        I use it on an AMD GPU.

        EDIT: Wait, let me make sure. I was using an Nvidia GPU for a while and switched to AMD.

        EDIT2: Oh, yeah, it uses transformers, and that doesn’t work on rocm presently, IIRC.

  • filister@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    I would say Elevenlabs is the best but unfortunately not free.

    If you need it for a short while it might be worth it.

    I tried Piper with different models, and a couple of FOSS alternatives but the output quality was definitely subpar.

    I would say soon we will have good FOSS models, but for the time being that’s not the case.

  • observantTrapezium@lemmy.ca
    link
    fedilink
    English
    arrow-up
    8
    ·
    1 year ago

    Piper is my choice. Very easy to use from the command line, fairly good sounding voices. Prior to that, for years (decades?) I used espeak-ng, had a very robotic voice but articulated almost everything very clearly, and I got used to it so didn’t actually mind.