I want to make anonymous sign language videos without having to go through the effort of manually animating a cartoon. Likewise I don’t want to pay anything for equipment, or have to use proprietary software. I’d prefer not to use anything too difficult to set up, but if that’s how it has to be then I’ll resign myself to it.

So, what sort of software could I use where I could just input a pre-recorded video of myself and have the machine spit out a new video of a different person or character making the same movements? Any names, pros and cons? How do they capture fine movements of the fingers and face?

Linux Mint Xfce on a “gaming laptop” from a few years ago with an Nvidia GPU, by the way.

Posted at 5:30 AM (no sleep)

  • ComradeOohAah [he/him, they/them]@hexbear.net
    link
    fedilink
    English
    arrow-up
    6
    ·
    25 days ago

    Do you have VR? VRChat might be an option. I know you’re not looking to spend money, but if you’re looking to do a bunch of projects like this, it might be worth it. It’s at least possibly much cheaper than a full mocap setup. I recently snagged a Quest Pro off of Facebook Marketplace for a couple hundred bucks and it’s got eye tracking, face tracking, and hand tracking. Of course that’s assuming your gaming laptop can handle it. If it’s 1050ti or newer you might be okay.

    I don’t have any experience personally, but just googling webcam face mocap will give you plenty of options because AR like filter stuff is so plentiful. You might have to roll your own, but it could probably be vibe coded if you’ve got any experience in a specific game engine or something.

    • Erika3sis [she/her, xe/xem]@hexbear.netOP
      link
      fedilink
      English
      arrow-up
      4
      ·
      25 days ago

      I guess if push comes to shove I could get a VR headset and see if I can figure out VRchat, although I do worry a bit about e.g. bumping into things or feeling nauseous. VRchat for signing is a trade-off due to the fewer handshapes but I could probably work around it.

      • ComradeOohAah [he/him, they/them]@hexbear.net
        link
        fedilink
        English
        arrow-up
        4
        ·
        edit-2
        25 days ago

        although I do worry a bit about e.g. bumping into things or feeling nauseous

        There’s a thing called a chaperone that you set up to define your space’s boundaries and then a little glowing wall pops up as you approach them, which can be super helpful. Most folks don’t have nausea issues until they start doing stuff like using the stick to move or a rollercoaster or something. If you’re just moving around in your space you will hopefully be okay.

        As much as I dislike Meta, I do recommend buying their headsets used because for the price point that you can usually find them at, you’re not going to beat it with the feature set they offer. Plus, you can use them with SteamLink for PCVR.

        I know you’re not in the US, but I’ll include the prices I would recommend holding out for in my area so you’ll have a feel for value.

        • Quest 2 (2020, discontinued Dec 2024. Will Receive Feature Updates until Dec 2026 and Security updates until Dec 2027.) is the oldest of the range I would consider still usable. It has Hand Tracking only and very limited pass through. ( Passthrough is the ability to see through the headset cameras with AR placing virtual objects in your real space.) People try to sell them around me for $200 to $300 dollars but they’re not worth that at all and eventually show up for less than $100. $50 - $75 is not out of the question if I’m checking once a day, and I’ve even seen them go for as little as $25 but they go very fast.
        • Quest Pro (2022, also discontinued Dec 2024. Will Receive Feature Updates until Dec 2026 and Security updates until Dec 2027.) Way more features. Better pass through cameras, better (supposedly best) hand tracking, has eye tracking and face tracking. This is what I would recommend if facial expressions are important to you. It was released more as a Developer Kit / Enterprise unit and became popular with VRChat folks because of how much it brought to the table. People around me try to sell them typically $600 to $800 on the high end. But more realistically around $400 or so. A good deal would be $200 to $300. Anything below that would be someone strapped for cash or no idea what they have, and they will go quick.
        • Quest 3 (2023) No face or Eye tracking. Better Hand tracking than Quest 2. Supposed to the best as far as Passthrough cameras go. Better screen than Quest Pro. I have not tried this one or the next yet. On the High end $400 - $600. Realistically around $200 is a good deal, but I’ve seen it for as low as $150 on occasion.
        • Quest 3s (2024) No face or Eye tracking. Better Hand tracking than Quest 2. A much cheaper version of the Quest 3 because they removed a depth sensor, and changed up the lenses. $300 is high. I’ve seen as low as $100, I wouldn’t pay more than $150.

        End info dump. data-laughing

  • stupid_asshole69 [none/use name]@hexbear.net
    link
    fedilink
    English
    arrow-up
    3
    ·
    25 days ago

    So, you’re not considering the most important identifying feature, your own body language.

    If you’re okay with eventually it being public knowledge that you’re the one who did those videos then you can use something like comfyui to create poses out the video and then feed the pose data into a weeb slop drawer and get something pretty passable.

    Comfy is proprietary I’m pretty sure. You’ll be building a workflow for video to character so start looking at huggingface for that. A lot of what you’ll be seeing there will be proprietary even though you can have at it for free.

    It will be slow.

    • Erika3sis [she/her, xe/xem]@hexbear.netOP
      link
      fedilink
      English
      arrow-up
      5
      ·
      edit-2
      26 days ago

      Wearing a mask is only an option if I’m willing to turn WANT, SEX, GAY, and PEDOPHILE into homophones of each other… which I am not.

      Edit: On top of this, if the eyes, brows, hands etc are still visible, then that would still feel a bit revealing. There’s a lot you can be identified from.

      • Kefla [she/her, they/them]@hexbear.net
        link
        fedilink
        English
        arrow-up
        1
        ·
        25 days ago

        So you want to make videos which necessarily require you to show parts of your body you refuse to show. NGL this just sounds like a completely incompatible set of goals.

        I would just make the videos and not worry that much about being identified by your eyebrows. People more radical than you have made more controversial content than this showing greater portions of their body without any problems. Unless your actual goal here is to make a ransom video for the hostage’s deaf family, I really don’t see what could possibly require this amount of dedication to complete anonymity.

        • Erika3sis [she/her, xe/xem]@hexbear.netOP
          link
          fedilink
          English
          arrow-up
          3
          ·
          edit-2
          25 days ago

          My goal is, truth be told, to make a signed conlang and then post a translation quiz in said conlang on this site, like I’ve done with my spoken conlang. Self-doxxing material gets removed from Hexbear, but the border of what constitutes it is fuzzy: Rationally I know that nobody’s actually going to identify me from my hands or my eyebrows, and even so, they’d probably identify me from my body language, or even from the peculiarities of how I write, or any number of other things, before they’d ever have to resort to cross-referencing the creases of my hands… But that doesn’t stop me from being irrationally bothered by the idea of showing myself on camera, and it doesn’t stop me from finding it bothersome and unfair that there are 300 or so natural languages in the world that apparently just can’t be used anonymously online like all the other languages. People do have a right to privacy and a right to use their first languages, so why should millions of people around the world have to choose between one or the other? This is why I was excited to hear about VRchat’s sign language communities, or Zhaoyang Xia et al’s articles about SL video anonymization through machine learning, and figured I could probably utilize something like that for my own project.

          The other solutions in my case are to go through the effort of manually animating a cartoon after all; to spend however much on setting up a proper mocap system; or to use or devise some sort of transcription system or orthography… which would be a pain for me to learn/make and a bigger pain to make other people learn just to play a translation quiz. This means that video really is the best option. If I am resigning myself to just wearing a mask to cover my face, then I’m sure I could find a way to add all the information lost back in.