Updated: Mar 7, 2021
Before delving into iClone's animation lip-sync/facial mocap features, I wanted to play around with some other publicly accessible tools/apps that have piqued my interest. These apps use single images as the basis for facial animation. As a test, I wanted to bring in a couple of screenshots of my 3D animated avatar and see if these tools would work on a non-photorealistic face.
First and foremost, my professor Finlay Braithwaite challenged me to use *buzzworthy* Deep Nostalgia app to animate my avatar. The deep learning feature, from the ancestral website My Heritage was created as means of bringing old, family photos of your relatives "to life". It's an interesting, but certainly uncanny, concept that has caught the attention of numerous publications such as People Magazine, The Verge, The National Post and USA Today. Deep Nostalgia is a tool that any consumer is free to use (up to 5 photos). Because of its accessibility, Deep Nostalgia is part of a growing trend of democratization and normalization of machine learning techniques (deep learning/deep fake) in the public landscape. Though machine learning is often times difficult, complicated and expensive, apps like Deep Nostalgia are pre-built for public usage. So although they are not fully customizable and only work in limited capacities with strict confines, they're still contributing to the ubiquity of machine learning that is only bound to become more and more normalized. I also appreciate the fact that not every conversation about "deepfakes" needs to induce such a sinister and dystopian discourse.
So, rant about machine learning aside, I took Finlay up on this challenge to use Deep Nostalgia on my 3D animated avatar. The confines present in this app meant that I didn't get to choose the animation that was thrust upon my image - it is a pre-built set of motions recorded by an actor that mainly involves looking around, tilting its head and, sometimes, smiling slightly. You know, all of the tolerable, subtle motions you would want to see whilst virtually reviving your dead great-grandmother.
I decided to try out the tool on a few different image selections - 3 Different 3D Animated Avatars and 1 photorealistic (real) image of myself for comparison. The results showed varying degrees of success.
I actually really love the blinking effect on this attempt. When facing forward, this avatar shows an emotional depth and I find that very exciting. As soon as she turns her head, however, the distortion is quite apparent and very distracting.
Again, the eye movements add a really nice, humanistic depth. But I pray I do not smile like that and if I do... no one tell me.
I also made an avatar of my good friend Brittany in iClone and tried her out in Deep Nostalgia as well
Is she okay?! Why is she twitching so much? She looks very disoriented. Arguably slightly better than the Avatar attempts but not a significant difference in my opinion.
Like many other deepfake attempts I've seen in the past, most of the distortion comes from the image/avatar/animation attempting to turn it's head to the side. The face shape starts to warp slightly, as if the machine learning doesn't fully comprehend the dynamics and 360 scope of a human head (even on the photorealistic footage). But I must admit that I was quite impressed by a few, fleeting moments in this exercise. Seeing my avatar blink and show even a moment of emotional depth was really exciting. It's hard to put into words the difference between a quick glimpse of humanity in animated eyes compared to the standard, dead appearance an avatar typically has. Similar to the concept of the "uncanny valley" it's less of a quantifiable distinction and more of a phenomenological confusion.
I think these are examples of what the kids call "cursed images".
That being said, although playing around with Deep Nostalgia was an interesting exercise, it did not fulfill my desire to add speech, facial mocap and lip-sync animation to my avatars. So I ventured elsewhere to find these features I could test out. Again, I wanted to find an external application before delving back into iClone.
In recent months, I've joined a few Facebook groups for iClone/3D animation creatives. The other day, one member of the group "Iclone free/ trade / sell / hire" posted what appeared to be an animation of an iClone character lipsyncing to the viral hit of days past "Numa Numa". At first I was purely confused as to how Numa Numa was still being referenced in 2021, but moreover I thought to myself "That's not a terrible lip-sync! How was this done?". The singing avatar video had a watermark in the edge of the clip reading "Avatarify". That's an extremely vague and unoriginal name for an avatar app, but I decided to look into it anyways. The Russian developed app Avatarify, on the Apple Store, seemed to be similar to some other nascent user-friendly deepfake/FaceSwap tools like ReFace.
Typically, these apps are similar to Deep Nostalgia in the sense they operate with a "plug & play" interface. You upload an image and then it animates it for you with a pre-built recording or a selection of clips to choose from. But Avatarify goes one step further - it lets you record the reference animation yourself; the video that "drives" the deepfake. In short, it can do very rudimentary motion capture on a single image. This was exciting. Even though I've come across similar apps in the past, 99% of them require Apple's TrueDepth camera for facial recognition (iPhone X or higher / iPad Pro 12 or higher). Avartarify works with any iPhone camera (I currently own an iPhone 8). I opted to enroll in a 7 day free trial of the app (which, after the fact, costs $46.99 CAD/year or $3.49 CAD/week). So I put it to the test by uploading my 3D animated avatar photos and recording facial capture of myself (just rambling, as you will see)....
First attempt with Avatarify. I didn't realize you had to keep your head completely still hence the neck distortion. But to be honest, some of the facial expression captures are not horrible!
Second attempt with new avatar and I tried to keep my head still this time. For context, when you're recording live, you can barely see the results - hence my confusion. It renders a lot cleaner than it captures.
Contrary to what I said while recording, I think this one is actually the best attempt. It's obviously still uncanny that the head moves and the body doesn't, but there is a good range of eye/facial expressions.
Overall, although I think this app does a pretty good job at facial motion capture (especially considering the fact it's doing so without a TrueDepth camera) the quality of the final render leaves something to be desired. Additionally, the head/neck distortion is quite distracting. But for a (somewhat) free app, I was impressed with the live recording results. It gives me hope that, yes indeed, 3D animated avatars are compatible with motion capture - and that eye/expression/lip-sync capture does add a realistic depth to the final outputs. Will I be using Avatarify in my MRP? No, absolutely not. But was it a worthwhile exercise in exploring different possibilities for animation outside of iClone? Yes. As long as I can cancel my 7-day free trial and the app didn't steal all of my personal data and credit card information. We shall see...
That moment when you find out your credit card information was stolen by the strange, Russian app you downloaded