Avatar Authoring, Stasia McGehee 1997

Design Considerations for Creating Avatars for a 3D Virtual Environment
Stasia McGehee 9/22/2001

1. The Creation of Avatars for OnLive! Technologies' Traveler - http://www.onlive.com

Briefly I will discuss the decision making process that resulted in a rather idiosyncratic approach to avatar design, and how our team at OnLive! adopted a "form follows function" approach, designing avatars and environments that were compelling, while meeting the strict requirements of the technology. By abandoning the literal paradigms of reality, such as full bodies and texture maps, the avatars function not as real-life representations of self, but rather as totems, powerful spirit beings that we can identify with, gaining strength and power from the association. Yet an avatar may also function as a mask, obscuring one's identity as well as augmenting it. As very stylized representations of self, the 200+ polygon avatars in Traveler, circa 1996, were also informed by the notion of totems, archetypes, alter egos, and ceremonial masks, relying upon the simultaneous magic of audio, real-time lip sync, and autonomous gestures, to provide the suspension of disbelief.

User Customization
I will also talk about the importance of user customization. Within a rather limited color palette we were able to offer a high degree of customization just by allowing users to color their avatars according to specially designed color regions, specified by the designer. Users can also squash and stretch their avatar; choose from a palette of four transitory emotions that can vary from avatar to avatar; and alter their voice. Although we thought that the ability to disguise one's voice would be a necessity, such a feature that obscured or hindered communication proved to undermine the whole purpose of the application. Ultimately, voice disguising was only limited to altering one's pitch, and even this was viewed as a novelty.

II. Design by Necessity - The Decision Making Process that lead to the OnLive! Severed Heads

I joined OnLive! Technologies in February 1994, responsible for avatar design and development. Our team began with the notion that we would create huge virtual communities with fully articulated walking and talking avatars. Feverishly we worked over Christmas, so that by January 95 we had an actual prototype of a textured avatar, walking, talking, and blinking through a heavily texture-mapped cyberspace, ready for CES (the Consumer Electronic Show) in Las Vegas, NV. From our hotel suite we listened excitedly through the walls as Geffen, Katzenburg, and Spielburg of DreamWorks fame ogled over our lone prototype in the adjacent room.

But the real test of our abilities occurred a few months later, when we realized that our voice-enabled creation took up the entire bandwidth of a Pentium 60. We had a community of One. Clearly our well wrought plan was undeployable. The day before an important board meeting, one of our founders called a mandatory meeting with the entire product development team, including about 15 engineers and designers. He admitted his worst fear - that the idea upon which our tiny company was based was not feasible. After an unproductive interlude, he left us, admonishing, "When I come back in an hour, I want you to give me a proposal that I can present to the board." Left to our own conjectures, one engineer flippantly advised, "Let’s just lop of the heads. Who needs bodies anyway?" When the founder returned he was none too enamored with the idea, but no other suggestions surfaced, so we asked the board to give us a few days to come up with a streamlined version of our current prototype. This was accomplished by eliminating all texture maps, and by stripping out the now extraneous body parts, so that, before long, we could accommodate up to five simultaneous users in a single room, each sporting a streamlined, untextured, Gauraud-shaded head.

And that’s how, after 14 months of R&D, we settled with an idea said in jest, the glib bi-product of a harried brainstorm session. But it was actually a wise decision. The elimination of the bodies also eliminated many opportunities for feature-creep, allowing us to focus on our core technology – a 3D spatialized audio product with avatars that boasted real-time lip sync and emotions. By avoiding extraneous detail, like body parts, we never set up false expectations of being able to manipulate objects in the universe, a feature we were not prepared to implement. Nor did we need to worry about integrating the emotional content of facial expressions with the ongoing gestures of the body.

Internal Debates
Needless to say, there was much dissent throughout the company, and we lost a few key folks over a decision that seemed absurd. There was already a schism in the company - between those who strictly wanted to develop a chat application, a harmonious environment that would facilitate communication; and those who dreamed of creating a vast multi-user gaming environment, where one could blow people up. By eliminating bodies, and the ability for users to manipulate their environments, we were postponing indefinitely the ability to tailor our application to game developers, a major source of disappointment for some. By emphasizing face to face contact, we were positioning ourselves almost exclusively as a 3D chat product, competing with the likes of AOL, or even with the phone company. But, despite internal opposition, a prototype was created; and a few company meetings later, our CEO spoke to us virtually from the comfort of her office - her silver head addressing us from a monitor set at the head of the table in the board room.

The Role of Gesture in a Disembodied Universe
Despite our intentions to focus entirely on lip sync and facial expressions, gesture and body language does play a crucial role within OnLive!’s 3D environments. Even with the absence of bodies, users developed a lexicon of motions to express levels of excitement and enthusiasm. Users quickly noted that certain combinations of keys allow one’s avatar to spin, roll, and twirl through space. Collision detection, augmented by the clonking sound of two skulls colliding, provides a game-like element, and is often employed by unruly members, to the annoyance of those more inclined to converse. Users also take care to position more engaging users within the center of their 90 degree field of view. Thus, body language and gesture seem to be an inherent aspect of 3D immersive environments, whether you provide for it or not.

III. Aesthetic Influences

As an Avatar Designer an important task was to provide the world with attractive representations of self within a 200 polygon limit. In my effort to tackle the dilemma of providing widely appealing content within a low poly budget, I drew upon various sources. In terms of content, I strove to provide a range of entities, from realistic (see Bruce’s Damer’s unintentional likeness) to more archetypal and mythological, like the Pharaoh, Vampire, Cyclops, and Pirate. I also noticed that animals tended to be popular, so I created a variety of them. Avatar creation became an iterative process, as the initial avatars generated user input which influenced further development.
In addressing the low-poly problem, I looked at the geometric patterns of African, Native American, or tribal art with their stylized representations. Although intended for a different audience with a different purpose, these artists were also faced with similar design constraints - representing complex facial features through woodcarving techniques that did not afford the luxury of minute refinements. In essence, they too created highly expressive low poly characters, using clean lines and flat shades of color.

For reference, The National Museum of African Art at the Smithsonian, in Washington D.C. proved to be an invaluable resource. In addition, the Smithsonian sells inexpensive cut-outs and coloring books of masks. I also studied the flat planes of Cubist and Italian Futurist Art, which also had its antecedents in indigenous art forms. The more I observed the design solutions offered by other forms of sculpture and painting, the easier it was to arrive at my own unique solutions.

Constructing Morph Targets for Speech
Most of the vertices tend to cluster around the mouth regions, so as to accommodate lip-syncing. As the artist, I had to provide the extreme mouth positions, and the speech engine interpolated between these positions. Later I also noticed that slight head movements could be incorporated into the speech positions, so that the avatar’s head could bob around as it spoke. An emotional palette was provided as well: happy, sad, angry, and mad. While the avatars idle, they randomly cycle through the emotional palette at about 1-2%, giving the impression that the avatar is breathing.

User Customization
One of the most compelling aspects of the OnLive! Avatars is the users’ ability to customize their avatars. Although we had grand schemes of creating fully customizable parameterized heads, for the first pass we merely allowed users to alter their colors and to do a global squash and stretch. In designing avatars with customizable color regions, I paid close attention to the orientation of each and every facet, conscious of how faces could be tailored into customizable patterns. Used in conjunction with a global squash and stretch, users came up with creations that I barely recognized. Thus, this small degree of customization keeps users occupied for hours, and allows ample variation within a fairly limited range of choices. Originally we also offered several variations of voice disguising, but this ended up not being nearly as important as we had surmised; ultimately we only allowed users to alter their pitch.

IV. Future Directions for Avatar Design

Higher Poly Avatars
Future directions for avatar design include creating higher poly avatars and adding textures for realism, possibly appealing to a business application, or for a one-on-one context. This was accomplished with some degree of success when Traveler was revived at Communities.com, 2000, with the addition of several new 800 poly avatars. Although more difficult to author, the more powerful CPU's accomodated these higher poly avatars with ease.

More Abstract Notions of Avatars
In our use of a 3D file format, we provide a very generalized way of representing the self that is not constrained to the anthropomorphic. In fact, an OnLive! Avatar can be an entirely abstract representation, whose autonomous eye blinks, and lip syncing behaviors, useful for communication, can instead be mapped to more abstract behaviors, for an entirely aesthetic effect.

User Customization
High on the list is User Customization. Currently the users can color their avatar, but they are limited to a 24-color palette. Giving them a 32-bit color picker could be a next step. We have always discussed the need for parameterized heads. This would presume a single head with a fixed topology that could be tweaked in an infinite number of ways. This could compensate for the fact that 3D authoring is a difficult skill; very few of our users have been able to make compelling 3D avatars on their own. But if they were given a head with selectable regions that could be controlled by various slider bars, they could author an anthropomorphic avatar without having to master the medium. In 2001, the proposed parameterized facial system under consideration during OnLive! has resurfaced under the guise of www.appeerance.com, after further development by Creative Director Steve DiPaola and 3D Graphics Engineer Dave Collins.

Increase the Emotional Range
Also we could increase the emotional range by providing Attitudes. Unlike our current emotions, attitudes would not be transitory, but could color the avatar’s whole personality. In addition to the ability to select a fixed attitude, we could also offer a palette of transitory emotions, more complex than our current selection of four.

More Convincing Lip Sync
The speech engine that drives the real-time lip sync was developed by Shankar Narayan over a period of several years while at Apple Computer. Shankar's goal was to create a speech engine that was so accurate, that one could read an avatar's lips. However, the OnLive! implementation of this speech engine was not so granular. For the short term, we could add another parameter to accommodate the "oo" or "w" sound. This could be calculated as a negative value of the widest lip position, so that a trained 3D artist would not have to be conscripted to re-author this morph target for existing content.

end