// transmission.log

Data Feed

> Intercepted signals from across the network — tech, engineering, and dispatches from the void.

1688 transmissions indexed — page 84 of 85

[ 2015 ]

11 entries
1661|blog.unity.com

IL2CPP internals: P/Invoke Wrappers

I’ve written a good bit of managed to native interop code in my days, but getting p/invoke declarations right in C# is still difficult, to say the least. Understanding what the runtime is doing to marshal my objects is even more of a mystery. Since IL2CPP does most of its marshaling in generated C++ code, we can see (and even debug!) its behavior, providing much better insight for troubleshooting and performance analysis.This post does not aim to provide general information about marshaling and native interop. That is a wide topic, too large for one post. The Unity documentation discusses how native plugins interact with Unity. Both Mono and Microsoft provide plenty of excellent information about p/invoke in general.As with all of the posts in this series, we will be exploring code that is subject to change and, in fact, is likely to change in a newer version of Unity. However, the concepts should remain the same. Please take everything discussed in this series as implementation details. We like to expose and discuss details like this when it is possible though!For this post, I’m using Unity 5.0.2p4 on OSX. I’ll build for the iOS platform, using an “Architecture” value of “Universal”. I’ve built my native code for this example in Xcode 6.3.2 as a static library for both ARMv7 and ARM64.The native code looks like this:The scripting code in Unity is again in the HelloWorld.cs file. It looks like this:Each of the method calls in this code are made into the native code shown above. We will look at the managed method declaration for each method as we see it later in the post.Since IL2CPP is already generating C++ code, why do we need marshaling from C# to C++ code at all? Although the generated C++ code is native code, the representation of types in C# differs from C++ in a number of cases, so the IL2CPP runtime must be able to convert back and forth from representations on both sides. The il2cpp.exe utility does this both for types and methods.In managed code, all types can be categorized as either blittable or non-blittable. Blittable types have the same representation in managed and native code (e.g. byte, int, float). Non-blittable types have a different representation in managed and native code (e.g. bool, string, array types). As such, blittable types can be passed to native code directly, but non-blittable types require some conversion before they can be passed to native code. Often this conversion involves new memory allocation.In order to tell the managed code compiler that a given method is implemented in native code, the extern keyword is used in C#. This keyword, along with a DllImport attribute, allows the managed code runtime to find the native method definition and call it. The il2cpp.exe utility generates a wrapper C++ method for each extern method. This wrapper performs a few important tasks:- It defines a typedef for the native method which is used to invoke the method via a function pointer.- It resolves the native method by name, getting a function pointer to that method.- It converts the arguments from their managed representation to their native representation (if necessary).- It calls the native method.- It converts the return value of the method from its native representation to its managed representation (if necessary).- In converts any out or ref arguments from from their native representation to their managed representation (if necessary).We’ll take a look at the generated wrapper methods for some extern method declarations next.The simplest kind of extern wrapper only deals with blittable types.First, note the typedef for the native function signature:Something similar will show up in each of the wrapper functions. This native function accepts a single int32_t and returns an int32_t.Next, the wrapper finds the proper function pointer and stores it in a static variable:Here the Increment function actually comes from an extern statement (in the C++ code):On iOS, native methods are statically linked into a single binary (indicated by the “__Internal” string in the DllImport attribute), so the IL2CPP runtime does nothing to look up the function pointer. Instead, this extern statement informs the linker to find the proper function at link time. On other platforms, the IL2CPP runtime may perform a lookup (if necessary) using a platform-specific API method to obtain this function pointer.Practically, this means that on iOS, an incorrect p/invoke signature in managed code will show up as a linker error in the generated code. The error will not occur at runtime. So all p/invoke signatures need to be correct, even with they are not used at runtime.Finally, the native method is called via the function pointer, and the return value is returned. Notice that the argument is passed to the native function by value, so any changes to its value in the native code will not be available in the managed code, as we would expect.Things get a little more exciting with a non-blittable type, like string. Recall from an earlier post that strings in IL2CPP are represented as an array of two-byte characters encoded via UTF-16, prefixed by a 4-byte length value. This representation does not match either the char* or wchar_t* representations of strings in C on iOS, so we have to do some conversion. If we look at the StringsMatch method (HelloWorld_StringsMatch_m4 in the generated code):We can see that each string argument will be converted to a char* (due to the UnmangedType.LPStr directive).The conversion looks like this (for the first argument):A new char buffer of the proper length is allocated, and the contents of the string are copied into the new buffer. Of course, after the native method is called we need to clean up those allocated buffers:So marshaling a non-blittable type like string can be costly.Simple types like int and string are nice, but what about a more complex, user defined type? Suppose we want to marshal the Vector structure above, which contains three float values. It turns out that a user defined type is blittable if and only if all of its fields are blittable. So we can call ComputeLength (HelloWorld_ComputeLength_m5 in the generated code) without any need to convert the argument:Notice that the argument is passed by value, just as it was for the initial example when the argument type was int. If we want to modify the instance of Vector and see those changes in managed code, we need to pass it by reference, as in the SetX method (HelloWorld_SetX_m6):Here the Vector argument is passed as a pointer to native code. The generated code goes through a bit of a rigmarole, but it is basically creating a local variable of the same type, copying the value of the argument to the local, then calling the native method with a pointer to that local variable. After the native function returns, the value in the local variable is copied back into the argument, and that value is available in the managed code then.A non-blittable user defined type, like the Boss type defined above can also be marshaled, but with a little more work. Each field of this type must be marshaled to its native representation. Also, the generated C++ code needs a representation of the managed type that matches the representation in the native code.Let’s take a look at the IsBossDead extern declaration:The wrapper for this method is named HelloWorld_IsBossDead_m7:The argument is passed to the wrapper function as type Boss_t2, which is the generated type for the Boss struct. Notice that it is passed to the native function with a different type: Boss_t2_marshaled. If we jump to the definition of this type, we can see that it matches the definition of the Boss struct in our C++ static library code:We again used the UnmanagedType.LPStr directive in C# to indicate that the string field should be marshaled as a char*. If you find yourself debugging a problem with a non-blittable user-defined type, it is very helpful to look at this _marshaled struct in the generated code. If the field layout does not match the native side, then a marshaling directive in managed code might be incorrect.The Boss_t2_marshal function is a generated function which marshals each field, and the Boss_t2_marshal_cleanup frees any memory allocated during that marshaling process.Finally, we will explore how arrays of blittable and non-blittable types are marshaled. The SumArrayElements method is passed an array of integers:This array is marshaled, but since the element type of the array (int) is blittable, the cost to marshal it is very small:The il2cpp_codegen_marshal_array function simply returns a pointer to the existing managed array memory, that’s it!However, marshaling an array of non-blittable types is much more expensive. The SumBossHealth method passes an array of Boss instances:It’s wrapper has to allocate a new array, then marshal each element individually:Of course all of these allocations are cleaned up after the native method call is completed as well.The IL2CPP scripting backend supports the same marshalling behaviors as the Mono scripting backend. Because IL2CPP produces generated wrappers for extern methods and types, it is possible to see the cost of managed to native interop calls. For blittable types, this cost is often not too bad, but non-blittable types can quickly make interop very expensive. As usual, we’ve just scratched the surface of marshaling in this post. Please explore the generated code more to see how marshaling is done for return values and out parameters, native function pointers and managed delegates, and user-defined reference types.Next time we will explore how IL2CPP integrates with the garbage collector.

>access_file_
1662|blog.unity.com

IL2CPP Internals: Generic sharing implementation

This is the fifth post in the IL2CPP Internals series. In the last post, we looked at how methods are called in the C++ code generated for the IL2CPP scripting backend. In this post, we will explore how they are implemented. Specifically, we will try to better understand one of the most important features of code generated with IL2CPP - generic sharing. Generic sharing allows many generic methods to share one common implementation. This leads to significant decreases in executable size for the IL2CPP scripting backend.Note that generic sharing is not a new idea, both Mono and .Net runtimes use generic sharing as well. Initially, IL2CPP did not perform generic sharing. Recent improvements have made it even more robust and beneficial. Since il2cpp.exe generates C++ code, we can see where the method implementations are shared.We will explore how generic method implementations are shared (or not) for reference types and value types. We will also investigate how generic parameter constraints affect generic sharing.Keep in mind that everything discussed in this series are implementation details. The topics and code discussed here are likely to change in the future. We like to expose and discuss details like this when it is possible though!What is generic sharing?Imagine you are writing the implementation for the List class in C#. Would that implementation depend on the type T is? Could you use the same implementation of the Add method for List and List? How about List?In fact, the power of generics is just that these C# implementations can be shared, and the generic class List will work for any T. But what happens when List is translated from C# to something executable, like assembly code (as Mono does) or C++ code (as IL2CPP does)? Can we still share the implementation of the Add method?Yes, we can share it most of the time. As we’ll discover in this post, the ability to share the implementation of a generic method depends almost entirely on the size of that type T. If T is any reference type (like string or object), then it will always be the size of a pointer. If T is a value type (like int or DateTime), its size may vary, and things get a bit more complex. The more method implementations which can be shared, the smaller the resulting executable code is.Mark Probst, the developer who implemented generic sharing Mono, has an excellent series of posts on how Mono performs generic sharing. We won’t go into that much depth about generic sharing here. Instead, we will see how and when IL2CPP performs generic sharing. Hopefully this information will help you better analyze and understand the executable size of your project.What is shared by IL2CPP?Currently, IL2CPP shares generic method implementations for a generic type SomeGenericType when T is:- Any reference type (e.g. string, object, or any user defined class)- Any integer or enum typeIL2CPP does not share generic method implementations when T is a value type because the size of each value type will differ (based on the size of its fields).Practically, this means that adding a new usage of SomeGenericType, where T is a reference type will have a minimal impact on the executable size. However, if T is a value type, the executable size will be impacted. This behavior is the same for both the Mono and IL2CPP scripting backends. If you want to know more, read on, it’s time to dig into some implementation details!The setupI’ll be using Unity 5.0.2p1 on Windows, and building for the WebGL platform. I’ve enabled the “Development Player” option in the build settings, and the “Enable Exceptions” option is set to a value of “None”. The script code for this post starts with a driver method to create instances of the generic types we will investigate:Next, we define the types used in this method:And all of code is nested in a class named HelloWorld derived from MonoBehaviour.If you view the command line for il2cpp.exe, note that it does not contain the --enable-generic-sharing option, as described in the first post in this series. However, generic sharing is still occurring. It is no longer optional, and happens in all cases now.Generic sharing for reference typesWe’ll start by looking at the most often occurring generic sharing case: reference types. Since all reference types in managed code derive from System.Object, all reference types in the generated C++ code derive from the Object_t type. All reference types can then be represented in C++ code using the type Object_t* as a placeholder. We’ll see why this is important in a moment.Let’s search for the generated version of the DemonstrateGenericSharing method. In my project it is named HelloWorld_DemonstrateGenericSharing_m4. We’re looking for the method definitions for the four methods in the GenericType class. Using Ctags, we can jump to the method declaration for the GenericType constructor, GenericType_1__ctor_m8. Note that this method declaration is actually a #define statement, mapping the method to another method, GenericType_1__ctor_m10447_gshared.Let’s jump back, back and then find the method declarations for the GenericType type. If we jump to the declaration of the constructor, GenericType_1__ctor_m9, we can see that it is also a #define statement, mapped to the same function, GenericType_1__ctor_m10447_gshared!If we jump to the definition of GenericType_1__ctor_m10447_gshared, we can see from the code comment on the method definition that this method corresponds to the managed method name HelloWorld/GenericType`1::.ctor(). This is the constructor for the GenericType type. This type is called the fully shared type, meaning that given a type GenericType, for any T that is a reference type, the implementation of all methods will use this version, where T is object.Look just below the constructor in the generated code, and you should see the C++ code for the UsesGenericParameter method:In both places where the generic parameter T is used (the return type and the type of the single managed argument), the generated code uses the Object_t* type. Since all reference types can be represented in the generated code by Object_t*, we can call this single method implementation for any T that is a reference type.In the second blog post in this series (about generated code), we mentioned that all method definitions are free functions in C++. The il2cpp.exe utility does not generate overridden methods in C# using C++ inheritance. However, il2cpp.exe does use C++ inheritance for types. If we search the generated code for the string “AnyClass_t” we can find the C++ representation of the C# type AnyClass:Since AnyClass_t1 derives from Object_t, we can pass a pointer to AnyClass_t1 as the argument to the GenericType_1_UsesGenericParameter_m10449_gshared function without problems.What about the return value though? We can’t return a pointer to a base class where a pointer to a derived class is expected, right? Take a look at the declaration of the GenericType::UsesGenericParameter method:The generated code is actually casting the return value (type Object_t*) to the derived type AnyClass_t1*. So here IL2CPP is lying to the C++ compiler to avoid the C++ type system. Since the C# compiler has already enforced that no code in UsesGenericParameter does anything unreasonable with type T, then IL2CPP is safe to lie to the C++ compiler here.Generic sharing with constraintsSuppose that we want to allow some methods to be called on an object of type T? Won’t the use of Object_t* prevent that, since we don’t have many methods on System.Object? Yes, this is correct. But we first need to express this idea to the C# compiler using generic constraints.Take a look again in the script code for this post at the type named InterfaceConstrainedGenericType. This generic type uses a where clause to require that it type T be derived from a given interface, AnswerFinderInterface. This allows the ComputeAnswer method to be called. Recall from the previous blog post about method invocation that calls on interface methods require a lookup in a vtable structure. Since the FindTheAnswer method will make a direct function call on the constrained instance of type T, then the C++ code can still use the fully shared method implementation, with the type T represented by Object_t*.If we start at the implementation of the HelloWorld_DemonstrateGenericSharing_m4 function, then jump to the definition of the InterfaceConstrainedGenericType_1__ctor_m11 function, we can see that this method is again a #define, mapping to the InterfaceConstrainedGenericType_1__ctor_m10456_gshared function. If we look just below that function for the implementation of the InterfaceConstrainedGenericType_1_FindTheAnswer_m10458_gshared function, we can see that indeed, this is the fully shared version of the function, taking an Object_t* argument. It calls the InterfaceFuncInvoker0::Invoke function to actually make the call to the managed ComputeAnswer method.This all hangs together in the generated C++ code code because IL2CPP treats all managed interfaces like System.Object. This is a useful rule of thumb to help understand the code generated by il2cpp.exe in other cases as well.Constraints with a base classIn addition to interface constraints, C# allows constraints to be a base class. IL2CPP does not treat all base classes like System.Object, so how does generic sharing work for base class constraints?Since base classes are always reference types, IL2CPP uses the fully shared version of the generic methods for these types. Any code which needs to use a field or call a method on the constrained type is performs a cast in C++ to the proper type. Again, here we rely on the C# compiler to correctly enforce the generic constraint, and we lie to the C++ compiler about the type.Generic sharing with value typesLet’s jump back now to the HelloWorld_DemonstrateGenericSharing_m4 function and look at the implementation for GenericType. The DateTime type is a value type, so GenericType is not shared. We can jump to the declaration of constructor for this type, GenericType_1__ctor_m10. There we see a #define, as in the other cases, but the #define maps to the GenericType_1__ctor_m10_gshared function, which is specific to the GenericType class, and is not used by any other class.Thinking about generic sharing conceptuallyThe implementation of generic sharing can be difficult to understand and follow. The problem space itself is fraught with pathological cases (e.g. the curiously recurring template pattern). It can help to think about a few concepts:- Every method implementation on a generic type is shared- Some generic types only share method implementations with themselves (e.g. generic types with a value type generic parameter, GenericType above)- Generic types with a reference type generic parameter are fully shared - they always use the implementation with System.Object for all type parameters.- Generic types with two or more type parameters can be partially shared if at least one of those type parameters is a reference type.The il2cpp.exe utility always generates the fully shared method implementations for any generic type. It generates other method implementations only when they are used.Sharing of generic methodsJust as method implementations on generic types can be shared, so can method implementation for generic methods. In the original script code, notice that the UsesDifferentGenericParameter method uses a different type parameter than the GenericType class. When we looked at the shared method implementations for the GenericType class, we did not see the UsesDifferentGenericParameter method. If I search the generated code for “UsesDifferentGenericParameter” I see that the implementation of this method is in the GenericMethods0.cpp file:Notice that this the fully shared version of the method implementation, accepting the type Object_t*. Although this method is in a generic type, the behavior would be the same for a generic method in a non-generic type as well. Effectively, il2cpp.exe attempts to always generate the least code possible for method implementations involving generic parameters.ConclusionGeneric sharing has been one of the most important improvements to the IL2CPP scripting backend since its initial release. It allows the generated C++ code to be as small as possible, sharing method implementations where they do not differ in behavior. As we look to continue to decrease binary size, we will work to take advantage of more opportunities to share method implementations.In the next post, we will explore how p/invoke wrappers are generated, and how types are marshaled from managed to native code. We will be able to see the cost of marshaling various types, and debug problems with marshaling code.

>access_file_
1663|blog.unity.com

Unity and Anime Studio Pro: The Making of Monster Mingle

When Chris O’Shea and the team at Cowly Owl had the idea for their recent game Monster Mingle, they realized that their vision requires really powerful animation tools. The team decided to use Anime Studio Pro on this project, because the new FBX export feature allowed them to bring the animations into Unity using rigged skinned meshes, rather than sprite based frames. Using animated files, Cowly Owl added expressive, fluid movement to the game. Chris shared with us his team’s Anime Studio and Unity workflow.Monster Mingle is Cowly Owl’s digital toy that lets children create their own monster, exploring a magical world full of creatures and surprises. It was created in part by Chris O’Shea with illustration & character design by Nick Stoney, animation by Wip Vernooij and sound by Resonate.First step of Cowly Owl’s workflow was rigging and animating the characters in Anime Studio Pro. Then they imported the FBX files into Unity. After that, they trimmed the animations in the import setting and added scripts to the animation timeline to control events and sounds. Where they could, they used Mesh Baker to convert the multiple meshes per model into one mesh and sprite sheet to cut down draw calls. Chris and Wip also used a double sided unlit shader on the models, so that they could be flipped in the game.Screenshot of a monster in Anime Studio side by side with Unity:For the main character, the type of legs on the monster affects the movement animation of the body. By using Anime Studio, Wip was able to animate all of the leg walk cycles with all of the bodies attached. In the game, you can change legs and bodies, so the character build controller code switches meshes on and off depending on the part chosen. Custom attachment code added further parts and animation to the body bones, attaching eyes, mouths, wings and horns. He also used Mecanim to create a state machine for controlling all of the animations.Both Chris and Wip said that Anime Studio Pro helped them to achieve the look and feel that they were going for with Monster Mingle and that it was an invaluable tool in their game development pipeline. Because of the flexible integration with Unity, they’d recommend it for any game developer’s toolkit.Learn more about how Monster Mingle was made in this ‘making of’ video:Want to try Monster Mingle? Get it here.About Anime Studio ProAnime Studio Pro is a powerful animation tool. Aside from game development, it has been used in animated shorts, TV commercials and full length films. It was recently used in the Oscar nominated animated feature film, Song of the Sea, created by Cartoon Saloon™.

>access_file_
1664|blog.unity.com

IL2CPP internals: Method calls

This is the fourth blog post in the IL2CPP Internals series. In this post, we will look at how il2cpp.exe generates C++ code for method calls in managed code. Specifically, we will investigate six different types of method calls:- Direct calls on instance and static methods- Calls via a compile-time delegate- Calls via a virtual method- Calls via an interface method- Calls via a run-time delegate- Calls via reflectionIn each case, we will focus on what the generated C++ code is doing and, specifically, on how much those instructions will cost.As with all of the posts in this series, we will be exploring code that is subject to change and, in fact, is likely to change in a newer version of Unity. However, the concepts should remain the same. Please take everything discussed in this series as implementation details. We like to expose and discuss details like this when it is possible though!SetupI’ll be using Unity version 5.0.1p4. I’ll run the editor on Windows, and build for the WebGL platform. I’m building with the “Development Player” option enabled, and the “Enable Exceptions” option set to a value of “Full”.I’ll build with a single script file, modified from the last post so that we can see the different types of method calls. The script starts with an interface and class definition:Then we have a constant field and a delegate type, both used later in the code:Finally, these are the methods we are interested in exploring (plus the obligatory Start method, which has no content here):With all that defined, let’s get started. Recall that the generated C++ code will be located in the Temp\StagingArea\Data\il2cppOutput directory in the project (as long as the editor remains open). And don’t forget to generate Ctags on the generated code, to help navigate it.Calling a method directlyThe simplest (and fastest, as we will see) way to call a method, is to call it directly. Here is the generated code for the CallDirectly method:The last line is the actual method call. Note that it does nothing special, just calls a free function defined in the C++ code. Recall from the earlier post about generated code, that il2cpp.exe generates all methods as C++ free functions. The IL2CPP scripting backend does not use C++ member functions or virtual functions for any generated code. It follows then that calling a static method directory should be similar. Here is the generated code from the CallStaticMethodDirectly method:We could say there is less overhead calling a static method, since we don’t need to create and initialize an object instance. However, the method call itself is exactly the same, a call to a C++ free function. The only difference here is that the first argument is always passed with a value of NULL.Since the difference between calls to static and instance methods is so minimal, we’ll focus on instance methods only for the rest of this post, but the information applies to static methods as well.Calling a method via a compile-time delegateWhat happens with a slightly more exotic method call, like an indirect call via delegate? We’ll first look at a what I’ll call a compile-time delegate, meaning that we know at compile time which method will be called on which object instance. The code for this type of call is in the CallViaDelegate method. It looks like this in the generated code:I’ve added a few comments to indicate the different parts of the generated code.Note that the actual method called here is not part of the generated code. The method VirtFuncInvoker1::Invoke is located in the GeneratedVirtualInvokers.h file. This file is generated by il2cpp.exe, but it doesn’t come from any IL code. Instead, il2cpp.exe creates this file based on the usage of virtual functions that return a value (VirtFuncInvokerN) and those that don’t (VirtActionInvokerN), where N is the number of arguments to the method.The Invoke method here looks like this:The call into libil2cpp GetVirtualInvokeData looks up a virtual method in the vtable struct generated based on the managed code, then it makes a call to that method.Why don’t we used C++11 variadic templates to implement these VirtFuncInvokerN methods? This looks like a situation begging for variadic templates, and indeed it is. However, the C++ code generated by il2cpp.exe has to work with some C++ compilers which don’t yet support all C++ 11 features, including variadic templates. In this case at least, we did not think that forking the generated code for C++11 compilers was worth the additional complexity.But why is this a virtual method call? Aren’t we calling an instance method in the C# code? Recall that we are calling the instance method via a C# delegate. Look again at the generated code above. The actual method we are going to call is passed in via the MethodInfo* (method metadata) argument: ImportantMethodDelegate_Invoke_m5_MethodInfo. If we search for the method named "ImportantMethodDelegate_Invoke_m5" in the generated code, we see that the call is actually to the managed Invoke method on the ImportantMethodDelegate type. This is a virtual method, so we need to make a virtual call. It is this ImportantMethodDelegate_Invoke_m5 function which will actually make the call to the method named Method in the C# code.Wow, that was certainly a mouth-full. By making what looks like a simple change to the C# code, we’ve now gone from a single call to a C++ free function to multiple function calls, plus a table lookup. Calling a method via a delegate is significantly more costly than calling the same method directly.Note that in the process of looking at a delegate method call, we’ve also seen how a call via a virtual method works.Calling a method via an interfaceIt’s also possible to call a method in C# via an interface. This call is implemented by il2cpp.exe similar to a virtual method call:Note the actual method call here is done via the InterfaceFuncInvoker1::Invoke function, which is in the GeneratedInterfaceInvokers.h file. Like the VirtFuncInvoker1 class the InterfaceFuncInvoker1 class does a lookup in a vtable via the il2cpp::vm::Runtime::GetInterfaceInvokeData function in libil2cpp.Why does an interface method call need to use a different API in libil2cpp from a virtual method call? Note that the call to InterfaceFuncInvoker1::Invoke is passing not only the method to call and its arguments, but also the interface to call that method on (L_1 in this case). The vtable for each type is stored so that interface methods are written at a specific offset. Therefore, il2cpp.exe needs to provide the interface in order to determine which method to call.The bottom line here is that calling a virtual method and calling a method via an interface have effectively the same overhead in IL2CPP.Calling a method via a run-time delegateAnother way to use a delegate is to create it at runtime via the Delegate.CreateDelegate method. This approach is similar to a compile-time delegate, except that it be modified at runtime in a few more ways. We pay for that flexibility with an additional function call. Here is the generated code:This delegate requires a good bit of code for creation and initialization. But the method call itself has even more overhead, too. First we need to create an array to hold the method arguments, then call the DynamicInvoke method on the Delegate instance. If we follow that method in the generated code, we can see that it calls the VirtFuncInvoker1::Invoke function, just as the compile-time delegate does. So this delegate requires one more function call then the compile-time delegate does, plus two lookups in a vtable, instead of just one.Calling a method via reflectionThe most costly way to call a method is, not surprisingly, via reflection. Let’s look at the generated code for the CallViaReflection method:As in the case of the runtime delegate, we need to spend some time creating an array for the arguments to the method. Then we make a virtual method call to MethodBase::Invoke (the MethodBase_Invoke_m24 function). This function in turn invokes another virtual function, before we finally get to the actual method call!ConclusionWhile this is no substitute for actual profiling and measurement, we can get some insight about the overhead of any given method invocation by looking at how the generated C++ code is used for different types of method calls. Specifically, it is clear that we want to avoid calls via run-time delegates and reflection, if at all possible. As always, the best advice about making performance improvements is to measure early and often with profiling tools.We’re always looking for ways to optimize the code generated by il2cpp.exe, so it is likely that these method calls will look different in a later version of Unity.Next time we’ll delve deeper in to method implementations and see how we share the implementation of generic methods to minimize generated code and executable size.

>access_file_
1665|blog.unity.com

Atmospheric Scattering in The Blacksmith

Early in the planning phase of The Blacksmith, we knew we wanted an atmospheric scattering solution that would give us a little bit more detail and control than the built-in fog options. In particular, we wanted to emphasize the aerial perspective effect in some of the more expansive shots in the movie.As we started working towards a scattering solution for the project, we initially implemented and played around with the simulation models presented in several papers from Tomoyuki Nishita[1]. After some experimentation and prototyping of different shots, we eventually decided that we would be better off aiming for a model that allowed extensive artistic control for each of the shots in the short film. We wanted a solution that would allow us to get close to the primary elements of the physical models, but that also allowed us to break any and all rules when required. We also needed the solution to not have a huge impact on the runtime performance of the short film, and set aim to be able to do most of the calculations per-vertex as opposed to per-pixel.We set a goal of trying to emulate the combined effects of Rayleigh and Mie scattering from the physical models. We also added a third element representing various types of low-altitude scattering effects; collectively named height scattering. Another key divergence from the physics based models was that we decided to keep using HDR sky textures, as opposed to procedurally generating the sky and clouds. The obvious downside to this is that setting up something like dynamic time of day (which we didn’t need for The Blacksmith) becomes a bit more complicated, whereas the primary advantage is retaining full artistic control over the sky.Rayleigh Scattering Rayleigh scattering of sunlight in the atmosphere is the reason for the bright blue hue of the daytime sky, and the reddening of the sun and horizon at sunrise and sunset.In our emulation, we omit the sun itself, and focus just on modelling the colors and extinction produced by the sunlight’s in- and out-scattering. A visual representation of the sun can be added either in the sky texture, as part of Mie scattering, as a sun flare sprite, or any combination of these. At its simplest core, the density of our rayleigh scattering boils down to a glorified exponential function modulated by the Rayleigh phase function. However, we have some additional control over the data that gets put into it, and the data we extract out from it. Since we don’t model light of different wavelengths travelling through the atmosphere, the densities we calculate are scalar values. We use an HDR color ramp to allow for different hues of in-scattered light at horizon and towards zenith, and use a distance aware function for composing the final hue.Mie ScatteringMie scattering of sunlight in the atmosphere contributes to the bright halo around the sun, the grey-white appearance of clouds, and the haze that can be seen over polluted cities. As opposed to Rayleigh, which scatters light in an almost uniform shape, Mie scattering is strongly forward directional.In our emulation, we let Mie scattering primarily represent the haze and halo around the sun. As such, we almost always tint it to compensate for the fact that our Rayleigh emulation ignores the sun. Technically, our Rayleigh and Mie functions are very similar, with the significant difference being the phase function that is applied to the output. Like many other implementations, we use the Henyey-Greenstein scattering function for controlling the anisotropy – or forward directionality – of the Mie scattering.People who have read the research papers might scoff at our choice of names, given that we take certain ‘liberties’ in what we include in each of the emulations. We found early on that people generally used the name Rayleigh when describing ‘sky scattering’ and Mie when describing ‘sun haze’, so we decided to just keep rolling with those names even after the implementation models were simplified from the physical models.Height ScatteringThe height scattering element represents a mish-mash of various low-altitude scattering effects, including radiation fog, ground haze, and low-lying clouds.Our implementation of height scattering is fairly straightforward; height density is calculated from a defined sea level and height falloff. This then scales the distance-based exponential density, and the whole thing is tinted to the desired color.Scatter OcclusionSince our scattering contribution is primarily caused by sunlight scattering towards the observer, away from the observer, or being absorbed by particles on its way to the observer, it makes sense that something should be happening if objects are blocking the sun’s light.To handle such cases, we ray-march through the directional light’s cascaded shadow map and accumulate the amount of occlusion along the ray in a downscaled, off-screen buffer. When applying the scattering to the output pixel, we upsample this occlusion map with an edge-aware filter, and use it in composing the final color for the pixel. This combining stage is where we get into a little bit of trouble; since our solution is single-scattering only, we can’t just go masking out all in-scattered light, as that would leave us with a very dark and unnatural image. We also didn’t want to expand the solution to handle the more complex and expensive multiple-scattering. In the end, the solution for us was to invent an ‘indirect factor’ where you could just explicitly designate a certain percentage of scattering to be treated as it were indirect instead of direct.Putting it all togetherAll that remains now, is to combine the different elements to compose the final image. Adding together the Rayleigh, Mie and Height elements gets us started with a nice composition of the different scattering colors.Next, we need to make sure we put that occlusion buffer to good use. We use different strength parameters for tweaking the amount of occlusion applied to direct, indirect, cloud and sky scattering.Finally, the only thing that remains is to mix the scattering with the rendered image. We darken the transmitted image by the total accumulated extinction, and lighten it by the total accumulated in-scattering. This yields the final composition for our example scenes.We’ve extracted the atmospheric scattering to a separate project which you can get from the Asset Store. In addition to all the code and shaders making up the solution, the project also contains all configuration presets used to generate the images in this post. Don’t forget to check the included readme for details about what the different configuration options mean.References: [1]: Display of The Earth Taking into account Atmospheric Scattering https://nishitalab.org/user/nis/cdrom/sig93_nis.pdf[1]: Display Method of the Sky Color Taking into Account Multiple Scattering https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.75.5595&rep=rep1&type=pdf[1]: Display of Clouds Taking into Account Multiple Anisotropic Scattering and Sky Light https://www.researchgate.net/publication/220720838_Display_of_Clouds_Taking_into_Account_Multiple_Anisotropic_Scattering_and_Sky_LightThe HDR sky in the package is from NoEmotionHDRs (Peter Sanitra) / CC BY-ND 4.0. Used without modification.

>access_file_
1666|blog.unity.com

Wrinkle Maps in The Blacksmith

When planning The Blacksmith short film, we never really prioritized a custom skin shader high enough for it to have any realistic chance of being picked up as a task. However, we still wanted to see if there was something simple we could do to add a little extra life to the Challenger’s expressions. After a quick brainstorming, we decided to have a go at adding blendshape-driven wrinkle maps to the project.To add detail and depth to the expressions, we decided that the Standard shader would give us the best bang for our bucks if we let the wrinkles affect both normals and occlusion. We also wanted a method of restricting the influence of certain expressions to specific parts of the face.Enter Wrinkle Maps DriverWe created a component that allowed the animator to define the wrinkle layers, one layer per blendshape in the mesh. The layer definitions contained texture mappings and strength modifiers, as well as a set of masking weights that would be matched against a face part masking texture. Using the masking weights, specific wrinkle layers could affect one-to-four of the masked face parts, each with a different influence.Since we wanted to be able to blend up to four different expressions at any given time, the blending alone required 11 texture samplers with all bells and whistles enabled (two base textures, eight detail textures and one masking texture). The only realistic option for this was to compose the blended wrinkle maps in an off-screen pre-render pass. We found that the ARGB2101010 render texture format was perfect for us, as it would allow us to pack normals into two of the 10-bit channels, with the remaining one receiving the occlusion. Each frame, the wrinkle map component would find the four most influential blendshapes, and assign layer rendering weights accordingly.Once we had all the wrinkle data composed in screen-space, the only remaining thing to do was to redirect the normal and occlusion data inputs in the Standard shader we were using for face rendering. In practice, this just meant adding a handful of lines to the surface shader main function.Final ResultsComparing the base head to the – exaggerated – angry blendshape at full weight illustrates the additional detail added in by the blended wrinkle maps:We also added various debug output modes that allowed us to easily visualize the fully blended occlusion and normal maps. These were quite useful in figuring out exactly which component contributed to what in the final result.We’ve broken this feature out into an example project which you can get from the Asset Store. It’s basically just the Challenger’s head with a couple of the expressions we used in The Blacksmith, but should serve as a useful starting point for getting this system running in your own projects.

>access_file_
1667|blog.unity.com

IL2CPP internals: Debugging tips for generated code

This is the third blog post in the IL2CPP Internals series. In this post, we will explore some tips which make debugging C++ code generated by IL2CPP a little bit easier. We will see how to set breakpoints, view the content of strings and user defined types and determine where exceptions occur.As we get into this, consider that we are debugging generated C++ code created from .NET IL code. So debugging it will likely not be the most pleasant experience. However, with a few of these tips, it is possible to gain meaningful insight into how the code for a Unity project executes on the actual target device (we’ll talk a little bit about debugging managed code at the end of the post).Also, be prepared for the generated code in your project to differ from this code. With each new version of Unity, we are looking for ways to make the generated code better, faster and smaller.The setupFor this post, I’m using Unity 5.0.1p3 on OSX. I’ll use the same example project as in the post about generated code, but this time I’ll build for the iOS target using the IL2CPP scripting backend. As I did in the previous post, I’ll build with the “Development Player” option selected, so that il2cpp.exe will generate C++ code with type and method names based on the names in the IL code.After Unity is finished generating the Xcode project, I can open it in Xcode (I have version 6.3.1, but any recent version should work), choose my target device (an iPad Mini 3, but any iOS device should work) and build the project in Xcode.Setting breakpointsBefore running the project, I’ll first set a breakpoint at the top of the Start method in the HelloWorld class. As we saw in the previous post, the name of this method in the generated C++ code is HelloWorld_Start_m3. We can use Cmd+Shift+O and start typing the name of this method to find in in Xcode, then set a breakpoint in it.We can also choose Debug > Breakpoints > Create Symbolic Breakpoint in XCode, and set it to break at this method.Now when I run the Xcode project, I immediately see it break at the start of the method.We can set breakpoints on other methods in the generated code like this if we know the name of the method. We can also set breakpoints in Xcode at a specific line in one of the generated code files. In fact, all of the generated files are part of the Xcode project. You will find them in the Project Navigator in the Classes/Native directory.Viewing stringsThere are two ways to view the representation of an IL2CPP string in Xcode. We can view the memory of a string directly, or we can call one of the string utilities in libil2cpp to convert the string to a std::string, which Xcode can display. Let’s look at the value of the string named _stringLiteral1 (spoiler alert: its contents are "Hello, IL2CPP!").In the generated code with Ctags built (or using Cmd+Ctrl+J in Xcode), we can jump to the definition of _stringLiteral1 and see that its type is Il2CppString_14:In fact, all strings in IL2CPP are represented like this. You can find the definition of Il2CppString in the object-internals.h header file. These strings include the standard header part of any managed type in IL2CPP, Il2CppObject (which is accessed via the Il2CppDataSegmentString typedef), followed by a four byte length, then an array of two bytes characters. Strings defined at compile time, like _stringLiteral1 end up with a fixed-length chars array, whereas strings created at runtime have an allocated array. The characters in the string are encoded as UTF-16.If we add _stringLiteral1 to the watch window in Xcode, we can select the View Memory of “_stringLiteral1” option to see the layout of the string in memory.Then in the memory viewer, we can see this:The header member of the string is 16 bytes, so after we skip past that, we can see that the four bytes for the size have a value of 0x000E (14). The next byte after the length is the first character of the string, 0x0048 (‘H’). Since each character is two bytes wide, but in this string all of the characters fit in only one byte, Xcode displays them on the right with dots in between each character. Still, the content of the string is clearly visible. This method of viewing string does work, but it is a bit difficult for more complex strings.We can also view the content of a string from the lldb prompt in Xcode. The utils/StringUtils.h header gives us the interface for some string utilities in libil2cpp that we can use. Specifically, let’s call the Utf16ToUtf8 method from the lldb prompt. Its interface looks like this:We can pass the chars member of the C++ structure to this method, and it will return a UTF-8 encoded std::string. Then, at the lldb prompt, if we use the p command, we can print the content of the string. Viewing user defined typesWe can also view the contents of a user defined type. In the simple script code in this project, we have created a C# type named Important with a field named InstanceIdentifier. If I set a breakpoint just after we create the second instance of the Important type in the script, I can see that the generated code has set InstanceIdentifier to a value of 1, as expected.So viewing the contents of user defined types in generated code is done that same way as you normally would in C++ code in Xcode.Breaking on exceptions in generated codeOften I find myself debugging generated code to try to track down the cause of a bug. In many cases these bugs are manifested as managed exceptions. As we discussed in the last post, IL2CPP uses C++ exceptions to implement managed exceptions, so we can break when a managed exception occurs in Xcode in a few ways.The easiest way to break when a managed exception is thrown is to set a breakpoint on the il2cpp_codegen_raise_exception function, which is used by il2cpp.exe any place where a managed exception is explicitly thrown.If I then let the project run, Xcode will break when the code in Start throws an InvalidOperationException exception. This is a place where viewing string content can be very useful. If I dig into the members of the ex argument, I can see that it has a ___message_2 member, which is a string representing the message of the exception.With a little bit of fiddling, we can print the value of this string and see what the problem is: Note that the string here has the same layout as above, but the names of the generated fields are slightly different. The chars field is named ___start_char_1 and its type is uint16_t, not uint16_t[]. It is still the first character of an array though, so we can pass its address to the conversion function, and we find that the message in this exception is rather comforting.But not all managed exceptions are explicitly thrown by generated code. The libil2cpp runtime code will throw managed exceptions in some cases, and it does not call il2cpp_codegen_raise_exception to do so. How can we catch these exceptions?If we use Debug > Breakpoints > Create Exception Breakpoint in Xcode, then edit the breakpoint, we can choose C++ exceptions and break when an exception of type Il2CppExceptionWrapper is thrown. Since this C++ type is used to wrap all managed exceptions, it will allow us to catch all managed exceptions.Let’s prove this works by adding the following two lines of code to the top of the Start method in our script:The second line here will cause a NullReferenceException to be thrown. If we run this code in Xcode with the exception breakpoint set, we’ll see that Xcode will indeed break when the exception is thrown. However, the breakpoint is in code in libil2cpp, so all we see is assembly code. If we take a look at the call stack, we can see that we need to move up a few frames to the NullCheck method, which is injected by il2cpp.exe into the generated code.From there, we can move back up one more frame, and see that our instance of the Important type does indeed have a value of NULL.ConclusionAfter discussing a few tips for debugging generated code, I hope that you have a better understanding about how to track down possible problems using the C++ code generated by IL2CPP. I encourage you to investigate the layout of other types used by IL2CPP to learn more about how to debug the generated code.Where is the IL2CPP managed code debugger though? Shouldn’t we be able to debug managed code running via the IL2CPP scripting backend on a device? In fact, this is possible. We have an internal, alpha-quality managed code debugger for IL2CPP now. It’s not ready for release yet, but it is on our roadmap, so stay tuned.The next post in this series will investigate the different ways the IL2CPP scripting backend implements various types of method invocations present in managed code. We will look at the runtime cost of each type of method invocation.

>access_file_
1668|blog.unity.com

IL2CPP internals: A tour of generated code

This is the second blog post in the IL2CPP Internals series. In this post, we will investigate the C++ code generated by il2cpp.exe. Along the way, we will see how managed types are represented in native code, take a look at runtime checks used to support the .NET virtual machine, see how loops are generated and more!We will get into some very version-specific code that is certainly going to change in later versions of Unity. Still, the concepts will remain the same.Example projectI’ll use the latest version of Unity available, 5.0.1p1, for this example. As in the first post in this series, I’ll start with an empty project and add one script file. This time, it has the following contents:I’ll build this project for WebGL, running the Unity editor on Windows. I’ve selected the Development Player option in the Build Settings, so that we can get relatively nice names in the generated C++ code. I’ve also set the Enable Exceptions option in the WebGL Player Settings to Full.Overview of the generated codeAfter the WebGL build is complete, the generated C++ code is available in the Temp\StagingArea\Data\il2cppOutput directory in my project directory. Once the editor is closed, this directory will be deleted. As long as the editor is open though, this directory will remain unchanged, so we can inspect it.The il2cpp.exe utility generated a number of files, even for this small project. I see 4625 header files and 89 C++ source code files. To get a handle on all of this code, I like to use a text editor which works with Exuberant CTags. CTags will usually generate a tags file quickly for this code, which makes it easier to navigate.Initially, you can see that many of the generated C++ files are not from the simple script code, but instead are the converted version of the code in the standard libraries, like mscorlib.dll. As mentioned in the first post in this series, the IL2CPP scripting backend uses the same standard library code as the Mono scripting backend. Note that we convert the code in mscorlib.dll and other standard library assemblies each time il2cpp.exe runs. This might seem unnecessary, since that code does not change.However, the IL2CPP scripting backend always uses byte code stripping to decrease the executable size. So even small changes in the script code can cause many different parts of the standard library code to be used or not, depending on the situation. Therefore, we need to convert the mscorlib.dll assembly each time. We are researching better ways to do incremental builds, but we don’t have any good solutions yet.How managed code maps to generated C++ codeFor each type in the managed code, il2cpp.exe will generate one header file for the C++ definition of the type and another header file for the method declarations for the type. For example, let’s look at the contents of the converted UnityEngine.Vector3 type. The header file for the type is named UnityEngine_UnityEngine_Vector3.h. The name is created based on the name of the assembly, UnityEngine.dll followed by the namespace and name of the type. The code looks like this:The il2cpp.exe utility has converted each of the three instance fields, and done a little bit of name mangling to avoid conflicts and reserved words. By using leading underscores, we are using some reserved names in C++, but so far we’ve not seen any conflicts with C++ standard library code.The UnityEngine_UnityEngine_Vector3MethodDeclarations.h file contains the method declarations for all of the methods in Vector3. For example, Vector3 overrides the Object.ToString method:Note the comment, which indicates the managed method this native declaration represents. I often find it useful to search the files in the output for the name of the managed method in this format, especially for methods with common names, like ToString.Notice a few interesting things about all methods converted by il2cpp.exe:- These are not member functions in C++. All methods are free functions, where the first argument is the "this" pointer. For static functions in managed code, IL2CPP always passes a value of NULL for this first argument. By always declaring methods with the "this" pointer as the first argument, we simplify the method generation code in il2cpp.exe and we make invoking methods via other methods (like delegates) simpler for generated code.- Every method has an additional argument of type MethodInfo* which includes the metadata about the method that is used for things like virtual method invocation. The Mono scripting backend uses platform-specific trampolines to pass this metadata. For IL2CPP, we’ve decided to avoid the use of trampolines to aid in portability.- All methods are declared extern “C” so that il2cpp.exe can sometimes lie to the C++ compiler and treat all methods as if they had the same type.- Types are named with a “_t” suffix. Methods are named with a “_m” suffix. Naming conflicts are resolved by appended an unique number to each name. These numbers will change if anything in the user script code changes, so you cannot depend on them from build to build.The first two points imply that every method has at least two parameters, the "this" pointer and the MethodInfo pointer. Do these extra parameters cause unnecessary overhead? While they clearly do add overhead, we haven’t seen anything so far which suggests that those extra arguments cause performance problems. Although it may seem that they would, profiling has shown that the difference in performance is not measurable.We can jump to the definition of this ToString method using Ctags. It is in the Bulk_UnityEngine_0.cpp file. The code in that method definition doesn’t look too much like the C# code in the Vector3::ToString() method. However, if you use a tool like ILSpy to reflect the code for the Vector3::ToString() method, you’ll see that the generated C++ code looks very similar to the IL code.Why doesn’t il2cpp.exe generate a separate C++ file for the method definitions for each type, as it does for the method declarations? This Bulk_UnityEngine_0.cpp file is pretty large, 20,481 lines actually! We found the C++ compilers we were using had trouble with a large number of source files. Compiling four thousand .cpp files took much longer than compiling the same source code in 80 .cpp files. So il2cpp.exe batches the methods definitions for types into groups and generates one C++ file per group.Now jump back to the method declarations header file and notice this line near the top of the file:The il2cpp-codegen.h file contains the interface which generated code uses to access the libil2cpp runtime services. We’ll discuss some ways that the runtime is used by generated code later.Method prologuesLet’s take a look at the definition of the Vector3::ToString() method. Specifically, it has a common prologue that is emitted in all methods by il2cpp.exe.The first line of this prologue creates a local variable of type StackTraceSentry. This variable is used to track the managed call stack, so that IL2CPP can report it in calls like Environment.StackTrace. Code generation of this entry is actually optional, and is enabled in this case by the --enable-stacktrace option passed to il2cpp.exe (since I set Enable Exceptions option in the WebGL Player Settings to Full). For small functions, we found that the overhead of this variable has a negative impact on performance. So for iOS and other platforms where we can use platform-specific stack trace information, we never emit this line into generated code. For WebGL, we don’t have platform-specific stack trace support, so it is necessary to allow managed code exceptions to work properly.The second part of the prologue does lazy initialization of type metadata for any array or generic types used in the method body. So the name ObjectU5BU5D_t4 is the name of the type System.Object[]. This part of the prologue is only executed once and often does nothing if the type was already initialized elsewhere, so we have not seen any adverse performance implications from this generated code.Is this code thread safe though? What if two threads call Vector3::ToString() at the same time? Actually, this code is not problematic, since all of the code in the libil2cpp runtime used for type initialization is safe to call from multiple threads. It is possible (maybe even likely) that il2cpp_codegen_class_from_type function will be called more than once, but the actual work it does will only occur once, on one thread. Method execution won’t continue until that initialization is complete. So this method prologue is thread safe.Runtime checksThe next part of the method creates an object array, stores the value of the x field of Vector3 in a local, then boxes the local and adds it to the array at index zero. Here is the generated C++ code (with some annotations):The three runtime checks are not present in the IL code, but are instead injected by il2cpp.exe.- The NullCheck code will throw a NullReferenceException if the value of the array is null.- The IL2CPP_ARRAY_BOUNDS_CHECK code will throw an IndexOutOfRangeException if the array index is not correct.- The ArrayElementTypeCheck code will thrown an ArrayTypeMismatchException if the type of the element being added to the array is not correct.These three runtime checks are all guarantees provided by the .NET virtual machine. Rather than injecting code, the Mono scripting backend uses platform specific signaling mechanism to handle these same runtime checks. For IL2CPP, we wanted to be more platform agnostic and support platforms like WebGL, where there is no platform-specific signaling mechanism, so il2cpp.exe injects these checks.Do these runtime checks cause performance problems though? In most cases, we’ve not seen any adverse impact on performance and they provide the benefits and safety which are required by the .NET virtual machine. In a few specific cases though, we are seeing these checks lead to degraded performance, especially in tight loops. We’re working on a way now to allow managed code to be annotated to remove these runtime checks when il2cpp.exe generates C++ code. Stay tuned on this one.Static FieldsNow that we’ve seen how instance fields look (in the Vector3 type), let’s see how static fields are converted and accessed. Find the definition of the HelloWorld_Start_m3 method, which is in the Bulk_Assembly-CSharp_0.cpp file in my build. From there, jump to the Important_t1 type (in theAssemblyU2DCSharp_HelloWorld_Important.h file):Notice that il2cpp.exe has generated a separate C++ struct to hold the static field for this type, since the static field is shared between all instances of this type. So at runtime, there will be one instance of the Important_t1_StaticFields type created, and all of the instances of the Important_t1 type will share that instance of the static fields type. In generated code, the static field is accessed like this:The type metadata for Important_t1 holds a pointer to the single instance of the Important_t1_StaticFields type, and that instance is used to obtain the value of the static field.ExceptionsManaged exceptions are converted by il2cpp.exe to C++ exceptions. We have chosen this path to again avoid platform-specific solutions. When il2cpp.exe needs to emit code to raise a managed exception, it calls the il2cpp_codegen_raise_exception function.The code in our HelloWorld_Start_m3 method to throw and catch a managed exception looks like this:All managed exceptions are wrapped in the C++ Il2CppExceptionWrapper type. When the generated code catches an exception of that type, it unpacks the C++ representation of the managed exception (which has type Exception_t8). In this case, we’re looking only for a InvalidOperationException, so if we don’t find an exception of that type, a copy of the C++ exception is thrown again. If we do find the correct type, the code jumps to the implementation of the catch handler, and writes out the exception message.Goto!?!This code brings up an interesting point. What are those labels and goto statements doing in there? These constructs are not necessary in structured programming! However, IL does not have structured programming concepts like loops and if/then statements. Since it is lower-level, il2cpp.exe follows lower-level concepts in generated code.For example, let’s look at the for loop in the HelloWorld_Start_m3 method:Here the V_2 variable is the loop index. Is starts off with a value of 0, then is incremented at the bottom of the loop in this line:The ending condition in the loop is then checked here:As long as V_2 is less than 3, the goto statement jumps to the IL_00af label, which is the top of the loop body. You might be able to guess that il2cpp.exe is currently generating C++ code directly from IL, without using an intermediate abstract syntax tree representation. If you guessed this, you are correct. You may have also noticed in the Runtime checks section above, some of the generated code looks like this:Clearly, the L_2 variable is not necessary here. Most C++ compilers can optimize away this additional assignment, but we would like to avoid emitting it at all. We’re currently researching the possibility of using an AST to better understand the IL code and generate better C++ code for cases involving local variables and for loops, among others.ConclusionWe’ve just scratched the surface of the C++ code generated by the IL2CPP scripting backend for a very simple project. If you haven’t done so already, I encourage you dig into the generated code in your project. As you explore, keep in mind that the generated C++ code will look different in future versions of Unity, as we are constantly working to improve the build and runtime performance of the IL2CPP scripting backend.By converting IL code to C++, we’ve been able to obtain a nice balance between portable and performant code. We can have many of the nice developer-friendly features of managed code, while still getting the benefits of quality machine code that C++ compiler provides for various platforms.In future posts, we’ll explore more generated code, including method calls, sharing of method implementations and wrappers for calls to native libraries. But next time we will debug some of the generated code for an iOS 64-bit build using Xcode.

>access_file_
1669|blog.unity.com

An introduction to IL2CPP internals

Almost a year ago now, we started to talk about the future of scripting in Unity. The new IL2CPP scripting backend promised to bring a highly-performant, highly-portable virtual machine to Unity. In January, we shipped our first platform using IL2CPP, iOS 64-bit. The Unity 5 release brought another platform, WebGL. Thanks to the input from our tremendous community of users, we have shipped many patch release updates for IL2CPP, steadily improving its compiler and runtime. We have no plans to stop improving IL2CPP, but we thought it might be a good idea to take a step back and tell you a little bit about how IL2CPP works from the inside out. Over the next few months, we’re planning to write about the following topics (and maybe others) in this IL2CPP Internals series of posts:1.The basics - toolchain and command line arguments (this post) 2. A tour of generated code 3. Debugging tips for generated code 4. Method calls (normal methods, virtual methods, etc.) 5. Generic sharing implementation 6. P/invoke wrappers for types and methods 7. Garbage collector integration 8. Testing frameworks and usageIn order to make this series of posts possible, we’re going to discuss some details about the IL2CPP implementation that will surely change in the future. Hopefully we can still provide some useful and interesting information.The technology that we refer to as IL2CPP has two distinct parts.An ahead-of-time (AOT) compilerA runtime library to support the virtual machineThe AOT compiler translates Intermediate Language (IL), the low-level output from .NET compilers, to C++ source code. The runtime library provides services and abstractions like a garbage collector, platform-independent access to threads and files, and implementations of internal calls (native code which modifies managed data structures directly).The IL2CPP AOT compiler is named il2cpp.exe. On Windows you can find it in the Editor\Data\il2cpp directory. On OSX it is in the Contents/Frameworks/il2cpp/build directory in the Unity installation.The il2cpp.exe utility is a managed executable, written entirely in C#. We compile it with both .NET and Mono compilers during our development of IL2CPP. The il2cpp.exe utility accepts managed assemblies compiled with the Mono compiler that ships with Unity and generates C++ code which we pass on to a platform-specific C++ compiler.You can think about the IL2CPP toolchain like this:The other part of the IL2CPP technology is a runtime library to support the virtual machine. We have implemented this library using almost entirely C++ code (it has a little bit of platform-specific assembly code, but let’s keep that between the two of us). We call the runtime library libil2cpp, and it is shipped as a static library linked into the player executable. One of the key benefits of the IL2CPP technology is this simple and portable runtime library.You can find some clues about how the libil2cpp code is organized by looking at the header files for libil2cpp we ship with Unity (you’ll find them in the Editor\Data\PlaybackEngines\webglsupport\BuildTools\Libraries\libil2cpp\include directory on Windows, or the Contents/Frameworks/il2cpp/libil2cpp directory on OSX). For example, the interface between the C++ code generated by il2cpp.exe and the libil2cpp runtime is located in the codegen/il2cpp-codegen.h header file.One key part of the runtime is the garbage collector. We’re shipping Unity 5 with libgc, the Boehm-Demers-Weiser garbage collector. However, libil2cpp has been designed to allow us to use other garbage collectors. For example, we are researching an integration of the Microsoft GC which was open-sourced as part of the CoreCLR. We’ll have more to say about this in our post about garbage collector integration later in the series.Let’s take a look at an example. I’ll be using Unity 5.0.1 on Windows, and I’ll start with a new, empty project. So that we have at least one user script to convert, I’ll add this simple MonoBehaviour component to the Main Camera game object:When I build for the WebGL platform, I can use Process Explorer to see the command line Unity used to run il2cpp.exe:That command line is pretty long and horrible, so let’s unpack it. First, Unity is running this executable:The next argument on the command line is the il2cpp.exe utility itself.The remaining command line arguments are passed to il2cpp.exe, not mono.exe. Let’s look at them. First, Unity passes five flags to il2cpp.exe:--copy-level=NoneSpecify that il2cpp.exe should not perform an special file copies of the generated C++ code.--enable-generic-sharingThis is a code and binary size reduction feature. IL2CPP will share the implementation of generic methods when it can.--enable-unity-event-supportSpecial support to ensure that code for Unity events, which are accessed via reflection, is correctly generated.--output-format=CompactGenerate C++ code in a format that requires fewer characters for type and method names. This code is difficult to debug, since the names in the IL code are not preserved, but it often compiles faster, since there is less code for the C++ compiler to parse.--extra-types.file="C:\Program Files\Unity\Editor\Data\il2cpp\il2cpp_default_extra_types.txt"Use the default (and empty) extra types file. This file can be added in a Unity project to let il2cpp.exe know which generic or array types will be created at runtime, but are not present in the IL code.It is important to note that these command line arguments can and will be changed in later releases. We’re not at a point yet where we have a stable and supported set of command line arguments for il2cpp.exe. Finally, we have a list of two files and one directory on the command line:"C:\Users\Josh Peterson\Documents\IL2CPP Blog Example\Temp\StagingArea\Data\Managed\Assembly-CSharp.dll""C:\Users\Josh Peterson\Documents\IL2CPP Blog Example\Temp\StagingArea\Data\Managed\UnityEngine.UI.dll""C:\Users\Josh Peterson\Documents\IL2CPP Blog Example\Temp\StagingArea\Data\il2cppOutput"The il2cpp.exe utility accepts a list of all of the IL assemblies it should convert. In this case they are the assembly containing my simple MonoBehaviour, Assembly-CSharp.dll, and the GUI assembly, UnityEngine.UI.dll. Note that there are a few conspicuously missing assembles here. Clearly, my script references UnityEngine.dll, and that references at least mscorlib.dll, and maybe other assemblies. Where are they? Actually, il2cpp.exe resolves those assemblies internally. They can be mentioned on the command line, but they are not necessary. Unity only needs to mention the root assemblies (those which are not referenced by any other assembly) explicitly.The last argument on the il2cpp.exe command line is the directory where the output C++ files should be created. If you are curious, have a look at the generated files in that directory, they will be the subject of the next post in this series. Before you do though, you might want to choose the “Development Player” option in the WebGL build settings. That will remove the --output-format=Compact command line argument and give you better type and method names in the generated C++ code.Try changing various options in the WebGL or iOS Player Settings. You should be able to see different command line options passed to il2cpp.exe to enable different code generation steps. For example, changing the “Enable Exceptions” setting in the WebGL Player Settings to a value of “Full” adds the --emit-null-checks, --enable-stacktrace, and --enable-array-bounds-check arguments to the il2cpp.exe command line.I’d like to point out one of the challenges that we did not take on with IL2CPP, and we could not be happier that we ignored it. We did not attempt to re-write the C# standard library with IL2CPP. When you build a Unity project which uses the IL2CPP scripting backend, all of the C# standard library code in mscorlib.dll, System.dll, etc. is the exact same code used for the Mono scripting backend.We rely on C# standard library code that is already well-known by users and well-tested in Unity projects. So when we investigate a bug related to IL2CPP, we can be fairly confident that the bug is in either the AOT compiler or the runtime library, and nowhere else.Since the initial public release of IL2CPP at version 4.6.1p5 in January, we’ve shipped 6 full releases and 7 patch releases (across versions 4.6 and 5.0 of Unity). We have corrected more than 100 bugs mentioned in the release notes.In order to make this continuous improvement happen, we develop against only one version of the IL2CPP code internally, which sits on the bleeding edge of the trunk branch in Unity used to ship alpha and beta releases. Just before each release, we port the IL2CPP changes to the specific release branch, run our tests, and verify all of the bugs we fixed are corrected in that version. Our QA and Sustained Engineering teams have done incredible work to make delivery at this rate possible. This means that our users are never more than about one week away from the latest fixes for IL2CPP bugs.Our user community has proven invaluable by submitting many high quality bug reports. We appreciate all the feedback from our users to help continually improve IL2CPP, and we look forward to more of it.The development team working on IL2CPP has a strong test-first mentality. We often employee Test Driven Design practices, and seldom merge a pull request without good tests. This strategy works well for a technology like IL2CPP, where we have clear inputs and outputs. It means that the vast majority of the bugs we see are not unexpected behavior, but rather unexpected cases (e.g. it is possible to use an 64-bit IntPtr as a 32-bit array index, causing clang to fail with a C++ compiler error, and real code actually does this!). That difference allows us to fix bugs quickly with a high degree of confidence.With the help of our community, we’re working hard to make IL2CPP as stable and fast as possible. By the way, if any of this excites you, we’re hiring (just sayin’).I fear that I’ve spent too much time here teasing future blog posts. We have a lot to say, and it simply won’t all fit in one post. Next time, we’ll dig into the code generated by il2cpp.exe to see how your project actually looks to the C++ compiler.

>access_file_
1670|blog.unity.com

Working with physically based shading: A practical approach

Throughout the development of Unity 5, we’ve used our Viking Village project internally as a testing ground for shading and lighting workflows.If you’re using the Unity 5 beta, you can download the Viking Village package from the Asset Store to get insights into how you can assemble and illuminate a scene in Unity 5. We also present some of our learnings below.In order to ensure that your texturing and shader configuration is behaving appropriately, we recommend that you use a simple scene with a variety of lighting setups. This could mean differing skyboxes, lights etc - anything that contributes to illuminating your model.When you open Unity 5, you’ll notice that any new empty scene has a procedural sky as well as default ambient and reflection settings. This provides a suitable starting point.For our template environment we used:HDR camera renderingA few scattered reflection probes (for localized reflections on objects)A group of light-probesA set of HDR sky-textures and materials, as well as procedural skies. The sky which ships with this project was custom-made for Unity by Bob Groothuis, author of Dutch Skies 360.Off-white directional lights with matched intensity and HDR sky colorAdjusting sky texture panoramasMost sky textures include the sun (along with flares etc.), thus, light from the sun gets reflected by surfaces. This has the potential to cause three issues:1) The Directional light you use to represent the sun must match the exact direction of the sun painted onto the skybox or there will be multiple specular hotspots on the material.2) The reflected sun and the specular hotspot overlap, causing intense specular highlights.3) The baked-in sun reflection is not occluded when the surface is in shadow and it becomes overly shiny in darkness.As a result, the sun highlight, flares, sunrays and HDR values need to be edited out of the sky texture and reapplied using Directional Lights.Authoring physically-based shading materials To avoid the guesswork involved in emulating real world materials, it is useful to follow a reliable known reference.The Standard Shader supports both a Specular Color and a Metallic workflow. They both define the color of the reflections leaving the surface. In the Specular workflow, color is specified directly, whilst in the Metallic workflow, the color is derived from a combination of the diffuse color and the metallic value set in the Standard Shader controls.For the Viking Village project, we used the Standard Shader’s Specular Color Workflow. Our calibration scene, which you can download from the Asset Store, includes some handy calibration charts. We referenced the charts regularly when designing our materials.When approaching materials you can choose between what we call the Specular and the Metallic workflows, each with its own set of values and a reference chart. In the Specular workflow you choose the color of the specularly reflected light directly, in the metallic workflow you choose if the material behaves like a metal when it is illuminated.Choosing between Specular or Metallic workflows is largely a matter of personal preference, you can usually get the same result whichever workflow you choose to use.Aside from charts and values, gathering samples of real world surfaces is highly valuable. It is of great help to find the surface type you are trying to imitate and try to get an understanding of how it reacts to light.Setting up the materialWhen starting out, it’s often useful to create a plain but tweakable representation of the materials using colors, values and sliders derived from the calibration charts. Then, you can apply textures while keeping the original material as a reference to confirm that characteristics are preserved.Textures in the Viking Village have been authored using both manual-traditional methods (photos + tweaking) as well as through scanned Diffuse/albedo, specular-, gloss and normal map images which were provided to us by Quixel.Be careful when adding detail in the texture channels of the material. For example, it usually pays to avoid placing lighting (Ambient Occlusion, shadows etc.) in your textures: remember that the physically based rendering approach provides all the lighting you should need.Naturally, retouching photographs is more demanding than using scanned data, specially when it comes to PBS-friendly values. There are tools that provide assistance to make the process easier, such as Quixel Suite and Allegorithmic Substance Painter.Scanned dataPBS-calibrated scanned textures alleviate the need for editing, since data is already separated into channels and contains values for albedo, specular and smoothness. It is best if the software that provides the PBS-calibrated data contains a Unity profile for export. You can always use the reference charts as a sanity check and as a guide if you need to calibrate the values using Photoshop or a related tool.Material examplesThe Viking Village Scene features a large amount of content while trying to stay within reasonable texture memory consumption. Let's take a look at how we set up a 10-meter-high wooden crane as an example.Notice that many textures, especially specular and diffuse textures, are homogenous and require different resolutions.• Albedo texture: In the specular workflow it represents the color of diffuse light bounced off the surface. It does not necessarily need to be highly detailed as seen in the left image (crane), whereas the right texture (shield) includes significant unique detail.• Specular: Non-metals (insulators) are comparatively dark and in grayscale while metal values are bright and could be colored (remember that rust, oil and dirt on a metal are not metallic). Specular for the wood surface did not benefit extensively from a specular texture, so a value was used instead of inputting a map.• Smoothness is a key element in PBS materials. It contributes variation, imperfections and detail to surfaces and helps represent their state and age. For the crane, smoothness happened to be fairly constant across the surface and was therefore substituted by a value. This delivered a reasonable texture memory gain.• Occlusion indicates how exposed different points of the surface are to the light of the surrounding environment. Ambient Occlusion brings out surface detail and depth by muting ambient and reflection in areas with little indirect light. Keep in mind that there’s also the option of using SSAO (Screen Space Ambient Occlusion) in your scene. Using SSAO and AO could result in double darkening of certain areas, in which case you may want to consider treating the AO map as a cavity map. An AO map that would emphasize deep cracks and creases may be the best option if the game uses SSAO and/or light-mapped Ambient Occlusion.Secondary Textures and resolution Secondary Textures can be used to increase the level of detail or provide variation within the material. They can be masked using the Detail Mask property.Due to the low resolution primary diffuse wood texture in the Crane example, the secondary texture set is crucial. It adds the fine detail to the final surface. In this instance, the detail-maps are tiled and at a reasonably low resolution. They are repeated on many other wooden surfaces, thus delivering a major texture memory saving.These workflows certainly helped us when designing the Viking Village project. We hope you also find them useful, and look forward to reading your comments!Acknowledgements The Viking Village project was launched in partnership with the creative team at Quixel, developer of HDR surface capture technology and the Quixel Megascans library of PBS-ready textures.Big thanks to the very talented Emmanuel “Manu” Tavares and Plamen “Paco” Tamnev for bringing this scene to life.Go and download the project at the Asset Store. Be aware that it's optimised for Unity 5.0.0 RC2.

>access_file_
1671|blog.unity.com

A primer on repeatable random numbers

In this article we'll use level/world generation in games as example use cases, but the lessons are applicable to many other things, such as procedural textures, models, music, etc. They are however not meant for applications with very strict requirements, such as cryptography.Why would you want to repeat the same result more than once?Ability to revisit the same level/world. For example a certain level/world can be created from a specific seed. If the same seed is used again, you will get the same level/world again. You can for example do this in Minecraft.Persistent world that's generated on the fly. If you have a world that's generated on the fly as the player moves around in it, you may want locations to remain the same the first and subsequent times the player visit those locations (like in Minecraft, the upcoming game No Man's Sky, and others), rather than being different each time as if driven by dream logic.Same world for everyone. Maybe you want your game world to be the same for everyone who play it, exactly as if it wasn't procedurally generated. This is for example the case in No Man's Sky. This is essentially the same as the ability to revisit the same level/world mentioned above, except that the same seed is always used.We've mentioned the word seed a few times. A seed can be a number, text string, or other data that's used as input in order to get a random output. The defining trait for a seed is that the same seed will always produce the same output, but even the slightest change in the seed can produce a completely different output.In this article we'll look into two different ways to produce random numbers - random number generators and random hash functions - and reasons for using one or the other. The things I know about this are hard earned and don't seem to be readily available elsewhere, so I thought it would be in order to write it down and share it.The most common way to produce random numbers is using a random number generator (or RNG for short). Many programming languages have RNG classes or methods included, and they have the word "random" in their name, so it's the obvious go-to approach to get started with random numbers.A random number generator produces a sequence of random numbers based on an initial seed. In object-oriented languages, a random number generator is typically an object that is initialized with a seed. A method on that object can then be repeatedly called to produce random numbers.The code in C# could look like this:In this case we're getting random integer values between 0 and the maximum possible integer value (2147483647), but it's trivial to convert this to a random integer in a specific range, or a random floating point number between 0 and 1 or similar. Often methods are provided that do this out of the box.Here's an image with the first 65536 numbers generated by the Random class in C# from the seed 0. Each random number is represented as a pixel with a brightness between 0 (black) and 1 (white).It's important to understand here that you cannot get the third random number without first getting the first and second one. This is not just an oversight in the implementation. In its very nature, an RNG generates each random number using the previous random number as part of the calculation. Hence we talk about a random sequence.This means that RNGs are great if you need a bunch of random numbers one after the other, but if you need to be able to get a specific random number (say, the 26th random number from the sequence), then you're out of luck. Well, you could call Next() 26 times and use the last number but this is a very bad idea.If you generate everything at once, you probably don't need specific random numbers from a sequence, or at least I can't think of a reason. However, if you generate things bit by bit on the fly, then you do.For example, say you have three sections in your world: A, B, and C. The player starts in section A, so section A is generated using 100 random numbers. Then the player proceeds to section B which is generated using 100 different numbers. The generated section A is destroyed at the same time to free up memory. The player proceeds to section C which is 100 yet different numbers and section B is destroyed.However, if the player now go back to section B again, it should be generated with the same 100 random numbers as it was the first time in order for the section to look the same.No! This is a very common misconception about RNGs. The fact is that while the different numbers in the same sequence are random in relation to each other, the same indexed numbers from different sequences are not random in relation to each other, even if it may look like it at first glance.So if you have 100 sequences and take the first number from each, those numbers will not be random in relation to each other. And it won't be any better if you take the 10th, 100th, 1000th number from each sequence.At this point some people will be skeptical, and that's fine. You can also look at this Stack Overflow question about RNG for procedural content if that's more trustworthy. But for something a bit more fun and informative, let's do some experiments and look at the results.Let's look at the numbers generated from the same sequence for reference and compare with numbers created by getting the first number in of each of 65536 sequences created from the seeds 0 to 65535.Though the pattern is rather uniformly distributed, it isn't quite random. In fact, I've shown the output of a purely linear function for comparison, and it's apparent that using numbers from subsequent seeds is barely any better than just using a linear function.Still, is it almost random though? Is it good enough?At this point it can be a good idea to introduce better ways to measure randomness since the naked eye is not very reliable. Why not? Isn't it enough that the output looks random enough?Well yes, in the end our goal is simply that things look sufficiently random. But the random number output can look very different depending on how it's used. Your generation algorithms may transform the random values in all kinds of ways that will reveal clear patterns that are hidden when just inspecting the values listed in a simple sequence.An alternative way to inspect the random output is to create 2D coordinates from pairs of the random numbers and plot those coordinates into an image. The more coordinates land on the same pixel, the brighter that pixel gets.Let's take a look at such a coordinate plot for both a random numbers in the same sequence and for random numbers created from individual sequences with different seeds. Oh and let's throw in the linear function too.Perhaps surprisingly, when creating coordinates from random numbers from different seeds, the coordinates are all plotted into thin lines rather than being anything near uniformly distributed. This is again just like for a linear function.Imagine you created coordinates from random numbers in order to plant trees on a terrain. Now all your trees would be planted in a straight line with the remaining terrain being empty!We can conclude that random number generators are only useful if you don't need to access the numbers in a specific order. If you do, then you might want to look into random hash functions.In general a hash function is any function that can be used to map data of arbitrary size to data of fixed size, with slight differences in input data producing very big differences in output data.For procedural generation, typical use cases are to provide one or more integer numbers as input and get a random number as output. For example, for large worlds where only parts are generated at a time, a typical need is to get a random number associated with an input vector (such as a location in the world), and this random number should always be the same given the same input. Unlike random number generators (RNGs) there is no sequence - you can get the random numbers in any order you like.The code in C# could look like this - note that you can get the numbers in any order you like:The hash function may also take multiple inputs, which mean you can get a random number for a given 2D or 3D coordinate:Procedural generation is not the typical use of hash functions, and not all hash functions are well suited for procedural generation, as they may either not have sufficiently random distribution, or be unnecessarily expensive.One use of hash functions is as part of the implementation of data structures such as hash tables and dictionaries. These are often fast but not sufficiently random, since they are not meant for randomness but just for making algorithms perform efficiently. In theory this means they should be random as well, but in practice I haven't found resources that compare the randomness properties of these, and the ones I've tested have turned out to have fairly bad randomness properties (see Appendix C for details).Another use of hash function is for cryptography. These are often very random, but are also slow, since the requirements for cryptographically strong hash functions is much higher than for values that just looks random.Our goal for procedural generation purposes is a random hash function that looks random but is also efficient, meaning that it's not slower than it needs to be. Chances are there's not a suitable one built into your programming language of choice, and that you'll need to find one to include in your project.I've tested a few different hash functions based on recommendations and information from various corners of the Internet. I've selected three of those for comparison here.PcgHash: I got this hash function from Adam Smith in a discussion on Google Groups forum on Procedural Content Generation. Adam proposed that with a little skill, it's not too hard to create your own random hash function and offered his PcgHash code snippet as an example.MD5: This may be one of the most well-known hash functions. It's also of cryptographic strength and more expensive than it needs to be for our purposes. On top of that, we typically just need a 32-bit int as return value, while MD5 returns a much larger hash value, most of which we'd just be throwing away. Nevertheless it's worth including for comparison.xxHash: This is a high-performing modern non-cryptographic hash function that has both very nice random properties and great performance.Apart from generating the noise sequence images and coordinate plots, I've also tested with a randomness testing suite called ENT - A Pseudorandom Number Sequence Test Program. I've included select ENT stats in the images as well as a stat I devised myself with I call the Diagonals Deviation. The latter looks at sums of diagonal lines of pixels from the coordinate plot and measures the standard deviation of these sums.Here's the results from the three hash functions:PcgHash stands out in that while it appears very random in the noise images of sequential random values, the coordinate plot reveals clear patterns, which means it doesn't hold up well to simple transformations. I conclude from this that rolling your own random hash function is hard and should probably be left to the experts.MD5 and xxHash seem to have very comparable random properties, and out of those, xxHash is around 50 times faster.xxHash also has the advantage that although it's not an RNG, it still has the concept of a seed, which is not the case for all hash functions. Being able to specify a seed has clear advantages for procedural generation, since you can use different seeds for different random properties of entities, grid cells, or similar, and then just use the entity index / cell coordinate as input for the hash function as-is. Crucially, with xxHash, the numbers from differently seeded sequences are random in relation to each other (see Appendix 2 for more details).In my investigations of hash functions it has become clear that while it's good to choose a hash function that's high-performing in general-purpose hash benchmarks, it's crucial for performance to optimize it to procedural generation needs rather than just using the hash function as-is.There are two important optimizations:Avoid conversions between integers and bytes. Most general-purpose hash functions take a byte array as input and return an integer or some bytes as the hash value. However, some of the high-performing ones convert the input bytes to integers since they operate on integers internally. Since it's most common for procedural generation to get a hash based on integer input values, the conversion to bytes is completely pointless. Refactoring the reliance on bytes away can triple the performance while leaving the output 100% identical.Implement no-loop methods that take just one or a few inputs. Most general-purpose hash functions take input data of variable length in the form of an array. This is useful for procedural generation too, but the most common uses are probably to get a hash based on just 1, 2 or 3 input integers. Creating optimized methods that take a fixed number of integers rather than an array can eliminate the need for a loop in the hash function, and this can dramatically improve the performance (by around 4x-5x in my tests). I'm not an expert on low level optimization, but the dramatic difference could be caused by either implicit branching in the for loop or by the need to allocate an array.My current recommendation for a hash function is to use an implementation of xxHash that's optimized for procedural generation. See Appendix C for details on why.You can get my implementations of xxHash and other hash functions in C# on BitBucket. This is maintained privately by me in my free time, not by Unity Technologies.Besides the optimizations I also added extra methods to get the output as an integer number in a specified range or as a floating point number in a specified range, which are typical needs in procedural generation.Note: At the time of writing I only added a single integer input optimization to xxHash and MurmurHash3. I'll add optimized overloads for two and three integer inputs too when I get time.Random hash functions and random number generators can also be combined. A sensible approach is to use random number generators with different seeds, but where the seeds have been passed through a random hash function rather than being used directly.Imagine you have a large maze world, possibly nearly infinite. The world has a large scale grid where each grid cell is a maze. As the player moves around in the world, mazes are generated in the grid cells surrounding the player.In this case you'll want each maze to always be generated the same way every time it's visited, so the random numbers needed for that need to be able to be produced independently from previously generated numbers.However, mazes are always generated one whole maze at a time, so there's no need to have control over the order of the individual random numbers used for one maze.The ideal approach here is to use a random hash function to create a seed for a maze based on the coordinate of the grid cell of the maze, and then use this seed for a random number generator sequence of random numbers.The C# code could look like this:If you need control over the order of querying random numbers, use a suitable random hash function (such as xxHash) in an implementation that's optimized for procedural generation.If you just need to get a bunch of random numbers and the order doesn't matter, the simplest way is to use a random number generator such as the System.Random class in C#. In order for all the numbers to be random in relation to each other, either only a single sequence (initialized with one seed) should be used, or if multiple seeds are used they should be passed through a random hash function (such as xxHash) first.The source code for the random numbers testing framework referred to in this article, as well as a variety of RNGs and hash functions, is available on BitBucket. This is maintained privately by me in my free time, not by Unity Technologies.This article originally appeared on the runevision blog which is dedicated to game development and research I do in my free time.For certain things you'll want to be able to query noise values that are continuous, meaning that input values near each other produce output values that are also near each other. Typical uses are for terrains or textures.These requirements are completely different from the ones discussed in this article. For continuous noise, look into Perlin Noise - or better - Simplex Noise.However, be aware that these are only suitable for continuous noise. Querying continuous noise functions just to get random numbers unrelated to other random numbers will produce poor results since it's not what these algorithms are optimized for. For example, I've found that querying a Simplex Noise function at integer positions will return 0 for every third input!Additionally, continuous noise functions usually use floating point numbers in their calculations, which have worse stability and precision the further you get from the origin.I've heard various misconceptions over the years and I'll try to address a few more of them hereNo, I haven't seen anything that indicates that. If you look at the test images throughout this article, there's no difference between the results for low or high seed values.No. Again, if you look at the test images, you can see that the sequences of random values follow the same pattern from start (upper left corner and proceeding one line after the other) to end.In the image below I've tested the 0th number in 65535 sequences as well as the 100th number in those same sequences. As can be seen, there's no apparent significant difference in the (lack of) quality of the randomness.Maybe a tiny bit better, but not nearly enough. Unlike the Random class in C#, the Random class in Java doesn't use the provided seed as-is, but shuffles the bits a bit before storing the seed.The resulting numbers from different sequences may be a tiny bit more random looking, and we can see from the test stats that the Serial Correlation is much better. However, it's clear in the coordinates plot that the numbers still collapse to a single line when used for coordinates.That said, there's no reason why a RNG couldn't apply a high-quality random hash function to the seed before using it. In fact it seems like a very good idea to do so, with no downsides I can think of. It's just that popular RNG implementations that I'm aware of don't do that, so you'll have to do it yourself as described previously.There's no intrinsic reason, but hash functions such as xxHash and MurmurHash3 treat the seed value similar to the inputs, meaning that it essentially applies a high quality random hash function to the seed, so to speak. Because it's implemented that way, it's safe to use the Nth number from differently seeded random hash objects.In the original version of this article I compared PcgHash, MD5, and MurmurHash3 and recommended using MurmurHash3.MurmurHash3 has excellent randomness properties and very decent speed. The author implemented it in parallel with a framework for testing hash functions called SMHasher which has become a widely used framework for that purpose.There is also good information on this Stack Overflow question about good hash functions for uniqueness and speed which compares a lot more hash functions and seems to paint an equally favorable picture of MurmurHash.After publishing the article I got recommendations from Aras Pranckevičius to look into xxHash and from Nathan Reed to look into Wang Hash which he's written about here.xxHash is a hash function which apparently beats MurmurHash on its own turf by scoring as high on quality in the SMHasher testing framework while having significantly better performance. Read more about xxHash on its Google Code page.In my initial implementation of it, after I had removed byte conversions, it was slighter faster than MurmurHash3, though not nearly as much faster as shown in the SMHasher results.I also implemented WangHash. The quality proved to be insufficient since it showed clear patterns in the coordinate plot, but it was over five times faster than xxHash. I tried implementing a "WangDoubleHash" where its result is fed to itself, and that had fine quality in my tests while still being over three times faster than xxHash.However, since WangHash (and WangDoubleHash) takes only a single integer as input, I opted to also implement single input versions of xxHash and MurmurHash3 to see what effect it would have on the performance. And it turned out to improve performance dramatically (around 4-5 times faster). So much in fact that xxHash was now faster than WangDoubleHash.As for quality, my own test framework reveals fairly obvious flaws, but is not nearly as sophisticated as the SMHasher test framework, so a hash function that scores high there can be assumed to be a better seal of quality for randomness properties than just looking fine in my own tests. In general I would say that passing the tests in my test framework may be sufficient for procedural generation purposes, but since xxHash (in its optimized version) is the fastest hash function passing my own tests anyway, it's a no-brainer to just use that.There are extremely many different hash functions out there and it would always be possible to include even more for comparison. However, I've focused primarily on some of the widely recognized best performing ones in terms of both randomness quality and performance, and then optimized them further for procedural generation.I think the results produced with this version of xxHash are fairly close to optimal and further gains by finding and using something even better are likely going to be small. That said, feel free to extend the test framework with more implementations.

>access_file_

[ 2014 ]

8 entries
1672|blog.unity.com

High-performance physics in Unity 5

We’ve been using PhysX 2.8.3 for a while now. We’ve not just used the plain stock PhysX, but extended it with numerous patches made by Unity engineers over the years. It’s been so long, and we say thanks to PhysX 2 for all the fish. As announced at GDC’14, Unity 5.0 features an upgrade to PhysX 3.3. Let’s give it a closer look.PhysX SDK 3 is a radical redesign of the good old PhysX SDK 2.x. Basically, the PhysX team have taken the best ideas and best approaches from 2.x and rewritten the whole SDK from scratch. That means the entire codebase is different, all the interfaces are different and most of the functionality is different.Now let’s give you a taste of how it feels to use Unity 5.0 physics.To start with something simple, we’ve made the adaptive force switchable and off by default. Adaptive force is a special technique in PhysX to compensate for numerical errors when simulating stacks. However, feedback from Unity developers tells us that adaptive force contributes a lot to overall instability in game content. Expect your stack-like things to behave better in future.Moving Static Colliders will be a lot less expensive. A Static Collider is just a gameobject with a collider component on it, but without a Rigidbody component. Previously, since the SDK assumed that Static Colliders aren’t moved, moving a Static Collider would trigger an expensive AABB tree rebuild that affected overall performance badly.In Unity 5, we’ll use the same data structure to handle the movement of both dynamic and static colliders. Unfortunately, we’ll lose the benefit of Static Colliders consuming less memory than dynamic ones. However, right now, the cost associated with moving Static Colliders is one of the top 3 causes of performance issues in Unity games. We wanted to change that.Continuous collision detection has been improved by an order of magnitude. Continuous collision detection is used to prevent fast-moving bodies from going through other bodies without any collisions detected. Imagine a bullet shot at a piece of paper, or a game of pool where some balls will move faster than others.In Unity 5.0, the SDK generates all the data required to handle fast movement. You just enable continuous collision detection and it works. PhysX3 features an algorithm used to detect whether the expensive CCD simulation is actually needed given the current body velocity or a default discreet would do just fine. It’s activated once you enable the CCD.PhysX3 supports more Rigidbodies on the broadphase. In fact, you’ll be able to have several hundred thousand bodies on a single frame on desktop and desktop-like platforms. Previously, there was a fixed 64k limit on bodies. That wasn’t a constant that we could easily increase – it was a consequence of saving a lot on bits all over the SDK. Some console targets, such as PlayStation 3 will still have this limitation. There’s also a limit on physics materials: at the time of writing, you can’t have more than 64k of materials on any platform.In Unity 5.0 we’ve reduced the cost of scaling Mesh Colliders. Previously, when scaling a Mesh Collider, you would have to create a new mesh with scale baked into the vertices, and that required valuable time and memory. With PhysX3 we are able to support non-negative scaling without any baking. It’s basically free.Next, let’s take a look at two subsystems that differ so much from the Unity 4.x version that you can think of them as new: the cloth and vehicles modules.Let’s start with cloth. In Unity 4 cloth simulation is supported via InteractiveCloth and SkinnedCloth components. InteractiveCloth has a cloth-like mesh behaviour i.e. “physical cloth” that interacts with other physics bodies, can apply forces and so on. InteractiveCloth is computationally expensive, so Unity 4 has another one, SkinnedCloth, for character clothing.Since SkinnedCloth is decoupled from the main simulation pipeline, it is able to perform better than InteractiveCloth. The main problem with cloth is that both components were quite unstable and cost a lot. With PhysX3 integration coming, we have decided to drop support for InteractiveCloth and only have one cloth component, called simply Cloth, designed with character clothing in mind.In Unity 5.0, Cloth no longer reacts to all colliders in a scene, nor does it apply forces back to the world. Instead, we have a faster, multithreaded, more stable character clothing solution. When you add it the new Cloth component no longer reacts to any bodies at all.Thus, Cloth and the world do not recognise or see each other until you manually add colliders from the world to the Cloth component. Even after that, the simulation is still one-way: cloth reacts to those bodies but doesn’t apply forces back. Additionally, you can only use three types of colliders with cloth: a sphere, a capsule, and conical capsule colliders, constructed using two sphere colliders. These changes have all been introduced to boost performance.The authoring interface in Unity 5.0 is similar to the SkinnedCloth one, and we are working hard on improving that in 5.x. Expect to see things like integration with Mecanim avatars added during the 5.x cycle.The new Cloth component supports the GPU via CUDA internally, but we’ve decided to release this later in the 5.x cycle for several reasons. Firstly, CUDA only works on Windows on NVIDIA hardware, and we have a big presence on Mac & Linux. Secondly, we really want to focus our bug fixing efforts on core stuff first and move on to integrate fancy things after that.Now, a few words about vehicles. PhysX3 has a shiny new Vehicle SDK which we’ve used to implement our WheelCollider component. The new component delivers much more realistic suspension and tire friction. Plus, it fixes a number of other long-standing issues.In Unity 5.0, the new component can be used out of the box to generate a simple behaviour. I only expect developers to go for vehicle packages on the Asset Store when they want something that is already fine-tuned, realistic or advanced, like Edy’s Vehicle Package.Look at what I’ve been able to set up in a couple of hours using a free mesh downloaded from the web (most of that time was spent in Blender preparing the model):And here is one of the SUVs from Edy’s Vehicle Package:Edy is currently working on the new version of his package which will bring more amazing stuff. Contact him directly for more details.The are a lot of fantastic technical details about vehicles I’d like to share with you, but, for now, let’s just take a look at the new WheelCollider gizmos. This will give you an idea of how suspension is going to work as well.In the above picture the wheel circle and wheel diameter are marked in green, the suspension travel segment in orange and the force application point sphere in green. On the suspension travel segment, there are marks for maximum compression position, maximum droop position and target position.As you’d expect, the wheel can only travel between the max compression and max droop positions. The target position (also referred to as the rest position in technical talks) is located exactly where the sprung weight is balanced by the spring force; i.e. the position where the wheel is located when the vehicle is just standing on a flat surface. It might seem tricky to tune, but, actually, the max compression position is where your wheel is located originally in the mesh.Next, you specify the suspension distance and target position as a fraction of suspension distance. Just two floats to rule them all, not a big deal! Have I told you that the new wheel gizmo now updates the rotation and position from the simulation data out of the box? You don’t even have to add actual wheel geometry and write the wheel positioning code to preview your settings. It’s all just built in.PhysX3 is now prepared to run on multicores as the internal computation model is organised in tasks that can be executed on different cores. The SDK does all the multithreading, taking care of all the dependencies itself and granting optimal job decomposition.From what we’ve seen so far, it’s reasonable to expect a doubling in performance generally just as a result of having a better code base and improved multithreading. In some instances, the improvement is dramatic, with up to tenfold improvements.Performance ninjas interested in more data should visit Pierre Terdiman’s blog. He’s the core developer behind the PhysX SDK.The new functions look and feel Unity-like, but there are still cases where the behaviour is different, parameters mean different things or, in some cases, old behaviours are no longer supported. Thus, Unity 5.0 physics is not 100% compatible with Unity 4.x. Be prepared to retune your old projects and rewrite some of your physics code when migrating from previous Unity releases.There are many more details about physics in Unity 5 than I can share in this post. Please feel free to ask questions in the comments, or if you’re visiting our Unite 2014 developer conference this year, catch my in-depth talk on physics in Unity 5.0 and come say hi and have a chat.More about Unity 5

>access_file_
1673|blog.unity.com

Serialization in Unity

In the spirit of sharing more of the tech behind the scenes, and reasons why some things are the way they are, this post contains an overview of Unity's serialization system. Understanding this system very well can have a big impact on the effectiveness of your development, and the performance of the things you make. Here we go.Serialization of “things” is at the very core of Unity. Many of our features build ontop of the serialization system:Storing data stored in your scripts. This one most people are probably somewhat familiar with.Inspector window. The inspector window doesn’t talk to the C# api to figure out what the values of the properties of whatever it is inspecting is. It asks the object to serialize itself, and then displays the serialized data.Prefabs. Internally, a prefab is the serialized data stream of one (or more) game objects and components. A prefab instance is a list of modifications that should be made on the serialized data for this instance. The concept prefab actually only exists at editor time. The prefab modifications get baked into a normal serialization stream when Unity makes a build, and when that gets instantiated, the instantiated gameobjects have no idea they were a prefab when they lived in the editor.Instantiation. When you call Instantiate() on either a prefab, or a gameobject that lives in the scene, or on anything else for that matter (everything that derives from UnityEngine.Object can be serialized), we serialize the object, then create a new object, and then we “deserialize” the data onto the new object. (We then run the same serialization code again in a different variant, where we use it to report which other UnityEngine.Object’s are being referenced. We then check for all referenced UnityEngine.Object’s if they are part of the data being Instantiated(). If the reference is pointing to something “external” (like a texture) we keep that reference as it is, if it is pointing to something "internal" (like a child gameobject), we patch the reference to the corresponding copy).Saving. If you open a .unity scene file with a text editor, and have set unity to “force text serialization”, we run the serializer with a yaml backend.Loading. Might not seem surprising, but backwards compatible loading is a system that is built on top of serialization as well. In-editor yaml loading uses the serialization system, as well as the runtime loading of scenes and assets. Assetbundles also make use of the serialization system.Hot reloading of editor code. When you change an editor script, we serialize all editor windows (they derive from UnityEngine.Object!), we then destroy all the windows, unload the old c# code, load the new c# code, recreate the windows, and finally deserialize the datastreams of the windows back onto the new windows.Resource.GarbageCollectSharedAssets(). This is our native garbage collector and is different to the C# garbage collector. It is the thing that we run after you load a scene to figure out which things from the previous scene are no longer referenced, so we can unload them. The native garbage collector runs the serializer in a mode where we use it to have objects report all references to external UnityEngine.Objects. This is what makes textures that were used by scene1, get unloaded when you load scene2.The serialization system is written in C++, we use it for all our internal object types (Textures, AnimationClip, Camera, etc). Serialization happens at the UnityEngine.Object level, each UnityEngine.Object is always serialized as a whole. They can contain references to other UnityEngine.Objects and those references get serialized properly.Now you may say that none of this concerns you very much, you’re just happy that it works and want to get on with actually creating some content. However, this will concern you, as we use this same serializer to serialize MonoBehaviour components, which are backed by your scripts. Because of the very high performance requirements that the serializer has, it does not in all cases behave exactly like what a C# developer would expect from a serializer. Here we’ll describe how the serializer works and some best practices on how to make the best use of it.What does a field of my script need to be in order to be serialized?Be public, or have [SerializeField] attributeNot be staticNot be constNot be readonlyThe fieldtype needs to be of a type that we can serialize.Which fieldtypes can we serialize?Custom non abstract classes with [Serializable] attribute.Custom structs with [Serializable] attribute. (new in Unity4.5)References to objects that derive from UntiyEngine.ObjectPrimitive data types (int,float,double,bool,string,etc)Array of a fieldtype we can serializeList of a fieldtype we can serializeSo far so good. So what are these situations where the serializer behaves differently from what I expect?Custom classes behave like structs[Serializable] class Animal { public string name; } class MyScript : MonoBehaviour { public Animal[] animals; } If you populate the animals array with three references to a single Animal object, in the serializationstream you will find 3 objects. When it’s deserialized, there are now three different objects. If you need to serialize a complex object graph with references, you cannot rely on Unity’s serializer doing that all automagically for you, and have to do some work to get that object graph serialized yourself. See the example below on how to serialize things Unity doesn't serialize by itself.Note that this is only true for custom classes, as they are serialized “inline” because their data becomes part of the complete serializationdata for the MonoBehaviour they are used in. When you have fields that have a reference to something that is a UnityEngine.Object derived class, like a “public Camera myCamera”, the data from that camera are not serialized inline, and an actual reference to the camera UnityEngine.Object is serialized.No support for null for custom classesPop quiz. How many allocations are made when deserializing a MonoBehaviour that uses this script:class Test : MonoBehaviour { public Trouble t; } [Serializable] class Trouble { public Trouble t1; public Trouble t2; public Trouble t3; } It wouldn’t be strange to expect 1 allocation, that of the Test object. It also wouldn’t be strange to expect 2 allocations, one for the Test object and one for a Trouble object. The correct answer is 729. The serializer does not support null. If it serializes an object and a field is null, we just instantiate a new object of that type and serialize that. Obviously this could lead to infinite cycles, so we have a relatively magical depth limit of 7 levels. At that point we just stop serializing fields that have types of custom classes/structs and lists and arrays. [1]Since so many of our subsystems build on top of the serialization system, this unexpectedly big serializationstream for the Test monobehaviour will cause all these subsystems to perform more slowly than necessary. When we investigate performance problems in customer projects, we almost always find this problem and we added a warning for this situation in Unity 4.5. We actually messed up the warning implementation in such a way that it gives you so many warnings, you have no other option but to fix them right away. We'll soon ship a fix for this in a patch release, the warning is not gone, but you will only get one per "entering playmode", so you don't get spammed crazy. You'd still want to fix your code, but you should be able to do it at a time where it suits you.No support for polymorphismIf you have apublic Animal[] animalsand you put in an instance of a dog, a cat and a giraffe, after serialization, you will have three instances of Animal.One way to deal with this limitation is to realize that it only applies to “custom classes”, which get serialized inline. References to other UnityEngine.Object’s get serialized as actual references and for those, polymorphism does actually work. You’d make a ScriptableObject derived class or another MonoBehaviour derived class, and reference that. The downside of doing this, is that you need to store that monobehaviour or scriptable object somewhere and cannot serialize it inline nicely.The reason for these limitations is that one of the core foundations of the serialization system is that the layout of the datastream for an object is known ahead of time, and depends on the types of the fields of the class, instead of what happens to be stored inside the fields.I want to serialize something that Unity's serializer doesn't support. What do I do?In many cases the best approach is to use serialization callbacks. They allow you to be notified before the serializer reads data from your fields and after it is done writing to them. You can use this to have a different representation of your hard-to-serialize data at runtime than when you actually serialize. You’d use these to transform your data into something Unity understands right before Unity wants to serialize it, you also use it to transform the serialized form back into the form you'd like to have your data in at runtime, right after Unity has written the data to your fields.Let’s say you want to have a tree datastructure. If you let Unity directly serialize the data structure, the “no support for null” limitation would cause your datastream to become very big, leading to performance degradations in many systems:using UnityEngine; using System.Collections.Generic; using System; public class VerySlowBehaviourDoNotDoThis : MonoBehaviour { [Serializable] public class Node { public string interestingValue = "value"; //The field below is what makes the serialization data become huge because //it introduces a 'class cycle'. public List children = new List(); } //this gets serialized public Node root = new Node(); void OnGUI() { Display (root); } void Display(Node node) { GUILayout.Label ("Value: "); node.interestingValue = GUILayout.TextField(node.interestingValue, GUILayout.Width(200)); GUILayout.BeginHorizontal (); GUILayout.Space (20); GUILayout.BeginVertical (); foreach (var child in node.children) Display (child); if (GUILayout.Button ("Add child")) node.children.Add (new Node ()); GUILayout.EndVertical (); GUILayout.EndHorizontal (); } } Instead, you tell Unity not to serialize the tree directly, and you make a seperate field to store the tree in a serialized format, suited for Unity’s serializer:using UnityEngine; using System.Collections.Generic; using System; public class BehaviourWithTree : MonoBehaviour, ISerializationCallbackReceiver { //node class that is used at runtime public class Node { public string interestingValue = "value"; public List children = new List(); } //node class that we will use for serialization [Serializable] public struct SerializableNode { public string interestingValue; public int childCount; public int indexOfFirstChild; } //the root of what we use at runtime. not serialized. Node root = new Node(); //the field we give unity to serialize. public List serializedNodes; public void OnBeforeSerialize() { //unity is about to read the serializedNodes field's contents. lets make sure //we write out the correct data into that field "just in time". serializedNodes.Clear(); AddNodeToSerializedNodes(root); } void AddNodeToSerializedNodes(Node n) { var serializedNode = new SerializableNode () { interestingValue = n.interestingValue, childCount = n.children.Count, indexOfFirstChild = serializedNodes.Count+1 }; serializedNodes.Add (serializedNode); foreach (var child in n.children) AddNodeToSerializedNodes (child); } public void OnAfterDeserialize() { //Unity has just written new data into the serializedNodes field. //let's populate our actual runtime data with those new values. if (serializedNodes.Count > 0) root = ReadNodeFromSerializedNodes (0); else root = new Node (); } Node ReadNodeFromSerializedNodes(int index) { var serializedNode = serializedNodes [index]; var children = new List (); for(int i=0; i!= serializedNode.childCount; i++) children.Add(ReadNodeFromSerializedNodes(serializedNode.indexOfFirstChild + i)); return new Node() { interestingValue = serializedNode.interestingValue, children = children }; } void OnGUI() { Display (root); } void Display(Node node) { GUILayout.Label ("Value: "); node.interestingValue = GUILayout.TextField(node.interestingValue, GUILayout.Width(200)); GUILayout.BeginHorizontal (); GUILayout.Space (20); GUILayout.BeginVertical (); foreach (var child in node.children) Display (child); if (GUILayout.Button ("Add child")) node.children.Add (new Node ()); GUILayout.EndVertical (); GUILayout.EndHorizontal (); } } Beware that the serializer, including these callbacks coming from the serializer, usually do not run on the main thread, so you are very limited in what you can do in terms of invoking Unity API. (Serialization happening as part of loading a scene happens on a loading thread. Serialization happening as part of you invoking Instantiate() from script happens on the main thread). You can however do the necessary data transformations do get your data from a non-unity-serializer-friendly format to a unity-serializer-friendly-format.You made it to the end!Thanks for reading this far, hope you can put some of this information to good use in your projects.Bye, Lucas. (@lucasmeijer)PS: We'll add all this information to the documentation as well.[1] I lied, the correct answer isn't actually 729. This is because in the very very old days before we had this 7 level depth limit, Unity would just endless loop, and then run out of memory if you created a script like the Trouble one I just wrote. Our very first fix for that 5 years ago was to just not serialize fieldtypes that were of the same type as the class itself. Obviously, this was not the most robust fix, as it's easy to create a cycle using Trouble1->Trouble2->Trouble1->Trouble2 class. So shortly afterwards we actually implemented the 7 level depth limit to catch those cases too. For the point I'm trying to make however it doesn't matter, what matters is that you realize that if there is a cycle you are in trouble.

>access_file_
1674|blog.unity.com

Unit testing part 2 - Unit testing MonoBehaviours

As promised in my previous blog post Unit testing part 1 – Unit tests by the book, this one is dedicated to designing MonoBehaviours with testability in mind. MonoBehaviour is kind of a special class that is handled by Unity in a special way. Every time you try to instantiate a MonoBehaviour derivative you will get a warning saying that it’s not allowed. Being a good boy-scout and not ignoring the warning (ignoring a warning is bad in the long term!) you might have asked yourself the question, how can I mock MonoBehaviour then? The good news is that you don’t have to! Let me introduce you to...If you've already tried to write tests, you've probably stumbled upon some of the natural enemies of unit testing like UI, legacy code, bad design with no source-code access or areas with a high degree of concurrency. What make these parts hard to test? Achieving isolation: separating what is being tested from the context. There are tools out there that can help for legacy code, but for new code a very simple pattern can be used: The Humble Object Pattern.The idea behind this pattern is very simple. Whenever you want to test a component that has any dependencies that are hard to test, extract all the logic from the component to a separate, decoupled (thus testable) class and then reference it. In other words, the problematic component (with a dependency that makes test authors' lives miserable) becomes a very thin layer of code that has as little logic code as possible with all logic operations delegated to the newly created class.From a state where the test has an indirect dependency to the untestable component......we got to a state where the test is not even aware of the bad (well, just untestable) code:That’s pretty much it. It’s a no-brainer to be honest.What makes games so special in term of code and testability? How is testing games different from testing other software? Personally, I consider games as a pretty sophisticated pieces of software. It would be naive to say games aren't that much different from the software you use every day. In games (with exceptions of course) you will find shiny and polished graphics, background music and other well-engineered sound samples. Games often need to handle realtime input, potentially from a variety of sources, as well as a range of output devices (read resolution). Non-functional requirements can be also more strict for games. Multiplayer games will require you to have a reliable, synchronized network connection while, at the same time, keeping the performance you need to maintain a constant frame rate.This can make for a complex system that touches on many different kinds of media and technologies. For me, games were always masterpieces of software end-product, with some of them aspiring to be recognized as pieces of art (in the classical, visual way as well as the technical, behind-the-scenes side).All this complexity has consequences for the code architecture. To our misfortune, high performance architectures usually work against good code design, a restriction you may also encounter in Unity. One of the core mechanisms that had to be designed in a special way, is the MonoBehaviours mechanism. If you ever wondered why the callbacks in MonoBehaviours aren’t implemented with interfaces or inheritance (as common sense perhaps suggests), it is for performance reasons(See Lucas Meijer's clarification in the comments). Without going into detail, this also works against the testability of MonoBehaviours. The fact that you can't instantiate a MonoBehaviour with the new operator pretty much prohibits you from using any mocking frameworks out there. It probably wouldn't be a good idea anyway with all the things that are going on behind the scene every time a MonoBehaviour is used. Intercepting this behaviour would generate lots of problems.In the end it’s all about you, and how motivated you are to write testable code. Many approaches can solve the same problem but only few will work well for test automation. If you want to write testable code, sometimes you will need to write more code than you would think is necessary. If you are still learning (shouldn't we be learning our whole life, anyway?) or just got on the test automation adventure path, you may find some of the code pieces or design assumptions as an unnecessary overhead. These, however, become a habit so quickly that you will not even notice when you start using the pro-automation designs without even thinking about it.In this blog post, I promised to show you a way to design MonoBehaviour to be able to test them afterwards. It wasn't completely true, because we won't be testing MonoBehaviours themselves. You probably already have an idea of how to implement the Humble Object Pattern to your design to make it more testable but, nevertheless, let me show you the idea implemented in a real project.Let’s create a use-case for the purpose of this example. Imagine a simple player controller that is responsible for steering a spaceship. To simplify the example, let's put it in a 2D worldspace. We want the spaceship to be able to fly around in every direction. It has a gun that can shoot straight with bullets (space-rockets?) but not more frequently than a given firing rate. The number of bullets is also limited by the capacity of the bullet holder so once you shoot all of them you need to reload. To make it more interesting, let’s the make the movement speed dependent on the spaceship's health.A Monobehaviour that will serve as a controller for our spaceship could look like this:In the FixedUpdate callback we read the input and perform the action depending on which buttons were pressed by the user. To move around the spaceship we need translate spaceship’s position with the speed constant according to the direction of the axes. As you can see in the code, the deltaX and deltaY variables are multiplications of: Time.fixedDeltaTime, the value from the input axis and the speed constant which itself is dependant on the health level.On the Fire1 event (e.g. left mouse button click) we want to check if it is possible to shoot the bullet. In the first place, we need to have at least one bullet left in the bullet holder. Secondly, we want to only allow the spaceship to shoot at certain rate (once every half a second in this case). Therefore, we check how much time has passed since the last bullet was fired. If we’re good to go, we spawn the bullet.The Fire2 event will simply reload the bullet holder.To write unit tests for this logic, we need to overcome two problems. The first one, as previously mentioned, is the non-mockable MonoBehaviour class on which we depend on via inheritance. The second problem is more general for real-time software. Our logic is dependent on time (firing rate) which makes it impossible to perform unit tests since we can’t intercept the static Time class from Unity. The good news is that all this can be solved.Let’s refactor our code a bit by applying some simple method extraction refactorization and keeping in mind that the logic methods should not reference the Unity API (Input handling and bullet instantiation in this case). The time dependency in the if statement, should be extracted to a separate method as well. The final result should look more or less like this:As you can see, the FixedUpdate method here does nothing more than passing on the input from the users to the method that does the the logic part. The firing rate check was extracted to CanFire method, that generate the result "true" if a specified amount of time has passed. This extraction is important as it will allows us write unit tests later. If we were able mock the SpaceshipMotor class right now, we would simply intercept the CanFire method and make it return true or false whenever we intended to. It would make the test time-independent. But since we can’t mock SpaceshipMotor because it inherits from Monobehaviour, we need to apply the Humble Object Pattern.How do we do that? We simply need to extract all the logic code that doesn’t use the Unity API to a separate class and introduce a reference to it to the SpaceshipMotor. Let’s look at the class again and see what to extract. The TranformPosition and InstanciateBullet use the Unity API but everything else can be extracted. I know there is also the static Time class but let me get back to that later.The last thing left to explain before we do the actual extraction is how the extracted logic communicates with the Unity API without depending on it. This is the place where the interfaces come in. The class with logic will have a reference to an interface, and I will not care about the actual implementation. To keep things simple, we can implement those interfaces directly in MonoBehaviour itself! Let’s take a look at the following 2 classes:Let’s start with the SpaceshipMotor class. The class implemented some interfaces that are responsible for transforming the position of out spaceship and instantiating the bullet respectively. The class itself got a field that refers to the SpaceshipController which implements all the logic now. The SpaceshipController class knows nothing about the SpaceshipMotor and the only thing it can do is it invoke methods from the interfaces it references.Unity won't serialize references to the interfaces. If you don’t care about serialization, simply pass the interface references while constructing the SpaceshipController class. Otherwise, you can set the references in OnEnable callback that is called every time after the serialization happens. Just for the record, the whole SpaceshipMotor class will be serialized in the usual way, it’s just the interface references that will be lost.You must have noticed the Time class reference in SpaceshipMotor. I know I said there should be no Unity API reference in here but I left it there to demonstrate a different approach for handling time dependant dependencies. Ideally, we could simply pass the Time.time value as an argument to the methods.For UML fans, this is the end result as a (simplified) UML diagram:With the decoupled SpaceshipMotor class there's nothing preventing us from writing some unit tests. Take a look at one of the tests:The test validates that you can’t fire if you have no bullets left. The test itself is structured according to the Arrange-Act-Assert pattern. In the arrange part we create object mocks with GetGunMock and GetControllerMock methods. The GetControllerMock, besides creating a mock, overrides the behaviour of CanFire method to always return true. This removes the time dependency from the controller object. Next, we set the current bullet number to 0. After that, we apply fire to the controller class and we assert if Fire has not been called on the gun controller interface.There are few more tests in the project. You can grab it from here and play with it bit. I used NSubstitute for the mocking object. We also ship a version of it with the Unity Test Tools. All of the three versions of the controller we discussed here are attached in the project.That's it from me today. I hope you enjoyed the read, and happy testing!Tomek

>access_file_
1675|blog.unity.com

Mecanim humanoids

This post explains the technology behind Mecanim Humanoids. How it works, strengths and limitations, why some choices were made and hopefully some hints on how to get the best out of it. Please refer to Unity Documentation for general setup and instructions.Mecanim Humanoid Rig and Muscle Space are an alternative solution to the standard skeleton node hierarchies and geometric transforms to represent humanoid body and animations.The Humanoid Rig is a description on top of a skeleton node hierarchy. It identifies a set of human bones and creates a Muscle Referential for each of those. A Muscle Referential is essentially a pre and post rotation with a range and a sign for each axis.A Muscle is a normalized value [-1,1] that moves a bone for one axis between range [min,max]. Note that the Muscle normalized value can go below or over [-1,1] to overshoot the range. The range is not a hard limit, instead it defines the normal motion span for a Muscle. A specific Humanoid Rig can augment or reduce the range of a Muscle Referential to augment or reduce its motion span.The Muscle Space is the set of all Muscle normalized values for the Humanoid Rig. It is a Normalized Humanoid pose. A range of zero (min= max) for a bone axis means that there is no Muscle for it.For example, the Elbow does not have a muscle for its Y axis, as it only stretches in and out (Z-Axis) and roll in and out (X-Axis). In the end, the Muscle Space is composed of at most 47 Muscle values that completely describe a Humanoid body pose.One beautiful thing about Muscle Space, is that it is completely abstracted from its original or any skeleton rig. It can be directly applied to any Humanoid Rig and it always create a believable pose. Another beautiful thing is how well Muscle Space interpolates. Compare to standard skeleton pose, Muscle Space will always interpolate naturally between animation key frames, during state machine transition or when mixed in a blend tree.Computation-wise it also performs as the Muscle Space can be treated as a vector of a scalar that you can linearly interpolate as opposed to quaternions or Euler angles.An approximation of human body and human motionEvery new skeleton rig built for a humanoid character or any animation captured will be an approximation of the human body and human motion. No matter how many bones or how good your MOCAP hardware is, the result will be an approximation of the real thing.Riggers, game companies, schools or software firms will propose their own version of what they thinks best represent the human body and motion and what will best fit their production needs.The elaboration of Mecanim Humanoid Rig and Muscle Space was confronted to some hard choices. We had to find a compromise between fast runtime and animation quality or openness and standard definition.This is a tough one. Why 2, not 3? or an arbitrary number of spines bones? Lets discard the latest, it is not about biomedical research. (Note that you can always use a Generic Rig if you absolutely need this level of precision). One spine bone is clearly under defined.Adding a second one brings you most of the way. A third or even a forth one will only give a small contribution to the final human pose. Why is this? When looking at how a human spine bends, you will notice that the part of spine that is on the rib cage is almost rigid. What remains, is a main flexion point at the base of the spine and one other at the base of the rib cage. So there are two main flexion points. Looking at a contortionist even in extreme poses clearly show this. Considering all of this we decided to have 2 spine bones for the Humanoid Rig.1 Neck BoneThis one is easier than for spine. Note that many game skeleton rigs don’t even have a neck bone and manage to do the job with only a head bone.Rotation DoFAs with most skeleton rigs (it is even more often the case for games), the Mecanim Humanoid Rig only supports rotation animation. The bones are not allowed to change their local translation relative to their parent. Some 3D packages induce a certain amount of translation on bones to simulate elasticity of articulations or squash and stretch animation. We are currently looking at adding translation DoF as it is a relatively cheap way in term of computation performance to compensate for animation quality on less detailed skeleton rigs. It would also allow users to create retargetable squash and stretch animation.No twist bonesTwist bones are often added to skeleton rigs to prevent skin deformation problems on arms and legs when they are in extreme twist configuration.Twist bones help to distribute the deformation induced by twist from start to end of the limb.In the Muscle Space, the amount of twist is represented by a Muscle and it is always associated with the parent bone of a limb. Ex: The twist on the forearm happens at the elbow and not on the wrist.Humanoid Rigs don’t support twist bones, but Mecanim solver let you specify a percentage of twist to be taken out of the parent and put onto the child of the limb.It is defaulted at 50% and greatly helps to prevent skin deformation problem.Humanoid Root and Center of massNow, what would be the best way to represent the position and orientation of human body in world space?The top most bone in hierarchy (usually hips, pelvis or whatever it is called) is where lies the world space position and orientation curves in a standard skeleton rig. While this works fine for a specific character, it becomes inappropriate when doing retargeting since from one skeleton rig to another the top most bone usually have a different position and rotation relative to the rest of the skeleton. The Muscle Space uses the humanoid center of mass to represent its position in world space. The center of mass is approximated using a human average body parts mass distribution. We do the assumption that, after scale adjustments, the center of mass for a humanoid pose is the same for any humanoid character. It is a big assumption, but it has shown to work very well for a wide set of animations and humanoid characters.It is true that for standing up or walking animations, the centre of mass lies around hips, but for more dynamic motion like a back flip, you can see how body moves away from the centre of mass and how the centre of mass feels like the most stable point over the animation.Body orientationSimilar to what the centre of mass does for Muscle Space world space position, we use an average body orientation for world space orientation. The average body orientation up vector is computed out of the hips and shoulders middle points. The front vector is then the cross product of the up vector and average left/right hips/shoulders vectors. It is also assumed that this average body orientation for a humanoid pose is the same for all humanoid rigs. As for the centre of mass, an average body orientation tends to be a stable referential as lower and upper body orientation naturally compensates when walking, running, etc.Root MotionA more in depth paper about root motion will follow, but as an introduction, the projection of the centre of mass and average body orientation is used to automatically create root motion. The fact that the centre of mass and average body orientation are stable properties of humanoid animation leads to a stable root motion that can be used for navigation or motion prediction.The scaleOne thing is still missing in Muscle Space to be a completely normalized humanoid pose… the overall size of it. Again we are looking for a way to describe the size of a humanoid that does not rely on a specific point like head bone position since it is not consistent from rig to rig. The center of mass height for a humanoid character in T-Stance is directly used as its scale. The center of mass position of the Muscle Space is divided by this scale to produce the final normalized humanoid pose. Said in another way, the Muscle Space is normalized for a humanoid that has a centre of mass height of 1 when in T-Stance. All the positions in the Muscle Space are said to be in normalized meters.Original hands and feet positionWhen applying a Muscle Space to a Humanoid Rig, hands and feet may end up in different position and orientation from the original animation, due to the difference in proportions of Humanoid Rigs. This may result in feet sliding or hands not reaching properly. This is why Muscle Space optionally contains the original position and orientation of hands and feet. The hands and feet position and orientation are normalized relative to Humanoid Root (center of mass, average body rotation and humanoid scale) in the Muscle Space. Those original positions and orientations can be used to fix the retargeted skeleton pose to match the original world space position using an IK pass.The main goal of IK Solver on arms and legs is to reach the original hands and feet position and orientation optionally found in the Muscle Space. This is what happens under the hood for feet when “Foot IK” toggle is enabled in a Mecanim Controller State.In these cases, the retargeted skeleton pose is never very far from the original IK goals. The IK error to fix is small since it is only induced by difference in proportion of humanoid rigs. The IK solver will only modify the retargeted skeleton pose slightly to produce the final pose that matches original positions and orientations.Since the IK only modifies slightly the retargeted skeleton pose, it will rarely induce animation artefacts like knee or elbow popping. Even then, there is a Squash and Stretch solver, part of IK solver, that is there to prevent popping when arms or legs come close to maximum extension. By default the amount of squash and stretch allowed is limited to 5% of the total length of the arm or leg. An elbow or knee popping is more noticeable (and ugly) than a 5% or less stretch on arm or leg. Note that squash and stretch solve can be turned off by setting it to 0%.A more in depth paper about IK rigs will follow. It will explain how to handle props, use multiple IK passes, interaction with environment or between humanoid characters, etc.Optional BonesThe Humanoid Rig has some bones that are optional. This is the case for Chest, Neck, Left Shoulder, Right Shoulder, Left Toes and Right Toes. Many existing skeleton rigs don’t have some of the optional bones, but we still wanted to created valid humanoids with those.The Humanoid Rig also supports LeftEye and RightEye optional bones. Eye bones have two Muscles each, one that goes up and down and one to move in and out. The Eye bones also work with Humanoid Rig LookAt solver that can distribute look at adjustments on Spine, Chest, Neck, Head and Eyes. There will be more about LookAt solver in the upcoming Humanoid IK rig paper.FingersFinally, the Humanoid Rig supports fingers. Each finger may have 0 to 3 digits. 0 digit simply means that this finger is not defined. The are two Muscles (Stretch and Spread) for the first digit and one Muscle (Stretch) for 2nd and last digit. Note that there is no solver overhead for fingers when no fingers are defined for a hand.Skeleton rig requirementsIn-between bonesIn many case, skeleton rigs will have more bones than the ones defined by the Humanoid Rig. In-between bones are bones that are between humanoid defined bones. For example, a 3rd spine bone in a 3DSMAX Biped will be treated as an in-between bone. Those are supported by Humanoid Rig, but keep in mind that in-between bones won’t get animated. They will stay at their default position and orientation relative to their parent defined in the Humanoid Rig.Standard HierarchyThe skeleton rig must respect a standard hierarchy to be compatible with our Humanoid Rig. The skeleton may have any number of in-between bones between humanoid bones, but it must respect the following pattern:Hips - Upper Leg - Lower Leg - Foot - ToesHips - Spine - Chest - Neck - HeadChest - Shoulder - Arm - Forearm - HandHand - Proximal - Intermediate - DistalThe T-Stance is the most important step of Humanoid Rig creation since muscles setup is based on it. The T-Stance pose was chosen as reference pose since it is easy conceptualize and that there is not that much room for interpretation of what it should be:- Standing straight facing z axis- Head and eyes facing z axis- Feet on the ground parallel to z axis- Arms open parallel to the ground along x axis- Hands flat, palm down parallel to the ground along x axis- Fingers straight parallel to the ground along x axis-Thumbs straight parallel to the ground half way (45 degrees) between x and z axisWhen saying “straight”, it does not mean bones necessarily need to be perfectly aligned. It depends on how skin attaches to skeleton. Some rig may have the skin that looks straight, but underneath skeleton is not. So it is important that the T-Stance be set for final skinned character. In the case you are creating a Humanoid Rig to retarget MOCAP data, it is good practice to capture at least of few frames of a T-Stance done by the actor in the MOCAP suite.Muscle range adjustmentsBy default muscle ranges are set to values that best represent human muscle ranges. Most of the time, they should not be modified. For some more cartoony character you may want to reduce the range to prevent arms entering body or augment it to exaggerate legs motion. If you are creating a Humanoid Rig to retarget MOCAP data you should not modify the ranges since the produced animation clip will not respect default.Retargeting and Animation ClipMecanim retargeting is split into two phases. The first phase consists of converting a standard skeleton transforms animation to a normalized humanoid animation clip (or Muscle Clip). This phase happens in the editor when the animation file is imported. It is internally called “RetargetFrom”. The second phase happens in play mode when Muscle Clip is evaluated and applied to the skeleton bones of a Humanoid Rig.It is internally called “RetargetTo”.There are two big advantages of splitting retargeting into two phases. The first one is solving speed. Half of the retargeting process is done offline, only the other half is done at runtime. The other advantage is scene complexity and memory usage. Since the Muscle Clip is completely abstracted for its original skeleton, the source skeleton does not need to be included in runtime to perform the retargeting.The second phase is straight forward. Once you have a valid Humanoid Rig, you simply apply Muscle Clip to it with RetargetTo solver. This is done automatically under the hood.The first phase, converting a skeleton animation to a Muscle Clip, may be a bit trickier. The skeleton animation clip is sampled at a fixed rate. For each sample, the skeleton pose is converted to a muscle space pose and a key is added to the Muscle Clip. Not all the skeleton rig will fit, there are so many different ways a skeleton rig can be built and animated. Some skeleton rig will produce a valid output, but with possible loss of information. We will now review what is needed to create a lossless normalized humanoid animation… the Muscle Clip.Note: By lossless we mean that retargeting from a skeleton rig to Muscle Clip and then retargeting back to the same skeleton rig will preserve the animation intact. In fact, it will be almost intact. The original twist on arms and legs will be lost and replaced by what the Twist solver computes. As explained earlier in this document, there is no representation of twist repartition in Muscle Space.- The local position of bones must be the same in the humanoid rig and in the animation file. It happens that the skeleton used to create the Humanoid Rig differs from the one in the animation file. Be sure to use exactly the same skeleton. Warnings will be sent to the console at import, if it is not the case.- In-between bones must have no animation. This often happen for a 3DSMAX skeleton where the 3rd spine bone has both translation and rotation animation on it. It also happens when Bip001 is used as Hips and that Pelvis has some animation on it. Warnings will be sent to console at import if it is not the case.- The local orientation of the in-between bone must be the same in the Humanoid Rig and in the animation file. This may happen when using Humanoid Auto Configure that relies on Skin Bind Pose to create T-Stance. Make sure that Skin Bind Pose rotation for in-between bones is the same that one found in the animation file. Warnings will be sent to console at import if it is not the case.- Except for Hips, translation animation is not supported on bones. 3DSMAX Biped sometimes puts translation animation on spine bones. Warnings will be sent to console at imp or if it is not the case.The 3DSMAX Biped is pointed as a problematic rig here. It is probably because of its popularity and the fact that we had to support many cases of it being used with Mecanim. Note that if you are going to create new animations to be used with Mecanim Humanoid Rig, you should follow the rules stated above from the start. If you want to use already existing animation that break some of the rules, it is still possible, the Mecanim retarget solver is robust and will produce valid output, but the lossless conversion can’t be guarantied.Note that if you are going to create new animations to be used with the Mecanim Humanoid Rig, you should follow the rules stated above from the start. If you want to use already existing animation that breaks some of the rules, it is still possible as the Mecanim retarget solver is robust and will produce valid output, but the lossless conversion can’t be guaranteed.

>access_file_
1676|blog.unity.com

Teleporter demo

We'd like to share with you a project that was built during the R&D period of the Physically Based Shader and Reflection probes.This benchmark project is one among several which helped us identify what improvements of functionality were necessary from an artist’s production perspective.We compared offline to realtime rendering methods and output with the aim to achieve both an increase of visual quality, and a better streamlined, smoother production workflow for artists, which will open playful possibilities for graphics to be extended beyond realism to stylism.The demo uses the Standard PBR shader and displays a range of shiny and rough metallic, plastic and ceramic materials, which naturally use the new native cubemap reflections (or HDR reflection probes). The material output in the movie is at a prototype stage and the shader is still evolving.The textures consistently changed throughout the process as the shader evolved. In total, it is composed of 30 texture sets or so, both manually authored and procedurally generated textures. At this point, scanned textures were not used whatsoever. Typically, a texture set consists of albedo, specular, gloss, occlusion and a normal map and the sizes range between 256px to 4k. Background surfaces demanded less surface detail and amount of textures. In some cases, we casually created materials by pushing sliders to adjust color and float values until it matched the references. The secondary (detail-map) slots give a layer of dust, cracks, and crevices on the surfaces, which can be spotted on the close-up camera shots.The heated up revolving core is achieved by simply animating emissive values and combining the results with HDR bloom to give a glowing hot impression.The cave is a large scaled environment and the 100 meter tall machine itself was used intentionally to challenge performance and to serve as a lighting benchmark. This asked for a variety of convoluted HDR reflection probes/cubemaps to be placed along its body that could adapt during the changes of light that gradually diminishes towards the bottom of the cave and when the heated core lights up. Certain elements use real-time reflections while many are kept to static reflections. The application of the HDR reflection probes remains true to Unity’s ideology of keeping workflows simplified and are nearly effortless to apply and use.The background scene uses directional lightmaps, while the machine is composed of partly skinned- and dynamic meshes that are hooked up to light probes and use Image-Based Lighting and a variety of light sources.To be able to see the output of the shader during production, it is crucial to have HDR rendering represented in the sceneview.We are most excited to share this short film with you and are impatient to see what our talented community can produce with the new set of tools which is coming. We are looking forward to seeing artists amaze us with their limitless creativity.

>access_file_
1677|blog.unity.com

Custom == operator, should we keep it?

When you do this in Unity:Unity does something special with the == operator. Instead of what most people would expect, we have a special implementation of the == operator.This serves two purposes:1) When a MonoBehaviour has fields, in the editor only[1], we do not set those fields to "real null", but to a "fake null" object. Our custom == operator is able to check if something is one of these fake null objects, and behaves accordingly. While this is an exotic setup, it allows us to store information in the fake null object that gives you more contextual information when you invoke a method on it, or when you ask the object for a property. Without this trick, you would only get a NullReferenceException, a stack trace, but you would have no idea which GameObject had the MonoBehaviour that had the field that was null. With this trick, we can highlight the GameObject in the inspector, and can also give you more direction: "looks like you are accessing a non initialised field in this MonoBehaviour over here, use the inspector to make the field point to something".purpose two is a little bit more complicated.2) When you get a c# object of type "GameObject"[2], it contains almost nothing. this is because Unity is a C/C++ engine. All the actual information about this GameObject (its name, the list of components it has, its HideFlags, etc) lives in the c++ side. The only thing that the c# object has is a pointer to the native object. We call these c# objects "wrapper objects". The lifetime of these c++ objects like GameObject and everything else that derives from UnityEngine.Object is explicitly managed. These objects get destroyed when you load a new scene. Or when you call Object.Destroy(myObject); on them. Lifetime of c# objects gets managed the c# way, with a garbage collector. This means that it's possible to have a c# wrapper object that still exists, that wraps a c++ object that has already been destroyed. If you compare this object to null, our custom == operator will return "true" in this case, even though the actual c# variable is in reality not really null.While these two use cases are pretty reasonable, the custom null check also comes with a bunch of downsides.- It is counterintuitive.- Comparing two UnityEngine.Objects to eachother or to null is slower than you'd expect.- The custom ==operator is not thread safe, so you cannot compare objects off the main thread. (this one we could fix).- It behaves inconsistently with the ?? operator, which also does a null check, but that one does a pure c# null check, and cannot be bypassed to call our custom null check.Going over all these upsides and downsides, if we were building our API from scratch, we would have chosen not to do a custom null check, but instead have a myObject.destroyed property you can use to check if the object is dead or not, and just live with the fact that we can no longer give better error messages in case you do invoke a function on a field that is null.What we're considering is wether or not we should change this. Which is a step in our never ending quest to find the right balance between "fix and cleanup old things" and "do not break old projects". In this case we're wondering what you think. For Unity5 we have been working on the ability for Unity to automatically update your scripts (more on this in a subsequent blogpost). Unfortunately, we would be unable to automatically upgrade your scripts for this case. (because we cannot distinguish between "this is an old script that actually wants the old behaviour", and "this is a new script that actually wants the new behaviour").We're leaning towards "remove the custom == operator", but are hesitant, because it would change the meaning of all the null checks your projects currently do. And for cases where the object is not "really null" but a destroyed object, a nullcheck used to return true, and will if we change this it will return false. If you wanted to check if your variable was pointing to a destroyed object, you'd need to change the code to check "if (myObject.destroyed) {}" instead. We're a bit nervous about that, as if you haven't read this blogpost, and most likely if you have, it's very easy to not realise this changed behaviour, especially since most people do not realise that this custom null check exists at all.[3]If we change it, we should do it for Unity5 though, as the threshold for how much upgrade pain we're willing to have users deal with is even lower for non major releases.What would you prefer us to do? give you a cleaner experience, at the expense of you having to change null checks in your project, or keep things the way they are?Bye, Lucas (@lucasmeijer)[1] We do this in the editor only. This is why when you call GetComponent() to query for a component that doesn't exist, that you see a C# memory allocation happening, because we are generating this custom warning string inside the newly allocated fake null object. This memory allocation does not happen in built games. This is a very good example why if you are profiling your game, you should always profile the actual standalone player or mobile player, and not profile the editor, since we do a lot of extra security / safety / usage checks in the editor to make your life easier, at the expense of some performance. When profiling for performance and memory allocations, never profile the editor, always profile the built game.[2] This is true not only for GameObject, but everything that derives from UnityEngine.Object[3] Fun story: I ran into this while optimising GetComponent() performance, and while implementing some caching for the transform component I wasn't seeing any performance benefits. Then @jonasechterhoff looked at the problem, and came to the same conclusion. The caching code looks like this:Turns out two of our own engineers missed that the null check was more expensive than expected, and was the cause of not seeing any speed benefit from the caching. This led to the "well if even we missed it, how many of our users will miss it?", which results in this blogpost :)

>access_file_
1678|blog.unity.com

On Hunting the Uncommon Elephant

Hunting bugs is fun. And every now and then you get away alive with a story to bore your grandkids with ("In my days, we still hunted bugs with sticks and stones" and all).GDC 2014 had another such trophy-worthy hunting safari in store for us. We were five days away from presenting Unity 5 to the world when we "spotted" (well, it was kinda hard to miss) an ugly little elephant of a bug: our shiny new 64-bit editor was randomly crashing on OSX to the point of being completely unusable. There's just nothing like being up on stage to showcase how awesome your bug reporter is every couple minutes.So, Levi, Jonathan and I dropped all the awesome stuff we're working on (more stories we want to bore our grandkids with) and went stalking. All we knew at that point was that it crashed somewhere in the native code that Mono generates at run-time.As every programmer knows, when you're faced with a bug that isn't obvious, you simply start by gathering evidence. Once you've learned enough about the bug's behavioral patterns, you'll eventually get a shot at it. And with the clock ticking, we were ready to shoot at pretty much anything.But we were stumped. For an elephant, the bug turned out to be surprisingly agile and sneaky.It seemed to happen only on OSX 10.9 although Kim saw something that looked markedly similar on Windows with his heavy duty memory debugger branch. And if you enabled Guard Malloc on earlier versions of OSX, you got what looked fairly similar as well. However, as it was crashing in random script code at arbitrary depths in the call hierarchy, it was difficult to say with certainty what was the same crash and what wasn't. And the crash could be consistent for ten consecutive runs only to be totally different for the next five.So while Kim and I waded knee-high through memory and thigh-high through assembly code, Levi ran an extensive trace on all of Mono's secret and not so secret activities to generate a gigabyte log and an editor that ran at the speed of my grandma. This yielded the first interesting insight: apparently we were always compiling the method we crashed in right before things got ugly.But what made it crash? The immediate cause was that we were trying to execute code from an invalid address. How did we get there? A bug in Mono's signal handling where we don't resume properly? A bug in Mono's JIT compiler that won't jump back properly to the compiled code? A different thread corrupting stack memory on the main thread? Fairies and grumkins? (for a bit, the latter seemed the most likely).After two days of hunting, the elephant was still well alive and out and about.So, Saturday night I equipped myself with a notebook, four different colored pens and an ample supply of beer from our trademark Unity fridge (carefully making sure I don't touch the awful canned Christmas beer we still have stuck in its crevices ). Then I spun up Unity instances until I had four different crashes frozen in the debugger, labeled them "Red Crash", "Blue Crash", "Green Crash", and "Black Crash" and went to work with my respectively colored pens to take notes and draw some not-so-pretty diagrams of everything I found.Here's my notes for Blue Crash:And that's when I made my first discovery: in every case, the stack was 16 bytes larger than it should be!That then led to the next discovery: for all crashes, looking at those extra 16 bytes turned up a return address back into the function we crashed in. From a trace it was clear that in all cases we already had executed some calls from the same method, and at first I thought the address was from the last call we had traced. However, closer inspection revealed that it was actually the return address for a call whose method had not been compiled yet!This puzzled me for a moment as in some cases there were several calls in-between the last traced method and this call that hadn't been compiled yet either. Looking closer, however, revealed that we always had jumped around them.So, then I looked at that function we apparently were supposed to return from…And there we have it (highlighted in blue): We were jumping in the wrong direction!What Mono does here is create little "trampoline" functions that contain only a call to the JIT compiler and some data encoded into the instruction stream after the call (used by the JIT compiler to know which method to compile). Once the JIT compiler has done its work, it will delete those trampolines and erase every trace of having hooked into the method call.However, the call instruction you see there is what is called a "near call" which incidentally uses a signed 32-bit offset to jump relative to the next instruction.And since a signed 32-bit number can reach only 2GB up and down and we're running 64-bit here, we suddenly knew why heap memory layout played such a crucial role in reproducing the bug: once Mono's trampolines were further than 2GB away from the JIT compiler, offsets wouldn't fit anymore into 32-bit and would get truncated when emitting the call instruction.At that point, Jonathan quickly pinpointed the right fix and by the time his Sunday was over, we had a stable working build ready in time for GDC.You all know the history from there. We successfully demoed Unity 5 at GDC 2014 to rave reviews and after launch, it quickly became the most beloved piece of software ever. Oh wait, that bit is yet to come…Before that launch, there's a whole lot more black and blue crashes to fix :).

>access_file_
1679|blog.unity.com

Occlusion culling in Unity 4.3: Troubleshooting

The following blog post was written by Jasin Bushnaief of Umbra Software to explain the updates to occlusion culling in Unity Pro 4.3.This is the last post in a three-post series. In the first one, I described the new occlusion culling system in Unity 4.3 and went through the basic usage and parameters. In the second one, I gave a list of best practices and general recommendations for getting the most out of Umbra. This last post deals with troubleshooting some common issues people tend to encounter when using Umbra.Unity offers a couple of helpers for figuring out what’s going on in occlusion culling. These visualizations may help you figuring out why occlusion culling isn’t behaving quite as you’d expect. The visualizations can be found by enabling the Visualization pane in the Occlusion window and selecting the camera.The individual visualizations can then be enabled and disabled in the Scene view, in the Occlusion Culling dialog.Let’s take a look at what the different visualizations do.The Camera Volumes visualization simply shows you, as a grey box, in which cell the camera is located. For more information on what the cells are, take a look at the first post. This is one way of figuring out how the value of smallest occluder changes the output resolution of the data, for instance. Also, if it looks like the cell bounds don’t make sense, for example when the cell incorrectly extends to the other side of what should be an occluding wall, something may be amiss.The purpose of the Visibility Lines visualization is to show you the line of sight that Umbra sees. The way it works is that Umbra will project its depth buffer back into the scene and draw lines to the furthermost non-occluded points in the camera’s view. This may help you to figure out, for instance, which holes or gaps cause “leaks” in occlusion, ultimately causing some objects to become visible. This may also reveal some dubious situations where some object that clearly should be a good occluder, doesn’t occlude anything because of, say, forgetting to enable the static occluder flag for the object.The Portals visualization will draw all the traversed portals as semi-transparent axis-aligned quads. Not only will this help you get an idea of how many portals Umbra traverses and thus help you deal with occlusion culling performance tweaking, but it also provides another way of looking at what’s in Umbra’s line of sight. So you can see if there are some spots in the scene that don’t really cause occlusion, and how the portals get placed into the scene in general.While occlusion culling should just work in Unity, sometimes things don’t go quite as you’d expect. I’ll go over the most common issues people tend to run into, and how to solve those issues in order to make your game run smoothly.Sometimes people wonder why some objects are reported visible by Umbra when in reality they seem to be occluded. There can be many reasons for this. The most important thing to understand is that Umbra is always conservative. This means that it always opts for objects being visible rather than invisible whenever there’s any uncertainty in the air. This applies to all tie-breaking situations as well.Another thing to note is that the occlusion data represents a simplified version of the scene’s occluders. More specifically, it represents a conservatively simplified version, meaning some of the occlusion erodes and loses detail.The level of detail that gets retained in the data is controlled by smallest occluder. Decreasing the value will produce higher-resolution data that should be less conservative, but at the same time, culling will lose some speed and the data will get larger.Probably the most puzzling problematic scenario is when something gets reported by Umbra as occluded even though it shouldn’t be. After all the promises of always being conservative and never returning false negatives, how can this happen?Well, there can be a couple of things going on. The first and by far the most common case is that you’re looking at something through a hole, gap or crack which gets solidified by Umbra’s voxelization. So typically the first thing you should try is to reduce the value of smallest hole and see if that fixes the issue. You can try temporarily tuning it down even quite a bit just to test if that’s the issue.There are situations where this may not be completely obvious. For instance, if you have a book shelf in your scene where individual books are marked as occluders, too large a smallest hole may cause some of the books to be occluded either by the shelf or by the other books. So again, just decreasing the value of smallest hole is probably the first thing you should try.Another case where objects may disappear is when your backface limit has been set to something less than 100 and your camera is in the vicinity of back-facing triangles. Note that the camera doesn’t have to actually be looking at the triangles nor do the triangles have to be facing away from the camera at that particular spot. It is enough that there is a topologically connected place (i.e. not behind a wall or anything) close to the camera from which some back-facing triangles can be seen.The first thing to do to remedy this is obviously try with a backface limit of 100 and see if that fixes the issue. If it does, it may make sense to modify the geometry either by re-modeling some of the assets so that they’re two-sided or solid, or just removing the static occluder flag from the problematic objects. Or if you don’t care about the occlusion data size or don’t get a huge benefit out of the backface optimization, just disabling the backface test by setting the value at 100 is of course also an option.Culling may behave strangely if your camera goes inside an occluder, or infinitesimally close to one. Typically this may occur in a game with a 3rd person camera. Because Umbra considers occluders as solid objects, culling from inside one will typically mean that most of the stuff in your scene will get culled. On the other hand, if the backface test has been enabled, many of the locations inside occluders will have been removed from the data altogether, yielding undefined results. So you should not let the camera go inside occluders!To be more specific, in general Umbra will be able to guarantee correct culling when the camera is further away from an occluder than the value of smallest hole. In most cases, going even closer will still work, but in some cases, because the limitations the voxel resolution implies on the accuracy of the occlusion data, going super close to an occluder may result in the camera being incorrectly assigned to a location inside an occluder. Hint: use the “camera volume” visualization to see in which cell the camera is located and what it looks like.Generally, when the backface test is enabled (i.e. when backface threshold is something smaller than 100), Umbra will do a better job near occluders, because it is able to detect the insides of occluders, and correspondingly dilate all valid locations slightly towards them, so that you’ll get correct results even if you go arbitrarily close to an occluder. So if you cannot prevent your camera from going very close (or even slightly inside) an occluder, the first thing you may wish to try is to set backface threshold to something smaller than 100. This will help with dilation and may fix the issue.If tweaking backface threshold does not help, or if your camera goes very deep inside an occluder, the only thing left to do is to simply remove the occluder flag from the object.The reason for slow culling is typically very simple. Umbra traverses too many portals, and thus the visibility query takes a long time. The parameter that controls the portal resolution in the occlusion data is smallest occluder. A larger value will produce a lower-resolution portal graph, which is generally faster to traverse, up to a point. There are some situations, however, where this is not the case. Specifically, when having to simplify the occluder data conservatively, sometimes the increased conservativity of a lower-resolution graph may cause the view distances to increase, and the total amount of traversed portals to increase with it as well. But this is not the most typical of situations. In general, a large smallest occluder value will produce data that is faster to process in the runtime, at the cost of reduced accuracy of the occlusion.Another, but obviously a bit more arduous way of making sure that the number of traversed portals doesn’t get out of hand is to modify the geometry of the scene so that the view distances don’t get too long in the problematic areas. Manually inserting occlusion into open areas will of course cause the traverse to terminate sooner, reducing the amount of processed portals and thus making occlusion culling faster.The speed of baking largely depends on one thing: the number of voxels that need to be processed. In turn, the number of processed voxels is defined by two factors: the dimensions of the scene and the voxel size. Assuming you can’t do much about the former, the latter you can easily control with the smallest hole parameter. A larger value will of course speed up baking. So, it may make sense to start with a relatively large value and then tune it down if your objects are incorrectly disappearing because of too aggressive occluder generation. A microscopic smallest hole may cause baking to take forever and/or to consume ridiculous amounts of memory.If baking your scene produces too much occlusion data, there are a couple of things you can try. First, changing the value of backface limit to something smaller than 100, for instance 99, 50 or even 30 may be a good start. If you do this, make sure that culling works correctly in all areas your camera may be in. See the previous post for more information.If changing backface limit is not an option, produces unpredictable results or doesn’t reduce the data size enough, you can try increasing the value of smallest occluder which determines the resolution of the occlusion data and thus has a very significant impact on the size. Note that increasing smallest occluder also increases the conservativity of the results.Finally, it’s worth noting that huge scenes will naturally generate more occlusion data than small ones. The size of the occlusion data is displayed at the bottom of the Occlusion window.In some rare cases, where the scene is vast in size and the smallest occluder parameter has been set to a super small value, Baking may fail with the error “Failure in split phase”. This occurs because the initial step of the bake tries to subdivide the scene into computation tiles. The subdivision is based on the smallest occluder parameter and when the scene is humongous in size (like, dozens of kilometers in each direction) too many computation tiles may be created, resulting in an out of memory error. This, in turn, manifests as “Failure in split phase” to the user. Increasing the value of smallest occluder and/or splitting up the scene into smaller chunks will get rid of this error.This concludes our three-post series of occlusion culling in Unity 4.3. For more information about Umbra, visit www.umbrasoftware.com.

>access_file_

[ 2013 ]

1 entry
1680|blog.unity.com

Occlusion culling in Unity 4.3: Best practices

The following blog post was written by Jasin Bushnaief of Umbra Software to explain the updates to occlusion culling in Unity Pro 4.3.This is the second post in a three-post series. In the previous post, I discussed how the new occlusion culling system works in Unity 4.3. I went over the basic usage and parameters that you need to know in order to get the best out of occlusion culling. In case you missed it, check it out here.This post gives you a list of general recommendations and tips that should help you get the best results out of occlusion culling.It may seem obvious, but of course the first thing to make sure is that your scene actually contains meaningful occlusion. Moreover, the occlusion should preferably consist of good, large occluders if possible, as opposed to fine details that only accumulate as occluders when looking at from a certain angle. Umbra will generally not be able to perform occluder fusion, so even if your lush forest with lots of foliage will occlude something behind it, it will do so only once the individual occluders are “accumulated”. So in this sense, the trees and forests in general will be pretty bad occluders from Umbra’s point of view. On the other hand, a mountain is a good occluder and Umbra will certainly be able to capture it into the occlusion data as expected.There are two main types of objects Umbra cares about: occluders and occludees. The former are just geometry and Umbra treats them basically as a single, solid model. The latter are the ones whose visibility Umbra actually tests using the occlusion data. Occluders consist of pretty much all geometry that have the “Occluder static” flag set, and unsurprisingly, occludees that have the “Occludee static” flag, respectively.As a rule of thumb and by default, you can and should set most if not all your renderers as occludees in order for Umbra to cull them. Also by default, most of your static renderers can be occluders as well. Just make sure that if your renderer is non-opaque, it shouldn’t be an occluder either. (Unity will actually issue a warning if this is the case.) This naturally includes transparent objects and such.But also, if your object contains very small holes (consider e.g. a cheese grater or a bushy plant) that you wish to see through, but reducing the value of smallest hole globally doesn’t make sense (see the previous post as to why), simply removing the occluder flag from the renderer is the correct thing to do.Furthermore, because occluders are considered solid, correct culling can typically be guaranteed if the camera doesn’t intersect an occluder. This means that if e.g. the collision system cannot prevent the camera from flying inside an occluder, you should probably remove the occluder flag in order to get meaningful results.Given the fact that Umbra does object-level occlusion culling, it doesn’t make a whole lot of sense to have objects of several kilometers in size. Such massive objects are very hard to cull, as some part of the object is almost always visible, especially combined with Umbra’s conservative culling. So, splitting up e.g. the terrain into multiple patches is typically a good idea, unless you want the entire terrain to always be visible.In terms of occlusion culling, typically the best object subdivision is a natural one, meaning that naturally distinct objects should probably kept separate in culling as well. So chunking objects too aggressively typically doesn’t help. One should group only objects that are similar in terms of visibility. On the other hand, too fine-grained subdivision may introduce some unnecessary per-object overhead. In reality, this becomes a problem only once there are tens of thousands of occludees in the scene.Maybe it should be emphasized that only the object subdivision of occludees matters. Occluders are considered to be a single big bowl of polygon soup anyway.In the previous post, I briefly described how Umbra first voxelizes the occluding geometry, groups these voxels into cells and then connects the cells with portals. In the process, Umbra is always conservative, meaning that in various cases Umbra considers the occluders slightly smaller than what they are in reality, or conversely, the empty areas slightly larger.This means that if there happens to be an unintentional hole in your occluding geometry, one which rather than getting patched up is retained by voxelization, there’s a good chance it’ll become somewhat magnified in the final output data. This may lead to surprising “leaks” in occlusion. The camera may be looking at a seemingly solid wall, but things behind the wall don’t get occluded because there’s some unnoticeably small crack somewhere.So, while voxelization does patch a lot of unintentional cracks and gaps in the occluding geometry, it’s still highly important to try to model the geometry as water-tightly as possible. In the next post, I’ll describe the Visibility Lines visualization which may help you debug these kinds of issues.Admittedly the hardest part of using Umbra is finding the right parameter values. The default values in Unity do a good job as a starting point, assuming that one Unity unit maps into one meter in your game, and the game’s scale is “human-like” (e.g. not modeled on a molecular level, nor is your typical object a planet or a gigantic killer-mech-robot).A good rule of thumb is to start with relatively large values and work your way down. In case of smallest hole, for instance, the larger value you can use, the swifter is the bake process. Thus, you should tune it down only if/when you start experiencing culling artifacts, i.e. false negatives. Similarly, starting with a relatively large value for smallest occluder typically makes sense.Then you can start adjusting it downward and see how Umbra culls better. Stop when culling starts taking up too much time and/or the occlusion data becomes too large.As for backface threshold, start with 100. If your occlusion data is too large, or if you happen to get weird results when the camera is very, very close or possibly even intersects an occluder, try using 90 or even a smaller value. More on this in the next post.In the next and final post, go over some common issues people tend to run into when using Umbra. In the meantime, go check out www.umbrasoftware.com for more information!

>access_file_