Thursday, 30 April 2009

Heading Towards the End

Most of the work I have been doing lately has been on writing up my actual dissertation. While I'm getting there in terms of quantity I really need to put in diagrams, results tables and generally detail to clear things up. I've tried to finish up working on the application but I continually find myself toying with little things.

Most changes have been minor optimisations, things such as; streamlining the shader and vertex structures, reducing the number of unecessary temporary variables. I rethought the Indices to reduce Draw Indexed Primitive calls from 128 to 65, ideally it could be reduced to 2 but that would involve finding a way to pass the random number to the geometry shader for each strip.

I spent some time trying to arrange the hair strips in a more sensible fashion though I have not had much success with that, it's hard to shake the blockiness in particular.



The collision detection has been improved, the forces being calculated most of the time were far too big because the distances weren't normalised. The result of this being the hair would thrash wildly rather than trying to find equilibrium. Generally all the collision detection manages to do though is give the hair a bit of volume, intersections are still very likely to occur and create visual artefacts due to the polygons passing through each other when they do.

I toyed with a few features of DirectX 10 but lack of clarity in the available documentation left me lost as to how to actually utilise features such as Texture2DMS which could perhaps potentially have been useful in the shadow mapping.

I find myself not sure whether I'm pleased or disappointed with the application as it stands. Performance wise it reaches acceptable standards but it falls short on many areas of Appearance and Behaviour.

Thursday, 9 April 2009

Shadows and Optimisation



A little overdue for an update this time as some things got in the way briefly. I've managed to implement some basic shadow mapping which took about 3 days in all. Shadow mapping is a fairly simple concept. A depth map is made from the light's world, view and orthographic/perspective. This same transform is then used to project the depth map onto the hair in the pixel shader. As much as i would have liked to try and do Deep Opacity mapping for the shadows I suspect time is now far too short to try to implement them. Also performance is becoming an issue as it is beginning to fall further beyond acceptable levels.

Other than shadow mapping I've made a few minor optimisations. I managed to move the calculation of part of the diffuse lighting term to the vertex shader. Also I changed the way I was doing the WVP transforms for the hair strips. I suspect the major bottlenecks in the application are focussed around the vertex buffer.

Also I've been trying to work on arranging the hairs better though that's proving awkward without making a mess.

Tuesday, 31 March 2009

Some Collisions, Multi-Threading & New Lighting

Got a few things done this week. The first being a basic collision detection setup.

The collision system had to be fast so the most basic starting point is to use spheres to check the collisions. The opted for system was somewhat inspired by Nvidia's "Pearls" method for the Nalu demo and the theory behind the Adaptive Wisp tree. In basic terms, there are collision spheres arranged along the key hairs with respect to how many spheres are intended in total. When these spheres collide repulsive forces to keep them apart are added to the physics calculation. On top of this for speed the hairs are checked only against the hairs nearest to them for collisions. This is done by using a 3x3 filter on an array. The problems that still remain with the collisions though are that hairs only react to collisions rather than trying to prevent. This leads to a lack of equlirium if the collision spheres come into contact in the hair's rest position.

As the application is beginning to struggle for speed is seemed about time to implement a large optimisation. This quite simply was to remove the physics and related functon calls to a separate thread and leave only the rendering in the main program. As the only shared resource between these tasks is the vertex buffer which is only used in one function a single mutex wrapped around these sections seems to suffice. The performance gain was quite substantial. Obviously this optimisation would not work on single core CPUs.

Another improvement is the implementation of the Kajiya and Kay lighting model. This lighting model works on tangents, treating the hair as an infinitely thin cylinder. The end result is a much more distributed specular highlight and better lighting in general .

Thursday, 26 March 2009

Transparency

Finally starting to get things moving in the area of effects for the hair. It may not look like much progress but the problem in implementing it was probably due to the early z-test in the graphics pipeline. What seemed to be happening was akin to the problems before Alpha to Coverage(basically a method for multi sampling pixels in terms of depth to get the final blended colour) was introduced and the colours of most back pixels were not being alpha blended into the final result. After looking around and seeing some hints it seemed that even with Alpha to Coverage implemented it remains wise to implement a pre-z pass also to prevent geometry being removed by the early z-test.

Due to the fact that back objects must be drawn they had to be ordered to drawn in a back-to-front order. The sort method selected to use was stl sort as it is unlikely that I could make a faster algorithm given the time in particular. The initial pre z pass was too slow at first, so slow you could see the objects being arranged. This was mainly due to the fact that it was rearranging whole classes which of course are large sections of memory. A slight rethink and rearrange later had the classes being stored as pointers, which then makes rearranging with a compare function handed to stl sort almost as fast as arranging a list of intergers.

Thursday, 19 March 2009

LoD, Interpolated Hairs and Length Constraints

A large step in making the application more viable for use in an actual game is the Level of Detail filtering. As the hairs are built of polygonal strips it allows for a relatively simple algoritm to be implemented using an Index Buffer. Strips at current are built of 64 vertices, 2x8 for each section of the cubic splines. This number is handy also in that it is a power of 2 which allows recursive halving of the number of vertices making up a strip, i.e. 64, 32, 16, 8. This means that instead of creating new vertex data for each detail level only different sets of indices are needed to render with the same data in the vertex buffer. These sets of indices will iterate through every vertex, every second pair of vertices, every fourth pair of vertices and every eighth set of vertices. While the actual implementation is not as smooth as this theory suggests the addition has become critical to the performance with the addition of a geometry shader.


The interpolated hairs are added to the application using a geometry shader. Geometry shaders are a relatively new addition to the shader architecture but of most significance is their ability to produce more primitives than what they take in. The basis for duplicating the hairs using the geometry shader is simple:
  1. Take in three vertices
  2. Add those vertices to the current triangle stream
  3. Restrart the strip
  4. Add those same vertices again but slightly displaced
  5. Repeat 4 and 5 as needed
The main problem to note when using the geometry shader is that things can quickly become expensive as all these strips must still pass through the pixel shader after being created.


Length constraints proved elusive at first and the problems caused by slight inconsistencies proved to come in an interesting range. One of the first mistakes when implementing was using the old vertex data as opposed to the new, this for a while had vertices shooting off into infinity. A good deal of head scratching, debugging and class rearranging solved this problem to present another. The next problem in the constraints is probably best described as imploding/exploding hair as the very rigid constraints and the mass spring system seemed to have conflicting aims. At this points it must be pointed out that the method of constraint went something like this:

vMoving = (vFixed) + Normalize((vMoving) - (vFixed)) * (fIntendedDistance);

Where pmoving is the vector to be moved and pfixed is static. This method it turns out is only viable for so much of the constraints. To prevent the imploding/exploding effect most of the points for the hair must be constrained using a slightly different approach which is:

vector3 vDelta = (v2) - (v1);
float fDistance = vDelta .x*vDelta .x+vDelta .y*vDelta .y+vDelta .z*vDelta .z;
fDistance = sqrt(fDistance);
vDelta *= 0.5f * (((fIntendedDistance) - fDistance) / fDistance);
(v1) -= vDelta;
(v2) += vDelta;


In all it now looks something like this:

Tuesday, 10 March 2009

Progress Report

Application Progress

Generally close to intended time scale at current but still slightly behind where things were hoped to be.

Framework

Most of the framework is complete though further changes may be made, especially when optimisation comes into the main focus. Original framework based on DirectX 9 experience but changes had to be made for DirectX 10 to make work easier and generally improve the tidiness and accessibility of the objects in the application.

B-Spline generating and rendering proved fairly easy, maths may need rechecked. These formed the basis for the keyhairs, the keyhairs are then rendered later on.

Rendering

Keyhairs are currently drawn as Polygon strips based on the previously mentioned B-Splines. Could probably just as easily have used Beziers but the higher level of continuity keeps the hair constrained to a fairly smooth degree.

Basic Phong shading to give the hairs ambient, diffuse and specular lighting. Has also given an awareness of the DirectX 10 and HLSL 10 systems.

Physics

Implemented Verlet Integration to handle hair physics. Updates constrained to 10 millisecond frequency due to variable time steps causing errors and to reduce the performance impact. The mass spring system has also been implemented in the hair physics. The mass spring system is based on the simple calculations of Hooke's law. Spring strength is set by dividing based strength by a quadratic equation (based on Henrik Halen's work) parameters of which are based on the distance from the “root” of the hair.

Documentation Progress

Probably lagging a little in keeping up with documentation. Definite need to work on the Blog and Dissertation more frequently.

Dissertation

Started writing the introduction sections and bullet pointing the methods and implementation sections. The main introductions is mostly based on the proposal introduction which is already written and covers most of the important information. Approx 1000 words at current.

Estimated Progress

Click the chart to view at better quality.



Friday, 6 March 2009

Verlet Integration and the Start of Hair Physics

I finally have a good start on the physics for the hair the Verlet integration was fairly simple to integrate into the application. The first force I applied to it, which is the only force that'll be almost constant, was gravity and the result was infinitely falling hair which was amusing. There were some odd issues though to do with variable time stepping. I narrowed it down to the time in the equation for drag/air resistance:

m_vecForceAccumulator[iMember] -=
m_fDrag * (vPosition - m_vecOldPositions[iMember]) / APPROX_DELTA_TIME;

and found that as has been previously noted that fixed time stepping prevents the problem occurring in the first place. There was another serious problem to note with the fixed time stepping too and this was due to the need to recalculate normals and re-Map the vertex buffer to update all the vertex information, in short the frame rate dropped from approx 2000fps to 50 fps. Locking APPROX_DELTA_TIME at 10 milliseconds (0.01 seconds) and only running the physics when the APPROX_DELTA_TIME has passed and only doing the normals and vertex buffer work if physics has been recalculated in the current loop seems to have solved some problems.

Besides adding the Verlet Integration and decreasing the amount of simulated hair strands (a.k.a. the Keyhairs that will do the work for the rest of the hair) I also added in the Mass Spring Model mainly because infinitely falling objects aren't of much use to anyone. This is a simple system based on the each strips starting position and it suspends the hairs from the starting position using springs rather than letting them fall. Also important is that the elasticity of the spring decreases the further away it is from the "root" of the hair.

No Gravity...
With Gravity

Monday, 23 February 2009

Progress thus far

As can be seen from the images further down the page a fair amount of progress has been made since the last update.

After some persuasion from others I decided to focus on programming the application in directX 10, backwards compatiabiliy can wait until a possible later date.





The layout at current can either be the relatively simple circular layout...











or based on a Bezier Surface.






The lighting that can be seen is just a basic Phong Shader at current. Working on it has allowed me to get to grips with the DirectX 10 Input Assembler, its shader system and HLSL10.

Also implemented so far is the basis of the physics system and the Verlet Integration. While working on the physics I experimented with variable time stepping just to see the results. As has been documented there seems to be some strange errors with variable time stepping. As the physics calculations are likely to become very intensive fixed time stepping at a rate of approximately 10 milliseconds per update seems the sensible update frequency to use at current.

Tuesday, 27 January 2009

Long Overdue Update - Program Structuring and Initial Implementation

The requirement of having to hand in other courseworks have meant that progress in the project area has been slow until just recently so partially the reason for no blog updating was because there just wasn't much worth reporting. Though the possibility of being unable to start implementation was planned for in the predicted timetable so besides the lack of updates the schedule is not far off the mark.

One advantage of having these other courseworks is that it has given the opportunity to learn more about optimisation and program structuring through practice. For one thing the framework I have built up for DirectX is now almost all Object Oriented. The application is currently having to be thoroughly planned before progressing much further in order to avoid massive restructuring later on, not only that but bad structuring could also hinder the application at run time. One issue when creating Objects is whether to create a new Object and pass data to it from the Objects or whether to make the Object part of a heirarchy so that it's members and functions are inherited.

One thing that I have been adding to the framework that was not originally intended is now to make the application capable of using DirectX 9 or DirectX 10. There are large differences between the two API versions but greater application portability could be helpful. It seems sensible to try a similar approach when it comes to try to implement CUDA though that is far off yet.

Using a similar approach to one of the aforementioned courseworks a system for plotting and positioning Basis Splines (B Splines) has currently been included in the application. The ends of the Basis Spline are clamped to avoid unwanted behaviour at those points. Positioning the Splines in space at current uses Rotation and Scaling matrices to result in a semi spherical pattern. This pattern is basic butwill serve for now. At current the rotation of the individual Splines is not quite what was intended, the matrix multiplication probably requires changes in order to rectify the mistake.



Optimised rendering has quickly become the main issue to consider for the work at hand. At current the application is rendering keyhairs as green lines and each line has it's own vertex buffer. This is a grossly inefficent manner of using vertex buffers and the framerate suffers as a result. Luckily the keyhairs don't need to be rendered in the final version. A single Vertex buffer and Index buffer would probably be the best approach to improving rendering efficiency.