**Contents**

## Implementing a Simple Sphere-Tracer

In this chapter, we will implement the sphere-tracing algorithm. We will learn:

- How to render simple shapes such as spheres and planes using sphere-tracing.
- How to shade these objects (including some lighting) using sphere-tracing, a topic we haven't talked about yet. As you know, shading objects requires normals and if we have learned in the previous chapter how to find the intersection point between a ray and an object using sphere-tracing, we haven't learned yet how to compute the normal of the object at that intersection point.
- How to transform these implicit shapes. If you remember what we said in the previous chapter, so far, we have only studied the distance estimators of "canonical" shapes centred about the origin. We will also lean how to scale, move, rotate these shapes to create more interesting compositions.

Let the real fun begin.

The section of the code that generates the primary rays is exactly the same than for ray-tracers. So there is no mystery there. We need to generate a ray origin and a ray direction. This process is explained in details in the lesson Ray-Tracing: Generating Camera Rays.

The magic happens in the function sphereTrace() as you may have guessed. Though before we look at this function, note that the parameter scene is a list of implicit objects. All implicit shapes that our program will support are derived from the base class ImplicitShape. The important method from this base class that all derived classes need to implement is the getDistance(const Vec3f& from). This is the function that will return the distance of a point to the nearest point on the surface of the object (if the distance estimator is a DIF or the underestimated distance to the nearest point if we can only use a DUF). As you know the equation we used to compute that distance is specific to each supported shape (sphere, plane, cone, torus, etc.). In this chapter we will render planes and sphere, but we will show how to render more shapes in the next chapter. Here is the source code:

Let's now write the function sphereTrace(...) where we will compute whether a given ray intersects one of the implicit surfaces making up the scene (in our example, three spheres and a plane). This function takes as arguments the ray origin and direction and a list of implicit surfaces. It will return 0 if the ray doesn't intersect anything and the color of the intersected object otherwise. However because we haven't learned how to shade implicit surfaces yet and because returning a constant color won't produce a very interesting image, instead, we will let the returned color depend on the ray step count. This will allow us to observe how many steps were required for the visible part of the scene in the image and thus which part of the scene are the most expensive to compute.

The while loop is where things happen. We will "step" along the ray, starting from the ray origin up to a maximum distance. If the ray doesn't intersect any shape, we have to stop the process at some point, so setting up a maximum distance seems like a reasonable thing to do. Some people prefer to limit then number of steps instead. At each step we then need to compute what the minimum distance between our position on the ray and all the shapes in the scene is. This is simple:

- First we need to compute our current position along the ray (line 14).
- We then loop over all the shapes in the scenes (lines 15-21) and compute the distance from our current position to the current shape, using the shape's getDistance(...) method.

Once we get the minimum distance to all shapes in the scene, we can then test if this distance is lower that our threshold (line 24). If we pass this test successfully then we have an intersection (lines 24-26). If not, then we can safely move along the ray by a distance that is at least equal to minDistance (line 28) and increment numSteps (we use this number to render our image with a false color scheme).

Note that when we test whether minDistance is lower than our threshold we actually multiply the threshold by t, the parametric distance along the ray (line 24). In world space, the threshold value increases linearly with distance as showed in figure 1. It is easy to see from the figure that at distance \(2a\) the height of the opposite side of the triangle is \(2b\) or that the segment \(c\) at distance \(a\) is length \(2c\) at distance \(2a\). This is the reason why we can multiply the threshold value by \(t\) when we do the comparaison test. This keeps the error consistent in screen space while hastening convergence. The following image is the output of our program:

By making the pixel color a function of the number of steps, we can more easily see which which parts of the scene are the most expensive to compute. If you refer to the scale on the left, you can see that the number of steps increases around the object's edges and of course as the distance to the intersected object increases. The next images shows the same scene without the plane and with shading (a simple facing ratio).

Let's see the algorithm in action for the ray passing through the pixel marked in red in the image. We picked this ray because it gets really close to an object without intersecting it before intersecting another one further back.

As you can see, the steps get smaller as the ray approaches the first object, before getting larger again as we "move" away from it. This is why pixels close the edges of objects see their step count increase.

## Shading

To shade the objects which is the next step in the rendering process we need at least the point \(P\) where the ray intersects the shape and the normal \(N\) at that point. If you need an introduction to shading please check the lessons from Volume 1 devoted to that topic (for example Introduction to Shading. Computing \(P\) is simple:

Computing the normal at \(P\) is more complicated. it requires to compute what we call the **gradient** of the function at \(P\).

Before we get into explaining what the gradient is, let's for now explain why we need it. In the first chapter of this lesson, we introduced the concept of isosurface of level surface. We said that a level surface in three-dimensional space is defined by an equation of the form \(f(x, y, z) = c\). In this lesson we always rendered the surface for which \(f(x,y,z)=0\), in other words for which \(c=0\).

So let's now explain what the gradient is. The concept of gradient is to 2D or 3D functions what the concept of derivative is to 1D function. This is a crude simplification but hopefully it will set your mind on the right track. Mathematically the concept of gradient is not limited to 2D or 3D functions but in computer graphics, these are the cases that are the most relevant. To be more specific, let's say that if you had a function \(f(x)\) you would speak of derivative, but if you had a function \(f(x,y)\) or \(f(x,y,z)\), you would speak of gradient. If the latter case, we say that that we have a function of several variables.

Now remember that when we compute the derivative of a 1D function, what the derivative gives us is the function **rate-of-change** within at the point on the function where the derivate is calculated. If the function represents the speed of an object, then the derivate at a point on that function would represent the acceleration (or rate-of-change), of the object at that point. The gradient represents the same concept but applied to function of several variables. In other words it represents the rate-of- change of the function over space. They are two important things you need to know about the gradient:

- First the gradient is a vector. For 2D functions it is a 2D vector, and for 3D functions it is a 3D vector. And that vector points in the direction of the greatest rate of increase of the function.
- Second, the magnitude (or length) of that vector also indicates how "strongly" the function changes in that direction.

You can see an illustration of these two concepts in figure 2. Imagine figure 2 as a vertical slice in the density fields whose values over space are given by an implicit function \(f(x,y,z) = c\). The isocontours from figure 2 show what the density fields looks like for five different values of \(c\). As can you see the gradients vectors are pointing out in the direction of greatest rate of change and their magnitude is also proportional to this rate (a larger gap between two adjacent lines indicates a greater rate of change). You can also visually check that these gradient vectors are perpendicular to the tangents to the curves.

The gradient of a function \(f(x,y,z)\) is denoted \(\nabla f\). The name of the symbol \(\nabla\) is *nabla* and mathematician will refer to it as the *del* operator (or vector differential operator).

The next question is: how do we compute it? In fact the answer to this question will help you to both understand why the gradient is somehow related to the concept of derivative and why it is a vector. Let's say we have an implicit equation \(f(x,y,z)=c\). Now we would like to compute the rate of change of that function at a given point \(P\) in space. How we do that? The solution is simple: you compute the gradient by essentially decomposing this problem into simpler ones. We are going to compute the rate of change of the function along the x-axis, then along the y-axis and finally along the z-axis. Why? Because we know how to do that easily. Let's for example show how it works for the x-axis:

$$\nabla f_x = \dfrac{f(x+\delta, y, z) - f(x-\delta, y, z)} {2\delta}$$The term \(\delta\) here is just a very small number like 0.00001 for example. This should hopefully remind you of something we have already come across in previous lessons, in fact even in the previous chapter. The equation \(f(x+b) - f(x + a)\) in mathematics is called a **finite difference** and if you divide the result of this equation by \((b+a)\) then you get a **difference quotient** nothing less than the derivative of the function \(f\) as \(b+a\) approaches 0:

If you want to visualise the process, you can just look at figure 3. The idea is to take a small step towards the left of \(P\) and evaluate the function there, then do the same thing to the right, subtract the two numbers and normalize (which is done by dividing the finite difference by \(h = (a+b)\). The normalisation step is not necessary unless you need the result to be normalized of course. And that results, gives us the slope of that tangent at \(P\) along the x-axis. If we repeat the process for the other two axes, then we get three numbers which as you maybe guess, put together form a three-dimensional vectors. Gradients are computed by taking the derivative of the function (we call these derivative partial derivatives) of the function \(f(x, y, z)\) in each one of the directions \(\vec i = (1,0,0)\), \(\vec j = (0,1,1)\) and \(\vec k = (0,0,1)\) and then combine the results to form a three-dimensional vector. The x-, y- and z-coordinate of that vector indicates the rate-of-change of the function \(f\) along the x-, y- and z-axis respectively.

$$ \begin{array}{l} \nabla f_x = \dfrac{f(x+h, y, z) - f(x-h, y, z)}{2h},\\ \nabla f_y = \dfrac{f(x, y+h, z) - f(x, y-h, z)}{2h},\\ \nabla f_z = \dfrac{f(x, y, z+h) - f(x, y, z-h)}{2h},\\ \nabla f = \left< \nabla f_x, \nabla f_y, \nabla f_z \right>. \end{array} $$Translating these equations into code is straightforward:

It is possible to proof that the gradient is the normal we are looking for. Let's say that we have a point \(P = (x_0, y_0, z_0)\) on the surface of the object. Since this point is on the object isorsurface we have \(f(x_0, y_0, z_0) = c\). We wish to prove that \(\nabla f|_P\) (the gradient at \(P\)) is perpendicular to the surface. Naturally, if this is true, then this gradient is also perpendicular to any curve that lies on the surface and that passes through \(P\) (figure 4). The equation of this curve will be defined as:

$$ r(t) = \left< x(t), y(t), z(t) \right>. $$In other words, each value of \(t\) defines a 3D point that we can plot. One way of looking at the above equation is to think of three parametric equations, one for each coordinate of the point. Now, let's define \(t_0\) such that \( r(t_0) = \left< x_0, y_0, z_0 \right> \). if you plug \(t_0\) in the curve parametric equation we get the coordinates of a point which is similar to \(P\). Since \(r(t)\) lies on the surface of the object we create a third function (let's call it \(g(t)\)) defined as:

$$g(t) = f(x(t),y(t),z(t)) = c,$$In other words, we plug the coordinates of the point that we computed using the curve's equation \(r\) into \(f\). The next step in the demonstrations requires to compute the first-order derivative of this equation. Understanding how you do that -is- complicated and requires to dive deep into the world of differential geometry. If we explain every step of it in this lesson, it will be twice as long, so for now let's just get straight to the result (however if you are interested in the full proof you can fist read the note that comes right after the equation. The note is not complete but it can at least provide you with some pointers. Additionally, we promise to write a lesson on differential geometry in the future -- promise):

$\dfrac {dg} {dt} = \dfrac {\partial f} {\partial x} \bigg\rvert_{P_0} \dfrac {d x} {d t} \bigg\rvert_{t_0} + \dfrac {\partial f} {\partial y} \bigg\rvert_{P_0}\dfrac {d y} {d t} \bigg\rvert_{t_0} + \dfrac {\partial f} {\partial z} \bigg\rvert_{P_0}\dfrac {d z} {d t} \bigg\rvert_{t_0} = 0.$**chain rule**. It says that the derivative of the a composite function (such as \(g(t)\)) is given by: $$g'(t) = r'(t) \cdot f'(g(t)).$$ In this particular case we have to deal with the chain rule in higher dimensions.

- If you wonder about why the terms are added up. This comes from the idea of total derivative versus partial derivative. In words, the total change in a function is equal to the sum of the partial changes. In general, this is related to the Jacobian of the function defined by $J_{ij}=\frac{\partial f_i}{\partial x_j}$ which satisfies $f(x+\delta x)-f(x)=J\delta x,$ for $\delta x$ very small. This is a local approximation for the function $f$.
- if you wonder: why we use the notation $\dfrac {\partial f} {\partial z}$ and $\dfrac {d x} {d t}$? Why using $\partial$ in one case and $d$ in the other? Why not $\partial$ or $d$ in both cases? The $\partial$ terminology means taking the derivative with all other variables fixed, whereas the $d$ terminology means the total change in the function with respect to a given change in the underlying variables. Because x,y and z are functions of one variable (t), you can use the total derivative, whereas g depends on multiple variables, so you need to have the full equation written down.

If you rewrite the terms in vector form you get:

$ \begin{array}{l} \dfrac {dg} {dt} = \left< \dfrac {\partial f} {\partial x} \bigg\rvert_{P_0}, \dfrac {\partial f} {\partial y} \bigg\rvert_{P_0}, \dfrac {\partial f} {\partial z} \bigg\rvert_{P_0} \right> \cdot \left< \dfrac {d x} {d t} \bigg\rvert_{t_0} \dfrac {d y} {d t} \bigg\rvert_{t_0} \dfrac {d z} {d t} \bigg\rvert_{t_0} \right> = 0\\ \leftrightarrow \nabla f|_P \cdot r'(t_0) = 0. \end{array} $And we know that the dot product of two vectors is 0 if they are perpendicular to each other. Thus the gradient, is perpendicular to the surface at \(P\).

## Lighting

The code for adding lighting is straightforward. We will use a technique similar to the one we studied in the lesson devoted to the ray-tracing algorithm (please check the lesson shading from Volume 1 if you need a refresher on this topic). We will loop over all the lights in the scene and compute their contribution to the shaded point's color (for the lack of a better term as we haven't touch on the topic of radiometry yet). In this example, we only support point light sources (note the square falloff):

For shadows, we will need to cast a ray from the shaded point in the direction of the light and check whether the ray intersects an object along the way to the light. If it does, the point is in the shadow of the light and the light's contribution is 0. The sphereTraceShadow() is very similar to the our sphereTrace() method. Both function use the sphere-tracing algorithm to find if the ray intersects an object but for shadows we don't need to shade the intersected point so we return from the function as soon as an intersection is found. The function returns a boolean (true if the ray intersected an object and false otherwise) rather than a color (Vec3f).

Figure 5 shows a simple scene illuminated by two point lights.

## Anti-Aliasing

We haven't touched yet on the topic of aliasing and anti-aliasing either. If you don't know what this is, then don't worry about it for now too much. If you have an idea of what these two terms mean then you can keep reading. In short though (if you are just curious) anti-aliasing helps reducing the staircase effect you can see around the edge of objects, an effect due to the discrete nature of digital image (digital images breaks a continuous space down into pixels).

The most common method to fight aliasing is to cast more than just one ray for each pixel in the image and average the contribution of these rays, a technique called (to say things quickly) supersampling. Hart though proposed another approach that is rather interesting because it takes advantage of the very nature of the sphere-tracing algorithm itself. Let's talk about this method briefly.

For anti-aliasing Harts proposes to use a technique known as **cone-tracing**. We won't detail this technique here, but the underlying idea is to replace the rays with cones (check the lesson on anti-aliasing). Computing the intersection of cones with objects is a rather difficult thing to do, but Hart noticed that the sphere-tracing algorithm offers a simple solution to this problem (even if his solution is not entirely accurate). The cone here is a crude representation of the pixel's volumetric extension into the scene (figure 8). Remember that in the sphere-tracing algorithm we move along the ray by computing the distance from our current position on the ray to closest object in the scene. Let's call this object \(O\) and the distance \(d\). The minimum distance \(d\) can be seen as the radius of a sphere (Hart calls it an *unbounding sphere*), centred around our current position \(t\) on the ray. We can observe that if the unbounding sphere's radius (\(d\)) is equal or smaller than the radius of the cone's section, then the cone intersects \(O\) as well. This idea is illustrated in figure 6 and 7.

The algorithm to render to a pixel in the image now looks as follows:

- Initialisation step: initialize the pixel's color to 0, \(t\) to 0.
- Step 1: find the nearest object in the scene from our current position \(t\) on the ray.
- Step 2: if this object is at a distance \(d\) lower than the threshold \(\epsilon\) then the ray intersects the object. Shade the object at the intersection point, add the result to the pixel's color and break from the loop.
- Step 3: if the object is at a distance lower than the cone radius then the cone intersects the object. Shade the object at the intersection point and add the result to the pixel's color (using the blending algorithm described below).
- Step 4: else (if the object is at a distance greater than the threshold and the cone radius), then move along the ray by distance \(d\) and repeat the entire loop (from step 1 to step 4) unless \(t\) is greater than the ray maximum distance. If this is the case, then break from the loop.
- Return the pixel's color.

A good estimation for the radius of the cone at distance \(t\) can be computed as follows:

$$r = 2 \dfrac {\tan(fov / 2)}{N_H} t.$$Where \(N_H\) is the image height and \(fov\) is the vertical camera field-of-view (in radians). We assume as usual, that the image plane is 1 unit away from the camera origin. If you are don't understand this construction, check the lesson Ray-Tracing: Generating Camera Rays. Then if you divide \(2\tan(\alpha/2)t\) by the number of rows in the image (you will need to use \(N_W\) instead of \(N_H\) if you use a horizontal field of view) you will get what is technically the pixel size at distance \(t\), but this value can also be used as an estimation for the radius of the cone at that distance (figure 8). In summary, when:

$$ d \le r,$$then the cone intersects the object \(O\). Each time this happens as we progress along the ray, we will compute the partial coverage of the intersected object over the entire area of the pixel, and accumulate the contribution of this object to the pixel color using alpha blending. Hart proposes to use the following metrics to compute the object's percentage coverage, the ratio object over pixel area which is lower than 1 (figure 9):

$$\alpha = \dfrac {1}{2} - \dfrac {d \sqrt{r^2 - d^2}} {\pi r^2} - \dfrac{1}{\pi}\arcsin \dfrac {d}{r},$$and use this value as the opacity value to blend the object's color \(c_1\) to the current's pixel color \(c_2\) using the classic alpha-blending formula (where \(c_1\) and \(c_2\) here are already multiplied by their respective opacity values):

$$c=c_1 + c_2 (1 - \alpha_1).$$While the principle is rather simple, implementing this algorithm is not as simple as one might expect. If you look at figure 6, you will note that we don't have one by three unbounding spheres for what is essentially a single intersection of the cone with the red sphere. So what do you do in this case? do you accumulate the contribution of the red sphere to the pixel's color three times? This would obviously be wrong. One of the solutions consists of storing the position of these unbounding spheres in a list and compute the contribution of the red sphere to the pixel's color only once, using the unbounding sphere for which the distance to the red sphere is the smallest. In our example this would be the first of the three unbounding spheres. The cone might intersect several objects along the way, generating a sequence of unbounding spheres each time, so you will need to be careful about that as well. The final implementation is left as an exercise for the reader. Will provide our own solution in a further revision of this lesson.

## Transformations

The distance estimators we developed for each type of surface were developed on the assumption that objects were centred about the origin and somehow symmetrical. In other words, the sphere can't be turned into some sort of ellipsoid shape, and the cube can't be turned into a rectangular box. So how do we transform these objects? How do we scale, translate and/or rotate them? The solution is somehow simple: we don't. Rather than transforming the object by a matrix \(M\) what we will do instead is transform the ray's origin and direction by the object inverse transform \(M^{-1}\). This method was described in detail in the lesson Transforming Objects using Matrices so we won't describe it here again. But in short the idea is to transform the ray back into the object's "object-space" (the space in which the object is before it gets transformed).

Though there is a problem when this method is used in the context of sphere-tracing. To say things quickly, you can divide transformations into two categories: isometric and non-isometric transforms. Isometric transforms are transforms that preserve distances. Rotation and translation belong to this category. Scale on the other hand doesn't preserve distances. If you scale a vertex whose position is originally at <3,3,3> by a factor of <2,2,2>, then the position of the vertex will be <6,6,6>. If you do the same thing for a point whose initial position is <0,0,0> then the position of this point after transformation will be <0,0,0>. If you now compute the distance between the first and second point before and after transformation you will get roughly 5.2 and 10.4. Cleary this transformation didn't preserve distances. If you do the same exercise with a translation (translating the first and second vertex by the same amount) you will see that distances are preserved (and with rotation too).

That distances are not preserved (when scale is applied) is a problem in the context of sphere-tracing because all our calculations are based on the assumption that we can compare distances between a point on the ray and points on the surface of the objects making up the scene (it is not in the context of ray-tracing for a reason we explain in the lesson Transforming Objects using Matrices but for sphere-tracing it is a problem). If these objects are scaled up or down in different ways, then the distances will be getting from our distance estimators won't be strictly comparable to each other. This will lead to incorrect results.

One possible solution is to separate isometric transformations such as translation and rotation from non-isometric transformations such as scale. This is tricky when you deal with 4x4 matrix: these matrices are precisely used because they combine in a single structure scale, rotation and translation. It is possible to decompose or break down a 4x4 matrix into a singular scale, rotation and translation matrix using something like a *Jacobi transformation*. Explaining this method would take a while and goes beyond the scope of this lesson (this lesson contains enough maths already right?) but hopefully there is an easier solution.

To keep things simple, it is best to keep rotation and translation separate from scale, so that you can process the two types of transformations separately. You can represent each type of transformation the way you like best, like translation as a vector and rotation as a 3x3 matrix or as Euler angles. For scale, dealing with non-uniform scale is also rather difficult. So it is best to start with a singular scale value.

Applying the inverse of a translation or rotation to a point is rather simple. Keep in mind that the inverse of an orthogonal matrix is the matrix transpose. So for 3x3 rotation matrices, you don't need to go through a complex process to get the matrix inverse. You just need to transpose it, which is simple. For scale, we will also apply the inverse of the scale \(S^{-1}\) to the ray's origin and direction, but then once you have computed the distance from the point in object space to the nearest point on the surface of the object, you will need to scale that distance back by the scale value \(S\). Your code could look as follows:

The program provided with this lesson doesn't really support transformations. We thought it would be great though to show you a practical example of the method that consists of transforming the position along the ray into the object's "object-space". If you are interested in this particular feature, check the getDistance() method of the cube's class. In this example we actually use a 4x4 matrix. We were careful to use a matrix that doesn't scale the object. It only translates and rotates the object as you can see in the image on the right though as mentioned before, if you really want to support transformations properly it is best to avoid 4x4 matrices and keep rotation, scale and translation separate. Keep in mind that this code doesn't handle non-uniform scale. Furthermore the code below is not optimised at all since we perform the matrix inverse calculation each time the getDistance() mehthod is called. This is of course not how it should be done if you write production code. You should compute the inverse matrix once in the constructor of the class and store all the transformation data (rotation, scale, translation) as member variables.