Even though I gave %85 of my daily time to this homework, I could not complete it. However incomplete this homework is, I will still explain my work.

I had not been able to add transformations support in the previous homework, so I decided to prioritize that among the things I was going to do until this homework's deadline. Until this homework, I was storing all the triangles in all meshes inside a single global container, anonymizing them. I decided to first implement transformation of objects, not rays. I implemented this and tried to see the results, but there were missing objects, etc.. I then tried to apply custom transformations to the bigger objects in the previous scenes, however, they were taking a lot of time since the my acceleration structures were not working with this transformation pre-implementation and thus I was disabling the acceleration process. Since the main concerns and debugging features of transformation in our context are not shading, a faster method to quickly preview a scene'd be useful. Thinking about this, I decided to go for a debug viewer, to be able to preview the scenes before shading them. This viewer traverses every global triangle on the scene, then for the vertex1 field of each, it calculates where on the screen the ray from that vertex towards the camera intersects on the screen. Then, it calculates which pixel on the screen is closest to that position. That pixel is then shown as green. The result is as below.

This scene was going to take too long to render before, but by the chance of the debug viewer, we could guess what it looked like. This has another advantage that; when a new scene was not showing correctly, you can have a most quick idea of the reason why -or not why. Actually; this scene was one of my such scenes that was working before I implemented transformations, but somehow decided to be a blank screen after a attempt.

The next hop of me was about correcting transformations. In my first attempt I forgot to retransform the bounding boxes of the triangles on the scene and this was the result:

I had long since realized that triangles of the same meshes should somehow be stored with the information of belonging to the same mesh in order to fully implement a desired transformation on ray tracing. This was even more needed for mesh instancing where copying each triangle of the mesh on the scene was too redundant. So far, my Object abstract virtual class had 3 children: Triangle, Sphere and AlignedBox. As a first step, I tried some designs where I'd implement meshes as bounding boxes and such. However; I later decided to give up on giving chances to crazy ideas that are uncertain to work, and I added another child class called Mesh. After creating a interface for Mesh, and some implementation, I realized that scenes that does not include sampling (the scenes of HW2 and HW1) may look much better if they are sampled. This was because the sampling'd solve the aliasing problem of edges. However, for example the vast majority of the scene where no abrupt transitions occur'd benefit much less from the sampling. This means that most of the computation was kind of being done for nothing.

I was then unaware that this homework'd not finish on time, so decided to spend some of my time on this concept of adaptive sampling. Thinking on the topic with the whole knowledge and experience of mine about related topics, I realized that there are A LOT of ways to approach to adaptive sampling. I had to and did make some decisions to pick a idea of a system to try.

Let me go over my methodology aspect by aspect. I'd first divide the screen into n*n chunks of pixels, inside where I'd apply my adaptive sampling algorithms. The decision to divide the screen was for some reasons. First of them is that, probably my first algorithms'd be unoptimized. And sequentializing the process of whole image generation'd both take longer, and harder to visualize its mid-steps.

Even designing a novel algorithm or a system from scratch oneself is considered nowadays as a hilariously long process that should not even be taken into consideration whenever any kind of alternative exist (or might exist)! Usually the work that requires creativity is considered to be able to be done by only after doctorate of philosophy. This means that trying to do such a thing and not even having a intuitive and fast visual feedback of your work would be a mistake for me. That is one of the reasons that I picked the division of the screen/image into chunks.

Actually, after seeing that sampling has effects on even the most basic scenes that do not seem like that need it in the first glance, the natural meaning of rendering by ray tracing comes up: We know that we can increase the count of the rays in the scene to approach a perfect image. In the scenes that sampling in a rate>1 is ordered, the choice of distributing the rays homogeneously among the pixels is just a choice. It is neither optimal (never optimal), nor random: It is intuitive and a heuristic. We cannot, of course, deny the fact that the boundaries of pixels -and even the fact that they have rectangular boundaries- have a role in the theoretically optimal algorithm (and the notional,optimal fitness function that scores a candidate); however, measuring this effect will be a really hard work and there exist a better distribution than the homogeneous one.

Another concept that is born now is the concept of perfectness. I as a person have a hardware that is limited in some aspects by lots of factors. And -as everything is so,- ray tracing is just a super-parametrizable choice of rendering a scene, the choice of how much computing power and time to spend on a scene may represent a trade-off. To precisely count how much computing power is spent on a scene can be really hard, and having precise control over that power may be invalidatingly challenging. However, the count of rays sent into the scene is a good estimate of this. And the minimal computing power is not sending a ray for each pixel on the scene. One can visualize that, a single ray is nearly enough to estimate the real colors of all the pixels of a full HD scene where the camera is aggresively zoomed into a small triangle whose size is much smaller than the distance of the point light. So such a scene'd not need millions of rays to render. However, the chinese dragon scene will need a lot of rays to render. My topic here is not about deciding/estimating the computing power a image needs before rendering it; rather, about changing the reader's perspective on ray tracing (or rendering in general).

Long story short, in our homeworks, we can think the -implicitly or explicitly- given sample_count variables in the scenes as a limit of relative computing power to spend before deciding the image tensor's final values. This is how perceive sampling.

After this long sample of ideas, I shall continue with my methodology. I sampled among the whole set of applicable solutions to sampling and decided to divide the image into 4096 (64x64) chunks. I divided the rays into these chunks equally, however, in each chunk, the decision to send the allocated rays wherever was free. I first tried randomly sampling this on the screen and in each sample, I'd update the closest pixel's count to that sample. This resulted in such a bunny:

The resulting shift of pixels were caused by the method: the same memory window was being used for each chunk, so if a pixel is not the closest one of any ray, the color of it was the old color. I changed this and the real color was as follows:

Then I decided to hold the records of calculated colors within the segments then for each pixel in the segment calculating the weighted average of the samples in the segment. The weights were 1/distance.

This was another approach where I pick the closest sample:

Then I thought this was increasing complexity too much, so decided to optimize. For each pixel, the samples were being traversed. Their values were being added up until any sample that is inside the pixel is reached. If that happens, the sum is discarded and the color of the pixel is that sample. If not, the color of the pixel is the weighted average of the samples in that chunk. This was my first result, where I forgot to discard the summed samples and resulting in clamped high points:

Then I fixed the issue and the result was satisfying:

This image's difference from the closest method is that this one seems a bit more blurry, but also seems less noisy.

After this, I decided to try my executable in another scene, and this weird glitch happened:

I still do not know the reason behind this bug. However, seeing this input woke me up to realize that I was getting off track, this topic was too big to add as a side feature and also there is a chance that I was unable to finish the homework in time. Then, I decided to save this work to project, and started to implement and test the features with more priority.

Mesh Instancing

That was one of the stuff that my last homework lacked, so I decided to first take care of this. Remember that I created the Mesh object last? After trying different designs, I decided this approach: Each mesh'd have a AlignedBox*, which points to a valid box object. The box object'd have a hierarchical structure, where each box can have multiple boxes and triangles. So a box'd contain the information of a mesh's triangle data. Each mesh also has a transformation matrix.

Whenever a ray is sent into the scene, the meshes in the scene are traversed. For each mesh, the ray is transformed by the inverse of the transformation matrix, then the intersection information of the transformed ray and the mesh is calculated (this information can also mean no intersection). This is then transformed again to the global coordinates. So, whenever there is a mesh instance, the mesh instance's AlignedBox* field will be pointing to the same AlignedBox object. The only difference is the material, texture, transformation etc.: the lightweight data. This is copied from the parent Mesh or read from the file during the parse process for mesh instances. My implementation was going OK and worked without errors, but my program files (which I split into 4-5 header and implementation files) were getting bigger and bigger and the maintaining was getting harder. Even the most lightweight implementation files were taking too long to compute despite their small size of functions, declarations and variables...

I did not mind the size and maintaining issues and decided to bear it until this homework ends.

Here are 2 images. that show my work in mesh instancing. The first one was problematic, but the problems were not that important and I solved them in the 2. image.

This second image is taken by reducing the sample size to 1 in order to be able to render and see my results.

HW4

After finishing the instancing feature, I started to try the scenes in the homework 4. I planned to go step by step and not try to hurry things up, because I felt that I might have insufficient time to finish all the aspects of the homework. So, gradual implementation'd be easier to maintain and see the results. Also, I did not want to randomly pick the hardest task in the homework and then fail on that only to leave the easier ones untouched.

I decided to give the perlin scene a first try. Here, I decided to give random k_diffuse values to the pixels if they have perlin textures. This was the result:

Then I made a step towards perlin noise. I changed the hash function to this: hash(float a) is fmodf(a,1). Then I made it so that red is hashed by x coordinate of the intersection, green by y, and blue by z. The effects of the dimensions can be seen here:

I then tried the diffuse image textures. First of all, I want to mention that a relatively big portion of my time is spent on the parsing part. The total percentage I spent parsing and debugging problems originating from the parsing process is approximately %35 in this homework.

If a diffuse texture has images, I'd pick a random position on that image to pick as k_diffuse:

Then, I decided to go on and implement the first really important part of this homework: getting u,v values from a position on a object that is texture mapped. I tried to implement this as a lambda function. That did not work in the first shot, since that was the first time I wrote a lambda function and this was such a complex task. The modifications I had to make to the parser were too big that I was not able to follow the track of my job. Even if I was using git, since the files were approaching to 1000-lines and the indentations were crazy, the maintaining was getting really hard. And the compilation times were not helping. The smallest implementation files were taking nearly half a minute to compile. That was killing my time and wasting my brain power to handle the complexity of the source code files. And the compile times were just getting so annoyingly unbearable that I got curious why were the compilation times too high. I searched for it to realize that the linear algebra library Eigen that I used was a header-only library. So every time I include it in a implementation file, I was adding to the compilation time. This crazily means that seperating my source code into more implementation files means much slower compilation times. Normally this is not the case and not having to recompile the implementations reduces the compilation. So, using Eigen implicitly forces you to collect all your implementation inside one single .cpp file. Which is not the industry standard nowadays. Well, one can divide the source code into only header files to make the code more manageable. However, here comes the next problem in the Eigen library: The library also uses templates to compute matrix multiplications. So, all your compile-time known matrix (or vector) multiplications are expanded during the compilation time. And if you make something like M*M*M*...*M, this means the template expansions are going to take too long and the compile time will get slower.

1MKL

The unmaintainability of my current source code, the slow compilation times, the inefficiencies of the Eigen library, my wrong initial design choices that are not feasible to solve (Because the time to solve only them are not worth and the homeworks are not going to be finished if I attempt to do so.) and knowing the industry standards of source code writing all made me think on a critical decision: rewriting the entire ray tracer from scratch and switching to a new linear algebra library. I decided to go for it, mainly because of the lack of maintainability of my code.

I picked the Intel's oneAPI Math Kernel Library (1MKL). The library is very big and maybe I still do not know %95 of its APIs. It's APIs are C APIs, so it is not object oriented. This library focuses on optimized and precompiled (.so files) code that is able to work on both CPU, GPU and distributed multi-computer systems. Of course, switching to this library solely for this reason'd be overkill. However, I thought that if not now, I could not switch to this library in this semester. I created custom classes called vector3, vector4 and matrix (actually a matrix4 but no need to specify.) classes. They held the data in a suitable manner to get used inside the functions of the 1MKL. I defined needed operators and member functions along the way and prepared my CMakeLists.txt file to handle compilation of this. I had to redefine all my objects and edit all the computations using my new vector and matrix types.

I redefined my Ray to a object that holds a vector3 for starting point and a float xz_angle and another float y_angle. The 2 floats are enough to define a direction. I did this not for the space-saving. I knew this may reduce the time. However, I did this to define a line segment as a ray with a limited length. The ray also had a length field which was INFINITE by default... My previous line segment implementation was another class called LineSegment which was making things complicated.

I also had to define my own box architecture. Eigen had a AlignedBox3f class, 1MKL is a C library without any classes. It handles vectors as float arrays.

After countless days of porting the project into its new form, in the last day of the homework (today), I was able to make it working on some of the previous homeworks' scenes. I then focused on the HW4's scenes to implement at least something. However, that was not possible.

Conclusion

One thing I could not mention is that I renewed my BVH generation logic to stop early even if max_depth is not reached (when the box has only a couple of triangles). So I do not need to manually edit that. However, my bounding box algorithm still has problems that I need to solve, since I need to keep the max_depth small in order to not wait too long.

One of the good sides of using 1MKL is that it can do computations inplace, so the results do not need to get copied redundantly such as here: vector3 v1 = v2+v3; This optimization can also be done by defining operator overloading wrappers. Because I had not much time, I wrote a really unoptimized code. I plan to improve that gradually over time.

Transforming rays become different, because now the ray directions are defined by a angle. I looked up in internet to find any way to transform angles using transformation matrices directly but I could find none. I now convert this direction angles into a direction vector, transform this vector and then re-calculate the angles. I guess transformation matrices are just one type of representing transformations, a position oriented approach. I think there exist rotation oriented transformation matrix representations too. To give a example, let us consider a transformation that only consist of a rotation around y axis. Now transforming a ray into this rotation involves 2 conversions of the direction: angles->vector3->angles. But it can easily be done by adding the rotation angle to the xz_angle component of the ray. So, I guess there are special cases that can be handled differently.

Lazy computing would benefit too much in my implementation now. I recompute the inverse every time. I can reduce this kind of computations. I plan to do some if I have time.

Search This Blog

Ray Tracing and Path Tracing in C++

Ray Tracing: Homework 4

Mesh Instancing

HW4

1MKL

Conclusion

Comments

Post a Comment

Popular posts from this blog

Adaptive Sampling in Path Tracing

My Ray Tracing Adventure (Fall 2024)

Path Tracing (Homework 6)