Blog Post #12: The Outro

Hey everyone! Welcome to the final blog post of my Senior Project Experience! This week I’ll be giving a review of everything that has happened over the past 12

At the very least, I definitely would say that the senior project was a journey that had its highs and lows. Academically this project had an extremely dense amount of work. I had to cover a large amount of content in such a short amount of time (12 weeks), including learning some linear algebra and machine learning concepts, which are both very technical fields. Thankfully I got through it! However, I believe I overestimated my ability to understand these topics, because even after the experience I still don’t completely understand some of the concepts I had to deal with, primarily the principal component analysis. I feel that I may have been too ambitious with the scope of my project; I think I should have only worked with one type of classifier instead of two that have completely different architectures, as understanding how each one works took a lot of time that I would rather have had studying principal component analysis.

At first, the project was going without a hitch. I was going at a faster pace than the schedule on my syllabus. However, once I got to the very dense topics, primarily generating adversarial examples and implementing PCA, I saw that my progress slowed down as I had to study up on these topics, as they required a lot of mathematics and coding. However, the major obstacle I had was that implementing PCA in the convolutional neural network turned out to be extremely difficult, because I didn’t know how to reflect the processing of the algorithm in the classifier. I lost the most time trying to fix this issue, and I had to cut out a portion of the project involving implementing PCA in the adversarial image generator.

Overall, though, I did enjoy my Senior Project Experience, and provided me a great learning experience, both personally and academically. My final product is going to be an article that will be detailing my findings of this project. I will also be presenting about my Senior Project Experience on May 23rd at the DoubleTree Hotel near San Jose International Airport. Hope to see some of you there!

Special thanks to Ms. Jefferson, my BASIS Advisor, Ms. Belcher, my Senior Project Coordinator, Dr. Mitra, my external advisor, and my friends and familiy!

Until next time!

-Rishab

Blog Post #11: Wrapping it Up

Greetings! Welcome to the 11th blog post of my senior project! This post is going to be a short one, as this week was just testing the last piece of the project in terms of the convolutional neural network outfitted with PCA.

After the CNN was implemented with the PCA, the overall classification accuracy of the classifier was about 94.6%, which is great because it is very close to the original classification accuracy of the classifier, which was about 95.2%. However, when tested with the adversarial images, I found that the PCA implemented classifier’s accuracy was about 81.9%. When tested with the same images, the classifier without the implementation of PCA had a classification accuracy of  82.6%. Thus, in this regard, they both had similar accuracies with the adversarial images, which also means similar dropoff rates. Therefore, PCA seemed to have a minimal effect, either harmful or beneficial, on the accuracies of the classifier. This may be because of the fact that the input function of the convolutional neural network is very similar to the PCA algorithm, so that may mean that the PCA function is redundant information to the classifiers, which is why the classification accuracies were so similar.

In terms of the significance of the finding, I cannot conclude much in terms of the effectiveness of PCA in defending adversarial attacks. However, I can conclude that PCA is effective on a case-by-case basis. The algorithm was very effective on the logistic regression classifier, and as that classifier is more rudimentary, it can be an indication that PCA can be more effective on more basic classifiers, but different classifiers of the same ilk must be tested. However, PCA was not as effective on the more advanced convolutional classifier, which can be a foresight for other advanced classifiers. Overall, though, PCA cannot be considered as a universal defense.

Next week, I will be recapping and reviewing my senior project experience! I will be telling you about the highlights, the lowlights, any future work I will do, and if I want to pursue a similar topic in the future. Thank you guys for bearing with me on this wild ride!

Blog Post #10: The Home Stretch

Hey everyone! Welcome to the first-ever Post #10 for the Attacks on Self-Driving Cars blog!

The last blog post I detailed about how in terms of progress, I was about a couple weeks behind due to the large obstacle I faced with the convolutional neural network’s accuracy. Originally, the plan was to use the principal component analysis algorithm on the neural network that generated adversarial attacks and generate a new batch of images to be used on the classifiers implemented with PCA. Thus, early in the week I started to work on implementing the algorithm into the adversarial image generator. However, I hit into another roadblock this week, involving this image I displayed from Blog Post #3:

8837b-1vmby7agksx-nhc8x8_b-_w

With the supposed implementation of the PCA algorithm, the data will be reduced and reformatted. J, or the classification loss function, which minimizes the error in the neural network, however, must also change. There will be a smaller amount of dimensions to the data. Thus, with a change in dimensionality, there will be a change in how the error is calculated, so using the same loss function may result in an adversarial image that looks nothing like its original counterpart.

Thus, the question is, how must the loss function change? Unfortunately, I’m not sure I can answer this question properly. Due to the time constraints of the project, my external advisor suggested that I make the executive decision with the attack generator; rather than create entirely new adversarial images, he suggested to use the same adversarial images because my testing from Blog Post #6 showed that they were able to transfer effectively between one architecture to another, so it could be the case here.

Thus, I headed onto the next stage of the project-testing, leaving the question above for me to tackle in the future work I do in adversarial machine learning. I started with the logistic regression model, training with the original traffic sign images and testing it with the adversarial images. The classification accuracy for this model turned out to be about 71.2%, which is higher than the results from Blog Post #6. This is definitely a great sign!

Next week I will be testing the adversarial images on the convolutional neural network implemented with PCA. If the results are similar to the logistic regression model, it will show that PCA actually has a positive effect when acting as a defense on adversarial attack, which will be a great finding.

Strap in, everyone! We’re in the homestretch of the project!

-Rishab

Blog Post #9: An Update on Last Week’s Events

Hey everyone! I’m going to pick right back up from where I left off from last week, involving the problem I was having with the convolutional neural network and logistic regression fitted with the principal component analysis, where the classification accuracy of the neural networks were only about 19% and 16% respectively, which was definitely not where I wanted to be in terms of classification accuracy. I wanted to be more towards the range of 80% or above.

Well, that obstacle took me this entire week, most likely because I am not a great debugger and instead facing the problem head on and trying to find detours around it. I’ll go over in detail about what I did to fix the problem, and the various efforts I tried, many of which failed.

Originally, my mentor and I thought that there was a problem with the PCA function I used, as I coded the function myself in reference to the mathematics of the paper I linked in Blog Post #7. Therefore, I searched for any functions in the Tensorflow software library that would allow me to do the same matrix math in the PCA function I coded myself. Thankfully, I found just the right function in the Tensorflow library, called Singular Value Decomposition, or tf.svd, which also finds principal components of matrices. Therefore, with my advisor’s assistance, we replaced my PCA code with the built-in tf.svd function. However, the results were still the same, with classification accuracies of 19.6% and 17%.

The next attempt I made to fix the problem was to port the neural networks to another language by myself, which was admittedly not a very smart decision. Because I was programming in a low-level Tensorflow API, I did not have much experience in debugging at such a level. Thus, I turned to MATLAB’s neural network toolbox, a high-level software library where I have more experience debugging and programming in, and decided to port my convolutional neural network and logistic regression classifiers over to that software. Importing these classifiers in the MATLAB toolbox was surprisingly easy; there was already a function called importKerasNetwork, which called in my convolutional neural network and logistic regression classifier into the software. The toolbox also had a built-in PCA function, so I used that function to process the data and break it down into the principal components. Sadly, the classification accuracy was still the same. Looking back, going on this detour did not make much sense, as I made virtually no progress in debugging and instead changed the PCA function again, which as my advisor and I learned, was not the problem to begin with. Essentially,  I was just going in circles and wasting precious time.

With no other options, my advisor suggested that I find a fresh set of eyes to look at the convolutional neural network code and see if he, or she could fix the problem. Thus, I asked an acquaintance of mine, a graduate student at UC Berkeley who had experience with Tensorflow to look at my code. Thankfully, he found the problem, which was in the preprocessing stage of the data where the PCA was used. Unknowingly, I was using the test data to find the key principal components, which only captures a sliver of the variance in the entire dataset. Therefore, I needed more principal components for the data then was given. With that in mind, the classification accuracies are much more respectable, with the logistic regression classifier having an accuracy of 80.7%, and the convolutional neural network having an accuracy of 88.2%.

With all of these efforts to fix the problem, I now am 1 week behind in my project, as I should have been finished with this stage of the project last Friday. Luckily, I have all of Spring Break next week, so I will definitely make up for lost time.

Until next time!

-Rishab

Blog Post #8: Running Through PCA

Hey everyone, welcome to my 8th blog post! I can’t believe I’ve written 8 posts already, time has been flying by!

This week was primarily dedicated to processing the traffic sign images through the principal component analysis algorithm. I will spare you the details of how the mathematics of the algorithm works, as it involves knowledge of terminology such as eigenvalues and eigenvectors,  but I will link this paper that I’ve used as a reference for the mathematics for this entire week.

In terms of what the image data looks like when PCA is applied, I’ve taken pictures of scatter plots that represent the variance of the dataset through a program called the Tensorboard. Here are some of the images below:

With only one component, capturing about 9.8% of the data’s variance:

Screen Shot 2018-03-30 at 9.53.41 AM.png

With two components, capturing about 17.9% of the dataset’s variance:

Screen Shot 2018-03-30 at 9.54.05 AM.png

Finally ,with three components, capturing about 29.9% of the dataset’s variance (I tried to make this scatter plot into a .gif to capture the three dimensional sensibility, but unfortunately I’m not a great .gif creator):`Screen Shot 2018-03-30 at 9.54.53 AM.png

Sadly the tensorboard visualizer would only allow me to graph in three dimensions, although it is very understandable about why the software cannot visualize beyond the third dimension. Ultimately though I settled on using ten principal components, which captured about 81.2% of the dataset’s variance, which means that it passed the 80-20 rule I discussed from last week’s blog post with flying colors, as ten principal components is definitely less than 20% of the image’s 1024-dimensional vector.

I applied the principal component analysis algorithm to the  preprocessing stages of the logistic regression and the convolutional neural networks, where the data is processed to optimize the accuracy of the training algorithm. However, when applying the PCA algorithm to the convolutional neural network and logistic regression classifiers, with every iteration of the neural network I would get a consistent classification accuracy of 19% and 16% respectively. Those are clearly unacceptable classification accuracies, and when I take the PCA function out, I revert to the original classification accuracies. Therefore there must be something wrong with the PCA function, but I just can’t seem to figure it out. My advisor and I have been trying to debug the program for the past couple of days, and we haven’t made much progress in fixing the issue. Therefore I am spending some time over this weekend to fix the issue, because I have no idea how long it will take to fix it and if I spend the next week trying to figure it out, I will be behind on the project by a week. I will update this blog post if I get an acceptable classification accuracy for the convolutional neural network of about 80% and explain what caused the issue and how I fixed it. Keep your fingers crossed!

Next week, if all goes well, I will reuse the adversarial images I generated a while ago and test the classifiers on these images and see if there is any drop in accuracy. Keep your fingers crossed!

Until next time,

Rishab

 

Blog Post #7: A Primer on PCA

Hey everyone! Welcome to the first week of the second half of the project, where I will be working to create a defense against adversarial attacks on self-driving cars! The defense that I will be working with for this project is called principal component analysis, or PCA for short.

This week, my external advisor had to go on a business trip to Croatia, so I did not get much done in terms of implementing PCA into the neural networks I have been working with, as he usually helps me with understanding and coding the algorithms in Python. However, he did assign me some readings to work on for this week that detail how PCA works and how it is to be implemented into a function. Therefore, this week’s blog post is dedicated to my thoughts on the readings I perused all this week involving principal component analysis.

Essentially, the main purpose of principal component analysis is to reduce the dimensionality of a dataset consisting of many variables that may or may not be correlated with each other while retaining the essence of the dataset’s variance. In other words, it is a method of summarizing data while retaining the core qualities of the data, thus reducing the complexity of the data.

Since the pure definition of principal component analysis, is so abstract, I feel that it would be best to give a visual example of the PCA process, which was detailed very helpfully in this video here, from which I will take images from.

6god

The image above shows a dataset, that has two variables, cell 1 and cell 2. The way one identifies the first principal component is a line that has the highest amount of variance. Therefore, visually it must be a line that splits the dataset diagonally, which provides the most variance.

blocboy

Now, it is possible to reorient the data points along the black line, or the first principal component, by pushing all of the points onto the black line, thus turning the two-dimensional graph into a one-dimension graph, but that would lead to quite a large loss of data. Thus, we should also consider the second largest amount of variance, which must be orthogonal, or perpendicular to the first line. Therefore we find the line with the second most variance as the following:

jb

Essentially, the second axis, called PC2, shows that the data is reoriented along the PC1 and PC2 axes, so overall the dimensions did not change. However, creating the extra principal component in PC2 did ensure that even though there is certainly a loss of data, it is minor compared to changing the two dimensional data to another two dimensional form of the data, one that better explains the variation among the data points. Hopefully it leads to a more accurate and more robust neural network!

In terms of how to implement PCA in the self-driving car classifier, it ultimately will boil down to the size of the images. Since each image has 32 by 32 pixels, it will be a 32 X 32 matrix. If we flatten each matrix, it turns into a 1024 dimensional vector, with each component of the vector being a value describing the color intensity of the pixel. So the dataset in total will be a collection of 1024-dimensional vectors.

I am still not entirely sure about how many principal components I will need for the dataset, but I am going to abide by the Pareto rule, or 80/20 rule, so that the dataset, when applied with PCA, retains 80% of its variance and data, with only 20% of the dimensions, or principal components. Next week, since my mentor will be coming back from Croatia, I will take a stab at coding the PCA algorithm into the traffic sign classifiers and report my experiences in doing so!

Stay tuned!

-Rishab

 

 

 

 

Blog Post #6: Testing 1 2 3

Hey everyone! This week was light in terms of studying and coding, but very significant in terms of determining the direction of the project. This was the week where I tested the two traffic sign image classifiers with the adversarial images I generated last week. Generating 40,000 adversarial images was actually not that hard, albeit time-consuming, with generating 40,000 images taking about 90 minutes to generate, and I pickled the data, or in other words, packaged the data so that I can use it in another script, namely the two image classifiers I created in Week #2. Thus, I will discuss the changes in classification rate for both the logistic regression model and the neural network model.

In Week #2, the logistic regression model has a classification accuracy of 81.3%, which is definitely not a great accuracy, but a serviceable accuracy. I tested the model again with the adversarial images, and the result was that the logistic regression model had a classification of 65.6%. For the convolutional neural network, the classification accuracy was about 94.7%, which was a much better classification accuracy than the logistic regression model. I also tested this model with adversarial images, and the new classification accuracy for the model was about 82.6%. While I was certainly expecting a drop in classification accuracy for the model, I was not expecting such significant ones of about 16% and 13%. However, the drops in classification accuracy were what I was expecting, in terms of comparing them, as I expected the drop in the classification rate of the logistic regression model to be greater than the convolutional neural network, as it is a more basic, streamlined model.

Nevertheless, I am glad that I confirmed that the adversarial images are effective in their task of causing the neural networks to malfunction. This marks the end of the first half of the project. Even though there have been some obstacles in my way, I’m satisfied with my progress in this project. Now I will be proceeding with the second half – defense with principal component analysis. I imagine this will be a very dense topic, but I can’t wait to dive headfirst into it next week!

Until next time,

Rishab

Blog Post #5: Easier Said Than Done

Hey everyone! This week’s blog post is going to be relatively short, as I covered the mechanics and mathematics of how I was going to generate the adversarial examples of traffic sign images. This week was about creating a program that implements the method I described last week. Thankfully, I was able to generate some adversarial examples! I have included an image of them below:

Screen Shot 2018-03-08 at 11.01.00 PM

The two images, put side-by-side like this, definitely shows how similar adversarial examples are to their original counterparts. However, it is easy to see that there are clear differences between the two, with the adversarial image looking more pixelated and adding more noise outside the red outline. This could be due to the fact that I did have some difficulty with creating and implementing the program. For a long while all of my adversarial images were coming out as greyscale images, which was definitely not what I was looking for. This was primarily due to the fact that the gradient descent model I was using to create the adversarial images would first preprocess the images by converting every image into greyscale to increase accuracy of the program, and the algorithm would then be trained with these greyscale images, thus outputting greyscale adversarial examples. I was not completely sure of how to convert a greyscale image to RGB values, so I had to remove the preprocessing method and instead work with the original colored images. This may have lead to the adversarial image being more pixelated and slightly different from the original image. If I have extra time on the project, I will definitely try to optimize the adversarial generation program further so that the original and adversarial examples look like carbon copies of each other. It could be as simple as changing the influence the original image carries on generating an adversarial example. Otherwise, I am very happy with how the adversarial examples turned out.

The next challenge for me is to generate about 30 images for each type of traffic sign, so about 1,200 images in total and compile them into a single folder of adversarial examples. This was supposed to be implemented sometime this past week, but because I had some difficulties in causing the adversarial image program the way I wanted it to, I had to leave this task for next week. It most likely will not take that long, though, as I am sure Python has some functions that will make the process quick and painless.

Stay tuned for next week, where I will be testing my machine learning algorithms on these adversarial images and see if there is a change in their classification accuracy! I’m very excited!

Until next time,

Rishab

 

 

Blog Post #4: Neural Network Trickery

Greetings! It’s been a couple of weeks since the last blog post, but rest assured that I am still plowing through my project! Sadly, I have not been able to generate any adversarial examples from the traffic sign dataset. Instead, I’ve been focusing on learning the mathematics on how to generate these adversarial samples, which is most likely going to be the most complex and dense topic I will have to understand during the course of this project. In this blog post I will try to explain the mathematics behind the adversarial attacks. I hope to implement this next week in my code and provide you an example of what an altered traffic sign image looks like to provide further context in how the adversarial attacks looks like.

Essentially, there are two examples of adversarial attacks – targeted and non-targeted. A targeted attack is where a small amount of meticulously crafted noise is added to an image that causes a neural network to theoretically misclassify the image, even though the image will look extremely similar, or even exactly the same, to a human. Non-targeted examples are more rudimentary, in the sense that it is any input that will cause the neural network to malfunction. It does not have to look like a composed image; it can even be static or white noise. In the case of self-driving car neural networks, my type of adversarial attack will clearly be a targeted attack, as it is highly unlikely that white noise will appear anywhere on the streets.

There are many methods and functions used to generate adversarial samples. The most well-known method is called the fast gradient sign method, which is what I will be using to create these examples. The function for the fast gradient sign method is the following:

Image result for fast gradient sign method
Taken from Onfido Tech

This one line has taken me about a week to comprehend, but now that I understand it, it means that I am ready to implement in this in the two machine learning models I composed earlier. What this function boils down to is a concept called regularization, which is a way to use information already given, called a prior, to influence the results of a neural network model. Essentially, the x on the left side and the ytrue on the right side of the function acts as a prior, which means that I want the image to look like those terms. The gradient in the function will allow me to find the best value to modify the image that will trick the neural network given primarily through repeated iterations of the function. This value will then be added to the original image, which creates the adversarial image.

Now that I explained the method through which I will generate these adversarial images, I am going to attempt to implement this code next week, and hopefully I will have something to show you in terms of what an adversarial image looks like! Until next time!

Blog Post #3: Hit the Ground Running

Hi everyone! This was the first official week of my senior project, and I am very excited to recap the week’s events with you.

Essentially, this week was primarily dedicated to laying the foundation for the work I will be doing with generating adversarial images. Thus, my goal for this week was to train and test two traffic sign classifiers on a German Traffic Signs (GTSRB) dataset of different types – a logistic, or softmax, regression model, a basic algorithm used in machine learning, and a convolutional neural network, a much more complex type with many layers.

However, creating these neural networks from scratch is not an objective of mine in this project, as I do not have enough time to do so, and would rather spend that time on generating adversarial images. Thus, I looked into already-existing traffic sign classifiers online that fit the types of I was looking for.

Before I get into the machine learning aspect of this week, I want to speak a little bit on the dataset that I will be using for the project. The dataset, in total, has about 52,000 images of 32 X 32 pixel size, with Each of these 52,000 images organized into 43 classes. To give a better illustration of the dataset, I created two graphics in Python that helps explain how the dataset. The first image details the label for each of the 43 classes (apologies for the overlapping text – some of the labels were quite long and I wasn’t able to reformat it), while the second image shows the distribution of images of the total dataset.

Screen Shot 2018-02-15 at 8.16.36 PM

Screen Shot 2018-02-15 at 8.16.53 PM

While I was able to find a classifier that had convolution layers through this Github repository, I wasn’t as lucky to find a logistic regression model. However, with my external advisor’s help I was able to create a logistic regression classifier by adapting and making some changes to the code from this Tensorflow tutorial involving a dataset with handwritten digits. After training the convolutional neural network with the traffic sign dataset, the neural network had a 94.7% classification accuracy, the same as the accuracy from the Github project.  The changes I made to the Tensorflow tutorial code may have caused a drop in classification accuracy, from the tutorial’s accuracy of 91.7%  and my algorithm’s accuracy of 81.3%, but it is still understandable as the traffic sign dataset has more classes and image pixels compared to the handwritten digit dataset, which has images of 28 X 28 pixel size and only 10 classes. Still I would like to work on the logistic regression model over the weekend and ski break to try to increase the accuracy to about 90%.

Next week I will be diving into the field of creating adversarial images, which will definitely be expansive in both research and programming. Stay tuned!