Evolutionary Approach to Placing Circles Homogenously in a Given Area

There’re many problems in this world that you have no idea how to solve in a perfect way, but you can at least give it an okay solution. Like, say, how to cook, or how to run. As long as it’s a computational problem, and possible to quantitatively define a score to the ideal state, that’s probably where an evolutionary algorithm comes in.

The key idea of an evolutionary algorithm, often the case with a genetic algorithm as well, is to evaluate the loss of parameter(s) and iterate doing it. And between each iteration, you want to tweak your parameters a little bit. Sometimes you can add random factors, too. Repeating iterations will hopefully lead you to a heuristic solution.

You’re strongly recommend that you plot the loss of your computation so that you can find where to stop it, otherwise your computer will be working for days or weeks.

Here’s how I code to a test problem that is placing circles homogenously in a given area. The dependencies of libraries are not so many because it’s simple enough to write the code from scratch.

And below is the plot of the progress of finding the solution. Where to stop is subject to the project’s resources such as your time or your computational capability.

from PIL import Image, ImageOps, ImageDraw, ImageGrab
import numpy as np
import random, copy, math, time
import matplotlib.pyplot as plt

# consts
fn_image = 'test.png'
c_scatter = 0.005
gen = 1200
radius = 14
c_overlap = 0.675
fn_mask = 'mask64.png'
n_ga_unit = 40
step = 4
n_move = 5

# bounding box around the flood-filled area
def bbox(im):
    im = np.array(im)
    sum_row = im.sum(axis=0)
    sum_col = im.sum(axis=1)
    idx_row = np.where(sum_row>0)
    idx_col = np.where(sum_col>0)
    left = idx_row[0].min()
    right = idx_row[0].max()
    top = idx_col[0].min()
    bottom = idx_col[0].max()
    return (left, top, right, bottom)

def crossover(a, b, idx):
    return (a[:idx]+b[idx:], b[:idx]+a[idx:])

im = Image.open(fn_image)
im = im.convert('L')
objective_size = np.asarray(im).sum()
n_iter = int(c_scatter * objective_size)
bb = bbox(im)

mask = Image.open(fn_mask).convert('L') # greyscale
score_perfect = np.asarray(mask).sum()
score_threshold = int(score_perfect * c_overlap)
mask = ImageOps.invert(mask)
black = Image.fromarray(np.reshape(np.zeros(mask.width * mask.height), [mask.width, -1]))

def score(render, x, y, r):
    im_crop = render.crop((x-r, y-r, x+r, y+r))
    output = ImageOps.fit(im_crop, mask.size, centering=(0.5, 0.5))
    output.paste(black, (0, 0), mask)
    ar = np.asarray(output)
    return (output, ar.sum())

class GAUnit:
    def __init__(self, size):
        self.mod = [None] * size
        for i in range(n_move):
            idx = random.randrange(size)
            self.mod[idx] = random.random()*2*math.pi
    def evolve_rand(self):
        size = len(self.mod)
        self.mod = [None] * size
        for i in range(n_move):
            idx = random.randrange(size)
            self.mod[idx] = random.random()*2*math.pi
    def evolve(self):
        if random.random() < 0.5:
            for r in self.mod:
                if r is not None:
                    r += random.random() - 0.5 #-0.5 <-> 0.5
    def calc_score(self, render, x, y):
        draw = ImageDraw.Draw(render)
        for i in range(len(x_plot)):
            draw.ellipse((x[i]-radius, y[i]-radius, x[i]+radius, y[i]+radius), fill='black')
        self.score = np.asarray(render).sum()
        return render

def iteration(parent, second, third):
    n_crossover = 5
    crossovers = [copy.copy(parent)] * n_crossover
    crossovers[1].mod, crossovers[2].mod = crossover(parent.mod, second.mod, random.randrange(len(parent.mod)))
    crossovers[3].mod, crossovers[4].mod = crossover(parent.mod, third.mod, random.randrange(len(parent.mod)))

    units = []
    for i in range(n_ga_unit-n_crossover):
        u = copy.copy(parent)
    units = crossovers + units

    for u in units:
        xx = copy.copy(x_plot)
        yy = copy.copy(y_plot)
        for j,deg in enumerate(u.mod):
            if deg is not None:
                xx[j] += step * math.cos(deg)
                yy[j] += step * math.sin(deg)
        u.calc_score(im.copy(), xx, yy)
    units.sort(key=lambda x: x.score)
    best = units[0]
    second = units[1]
    third = units[2]
    if best.score < parent.score:
        for j, deg in enumerate(best.mod):
            if deg is not None:
                x_plot[j] += step * math.cos(deg)
                y_plot[j] += step * math.sin(deg)
    return (best, second, third)

# initial plot positions by random
x_plot = []
y_plot = []
render = im.copy()
draw = ImageDraw.Draw(render)
for i in range(n_iter):
    w_bbox = bb[2] - bb[0]
    h_bbox = bb[3] - bb[1]
    x_try = bb[0] + random.randrange(w_bbox)
    y_try = bb[1] + random.randrange(h_bbox)
    (output, s) = score(render, x_try, y_try, radius)
    if s > score_threshold:
        draw.ellipse((x_try-radius, y_try-radius, x_try+radius, y_try+radius), fill='black')

# initial parent
unit_size = len(x_plot)
parent = GAUnit(unit_size)
render = parent.calc_score(im.copy(), x_plot, y_plot)
base_score = int(parent.score/10000)
last_score = base_score

scores = None
last_score = base_score
def update_scores(s):
    global last_score, scores
    score_delta = last_score - s
    if not scores:
        scores = [0] * 19
    last_score = s

x = []
start = time.time()
best = parent
second = parent
third = parent
for i in range(gen): 
    print('\r{}/{}'.format(i+1, gen), end='')
    best, second, third = iteration(best, second, third)
    if sum(scores) == 0:
        step -= 1
        if step < 1:
            step = 1
        n_move -= 1
        if n_move < 1:
            n_move = 1

elapsed_time = time.time() - start
print ("elapsed_time:{0}".format(elapsed_time) + "[sec]")

# result renderring
render = im.copy()
draw = ImageDraw.Draw(render)
for j in range(len(x_plot)):
    draw.ellipse((x_plot[j]-radius, y_plot[j]-radius, x_plot[j]+radius, y_plot[j]+radius), fill='black')

fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.plot(np.arange(len(x)), np.array(x))

And below is the plot of the progress of finding the solution. Where to stop is subject to the project’s resources such as your time or your computational capability.

Calibrate Camera Position using Simplified Genetic Algorithm

Speaking of robotics, in order for a robot to move precisely in the real world, it is essential yet very difficult to set the position and the angle of sensing devices, such as cameras. For this problem, calibration is commonly used, setting the camera’s position and the angle as parameters in a program. But now we have another problem. The difficulty in setting up the camera precisely also means that knowing the camera’s position is also difficult. And the parameter you should know is not only the position but also the angle and the optical features such as FOV(Field Of View) which sometimes isn’t provided by the manufacturer of the sensing device.

Calibration using genetic algorithm

One way to tackle this problem is to use the captured image to assume parameters. And calibration using the genetic algorithm can be one of the easiest solutions.

Let’s say we have a sheet of paper in which a grid pattern is printed, and the camera is seeing it.

We’re reading y-coordinates which the bottom, middle, and top pixel in the image refer to respectively. These values will be used as ground truth for the genetic algorithm.

Just for a sidenote, a simplified diagram of this setup is shown as bellow, which is mathematically easy to describe using tangent equation. The constant we want to know is theta_start, theta_fov, and z.

The greatest advantage of the genetic algorithm is that all you have to do is just to put down the mathematical equation and to give the program the ground truth and then run the program to get the optimal values. Moreover, if the loss of each iteration can be calculated
somehow, the genetic algorithm is likely to work for that problem. So it’s flexible way to search the solutions. Each iteration will evaluate and quantify the fitness of the constants. This is done with the loss function, and least-squares are proper in many cases.

Because the mathematical model, in this case, is continuous, the optimization goes pretty straightforward. If the model complex, you may have to deal with local minima issues.
Otherwise technique required for this optimazation is tweaking the parameters, not using crossover operations.

The code is shown below.

import math, random

def calc_loss(theta_start_in, theta_camera_in, z_in, y_start_in, fac):
    global params

    rand0 = fac * (random.random() - 0.5)
    rand1 = fac * (random.random() - 0.5)
    rand2 = fac * (random.random() - 0.5)
    rand3 = fac * (random.random() - 0.5)
    theta_start = theta_start_in + rand0
    theta_camera = theta_camera_in + rand1
    z = z_in + rand2
    y_start = y_start_in + rand3

    y1 = y_start + z * math.tan(math.radians(theta_start + theta_camera))
    y2 = y_start + z * math.tan(math.radians(theta_start + 0.5 * theta_camera))
    y3 = y_start + z * math.tan(math.radians(theta_start))
    y1_true = params['y1_true']
    y2_true = params['y2_true']
    y3_true = params['y3_true']
    loss = math.pow(y1 - y1_true, 2) + math.pow(y2 - y2_true, 2) + math.pow(y3 - y3_true, 2)
    return loss, theta_start, theta_camera, z, y_start

class Param():
    def __init__(self, loss, theta_start, theta_camera, z, y_start):
        self.loss = loss
        self.theta_start = theta_start
        self.theta_camera = theta_camera
        self.z = z
        self.y_start = y_start

    def __lt__(self, other):
        # self < other
        return self.loss < other.loss

# starter params
theta_start = 1.0
theta_camera = 1.0
z = 1.0
y_start = 1.0

# start learning
n_gen = 500
n_epoch = 2000
for i in range(n_epoch):

    children = []
    for j in range(n_gen):
        factor = 0.01 if j > 0 else 0

        loss_ret, theta_start_ret, theta_camera_ret, z_ret, y_start_ret = calc_loss(theta_start, theta_camera, z, y_start, factor)
        child = Param(loss_ret, theta_start_ret, theta_camera_ret, z_ret, y_start_ret)

    # 0th is the best
    children = sorted(children)
    theta_start, theta_camera, z, y_start = children[0].theta_start, children[0].theta_camera, children[0].z, children[0].y_start
    if i == 0:
        print ''
        print 'worst child: loss: {}'.format(children[len(children)-1].loss)
    if i % 500 == 0:
        print 'epoch: {}, loss: {}'.format(i, children[0].loss)

# results
best = children[0]
params['theta_start'] = best.theta_start
params['theta_camera'] = best.theta_camera
params['z'] = best.z
params['y_start'] = best.y_start
data = json.dumps(params, indent=4)
fn = 'camera_params.json'
with open(fn, 'w') as f:

print '\nlearned params:'
for k,v in params.items():
    print '{}: {}'.format(k, v)

In each iteration, the best individual, which of cource marked the least loss, is to be picked and used as a parent of the next iteration. The loss value after 2,000 epochs with 500 individuals eventually turned out to be nearly zero. And we got the optimal solutions for this problem.

Maker Faire Taipei 2019

The project PlasticAI was exhibited at Maker Faire Taipei 2019.

The faire itself was full of people and tech-enthusiasts. And exhibitors were so creative that I’ve given additional inspiration from them. Especially it was so encouraging that there were a few people who were very interested in the project PlasticAI in terms of technologies and the marine environment. I greatly appreciate it.

The main takeaway from the show was “the AI does work”. The precision of detecting plastic bottle-caps was so good in spite of the fact that no training on any negative samples was done. But it also turned out that the AI is not enough if I want to pick something up in the real world because the object detection system tells no other information but the bounding box in the input image, which means there is no way to determine the actual distance between the actuator and the target object. Some extra sensors should definitely be added to do this job better.

To demonstrate PlasticAI in the exhibition, I have built a delta robot on which the AI to be put. The robot has 3 parallel link arms and is actuated with 3 servo motors. The main computer for the detection that I used is NVIDIA Jetson nano, which can perform full YOLO with approx 3.5 FPS. I will go into the details about the robot itself in the later article.

The strength of 3d-printed parts is acceptable for this demonstration. But I should try metal parts, too.

The Faire gives me a push. PlasticAI continues…

Plastic Wastes Detection using YOLO

I’m now working on the project “PlasticAI” which is aiming for detecting plastic wastes on the beach. As an experiment for that, I have trained object detection system with the custom dataset that I collected in Expedition 1 in June. The result of this experiment greatly demonstrates the power of the object detection system. Seeing is believing, I’ll show the outcomes first.

the resulted images with predicted bounding boxes

The trained model precisely predicted the bounding box of a plastic bottle cap.

Training Dataset

Up until I conducted a model training, I haven’t been sure about whether the amount of training dataset is sufficient. Because plastic wastes are very diverse in shapes and colors. But, as a result, as far as the shape of objects are similar, this amount of dataset has been proved to be enough.

In the last expedition to Makuhari Beach in June, I’ve shot a lot of images. I had no difficulty finding bottle caps on the beach. That’s sad, but I winded up with 484 picture files of bottle caps which can be a generous amount of training data for 1 class.

The demanding part of preparing training data is annotating bounding boxes on each image files. I used customised BBox-Label-Tool [1].

a sample of bounding-box annotation

Just for convenience, I open-sourced the dataset on GitHub[2] so that other engineers can use it freely.



Training was done on AWS’s P2 instance, taking about 20 minutes. Over the course of the training process, the validation loss drops rapidly.

One thing that I want to note is that there is a spike in the middle of training, which probably means that the network escaped from local minima and continued learning.

The average validation loss eventually dropped to about 0.04, although this score doesn’t simply tell me that the model is good enough to perform intended detection.


In the test run, I used a couple of images that I have put aside from the training dataset. It means that these images are unknown for the trained neural network. The prediction was done so quickly and I’ve got the resulted images.

the resulted images with predicted bounding boxes

It’s impressive that the predicted bounding-boxes are so precise.


The main takeaway of this experiment is that detecting plastic bottle caps just works. And it encourages further experiments.


[1] BBox-Label-Tool

[2] marine_plastics_dataset

Train Object Detection System with 3 Classes

Deep-learning-based object detection is a state-of-the-art yet powerful algorithm. And over the course of the last couple of years, a lot of progress has been made in this field. It has momentum and huge potential for the future, I think.

Now is the high time for actual implementation to solve problems. The project “Microplastic AI” is aiming for building the AI that can detect plastic debris on the beach. And object detection is going to be a core technology of the project.

As in the last article, training YOLO with 1 class was a good success (Train Object Detection System with 1 Class). But in order to delve into this system even deeper, I extended the training dataset and ended up to have 3 classes. Just for your convenience, I open sourced the training data as follows.

GitHub repository:

The training data has 524 images in total. In addition to that, the dataset has text files in which bounding boxes are annotated. And some config files also come with. At first, I had no idea if this amount of data has enough feature information to detect objects, but the end result was pleasant.

Here’s one of the results.

An interesting takeaway is the comparison between the model trained with 1 class and the one with 3 classes. Prediction accuracy of the model with 3 classes obviously outperforms. I think this is because a 1 yen coin and a 100 yen coin have similar color, and having been both classes trained, the neural network seems to have learned a subtle difference between those classes. This means that if you’re likely to have similar objects

Let’s say you have an image that has an object that you’re going to detect, and visually similar objects may be in the adjacent space. In a situation like that, you should train not only your target object but also similar objects. Because that would allow you a better detection accuracy.

With this experiment done successfully, the microplastic AI project has been one step closer to reality.

Darknet official project site:
Github repository

Expedition #1

In Makuhari beach, 28th June 2019.

It’s imaginable that learning plastic fragments is challenging for the AI in many ways because plastic wastes, in general, are very diverse in shapes or colors, that makes harder to obtain the ability to generalize what plastic waste should look like. So it’ll be a good approach to split up the problem into several stages. For a starter, I’ll be focusing on detecting plastic bottle caps.

Both luckily and sadly, the beach that day was full of plastic bottle caps. And I took pictures of them with the digital camera and my smartphone, which winded up with about 500 images in total, which can be a generous amount of training data for 1 class of object.

It’s still hard for me though to tell if this works until I train the neural network. Let’s give it a shot.

Train Object Detection System with 1 Class

the predicted bounding box by YOLOv2 after training 1000 epochs

YOLO and Darknet

YOLO is a state-of-the-art object detection system, which I believe it has a significant potential for applying AI to many problem-solving. Darknet is an open source neural net framework written in C language, on which YOLO is built.

The official repository of YOLO can be found here[1], which you should read through its README for better understanding of how to use it. And the paper can be found here[2] just in case you want to delve into the concept of YOLO in depth.

This article is aiming for showing you the actual steps and commands for training YOLO. Putting the ground algorithm aside, do run YOLO by your own hands because it’s a lot easier for you to understand how it works. I believe it’ll help you with implementing your own object detection system.

Get your EC2 booted

In this tutorial, everything is going to be done on AWS using AWS’s Deep Learning AMI, which allows you to kickstart. Therefore whether your local machine is Windows or Mac doesn’t matter at all.

just hit a button like this

I strongly recommend that you should train your YOLO on Linux OS(whatever Ubuntu or Amazon Linux you choose) because compiling Darknet on Linux is way easier. I tell you this because I actually tried it both on Linux and Windows.

By the way, DLAMI(s) has been constantly updated, and the latest version will work fine.

Training YOLO definitely needs the GPU computation capability. Well, with CPU(s), it would never be going to get it done before you give it up. I chose an EC2’s P2 instance booted with DLAMI(Amazon Linux version). (P3 instances will work even better.)

Login to your EC2 console from your local machine(it can differ a bit according to your vm’s region).

$ ssh -i “<your ssh key>.pem” root@ec2-<your vm’s ip>.ap-northeast-1.compute.amazonaws.com

Switch the CUDA version to 10.

$ sudo rm /usr/local/cuda
$ sudo ln -s /usr/local/cuda-10.0 /usr/local/cuda

Copy the codebase of YOLO from the AlexeyAB’s repository.

$ cd
$ git clone https://github.com/AlexeyAB/darknet.git
$ cd darknet
$ vim Makefile

Here’s some configuration for using GPU,
on the line 1 and 2, make them like so.


Compile Darknet.

$ make

Download an initial weights file.

$ wget https://pjreddie.com/media/files/darknet19_448.conv.23

Clone training dataset and config files.

$ cd
$ git clone https://github.com/sudamasahiko/dataset100jpy
$ cp -r dataset100jpy/* darknet

Start training.

$ cd darknet
$ ./darknet detector train cfg/obj.data cfg/yolo-obj.cfg darknet19_448.conv.23

The output will be something like below.

layer filters size input output
0 conv 32 3 x 3 / 1 416 x 416 x 3 -> 416 x 416 x 32 0.299 BFLOPs
1 max 2 x 2 / 2 416 x 416 x 32 -> 208 x 208 x 32
2 conv 64 3 x 3 / 1 208 x 208 x 32 -> 208 x 208 x 64 1.595 BFLOPs
3 max 2 x 2 / 2 208 x 208 x 64 -> 104 x 104 x 64
4 conv 128 3 x 3 / 1 104 x 104 x 64 -> 104 x 104 x 128 1.595 BFLOPs
5 conv 64 1 x 1 / 1 104 x 104 x 128 -> 104 x 104 x 64 0.177 BFLOPs
6 conv 128 3 x 3 / 1 104 x 104 x 64 -> 104 x 104 x 128 1.595 BFLOPs
7 max 2 x 2 / 2 104 x 104 x 128 -> 52 x 52 x 128
8 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BFLOPs
9 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BFLOPs
10 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BFLOPs
11 max 2 x 2 / 2 52 x 52 x 256 -> 26 x 26 x 256
12 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BFLOPs
13 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BFLOPs
14 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BFLOPs
15 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BFLOPs
16 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BFLOPs
17 max 2 x 2 / 2 26 x 26 x 512 -> 13 x 13 x 512
18 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BFLOPs
19 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BFLOPs
20 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BFLOPs
21 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BFLOPs
22 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BFLOPs
23 conv 1024 3 x 3 / 1 13 x 13 x1024 -> 13 x 13 x1024 3.190 BFLOPs
24 conv 1024 3 x 3 / 1 13 x 13 x1024 -> 13 x 13 x1024 3.190 BFLOPs
25 route 16
26 conv 64 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 64 0.044 BFLOPs
27 reorg / 2 26 x 26 x 64 -> 13 x 13 x 256
28 route 27 24
29 conv 1024 3 x 3 / 1 13 x 13 x1280 -> 13 x 13 x1024 3.987 BFLOPs
30 conv 30 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 30 0.010 BFLOPs
31 detection
mask_scale: Using default ‘1.000000’
Loading weights from darknet19_448.conv.23…Done!
Learning Rate: 0.001, Momentum: 0.9, Decay: 0.0005
Loaded: 0.000044 seconds
Region Avg IOU: 0.201640, Class: 1.000000, Obj: 0.187609, No Obj: 0.530925, Avg Recall: 0.090909, count: 11
Region Avg IOU: 0.117323, Class: 1.000000, Obj: 0.381525, No Obj: 0.531642, Avg Recall: 0.000000, count: 10
Region Avg IOU: 0.156779, Class: 1.000000, Obj: 0.301009, No Obj: 0.530801, Avg Recall: 0.000000, count: 12
Region Avg IOU: 0.083861, Class: 1.000000, Obj: 0.239799, No Obj: 0.530281, Avg Recall: 0.000000, count: 10
Region Avg IOU: 0.126977, Class: 1.000000, Obj: 0.426366, No Obj: 0.531593, Avg Recall: 0.000000, count: 8
Region Avg IOU: 0.156623, Class: 1.000000, Obj: 0.337786, No Obj: 0.529291, Avg Recall: 0.000000, count: 13
Region Avg IOU: 0.134743, Class: 1.000000, Obj: 0.368207, No Obj: 0.529858, Avg Recall: 0.000000, count: 9
Region Avg IOU: 0.105239, Class: 1.000000, Obj: 0.337773, No Obj: 0.529503, Avg Recall: 0.000000, count: 11
1: 510.735443, 510.735443 avg, 0.000000 rate, 7.901008 seconds, 64 images

Sit tight until several hundreds of iteration are completed, and then hit Ctrl + c to halt training. With that be done, you can test your trained YOLO.

$ ./darknet detector test cfg/obj.data cfg/yolo-obj.cfg backup/yolo-obj_last.weights test_image.jpg

If everything is done as expected, you’ll get predictions.jpg with detected bounding box(es). Voila!



Did you get your network trained as expected? Because this technology has been frequently revised, you might have some mismatch due to the framework’s version or CUDA version. So please do update surrounding information yourself and let me know if there is any of those. And YOLO is designed to detect up to 9000 classes, so you’re greatly encouraged to try out training it for multiple classes. Application of this technology is endless, I believe.


[1] YOLO: Real-Time Object Detection

[2] You Only Look Once: Unified, Real-Time Object Detection

Nvidia GeForce MX130 Test Out

I recently replaced my office laptop with a DELL Inspiron 15 Notebook(2019), which is a standard home use laptop. This PC has an Nvidia’s entry-class laptop GPU. More precisely, the model is MX130. Just for my curiosity, I looked up this GPU and got to know that MX130 supports CUDA, Nvidia’s GPGPU platform, which means that I can use this GPU for AI training.

I have barely expected that this GPU supports CUDA. Then I wanted to find out how suitable this GPU is for training the AI.

So here is a brief comparison on training a neural network(LeNet with MNIST dataset) with other GPU and CPU.

CPU: Intel Core i5-8265U 4050.4 sec
GPU: Nvidia GeForce MX130 1266.6 sec
GPU: Nvidia Tesla K80 629.2 sec

As the chart shows, MX130 performs good enough. Maybe a good choice for training a network with a small size of dataset or natural language processing.

AI-Boosted Microplastics Detector Ep.3

building a neural network using transfer learning

If you were to build a basic image classifier, you don’t want to reinvent the wheel. That is to say, there is almost no need to train your neural network from scratch. Instead, you can “transfer learning”, in which you’re going to use the trained network weights. Otherwise, you would have to prepare tens of thousands of training images. In terms of transfer learning, there are two major approaches out there, which are pre-training and fine-tuning. In this project, there is no significant difference between those from the accuracy’s standpoint. A fine-tuning method marked 89.2% of accuracy while pre-trained with 88.5%.

The network I chose is ResNet. I chose this because it’s memory efficient and very accurate in many cases. But other good models[1] such as GoogLenet or Inception etc are also available with the deep learning library.

Of course, there are always many rooms in order to get the better end-results because tweaking a neural network requires many parameters, therefore, pushing the last 1% of the inference accuracy sometimes needs a lot of effort. At this time, putting those tweaking parameters aside, I want to compare rather more basic strategy, which is, building a network from scratch vs pre-training vs fine-tuning.

commands and programs

The script I used for the training is basically based on the official tutorial[2] code by PyTorch. On top of that, I just modified a few parameters as follows.

learning rate: 0.001
epochs: 10
batch size: 4

# usage:
# python transfer_learning.py [data directory] [logfile]

# Modified from the licensed codes below
# License: BSD
# Author: Sasank Chilamkurthy

from __future__ import print_function, division
import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim import lr_scheduler
import torchvision
from torchvision import datasets, models, transforms
import time, sys, os, copy
import numpy as np

log = open(sys.argv[2], 'w')

# Data augmentation and normalization for training
# Just normalization for validation
data_transforms = {
    'train': transforms.Compose([
        transforms.RandomResizedCrop(256, (0.5, 1.0), (1.0, 1.0)),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    'val': transforms.Compose([
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    'test': transforms.Compose([
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])

data_dir = sys.argv[1]

image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x),
                  for x in ['train', 'val', 'test']}
dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=4,
                                             shuffle=True, num_workers=4)
              for x in ['train', 'val', 'test']}
dataset_sizes = {x: len(image_datasets[x]) for x in ['train', 'val', 'test']}
class_names = image_datasets['train'].classes
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

def train_model(model, criterion, optimizer, scheduler, num_epochs=25):
    since = time.time()

    best_model_wts = copy.deepcopy(model.state_dict())
    best_acc = 0.0

    for epoch in range(num_epochs):
        print('Epoch {}/{}'.format(epoch, num_epochs - 1))
        print('-' * 10)
        log.write('Epoch {}/{}\n'.format(epoch, num_epochs - 1))
        log.write('-' * 10)

        # Each epoch has a training and validation phase
        for phase in ['train', 'val', 'test']:
            if phase == 'train':
                model.train()  # Set model to training mode
                model.eval()   # Set model to evaluate mode

            running_loss = 0.0
            running_corrects = 0

            # Iterate over data.
            for inputs, labels in dataloaders[phase]:
                inputs = inputs.to(device)
                labels = labels.to(device)

                # zero the parameter gradients

                # forward
                # track history if only in train
                with torch.set_grad_enabled(phase == 'train'):
                    outputs = model(inputs)
                    _, preds = torch.max(outputs, 1)
                    loss = criterion(outputs, labels)

                    # backward + optimize only if in training phase
                    if phase == 'train':

                # statistics
                running_loss += loss.item() * inputs.size(0)
                running_corrects += torch.sum(preds == labels.data)

            epoch_loss = running_loss / dataset_sizes[phase]
            epoch_acc = running_corrects.double() / dataset_sizes[phase]

            print('{} Loss: {:.4f} Acc: {:.4f}'.format(phase, epoch_loss, epoch_acc))
            log.write('{} Loss: {:.4f} Acc: {:.4f}\n'.format(phase, epoch_loss, epoch_acc))

            # deep copy the model
            if phase == 'val' and epoch_acc > best_acc:
                best_acc = epoch_acc
                best_model_wts = copy.deepcopy(model.state_dict())


    time_elapsed = time.time() - since
    print('Training complete in {:.0f}m {:.0f}s'.format(time_elapsed // 60, time_elapsed % 60))
    print('Best val Acc: {:4f}'.format(best_acc))
    log.write('Training complete in {:.0f}m {:.0f}s\n'.format(time_elapsed // 60, time_elapsed % 60))
    log.write('Best val Acc: {:4f}\n'.format(best_acc))

    # load best model weights
    return model

model_conv = torchvision.models.resnet18(pretrained=True)
#for param in model_conv.parameters():
#    param.requires_grad = False

# Parameters of newly constructed modules have requires_grad=True by default
num_ftrs = model_conv.fc.in_features
model_conv.fc = nn.Linear(num_ftrs, 2)
model_conv = model_conv.to(device)
criterion = nn.CrossEntropyLoss()
optimizer_conv = optim.SGD(model_conv.fc.parameters(), lr=0.001, momentum=0.9)

# Decay LR by a factor of 0.1 every 7 epochs
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_conv, step_size=7, gamma=0.1)
model_conv = train_model(model_conv, criterion, optimizer_conv, exp_lr_scheduler, num_epochs=10)

One thing that I want to note is that the fully-connected layer at the end of the network is project specific. In other words, you should set the right number of nodes according to your classes of the data.

num_ftrs = model_conv.fc.in_features
model_conv.fc = nn.Linear(num_ftrs, 2)

training with GPU

To enjoy the power of the on-demand GPU instances, cloud platforms such as AWS, GCP, or Azure will be the first choice. I chose AWS because there is machine learning OS image in which a lot of handy tools are pre-installed, which is so easy to use. And the hardware that I used is an AWS P2 GPU instance (1 K80 GPU, 4 vCPU, 61 GiB RAM). With the AWS’s deep learning AMI, smooth and quick start is possible.

It takes a few minutes to train the network.

Epoch 0/9
train Loss: 0.5825 Acc: 0.7469
val Loss: 0.3196 Acc: 0.8824
test Loss: 0.3056 Acc: 0.8883

Epoch 1/9
train Loss: 0.5767 Acc: 0.7627
val Loss: 0.4760 Acc: 0.8626
test Loss: 0.4525 Acc: 0.8755

Epoch 2/9
train Loss: 0.5955 Acc: 0.7631
val Loss: 0.3651 Acc: 0.8597
test Loss: 0.3174 Acc: 0.8715

Epoch 3/9
train Loss: 0.5804 Acc: 0.7671
val Loss: 0.3384 Acc: 0.8824
test Loss: 0.2855 Acc: 0.8874

Epoch 4/9
train Loss: 0.5861 Acc: 0.7681
val Loss: 0.3921 Acc: 0.8824
test Loss: 0.3655 Acc: 0.8834

Epoch 5/9
train Loss: 0.6172 Acc: 0.7598
val Loss: 1.0636 Acc: 0.6927
test Loss: 1.0862 Acc: 0.7026

Epoch 6/9
train Loss: 0.6108 Acc: 0.7581
val Loss: 0.6338 Acc: 0.8014
test Loss: 0.6197 Acc: 0.7955

Epoch 7/9
train Loss: 0.4601 Acc: 0.8027
val Loss: 0.3158 Acc: 0.8854
test Loss: 0.2824 Acc: 0.8903

Epoch 8/9
train Loss: 0.4466 Acc: 0.8002
val Loss: 0.3944 Acc: 0.8439
test Loss: 0.3604 Acc: 0.8508

Epoch 9/9
train Loss: 0.4320 Acc: 0.8059
val Loss: 0.3080 Acc: 0.8923
test Loss: 0.2920 Acc: 0.8903

Training complete in 25m 31s
Best val Acc: 0.892292

the achieved accuracy

The best accuracy has been marked by fine-tuning, topping 89.2%. While the one with pre-training followed with 88.5%. On the other hand, the non-pretrained network marked just 70% at best.

the accuracy and the data size

In my opinion, the appropriate amount of training data really depends on the quantity of the latent features of the training data. Hense, it’s most likely impossible to calculate the required size of your training data with one simple formula. In this case, it turned out that I have prepared too much data. More precisely, in this case, most of the features of my training data are redundant. I assume that this is because I shot each plastic samples from various angles and I have inflated the data size by a factor of 10. Regarding this point, shooting a sample from 10 different angles is way too much. Judging from the chart below, I conclude that, in this case, the optimal size of training data is 2000 and that more data don’t add essential features to the network. That means, when I have 1000 physical samples, shooting from 2 angles is very efficient.


I built a simple AI that can see if a fragment from the beach debris is whether plastic or not. With that neural network trained with my home-made training dataset, the accuracy marked 89.2%.

The pre-trained ResNet on the deep learning library PyTorch and AWS’s deep learning AMI enabled me to skip all the tasks for the setting up the work environment. This advantage allows me to focus on the training itself.

In terms of my training data, 10,000 image data for 1,000 physical samples turned out to be highly redundant. However, shooting a sample from 2 angles is a good way to enrich the latent features of the training data.

There’s a long way to go through. What I really need for this project is object detection, not a simple image classifier. The project continues.



AI-Boosted Microplastics Detector Ep.2

Previously I brought a bag of plastic debris back to home. Taking a closer look at the flotsam, in most of the cases, they’re natural fragments such as twigs or shells, which is no problem being on the beach. Therefore, the key part of this project is going to be finding and picking plastic fragments from the whole mess.

As the first step to tackle this problem, I’m trying to build something basic, a simple image classifier. It seems to be relatively straightforward because the basic algorithm for image recognition has been fully established in the last ten years. A lot of useful deep learning libraries are available out there such as Tensorflow or PyTorch. On top of that, pre-trained neural networks are also available with those libraries. These are pretty handy for rapid prototyping. However, without my own training data, it would never be able to build something specific to this project. This means that I need to make my own training dataset of plastics.

The dataset I need to prepare is a whole bunch of image data, which is categorized into two classes. Firstly I randomly selected 500 plastics and 500 natural fragments from the whole samples.


To shoot these samples, I built a custom device which has 2 web cames and a flashlight and a rotating platform so that I can shoot objects from various angles. By the way, my 3D printer(Flashforge Adventurer3) worked great to put all parts together. Although I’m not sure about how much additional information to be given by multi-angled shooting, it won’t do any harm anyway.

The simplest image classification task is to predict a label to an unknown image, such as famously known as MNIST image classification. Prediction for an image that has only one object in the center position on a clear background will assumably be the easiest. So I made dataset like this.

I ended up having 10,000 jpeg images with 1,000 objects. Although the dataset contains some abnormal data because my webcams occasionally fail. With that being done, let’s move on to the training part. Continues to the Ep.3