Getting started with machine learning!

 

Dervis Mansuroglu

Board games

3 in a row, GoMoku

How a machine makes moves

For a given state B and all possible moves, find the best move M given by function C.

C = B \rightarrow M
C=BMC = B \rightarrow M
V = B \rightarrow R
V=BRV = B \rightarrow R

For a given state B, calculate a score R given by the target function V.

public int getMove(Board b) {
    List<Move> moves = generateMoves(b);
    int bestMove = 0;
    for (Move : moves) {
        b.doMove(move);
        score = calcScore(b);
        if (score > bestScore) bestMove = move;
    ...
    return bestMove;
}

How a machine learns

\hat V(B) = w_0 + w_1x_1 + w_2x_2 ... + w_nx_n
V^(B)=w0+w1x1+w2x2...+wnxn\hat V(B) = w_0 + w_1x_1 + w_2x_2 ... + w_nx_n

Identify properties that represent state:

public double calcScore(Board board) {
    x1 = evaluator.evaluateConnectedPieces(board, Board.player1);
    x2 = evaluator.evaluateConnectedPieces(board, Board.player2);
    x3 = evaluator.evaluateThreats(board, N_IN_A_ROW, Board.player1);
    x4 = evaluator.evaluateThreats(board, N_IN_A_ROW, Board.player2);

    return w0 + w1*x1 + w2*x2 + w3*x3 + w4*x4;
}

Weights decide the next move

\hat V(B) = w_0 + w_1x_1 + w_2x_2 ... + w_nx_n
V^(B)=w0+w1x1+w2x2...+wnxn\hat V(B) = w_0 + w_1x_1 + w_2x_2 ... + w_nx_n

Weights, 

w_0 ... w_n
w0...wnw_0 ... w_n

, is the importance of each property

The target function      gives you a score for the next state of the board (with a new move applied)

public int getMove(Board b) {
    List<Move> moves = generateMoves(b);
    for (Move : moves) {
        nesteBrett = gjeldendeBrett.doMove(move);
        score = calcScore(nesteBrett);
        if (score > bestScore) bestMove = move;
    ...
x_1 ... x_n
x1...xnx_1 ... x_n
\hat V
V^\hat V

LMS Weight Update

Described by Tom Mitchel, Machine Learning, 1997.

Algorithm that "adjusts" the weights after each move

By playing several times, E is reduced and the weights are changed so the machine learns better.

E = Score(nextMove) - Score(previousMove)
E=Score(nextMove)Score(previousMove)E = Score(nextMove) - Score(previousMove)
w_i \leftarrow w_i + 0.1*E*x_i
wiwi+0.1Exiw_i \leftarrow w_i + 0.1*E*x_i

E = Deviation between the training set (next state) and current state.

Genetic Algorithms (GA)

Weight values reside in an extremely large search space

GAs often find good values very fast

My example is inspired by Watchmaker and JGap

Traveling Salesman

Permutation of 15 towns: 1 307 674 368 000 (3-4 dager)

With genetic algorithm, after just 13 seconds:

Generation 1: distance 588, population size 140
Generation 2: distance 519, population size 196
Generation 3: distance 459, population size 275
.......
Generation 11: distance 200, population size 4143
Generation 14: distance 195, population size 11463
Generation 25: distance 168, population size 478947

Neural Networks

Create and mutate neural networks with GAs

INFO  Generation 99:
INFO  76238 vs Gomoku Random: 184-15-1
INFO  76238 vs Gomoku Better: 158-41-1
INFO  Generation 0:
INFO  2549 vs Gomoku Random: 1-89-10
INFO  2549 vs Gomoku Better: 2-90-8

Neuroph Studio

@dervis_m

dervis.no

Thanks!

https://github.com/dervism