Rasin in Tsukuba

The happiness of your life depends upon the quality of your thoughts.

A Survey of Monte Carlo Tree Search Methods -- Part 1

简介 在给定领域,蒙特卡洛搜索树采用在决策空间的随机采样,根据结果来构建一棵搜索树,以找到最优决策。 MCTS具有许多魅力:它是一种即时统计算法,只要有更多的算力就可以带来更好的性能。它基本不需要领域知识,也可以解决许多困难的问题。 总览 MCTS的概念非常简单。这棵搜索树以一种非对称的方式增长。对于算法的每个迭代,tree policy被用于寻找当前最需要探索的节点。Tree po...

GVGAI Book Chapter 3 - Planning in GVGAI

书籍网站 本章原文 本章练习 简介 Planning是指制定行动计划以解决给定的问题。当给定当前状态和玩家要采取的行动时,可以使用环境模型来模拟可能的未来状态,该模型将被称为前向模型(Forward Model)。Monte Carlo Tree Search(MCTS)和Rolling Horizon Evolutionary Algorithms(RHEA)是构建大多数Pla...

VGDL范例解读

A Video Game Description Language for Model-based or Interactive Learning Alians VGDL VGDL论文 简介 pyVGDL 是一种视频游戏描述语言,以促进大型或多样化游戏组合的生成,这些游戏组合也可以用于评估具有通用性的体系结构和算法,例如强化学习和进化搜索。为了使这一目的更加可行,作者将游戏生成领域限...

GVGAI Book Chapter 2 - VGDL and the GVGAI Framework -- Exercises

书籍网站 本章原文 本章练习 GVGAI环境配置 Github中提供了GVGAI框架。这里使用与书中匹配的2.3版本来进行练习。 下载并安装 IntelliJ IntelliJ下载地址与 JDK 下载地址。 下载安装完成后导入项目,如图所示: 运行项目 首先我们先试着用GVGAI运行游戏。可以使用键盘直接控制游玩,或者使用示例agent来自动测试。 以玩家身份玩游...

Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation

In chemistry

Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation Generating novel graph structures that optimize given objectives while obeying some given underlying rules is fundam...

Algorithm for Reinforcement Learning 3

Control

A catalog of learning problems The first criterion that the space of problems is split upon is whether the learner can actively influence the observations. In case it can, then we talk about int...

Algorithm for Reinforcement Learning 2

Value Prediction

Value prediction problems arise in a number of ways: Estimating the probability of some future event The expected time until some event occurs The (action-) value function underlying some p...

Algorithm for Reinforcement Learning 1

Markov Decision Processes

Overview There are two key ideas that allow RL algorithms to achieve this goal: Use samples to compactly represent the dynamics of the control problem it allows one to deal with lea...

David Silver - Reinforcement Learning Note 4

Model-Free Prediction - MC and TD Learning

Model-Free Prediction Introduction 上一节讲的是在已知模型的情况下,如何去解决一个马尔科夫决策过程(MDP)问题。 方法就是通过动态规划来评估一个给定的策略,通过不断迭代最终得到最优价值函数。 具体的做法有两个:一个是策略迭代,一个是值迭代。 所谓的模型未知,即状态转移概率 \(P_{ss’}^a\) 这些我们是不知道的。 所以我们无法直...

David Silver - Reinforcement Learning Note 3

Policy and Value Iteration using Dynamic Programming

Lecture: Planning by Dynamic Programming Learning Goals Understand the difference between Policy Evaluation and Policy Improvement and how these processes interact Understand the Policy ...