Sci论文 - 至繁归于至简,Sci论文网。 设为首页|加入收藏
当前位置:首页 > 计算机论文 > 正文

Towards Black-box Iterative Machine Teaching关于黑匣子迭代机教学的几点思考(附全文PDF版下载)

发布时间:2018-07-16 16:12:04 文章来源:SCI论文网 我要评论


      小编特地整理了 蚂蚁金服人工智能部研究员ICML贡献论文系列 第六篇论文,以下只是改论文摘录出来的部分英文内容和翻译内容,具体论文英文版全文,请在本页面底部自行下载学习研究。

       Towards Black-box Iterative Machine Teaching
Weiyang Liu , Bo Dai  , Xingguo Li , Zhen Liu , James M. Rehg , Le Song

      In this paper, we make an important step towards the black-box machine teaching by considering the cross-space machine teaching, where the teacher and the learner use different feature representations and the teacher can not fully observe the learner’s model. In such scenario, we study how the teacher is still able to teach the learner to achieve faster convergence rate than the traditional passive learning. We propose an active teacher model that can actively query the learner(i.e., make the learner take exams) for estimating the learner’s status and provably guide the learner to achieve faster convergence. The sample complexities for both teaching and query are provided.
In the experiments, we compare the proposed active teacher with the omniscient teacher and verify the effectiveness of the active teacher model.

      本文从跨空间机器教学的角度出发,向黑匣子机器教学迈出了重要的一步,即教师和学习者使用不同的特征表征。 NS和老师不能完全观察学习者的模式。在这种情况下,我们将研究教师如何仍然能够教学习者达到比传统PAS更快的收敛速度。 积极学习。我们提出了一个主动的教师模型,它可以主动地查询学习者(例如,让学习者参加考试)来估计学习者的状态,并且可以证明引导学习者实现f。 Aster收敛文中给出了教学和查询的示例复杂性。

       1. Introduction
       Machine teaching (Zhu, 2015; 2013; Zhu et al., 2018) is the problem of constructing a minimal dataset for a arget concept such that a student model (i.e., leaner) can learn the target concept based on this minimal dataset. Recently, machine teaching has been shown very useful in applications ranging from human computer interaction (Suh et al., 2016),crowd sourcing (Singla et al., 2014; 2013) to cyber security (Alfeld et al., 2016; 2017). Besides various applications,machine teaching also has nice connections with curriculum learning (Bengio et al., 2009; Hinton et al., 2015). In traditional machine learning, a teacher usually constructs a batch set of training samples, and provides them to a student in one shot without further interactions. Then the student keeps learning from this batch dataset and tries to learn the target concept. Previous machine teaching paradigm (Zhu,2013; 2015; Liu et al., 2016) usually focuses on constructing the smallest such dataset, and characterizing the size of such dataset, called the teaching dimension of the student model.
 Figure 1: Comparison between iterative machine teaching and cross-space machine teaching by active teacher.

(Liu et al., 2017a) mostly consider the simplest iterative case where the teacher can fully observe the student. This case is interesting in theory but too restrictive in practice.

        Human teaching is arguably the most realistic teaching scenario in which the learner is completely a black-box to the teacher. Analogously, the ultimate problem for machine teaching is how to teach a black-box learner. We call such problem black-box machine teaching. Inspired by the fact that the teacher and the student typically represent the same concept but in different ways, we present a step towards the black-box machine teaching – cross-space machine teaching, where the teacher i) does not share the same feature representation with the student, and ii) can not observe the student model. This setting is interesting in the sense that it can both relax the assumptions for iterative machine teaching and improve our understanding on human learning.

        Inspired by a real-life fact, that a teacher will regularly examine the student to learn how well the student has mastered the concept, we propose an active teacher model to address the cross-space teaching problem. The active teacher is allowed to actively query the student with a few (limited) samples every certain number of iterations, and the student can only return the corresponding prediction results to the teacher. For example, if the student uses a linear regression model, it will return to the teacher its prediction hw t, x˜i where w t is the student parameter at the t-th iteration and x˜ is the representation of the query example in student’s feature space. Under suitable conditions, we show that the active teacher can always achieve faster rate of improvement than a random teacher that feeds samples randomly. In other words, the student model guided by the active teacher can provably achieve faster convergence than the stochastic gradient descent (SGD). Additionally, we discuss the extension of the active teacher to deal with the learner with forgetting behavior, and the learner guided by multiple teachers.

         To validate our theoretical findings, we conduct extensive experiments on both synthetic data and real image data. The results show the effectiveness of the active teacher.

       2. Related Work
        Machine teaching defines a task where we need to find an optimal training set given a learner and a target concept.(Zhu, 2015) describes a general teaching framework which has nice connections to curriculum learning (Bengio et al.,2009) and knowledge distillation (Hinton et al., 2015). (Zhu,2013) considers Bayesian learners in exponential family and formulates the machine teaching as an optimization problem over teaching examples that balance the future loss of the learner and the effort of the teacher. (Liu et al., 2016) give the teaching dimension of linear learners. Machine teaching has been found useful in cyber security (Mei & Zhu, 2015),human computer interaction (Meek et al., 2016), and human education (Khan et al., 2011). (Johns et al., 2015) extend machine teaching to human-in-the-loop settings. (Doliwa et al., 2014; Gao et al., 2015; Zilles et al., 2008; Samei et al., 2014; Chen et al., 2018) study the machine teaching problem from a theoretical perspective.
Previous machine teaching works usually ignore the fact that a student model is typically optimized by an iterative algorithm (e.g., SGD), and in practice we focus more on how fast a student can learn from the teacher. (Liu et al., 2017a) propose the iterative teaching paradigm and an omniscient teaching model where the teacher knows almost everything about the learner and provides training examples based on the learner’s status. Our cross-space teaching serves as a stepping stone towards the black-box iterative teaching.

        3. Cross-Space Iterative Machine Teaching The cross-space iterative teaching paradigm is different from the standard iterative machine teaching in terms of two major aspects: i) the teacher does not share the feature representation with the student; ii) the teacher cannot observe the student’s current model parameter in each iteration. Specifi-cally, we consider the following teaching settings:Teacher. The teacher model observes a sample A (e.g.image, text, etc.) and represents it as a feature vector xA  Rd and a label y R.

        The teacher knows the model (e.g.,loss function) and the optimization algorithm (including thelearning rate1) of the learner, and the teacher preserves an optimal parameter v* of this model in its own feature space.We denote the prediction of the teacher as yˆv * =hv*, xi2.Learner. The learner observes the same sample A andrepresents it as a vectorized feature x˜A  R s and a label y˜ R. The learner uses a linear model hw, x˜i where w is its model parameter and updates it with SGD (if guided by a passive teacher). We denote the prediction of the student model as yˆt w =hw t , x˜i in t-th iteration.Representation. Although the teacher and learner do not share the feature representation, we still assume their representations have an intrinsic relationship. For simplicity, we assume there exists a unknown one-to-one mapping G from the teacher’s feature space to the student’s feature space such that x˜=G(x). However, the conclusions in this paper are also applicable to injective mappings. Unless specified,we assume that y = ˜y by default.Interaction. In each iteration, the teacher will provide a training example to the learner and the learner will update its model using this example. The teacher cannot directly observe the model parameter w of the student. In this paper, the active teacher is allowed to query the learner with a few examples every certain number of iterations. The learner can only return to the teacher its prediction hw t , x˜iin the regression scenario, its predicted label sign(hwt, x˜i)or confidence score S(hwt, x˜i) in the classification scenario,where w t is the student’s model parameter at t-th iteration and S(·) is some nonlinear function. Note that the teacher and student preserve the same loss function.

1. 介绍
       机器教学(朱,2015;2013;朱等人,2018)是为目标概念构造最小数据集的问题,以便学生模型(即更瘦的)能够学习基于o的目标概念。 n这个最小数据集。最近,机器教学在从人机交互(Suh等人,2016年)、人群来源(Singla等人,2014年;2013年)到各种应用程序中都显示出非常有用的作用。 网络安全(Alfeld等人,2016;2017)。除了各种应用,机器教学与课程学习也有很好的联系(Bengio等人,2009年;Hinton等人,2015年)。在传统的机器学习中,教师通常会构造一批训练集。 在没有进一步互动的情况下,一举将它们提供给学生。然后,学生继续从这批数据集学习,并试图学习目标概念。以前的机器教学 G范式(Zhu,2013;2015;Liu等人,2016)通常侧重于构建最小的此类数据集,并描述此类数据集的大小,称为学生模型的教学维度。 .


        为了使机器教学在实际场景中有效地工作(Liu等人,2017a),提出了一个考虑到学习者通常使用迭代alg的迭代教学框架。 更新模型的算法(例如梯度下降)。与传统的机器教学框架(教师只与学生一次互动)不同的是,迭代机器茶。 Ching允许老师在每一次迭代中与学生互动。因此,它将教学重点从模型转移到算法:教学的目标不再是构造m。 一次输入数据集,但对样本进行搜索,以便学生在最少的迭代次数中学习目标概念(即学生算法的最快收敛性)。如此渺小 迭代次数被称为学生算法的迭代教学维度。

        (Liu等人,207 a)主要考虑最简单的迭代情况,老师可以充分观察学生。这个案例在理论上是有趣的,但在实践中却过于局限。

        人类教学可以说是最现实的教学场景,在这种情境中,学习者对教师来说完全是一个黑匣子。类似地,机器教学的最终问题是如何教 黑匣子学习者。我们称这种问题为黑匣子机器教学。由于教师和学生通常代表相同的概念,但以不同的方式,我们提出了一个步骤。 对于黑匣子机器教学 。这个设置很有趣,因为它既可以放松迭代机器教学的假设,又可以提高我们对人类学习的理解。

        受现实生活的启发,老师会定期检查学生,了解学生对这个概念的掌握程度,我们提出了一种主动的教师模型来处理跨空间t。 每个人都有问题。允许主动教师在每一次迭代中使用几个(有限的)样本来主动查询学生,并且学生只能返回相应的谓词。 把结果告诉老师。例如,如果学生使用线性回归模型,它将返回给老师其预测的hw t,x˜i,其中w t是在t-th迭代a中的学生参数。 NDx˜是学生特征空间中查询实例的表示。在适当的条件下,我们发现积极的教师总是能比随机的teac取得更快的进步。 随机喂养样本的她。换句话说,主动教师引导的学生模型可以比随机梯度下降(SGD)更快地收敛。此外,我们 研究了主动教师对学习者遗忘行为的扩展,以及多位教师对学习者的指导作用。



(朱,2015)描述了一个与课程学习(Bengio等人,2009年)和知识升华(Hinton等人,2015年)有着良好联系的一般教学框架。(朱先生,2013)认为巴伊西亚 指数族中的n个学习者将机器教学描述为一个优化问题,以平衡学习者的未来损失和教师的努力。(刘娥 给出线性学习者的教学维度。机器教学在网络安全(梅朱,2015)、人机交互(Meek等人,2016)和人类教育方面都很有用。 离子(可汗等,2011)。(John等人,2015)扩展

        机器教学的人在循环设置。(Doliwa等人,2014年;加奥等人,2015年;Zlos等人,2008年;Samei等人,2014年;Chen等人,2018年)从理论上研究机器教学问题 我的视角。

       以前的机器教学工作通常忽略了这样一个事实,即学生模型通常是通过迭代算法(如SGD)来优化的,而在实践中,我们更多地关注学生的学习速度。 从老师那里。(Liu等人,2017a)提出了迭代式教学模式和无所不知的教学模式,其中教师几乎了解学习者的一切,并提供培训考官。 基于学习者的地位。我们的跨空间教学是迈向黑匣子迭代教学的垫脚石。


       跨空间迭代教学范式与标准的迭代机器教学有两个主要方面的不同:一是教师不与标准的迭代机器教学共享特征表示。 在每次迭代中,教师不能观察到学生当前的模型参数。具体而言,我们考虑以下教学设置:教师.教师模型观察一个样本 (如图像、文本等),并将其表示为特征向量xA属于RD和标签y属于R。教师知道模型(例如损失函数)和优化算法(包括学习速率1)。 而老师保留了该模型在其自身的特征空间中的最优参数v*,表示教师的预测值为yˆv*=hv*,xi2.学习者观察同一个样本A并将其表示为向量化f x˜A属于R s和标签y˜属于R.学习者使用线性模型HW,x˜i,其中w是它的模型参数,并用SGD更新它(如果由被动教师指导)。我们表示研究的预测。 NT模型为yˆt w=hw t,x˜i在t-t迭代中表示.虽然教师和学习者不共享特征表示,但我们仍然认为他们的表征有内在的联系。 电离。为了简单起见,我们假设从教师的特征空间到学生的特征空间存在一个未知的一对一映射G,使得x˜=G(X)。然而,本文的结论 也可应用于内射映射。


        在每次迭代中,教师将为学习者和学习者提供培训示例 将使用此示例更新其模型。教师不能直接观察学生的模型参数w。在本文中,主动教师可以用几个考试来询问学习者。 每重复一定数量的迭代。在回归场景中,学习者只能返回其预测的hw t,x˜i,其预测的标签符号(hwt,x˜i)或信心评分S(hwt,x˜i)。 在分类场景中,WT是T次迭代中的学生模型参数,S(·)是一些非线性函数。请注意,教师和学生保留相同的“损失函数”(· , ·).

       蚂蚁金服人工智能部研究员ICML贡献论文系列 第六篇论文
     《Towards Black-box Iterative Machine Teaching》全文PDF版下载链接:




Sci论文网 - Sci论文发表 - Sci论文修改润色 - Sci论文期刊 - Sci论文代发
Copyright © Sci论文网 版权所有 | SCI论文网手机版 | 豫ICP备2022008342号-1 | 网站地图xml | 百度地图xml