About Data Centric

date
Nov 12, 2022
Last edited time
Mar 27, 2023 08:42 AM
status
Published
slug
About_Data_Centric
tags
DL
summary
type
Post
Field
Plat

Data-Centric Approach vs Model-Centric Approach in Machine Learning

Model-centric approach

The model-centric approach means developing experimental research to improve the ml model performance. This involves selecting the best model architecture and training process from a wide range of possibilities.
  • In this approach you keep the data the same, and improve the code or model architecture.
  • Working on code is the central objective of this approach.
notion image

Data-centric approach

In an age where data is at the core of every decision-making process, a data-centric company can better align its strategy with the interests of its stakeholders by using information generated from its operations. This way the result can be more accurate, organized, and transparent which can help an organization run more smoothly.
  • This approach involves systematically altering/improving datasets in order to increase the accuracy of your ML applications.
  • Working on data is the central objective of this approach.
notion image

Activate Learning

Activate Learning 入门
在主动学习中,算法从一组未标注的数据点中选择信息量最大的点,比如具有高信息增益的点或模型在其上具有不确定性的点,然后再进行人工标注。这种方法的好处是,所需的数据数量比标准的监督学习所需的数据量要少得多。 简单来说, 主动学习的思路就是选择一批容易被错分的样本数据,让人工进行标注,再让机器学习模型训练的过程。 从二进制分类的例子中,我们可以很容易地理解和可视化为什么我们只选择几个特定的数据样本进行标记。 在上图中,(a) 表示一个未标记数据集的大型池。现在如果我们随机选择一些样本进行标记和训练,算法将预测决策边界,如(b)所示。现在,如果我们采用主动学习采样策略来选择用于标记和训练的数据样本,那么主动学习算法将只选择最接近实际决策边界的数据样本,而忽略不相关的数据样本,如 © 所示。 在机器学习的建模过程中,通常包括样本选择,模型训练,模型预测,模型更新这几个步骤。在主动学习这个领域则需要把标注候选集提取和人工标注这两个步骤加入整体流程,也就是: 机器学习模型:包括机器学习模型的训练和预测两部分; 待标注的数据候选集提取:依赖主动学习中的查询函数(Query Function); 人工标注:专家经验或者业务经验的提炼; 获得候选集的标注数据:获得更有价值的样本数据; 机器学习模型的更新:通过增量学习或者重新学习的方式更新模型,从而将人工标注的数据融入机器学习模型中,提升模型效果。 通过这种循环往复的方法,就可以达到人工调优模型的结果。 Membership Query Synthesis: In this, the active learning algorithm generates a new unlabeled instance within the input space and queries the oracle for labeling.
Activate Learning 入门
 

© Lazurite 2021 - 2024