Functional Risk Minimization

2025년 05월 08일

|

< Summary (English) >

This paper proposes Functional Risk Minimization (FRM), a general framework that compares functions rather than outputs in losses.
FRM improves performance in supervised, unsupervised, and reinforcement learning experiments by allowing FRM to subsume ERM for many common loss functions and capture more realistic noise processes.
The proposed Functional Generative Models (FGMs) assign a function to each data point, and the derived Functional Risk Minimization framework measures training objectives in function space.
Empirical results demonstrate the advantages of FRM in various experiments.

< 요약 (Korean) >

이 논문은 함수 위험 최소화(FRM)를 제안합니다.
이는 손실을 비교하는 대신 함수를 사용하여 함수 공간에서 목표를 측정합니다.
FRM은 관찰 데이터와 무작위 데이터 간의 차이를 모델링할 때 더 효과적으로 작동하며, 다양한 실제 노이즈 프로세스를 포착합니다.
제안된 함수 생성 모델(FGM)은 각 데이터 점에 함수를 할당하고, FRM은 매개변수 공간에서의 목표를 측정하는 파생된 함수 위험 최소화 프레임워크입니다.
실험적 결과는 FRM이 다양한 실험에서 우수한 성능을 보여주는 것을 보여줍니다.

< 기술적 용어 설명 >

* 함수 공간: 모델의 매개변수 공간을 말합니다. * 손실 함수: 학습 중에 최소화되는 함수입니다. * 무작위 데이터: 학습 데이터와 구분되는 추가 데이터입니다.

< 참고 논문 또는 관련 자료 >

* [1] Title: Abstract, Author/Source: URL
* [2] Title: Abstract, Author/Source: URL

< Excerpt (English) >

Functional Risk Minimization Ferran Alet∗ MIT Clement Gehring MIT Tomás Lozano-Pérez MIT Kenji Kawaguchi National University of Singapore (NUS) Joshua B. Tenenbaum MIT Leslie Pack Kaelbling MIT Abstract The field of Machine Learning has changed significantly since the 1970s. However, its most basic principle, Empirical Risk Minimization (ERM), remains unchanged. We propose Functional Risk Minimization (FRM), a general framework where losses compare functions rather than outputs. This results in better performance in supervised, unsupervised, and RL experiments. In the FRM paradigm, for each data point (xi, yi) there is function fθi that fits it: yi = fθi(xi). This allows FRM to subsume ERM for many common loss functions and to capture more realistic noise processes. We also show that FRM provides an avenue towards understanding generalization in the modern over-parameterized regime, as its objective can be framed as finding the simplest model that fits the training data. 1 Introduction Although machine learning has changed significantly since the 1970s, its most basic principle, Empirical Risk Minimization (ERM), remains unchanged. ERM (Vapnik & Chervonenkis, 1969) states that we can minimize a loss for unseen data by instead minimizing the same loss on a training set. When models were simple and small, we could often prove that good training performance would guarantee good test performance. However, with the huge capacity of current neural networks, this is no longer true (Zhang et al., 2017). This paper proposes an alternative framework to ERM designed for modern ML, where models are large and datasets are diverse. There are three motivations for searching for an alternative: 1) ERM-based deep learning can be inefficient by orders of magnitude (Frankle & Carbin, 2018), 2) generalization in deep learning is not well understood (Belkin et al., 2019), and 3) improvements over ERM would apply to the entire field. Datasets have increased massively in diversity: we used to train on small, standardized datasets like MNIST (LeCun, 1998) and Shakespeare books, now we train on images and text from the entire internet (Rad- ford et al., 2021).For example, we used to train a language model on Wikipedia and then fine-tune it on Shakespeare, using two different functions fθShakespeare ≈fθwiki to model their distributions. In contrast, we now train a single function fθinternet on general Internet data, which contains both Wikipedia and books. Since Shakespeare and a Wikipedia writer don’t have the same style, this single model cannot simultaneously ∗Work done at MIT(Alet, 2022); author is now at Google DeepMind. Contact: ferran@google.com. 1 arXiv:2412.21149v1 [cs.LG] 30 Dec 2024 (a) Old datasets had lit- tle variability (LeCun, 1998; Samaria, 1994). Mod- ern datasets are diverse, but with structured variabil- ity(Jahanian et al., 2019). (b) Difference between ERM and FRM when predicting the edges of the image on the left using a simple CNN. For a fixed ERM pixel loss, we can find images with very different FRM losses. Since neural networks often capture natural variability (Ulyanov et al., 2018), images with low functional loss retain most of the structure despite having high errors in…

< 번역 (Korean) >

기능적 위험 최소화 페란은 MIT Clement Gehring Mit Tomás Lozano-Pérez Mit Kenji Kawaguchi National University of Singapore (NUS) Joshua B.
Tenenbaum Mit Leslie 팩 Kaelbling MIT MIT 초록 1970 년대 이후 크게 바뀌 었습니다.
그러나 가장 기본적인 원칙 인 경험적 위험 최소화 (ERM)는 변하지 않습니다.
우리는 손실이 출력보다는 함수를 비교하는 일반 프레임 워크 인 기능 위험 최소화 (FRM)를 제안합니다.
이로 인해 감독, 감독되지 않은 및 RL 실험에서 성능이 향상됩니다.
FRM 패러다임에서, 각 데이터 포인트 (xi, yi)에 대해 yi = fθi (xi)에 맞는 함수 fθi가 있습니다.
이를 통해 FRM은 많은 일반적인 손실 기능에 대해 ERM을 사용하고보다 현실적인 노이즈 프로세스를 포착 할 수 있습니다.
우리는 또한 FRM이 현대의 과도한 가발성 체제에서 일반화를 이해하기위한 길을 제공한다는 것을 보여줍니다.
목표는 훈련 데이터에 맞는 가장 간단한 모델을 찾는 것으로 구성 될 수 있기 때문입니다.
1 소개 1970 년대 이래로 기계 학습이 크게 바뀌었지만 가장 기본적인 원칙 인 경험적 위험 최소화 (ERM)는 변경되지 않았습니다.
ERM (Vapnik & Chervonenkis, 1969)은 교육 세트에서 동일한 손실을 최소화함으로써 보이지 않는 데이터 손실을 최소화 할 수 있다고 명시하고 있습니다.
모델이 단순하고 작았을 때, 우리는 종종 우수한 훈련 성능이 좋은 테스트 성능을 보장 할 것이라는 것을 증명할 수 있습니다.
그러나 현재 신경망의 큰 용량으로 더 이상 사실이 아닙니다 (Zhang et al., 2017).
이 백서는 모델이 크고 데이터 세트가 다양합니다.
대안 검색에 대한 세 가지 동기가 있습니다.
1) ERM 기반 딥 러닝은 몇 배 순서에 따라 비효율적 일 수 있습니다 (Frankle & Carbin, 2018), 2) 딥 러닝의 일반화는 잘 자체적이지 않습니다 (Belkin et al., 2019), 3) ERM에 대한 개선은 전체 분야에 적용됩니다.
데이터 세트는 다양성이 크게 증가했습니다.
우리는 MNIST (Lecun, 1998) 및 셰익스피어 서적과 같은 소규모 표준화 된 데이터 세트를 훈련시키는 데 사용했습니다.
이제 우리는 전체 인터넷에서 이미지와 텍스트를 훈련 시켰습니다 (Rad-Ford et al., 2021).
예를 들어, Wikipedia에서 언어 모델을 훈련시키는 데 사용했습니다.
분포를 모델링하십시오.
대조적으로, 우리는 이제 Wikipedia와 Books가 포함 된 일반 인터넷 데이터에 대해 단일 기능 fθinternet을 훈련시킵니다.
셰익스피어와 위키 백과 작가는 같은 스타일을 가지고 있지 않기 때문에이 단일 모델은 동시에 MIT에서 수행 할 수 없습니다 (Alet, 2022).
저자는 이제 Google DeepMind에 있습니다.
연락처 : ferran@google.com.
1 ARXIV : 2412.21149V1 [CS.LG] 2024 년 12 월 30 일 (a) 오래된 데이터 세트는 변동성을 불러 일으켰다 (Lecun, 1998; Samaria, 1994).
모드 데이터 세트는 다양하지만 구조화 된 변수가 있습니다 (Jahanian et al., 2019).
(b) 간단한 CNN을 사용하여 왼쪽의 이미지의 가장자리를 예측할 때 ERM과 FRM의 차이.
고정 된 ERM 픽셀 손실의 경우 FRM 손실이 매우 다른 이미지를 찾을 수 있습니다.
신경망은 종종 자연적 변동성을 포착하기 때문에 (Ulyanov et al., 2018), 기능적 손실이 낮은 이미지는 높은 오류가 있어도 대부분의 구조를 유지합니다.

출처: arXiv

Download PDF

ilikeafrica.com

Functional Risk Minimization

이것이 좋아요:

Comments

답글 남기기 응답 취소

Functional Risk Minimization

이 글 공유하기:

이것이 좋아요:

Comments

답글 남기기 응답 취소