Recursively updating the MLE as new observations stream inSimple MLE Question4 cases of Maximum Likelihood Estimation of Gaussian distribution parameterssimulating random samples with a given MLEFor the family of distributions, $f_theta(x) = theta x^theta-1$, what is the sufficient statistic corresponding to the monotone likelihood ratio?Prove that MLE does not depend on the dominating measureDetermining an MLEMLE of $f(xmidtheta) = theta x^theta−1e^−x^thetaI_(0,infty)(x)$Sufficient statistic when $Xsim U(theta,2 theta)$Estimating the MLE where the parameter is also the constraintTrouble with MLE

Which partition to make active?

Could any one tell what PN is this Chip? Thanks~

Turning a hard to access nut?

Nested Dynamic SOQL Query

Do I need to convey a moral for each of my blog post?

Friend wants my recommendation but I don't want to give it to him

Air travel with refrigerated insulin

Why is this tree refusing to shed its dead leaves?

How to determine the greatest d orbital splitting?

Exit shell with shortcut (not typing exit) that closes session properly

Help with identifying unique aircraft over NE Pennsylvania

I got the following comment from a reputed math journal. What does it mean?

Why is "la Gestapo" feminine?

Can other pieces capture a threatening piece and prevent a checkmate?

What is the tangent at a sharp point on a curve?

Exposing a company lying about themselves in a tightly knit industry: Is my career at risk on the long run?

"Marked down as someone wanting to sell shares." What does that mean?

Does the Shadow Magic sorcerer's Eyes of the Dark feature work on all Darkness spells or just his/her own?

Single word to change groups

Do I need an EFI partition for each 18.04 ubuntu I have on my HD?

Why doesn't the fusion process of the sun speed up?

Would mining huge amounts of resources on the Moon change its orbit?

Why is participating in the European Parliamentary elections used as a threat?

Determine voltage drop over 10G resistors with cheap multimeter

Recursively updating the MLE as new observations stream in

Simple MLE Question4 cases of Maximum Likelihood Estimation of Gaussian distribution parameterssimulating random samples with a given MLEFor the family of distributions, $f_theta(x) = theta x^theta-1$, what is the sufficient statistic corresponding to the monotone likelihood ratio?Prove that MLE does not depend on the dominating measureDetermining an MLEMLE of $f(xmidtheta) = theta x^theta−1e^−x^thetaI_(0,infty)(x)$Sufficient statistic when $Xsim U(theta,2 theta)$Estimating the MLE where the parameter is also the constraintTrouble with MLE

General Question

Say we have iid data $x_1$, $x_2$, ... $sim f(x,|,boldsymboltheta)$ streaming in. We want to recursively compute the maximum likelihood estimate of $boldsymboltheta$. That is, having computed
$$hatboldsymboltheta_n-1=undersetboldsymbolthetainmathbbR^pargmaxprod_i=1^n-1f(x_i,|,boldsymboltheta),$$
we observe a new $x_n$, and wish to somehow incrementally update our estimate
$$hatboldsymboltheta_n-1,,x_n to hatboldsymboltheta_n$$
without having to start from scratch. Are there generic algorithms for this?

Toy Example

If $x_1$, $x_2$, ... $sim N(x,|,mu, 1)$, then
$$hatmu_n-1 = frac1n-1sumlimits_i=1^n-1x_iquadtextandquadhatmu_n = frac1nsumlimits_i=1^nx_i,$$
so
$$hatmu_n=frac1nleft[(n-1)hatmu_n-1 + x_nright].$$

edited 6 hours ago

asked 7 hours ago

bamts

780313

$begingroup$
Awesome question!
$endgroup$
– dlnB
7 hours ago

2

$begingroup$
Don't forget the inverse of this problem: updating the estimator as old observations are deleted.
$endgroup$
– Hong Ooi
5 hours ago

add a comment |

General Question

Toy Example

If $x_1$, $x_2$, ... $sim N(x,|,mu, 1)$, then
$$hatmu_n-1 = frac1n-1sumlimits_i=1^n-1x_iquadtextandquadhatmu_n = frac1nsumlimits_i=1^nx_i,$$
so
$$hatmu_n=frac1nleft[(n-1)hatmu_n-1 + x_nright].$$

edited 6 hours ago

asked 7 hours ago

bamts

780313

$begingroup$
Awesome question!
$endgroup$
– dlnB
7 hours ago

2

$begingroup$
Don't forget the inverse of this problem: updating the estimator as old observations are deleted.
$endgroup$
– Hong Ooi
5 hours ago

add a comment |

General Question

Toy Example

If $x_1$, $x_2$, ... $sim N(x,|,mu, 1)$, then
$$hatmu_n-1 = frac1n-1sumlimits_i=1^n-1x_iquadtextandquadhatmu_n = frac1nsumlimits_i=1^nx_i,$$
so
$$hatmu_n=frac1nleft[(n-1)hatmu_n-1 + x_nright].$$

edited 6 hours ago

asked 7 hours ago

bamts

780313

General Question

Toy Example

If $x_1$, $x_2$, ... $sim N(x,|,mu, 1)$, then
$$hatmu_n-1 = frac1n-1sumlimits_i=1^n-1x_iquadtextandquadhatmu_n = frac1nsumlimits_i=1^nx_i,$$
so
$$hatmu_n=frac1nleft[(n-1)hatmu_n-1 + x_nright].$$

maximum-likelihood online

edited 6 hours ago

asked 7 hours ago

bamts

780313

edited 6 hours ago

asked 7 hours ago

bamts

780313

edited 6 hours ago

asked 7 hours ago

bamts

780313

asked 7 hours ago

bamts

780313

asked 7 hours ago

bamts

780313

$begingroup$
Awesome question!
$endgroup$
– dlnB
7 hours ago

2

$begingroup$
Don't forget the inverse of this problem: updating the estimator as old observations are deleted.
$endgroup$
– Hong Ooi
5 hours ago

add a comment |

$begingroup$
Awesome question!
$endgroup$
– dlnB
7 hours ago

2

$begingroup$
Don't forget the inverse of this problem: updating the estimator as old observations are deleted.
$endgroup$
– Hong Ooi
5 hours ago

Awesome question!

– dlnB
7 hours ago

Don't forget the inverse of this problem: updating the estimator as old observations are deleted.

– Hong Ooi
5 hours ago

add a comment |

2 Answers
2

active

oldest

votes

See the concept of sufficiency and in particular, minimal sufficient statistics. In many cases you need the whole sample to compute the estimate at a given sample size, with no trivial way to update from a sample one size smaller (i.e. there's no convenient general result).

If the distribution is exponential family (and in some other cases besides; the uniform is a neat example) there's a nice sufficient statistic that can in many cases be updated in the manner you seek (i.e. with a number of commonly used distributions there would be a fast update).

One example I'm not aware of any direct way to either calculate or update is the estimate for the location of the Cauchy distribution (e.g. with unit scale, to make the problem a simple one-parameter problem). There may be a faster update, however, that I simply haven't noticed - I can't say I've really done more than glance at it for considering the updating case.

On the other hand, with MLEs that are obtained via numerical optimization methods, the previous estimate would in many cases be a great starting point, since typically the previous estimate would be very close to the updated estimate; in that sense at least, rapid updating should often be possible. Even this isn't the general case, though -- with multimodal likelihood functions (again, see the Cauchy for an example), a new observation might lead to the highest mode being some distance from the previous one (even if the locations of each of the biggest few modes didn't shift much, which one is highest could well change).

edited 6 hours ago

answered 6 hours ago

Glen_b♦

214k23414764

1

$begingroup$
Thanks! The point about the MLE possibly switching modes midstream is particularly helpful for understanding why this would be hard in general.
$endgroup$
– bamts
3 hours ago

$begingroup$
You can see this for yourself with the above unit-scale Cauchy model and the data (0.1,0.11,0.12,2.91,2.921,2.933). The log-likelihood for the location has peaks near 0.5 and 2.5, and the (slightly) higher peak is the one near 0.5. Now make the next observation 10 and the mode of each of the two peaks barely moves but the second peak is now substantially higher. Gradient descent won't help you when that happens, it's almost like starting again. If your population is a mixture of two similar-size subgroups with different locations, such circumstances could occur - even in large samples. ... ctd
$endgroup$
– Glen_b♦
2 hours ago

$begingroup$
ctd... In the right situation, mode switching may occur fairly often.
$endgroup$
– Glen_b♦
1 hour ago

add a comment |

In machine learning, this is referred to as online learning.

As @Glen_b pointed out, there are special cases in which the MLE can be updated without needing to access all the previous data. As he also points out, I don't believe there's a generic solution for finding the MLE.

A fairly generic approach for finding the approximate solution is to use something like stochastic gradient descent. In this case, as each observation comes in, we compute the gradient with respect to this individual observation and move the parameter values a very small amount in this direction. Under certain conditions, we can show that this will converge to a neighborhood of the MLE with high probability; the neighborhood is tighter and tighter as we reduce the step size, but more data is required for convergence. However, these stochastic methods in general require much more fiddling to obtain good performance than, say, closed form updates.

answered 5 hours ago

Cliff AB

13.6k12567

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "65"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f398220%2frecursively-updating-the-mle-as-new-observations-stream-in%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

edited 6 hours ago

answered 6 hours ago

Glen_b♦

214k23414764

1

$begingroup$
Thanks! The point about the MLE possibly switching modes midstream is particularly helpful for understanding why this would be hard in general.
$endgroup$
– bamts
3 hours ago

$begingroup$
You can see this for yourself with the above unit-scale Cauchy model and the data (0.1,0.11,0.12,2.91,2.921,2.933). The log-likelihood for the location has peaks near 0.5 and 2.5, and the (slightly) higher peak is the one near 0.5. Now make the next observation 10 and the mode of each of the two peaks barely moves but the second peak is now substantially higher. Gradient descent won't help you when that happens, it's almost like starting again. If your population is a mixture of two similar-size subgroups with different locations, such circumstances could occur - even in large samples. ... ctd
$endgroup$
– Glen_b♦
2 hours ago

$begingroup$
ctd... In the right situation, mode switching may occur fairly often.
$endgroup$
– Glen_b♦
1 hour ago

add a comment |

edited 6 hours ago

answered 6 hours ago

Glen_b♦

214k23414764

1

$begingroup$
Thanks! The point about the MLE possibly switching modes midstream is particularly helpful for understanding why this would be hard in general.
$endgroup$
– bamts
3 hours ago

$begingroup$
You can see this for yourself with the above unit-scale Cauchy model and the data (0.1,0.11,0.12,2.91,2.921,2.933). The log-likelihood for the location has peaks near 0.5 and 2.5, and the (slightly) higher peak is the one near 0.5. Now make the next observation 10 and the mode of each of the two peaks barely moves but the second peak is now substantially higher. Gradient descent won't help you when that happens, it's almost like starting again. If your population is a mixture of two similar-size subgroups with different locations, such circumstances could occur - even in large samples. ... ctd
$endgroup$
– Glen_b♦
2 hours ago

$begingroup$
ctd... In the right situation, mode switching may occur fairly often.
$endgroup$
– Glen_b♦
1 hour ago

add a comment |

edited 6 hours ago

answered 6 hours ago

Glen_b♦

214k23414764

edited 6 hours ago

answered 6 hours ago

Glen_b♦

214k23414764

edited 6 hours ago

answered 6 hours ago

Glen_b♦

214k23414764

answered 6 hours ago

Glen_b♦

214k23414764

answered 6 hours ago

Glen_b♦

214k23414764

1

$begingroup$
Thanks! The point about the MLE possibly switching modes midstream is particularly helpful for understanding why this would be hard in general.
$endgroup$
– bamts
3 hours ago

$begingroup$
You can see this for yourself with the above unit-scale Cauchy model and the data (0.1,0.11,0.12,2.91,2.921,2.933). The log-likelihood for the location has peaks near 0.5 and 2.5, and the (slightly) higher peak is the one near 0.5. Now make the next observation 10 and the mode of each of the two peaks barely moves but the second peak is now substantially higher. Gradient descent won't help you when that happens, it's almost like starting again. If your population is a mixture of two similar-size subgroups with different locations, such circumstances could occur - even in large samples. ... ctd
$endgroup$
– Glen_b♦
2 hours ago

$begingroup$
ctd... In the right situation, mode switching may occur fairly often.
$endgroup$
– Glen_b♦
1 hour ago

add a comment |

1

$begingroup$
Thanks! The point about the MLE possibly switching modes midstream is particularly helpful for understanding why this would be hard in general.
$endgroup$
– bamts
3 hours ago

$begingroup$
You can see this for yourself with the above unit-scale Cauchy model and the data (0.1,0.11,0.12,2.91,2.921,2.933). The log-likelihood for the location has peaks near 0.5 and 2.5, and the (slightly) higher peak is the one near 0.5. Now make the next observation 10 and the mode of each of the two peaks barely moves but the second peak is now substantially higher. Gradient descent won't help you when that happens, it's almost like starting again. If your population is a mixture of two similar-size subgroups with different locations, such circumstances could occur - even in large samples. ... ctd
$endgroup$
– Glen_b♦
2 hours ago

$begingroup$
ctd... In the right situation, mode switching may occur fairly often.
$endgroup$
– Glen_b♦
1 hour ago

Thanks! The point about the MLE possibly switching modes midstream is particularly helpful for understanding why this would be hard in general.

– bamts
3 hours ago

You can see this for yourself with the above unit-scale Cauchy model and the data (0.1,0.11,0.12,2.91,2.921,2.933). The log-likelihood for the location has peaks near 0.5 and 2.5, and the (slightly) higher peak is the one near 0.5. Now make the next observation 10 and the mode of each of the two peaks barely moves but the second peak is now substantially higher. Gradient descent won't help you when that happens, it's almost like starting again. If your population is a mixture of two similar-size subgroups with different locations, such circumstances could occur - even in large samples. ... ctd

– Glen_b♦
2 hours ago

ctd... In the right situation, mode switching may occur fairly often.

– Glen_b♦
1 hour ago

add a comment |

In machine learning, this is referred to as online learning.

answered 5 hours ago

Cliff AB

13.6k12567

add a comment |

In machine learning, this is referred to as online learning.

answered 5 hours ago

Cliff AB

13.6k12567

add a comment |

In machine learning, this is referred to as online learning.

answered 5 hours ago

Cliff AB

13.6k12567

In machine learning, this is referred to as online learning.

answered 5 hours ago

Cliff AB

13.6k12567

answered 5 hours ago

Cliff AB

13.6k12567

answered 5 hours ago

Cliff AB

13.6k12567

answered 5 hours ago

Cliff AB

13.6k12567

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Cross Validated!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Bsrhrki

General Question

Toy Example

General Question

Toy Example

General Question

Toy Example

General Question

Toy Example

2 Answers
2

Your Answer

Post as a guest

2 Answers
2

2 Answers
2

Post as a guest

Popular posts from this blog

General Question

Toy Example

General Question

Toy Example

General Question

Toy Example

General Question

Toy Example

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

2 Answers 2

2 Answers 2

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

2 Answers
2

2 Answers
2

2 Answers
2