Show HN: An easy-to-use online curve fitting tool
byx2000.github.ioThis is a powerful online curve fitting tool that supports fitting dozens of commonly used functions and implicit functions. It features a clean interface and simple operation. If you need to perform curve fitting but don't want to learn professional software like Matlab or Origin, you can try this tool.
It'd be nice if there was some demo data because I might want to play with it to see how it works, but don't have any data to use it on.
I've now tried it by adding a bunch of random points and I find it very cool! It can make curves fit very snugly. Maybe enhance it by a mode that runs all the models and shows you which one has the least errors/best fit.
Not to cast shade, but it looks like you've essentially built a front-end for Desmos. It definitely makes things faster than trying to do it directly in Desmos.
Suggestion: Most of the fits that you've done assume that the errors are normally distributed. It would be worthwhile adding some graphical or numerical checks on that, rather than having goodness of fit or visual inspection be the only indication if this is a faulty assumption.
It gave made for a good quick check testing some data I had.
The fit diagnostics at the top of the plot are inadequate. This needs at a minimum error estimates on the estimated parameters (probably bootstrap) and ideally some kind of "error envelope" on the plot.
I don’t think you can do anything sensible here without making much stronger modelling assumptions. A vanilla non-parametric bootstrap is only valid under a very specific generative story: IID sampling from a population. Many (most?) curve-fitting problems won't satisfy that.
For example, suppose you measure the decay of a radioactive source at fixed times t = 0,1,2,... and fit y = A e^{-kt}. The only randomness is small measurement error with, say, SD = 0.5. The bootstrap sees the huge spread in the y-values that comes from the deterministic decay curve itself, not from noise. It interprets that structural variation as sampling variability and you end up with absurdly wide bootstrap confidence intervals that have nothing to do with the actual uncertainty in the experiment.
These are all big topics, but any "parametric curve fitting" like this tool uses is parameter estimation (the parameters of the various curves). That already makes strong modeling assumptions (usually including IID, Gaussian noise, etc.,) to get the parameter estimates in the first place. I agree it would be even better to have ways to input measurement errors (in both x- & y- !) per your example and have non-bootstrap options (I only said "probably"), residual diagnostics, etc.
Maybe a residuals plot and IID tests of residuals (i.e. tests of some of the strong assumptions!) would be a better next step for the author than error estimates, but I stand by my original feedback. Right now even the simplest case of a straight line fit is reported with only exact slope & intercept (well, not exact, but to an almost surely meaningless 16 decimals!), though I guess he thought to truncate the goodness of fit measures at ~4 digits.
I think we are just coming at this from different angles. I do understand and agree that we are estimating the parameters of the fit curves.
> That already makes strong modeling assumptions (usually including IID, Gaussian noise, etc.,) to get the parameter estimates in the first place
You lose me here - I don't agree with "usually". I guess you're thinking of examples where you are sampling from a population and estimating features of that population. There's nothing wrong with that, but that is a much smaller domain than curve fitting in general.
If you give me a set of x and y, I can fit a parametric curve that tries to minimises the average squared distance between fit and observed values of y without making any assumptions whatsoever. This is a purely mechanical, non-stochastic procedure.
For example, if you give me the points {(0,0), (1,1), (2,4), (3,9)} and the curve y = a x^b, then I'm going to fit a=1, b=2, and I certainly don't need to assume anything about the data generating process to do so. However there is no concept of a confidence interval in this example - the estimates are the estimates, the residual error is 0, and that is pretty much all that can be said.
If you go further and tell me that each of these pairs (x,y) is randomly sampled, or maybe the x is fixed and the y is sampled, then I can do more. But that is often not the case.
What methods can you use the estimate the standard error in this case?
The radioactive decay example specifically? Fit A and k (e.g. by nonlinear least squares) and then use the Jacobian to obtain the approximate covariance matrix. The diagnonal elements of that matrix give you the standard error estimates.
Pretty sure this could replace one of my junior quants!
Very nice. I will use this at school to quickly produce fits. File import does not seem to work though...
Would be very nice if I could copy-paste data straight from a spreadsheet software.
Copy-paste worked perfectly for me. I just copied 16 data points I had in an open G Sheets tab and used the "Batch Add" button.