Random Numbers and Linear Regression
The Test Case y = rand + x
The goal is to see how good we can fit a line to the points generated by the equation:
y = rand + x
where rand is a random generated number between 0 + 1. Plotting such equation will show that y = rand + x will be bounded in between y = x and y = x + 1 (see screen shot below):
In each of the three tests cases presented, let x be a range of integers between 0 and 14.
X = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14}
In the cases presented today, all random amounts are rounded to three decimal places.
Case 1
Y = {0.735, 1.931, 2.007, 3.029, 4.499, 5.578, 6.794, 7.235, 8.847, 9.485, 10.326, 11.873, 12.287, 13.795, 14.101}
Intercept ≈ 0.530
Slope ≈ 0.996
Correlation ≈ 0.997
Case 2
Y = {0.900, 1.792, 2.462, 3.639, 4.083, 5.869, 6.088, 7.225, 8.458, 9.394, 10.401, 11.329, 12.283, 13.441, 14.478}
Intercept ≈ 0.627
Slope ≈ 0.976
Correlation ≈ 0.999
Case 3
Y = {0.066, 1.687, 2.654, 3.166, 4.879, 5.098, 6.891, 7.235, 8.504, 9.615, 10.217, 11.040, 12.823, 13.725, 14.397}
Intercept ≈ 0.428
Slope ≈ 1.006
Correlation ≈ 0.998
Testing 100 Curve Fits
The program RANDLR will test 100 test cases of 50 data points, X is a list generated of the integers from -24 to 25, while Y is calculated by X + rand, rounded to three decimal places. The intercept, slope, and correlation shown are the arithmetic average of 100 test runs. I used various random seeds to here are the results I used from five test runs:
Test 1:
Intercept ≈ 0.4977
Slope ≈ 1.0007
Correlation ≈ 0.9998
Test 2:
Intercept ≈ 0.4976
Slope ≈ 1.0007
Correlation ≈ 0.9998
Test 3:
Intercept ≈ 0.5012
Slope ≈ 1.0002
Correlation ≈ 0.9998
Test 4:
Intercept ≈ 0.5046
Slope ≈ 0.9997
Correlation ≈ 0.9998
Test 5:
Intercept ≈ 0.5012
Slope ≈ 1.0002
Correlation ≈ 0.9998
I can roughly estimate that a line to estimate y = x + rand is y = x + 0.5.
TI-84 Plus CE Program: RANDLR
Size: 322 bytes
"2020-07-26 EWS"
ClrHome
Disp "TEST: Y=AX+B","AGAINST Y=X+rand","PLEASE WAIT."
Fix 4
seq(X,X,-24,25,1) → L1
0 → A
0 → B
0 → R
For(I,1,100)
Output(5,10," ") \\ six spaces
Output(5,1,"PROGRESS:")
Output(5,12,I)
seq(round(X+rand,3),X,-24,25,1) → L2
LinReg(ax+b) L1, L2
A+a → A
B+b → B
R+r → R
End
.01 A → A
.01 B → B
.01 R → R
ClrHome
Output(3,1,"AVERAGE")
Output(4,1,"ITC:")
Output(4,6,B)
Output(5,1,"SLP:")
Output(5,6,A)
Output(6,1,"COR:")
Output(6,6,R)
Note:
The lines
Output(5,10," ") \\ six spaces
Output(5,1,"PROGRESS:")
Output(5,12,I)
generate an "in progress" counter so that the user how much longer the program will take.
L1 represents list 1 by pressing [ 2nd ] [ 1 ]. Likewise, L2 stands for list 2 and you can get the L2 character by pressing [ 2nd ] [ 2 ].
The statistics variables a, b, r are found in the VARS, Statistics, EQ menu.
The code above should work on any of the TI-83 Plus/TI-84 Plus family, but I only used it on the current TI-84 Plus CE.
Until next time,
Eddie
All original content copyright, © 2011-2020. Edward Shore. Unauthorized use and/or unauthorized distribution for commercial purposes without express and written permission from the author is strictly prohibited. This blog entry may be distributed for noncommercial purposes, provided that full credit is given to the author.
The Test Case y = rand + x
The goal is to see how good we can fit a line to the points generated by the equation:
y = rand + x
where rand is a random generated number between 0 + 1. Plotting such equation will show that y = rand + x will be bounded in between y = x and y = x + 1 (see screen shot below):
In each of the three tests cases presented, let x be a range of integers between 0 and 14.
X = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14}
In the cases presented today, all random amounts are rounded to three decimal places.
Case 1
Y = {0.735, 1.931, 2.007, 3.029, 4.499, 5.578, 6.794, 7.235, 8.847, 9.485, 10.326, 11.873, 12.287, 13.795, 14.101}
Intercept ≈ 0.530
Slope ≈ 0.996
Correlation ≈ 0.997
Case 2
Y = {0.900, 1.792, 2.462, 3.639, 4.083, 5.869, 6.088, 7.225, 8.458, 9.394, 10.401, 11.329, 12.283, 13.441, 14.478}
Intercept ≈ 0.627
Slope ≈ 0.976
Correlation ≈ 0.999
Case 3
Y = {0.066, 1.687, 2.654, 3.166, 4.879, 5.098, 6.891, 7.235, 8.504, 9.615, 10.217, 11.040, 12.823, 13.725, 14.397}
Intercept ≈ 0.428
Slope ≈ 1.006
Correlation ≈ 0.998
Testing 100 Curve Fits
The program RANDLR will test 100 test cases of 50 data points, X is a list generated of the integers from -24 to 25, while Y is calculated by X + rand, rounded to three decimal places. The intercept, slope, and correlation shown are the arithmetic average of 100 test runs. I used various random seeds to here are the results I used from five test runs:
Test 1:
Intercept ≈ 0.4977
Slope ≈ 1.0007
Correlation ≈ 0.9998
Test 2:
Intercept ≈ 0.4976
Slope ≈ 1.0007
Correlation ≈ 0.9998
Test 3:
Intercept ≈ 0.5012
Slope ≈ 1.0002
Correlation ≈ 0.9998
Test 4:
Intercept ≈ 0.5046
Slope ≈ 0.9997
Correlation ≈ 0.9998
Test 5:
Intercept ≈ 0.5012
Slope ≈ 1.0002
Correlation ≈ 0.9998
I can roughly estimate that a line to estimate y = x + rand is y = x + 0.5.
TI-84 Plus CE Program: RANDLR
Size: 322 bytes
"2020-07-26 EWS"
ClrHome
Disp "TEST: Y=AX+B","AGAINST Y=X+rand","PLEASE WAIT."
Fix 4
seq(X,X,-24,25,1) → L1
0 → A
0 → B
0 → R
For(I,1,100)
Output(5,10," ") \\ six spaces
Output(5,1,"PROGRESS:")
Output(5,12,I)
seq(round(X+rand,3),X,-24,25,1) → L2
LinReg(ax+b) L1, L2
A+a → A
B+b → B
R+r → R
End
.01 A → A
.01 B → B
.01 R → R
ClrHome
Output(3,1,"AVERAGE")
Output(4,1,"ITC:")
Output(4,6,B)
Output(5,1,"SLP:")
Output(5,6,A)
Output(6,1,"COR:")
Output(6,6,R)
Note:
The lines
Output(5,10," ") \\ six spaces
Output(5,1,"PROGRESS:")
Output(5,12,I)
generate an "in progress" counter so that the user how much longer the program will take.
L1 represents list 1 by pressing [ 2nd ] [ 1 ]. Likewise, L2 stands for list 2 and you can get the L2 character by pressing [ 2nd ] [ 2 ].
The statistics variables a, b, r are found in the VARS, Statistics, EQ menu.
The code above should work on any of the TI-83 Plus/TI-84 Plus family, but I only used it on the current TI-84 Plus CE.
Until next time,
Eddie
All original content copyright, © 2011-2020. Edward Shore. Unauthorized use and/or unauthorized distribution for commercial purposes without express and written permission from the author is strictly prohibited. This blog entry may be distributed for noncommercial purposes, provided that full credit is given to the author.