๐ Statsmodels OLS
Statsmodels allows you to explore data, estimate statistical models, and perform statistical tests.
Mastering this concept will significantly boost your Python data science skills!
๐ป Code Example:
import statsmodels.api as sm import numpy as np import pandas as pd np.random.seed(42) # 1. Generate realistic dataset โ predict pynfinity score n = 200 study_hours = np.random.uniform(1, 12, n) prev_score = np.random.uniform(40, 90, n) is_premium = np.random.choice([0, 1], n, p=[0.7, 0.3]) # True relationship: score = 4*hours + 0.4*prev + 8*premium + noise score = (4.0 * study_hours + 0.4 * prev_score + 8.0 * is_premium + np.random.normal(0, 5, n)) df = pd.DataFrame({ "score" : score, "study_hours": study_hours, "prev_score" : prev_score, "premium" : is_premium, }) # 2. Fit OLS X = sm.add_constant(df[["study_hours", "prev_score", "premium"]]) model = sm.OLS(df["score"], X).fit() print(model.summary()) # 3. Key diagnostics print(f"\nRยฒ : {model.rsquared:.4f}") print(f"Adj. Rยฒ : {model.rsquared_adj:.4f}") print(f"F-statistic: {model.fvalue:.2f} (p-value: {model.f_pvalue:.6f})") print("\nCoefficients:") print(model.params.round(4)) # 4. Predict for a new student new_student = pd.DataFrame({ "const" : [1.0], "study_hours": [8.0], "prev_score" : [75.0], "premium" : [1], }) prediction = model.predict(new_student)[0] conf_int = model.get_prediction(new_student).conf_int()[0] print(f"\nPredicted score: {prediction:.1f} (95% CI: {conf_int[0]:.1f} โ {conf_int[1]:.1f})")
Keep exploring and happy coding! ๐ป