A multi-objective evolutionary computing framework for optimizing Major League Baseball starting lineups using real MLB data and advanced sabermetric principles.
This project uses evolutionary algorithms to find optimal MLB starting lineups by balancing multiple competing objectives. Rather than simply sorting players by individual statistics, the system considers strategic lineup construction principles like run production cascades, proper leadoff characteristics, player synergies, and position-specific optimization.
- Multi-Objective Optimization: Uses Pareto-optimal solutions to reveal tradeoffs between different lineup objectives
- Real MLB Data Integration: Pulls current roster data from MLB's official Stats API and FanGraphs statistics
- Advanced Sabermetrics: Incorporates modern baseball analytics including wOBA, wRC+, OPS, and contact rates
- Evolutionary Framework: Employs genetic algorithm-inspired agents to evolve lineups over time
- Bench Integration: Considers full roster depth with bench substitution optimization
- Performance Profiling: Built-in performance monitoring to optimize computation efficiency
- Progress Tracking: Real-time evolution progress visualization
- Results Documentation: Comprehensive scoring analysis and improvement tracking
Based on sabermetric research and game theory, the optimal lineup construction follows this strategic framework:
1st Position - High OBP Leadoff
- Primary objective: Get on base to create scoring opportunities (OBP ≥ 0.310)
- Key metrics: On-Base Percentage (OBP)
- Strategy: Sets the table for power hitters behind them
2nd Position - Best Overall Hitter
- Primary objective: Maximum offensive production in high plate appearance slot
- Key metrics: Combined score of wOBA + wRC+ ≥ 140
- Strategy: Your best hitter gets the most at-bats here
3rd Position - Balanced Contact/Power
- Primary objective: Solid production with balanced OBP and slugging
- Key metrics: OBP ≥ 0.330, SLG ≥ 0.400, difference ≤ 0.070
- Strategy: Versatile hitter who can drive in runs or set up the cleanup spot
4th Position - Pure Power (Cleanup)
- Primary objective: Maximum run production potential
- Key metrics: Top-3 SLG on team or SLG ≥ 0.450
- Strategy: Traditional cleanup hitter for driving in baserunners
5th Position - Contact/Low Strikeouts
- Primary objective: Put ball in play, avoid rally-killing strikeouts
- Key metrics: K% ≤ 20%, Contact% ≥ 78%
- Strategy: Keep the line moving, protect runners
6th-9th Positions - Depth and Defense
- Primary objective: Minimize offensive black holes while maintaining defensive integrity
- Strategy: Utilize bench depth and defensive specialists effectively
The system evaluates lineups using seven distinct objectives:
- Proper Leadoff (
proper_leadoff): Ensures leadoff hitter has sufficient OBP (≥0.310) - Run Production Cascade (
run_production_cascade): Measures OBP×SLG synergies between consecutive hitters - Best Nine (
best_nine): Position-aware comparison ensuring strongest players are in lineup - Proper Best Hitter (
proper_best_hitter_second): Validates 2nd spot has elite combined metrics - Proper Third Hitter (
proper_third_hitter): Ensures balanced contact/power in 3-hole - Proper Cleanup Hitter (
proper_cleanup_hitter): Validates power production in 4th spot - Proper Fifth Hitter (
proper_fifth_hitter): Ensures contact-oriented approach in 5th spot
Six specialized agents modify lineups during evolution:
- Swapper (
swapper): Randomly exchanges two players in batting order - Better Bench Agent (
better_bench_agent): Substitutes lineup players with superior bench options at same position - Wasted OBP Agent (
wasted_obp_agent): Fixes high-OBP players followed by low-SLG hitters - Wasted SLG Agent (
wasted_slg_agent): Addresses high-SLG players preceded by poor table-setters - Leadoff Agent (
leadoff_agent): Optimizes leadoff position for maximum OBP - Best Hitter Agent (
best_hitter_agent): Ensures elite offensive production in 2nd spot
Each lineup solution is represented as a comprehensive dictionary:
{
"lineup": [
{
"name": "Aaron Judge",
"position_code": "9",
"jersey_number": "99",
"player_id": 592450,
"batting_side": "R",
"defensive_position": "Right Field"
}
// ... 8 more players
],
"available_roster": [
{
"name": "Giancarlo Stanton",
"position_code": "10",
"jersey_number": "27",
"player_id": 519317,
"batting_side": "R",
"defensive_position": "Bench"
}
// ... remaining bench players
],
"opposing_pitcher": {
"name": "Zack Wheeler",
"throws": "R"
},
"game_context": {
"ballpark": "Citi Field",
"weather": "Clear",
"inning": 1,
"situation": "standard"
},
"team_info": {
"name": "New York Yankees",
"team_id": 147
}
}- Language: Python 3.8+
- Key Libraries:
requestsfor MLB API integrationpandasfor statistical data managementnumpyfor numerical computationspybaseballfor additional baseball statisticsmatplotlibfor visualization
- Optimization Method: Non-dominated sorting with Pareto frontier analysis
- Data Sources: MLB Stats API, FanGraphs CSV exports
- Caching System: Intelligent roster and statistics caching to minimize API calls
- Profiling: Built-in function-level performance monitoring with
@profiledecorator - Efficient Player Lookup: Uses
pybaseball.playerid_lookup()for fast player ID resolution
from main import main
from api import MLBStatsAPI
from evo import Evo
# Run complete optimization workflow
main()# Initialize the evolutionary framework
E = Evo()
# Add objectives (customize as needed)
E.add_objective("proper_leadoff", proper_leadoff)
E.add_objective("run_production_cascade", run_production_cascade)
E.add_objective("best_nine", best_nine)
# Register agents
E.add_agent("swapper", swapper, k=1)
E.add_agent("better_bench_agent", better_bench_agent, k=1)
E.add_agent("wasted_obp_agent", wasted_obp_agent, k=1)
# Generate initial solution and evolve
api = MLBStatsAPI(update=True)
sol = api.init_sol('New York Yankees', 'Zack Wheeler', 'R', 'Citi Field', 'Clear')
E.add_solution(sol)
E.evolve(time_limit=60) # 60 second evolution
# Get results
best_solution = E.get_best_solution()
E.summarize()
E.get_scores_chart()The system generates comprehensive results:
- best_solution.json: Optimized lineup with full player details
- evolution_scores.csv: Complete scoring history for all generated solutions
- score_differences.csv: Detailed improvement analysis comparing initial vs. final lineups
- objective_scores_chart.png: Visualization of objective function performance
- Console Output: Real-time progress bar and evolution statistics
Based on recent optimization runs, the system consistently achieves:
- Convergence Time: 60 seconds for stable solutions
- Solution Quality: Multiple Pareto-optimal solutions revealing strategic tradeoffs
- Improvement Metrics: Measurable enhancement across all objective functions
- Processing Speed: ~1000+ solution evaluations per evolution cycle
- Platoon Advantages: Left/right-handed matchup optimization against specific pitchers
- Situational Context: Late-inning, high-leverage, and postseason lineup adjustments
- Advanced Metrics: Integration of Statcast data (exit velocity, launch angle, barrel rate)
- Historical Validation: Back-testing optimized lineups against actual game results
- Real-time Integration: Live game data integration for in-game optimization decisions
- Machine Learning Enhancement: Neural network-based objective function learning
- Multi-team Analysis: Comparative optimization across different team compositions
This project demonstrates the application of evolutionary computing to real-world optimization problems in sports analytics. The work bridges multiple disciplines:
- Computer Science: Multi-objective optimization, genetic algorithms, performance profiling
- Statistics: Sabermetric analysis, regression modeling, data validation
- Operations Research: Resource allocation, constraint satisfaction, decision optimization
- Sports Science: Athletic performance modeling, strategic decision-making
The evolutionary approach reveals that optimal lineup construction involves complex tradeoffs that cannot be captured by simple statistical ranking, making this a compelling case study for multi-objective optimization techniques.
This project represents ongoing research into evolutionary computing applications in sports analytics and multi-objective decision-making frameworks.