Powered by WordPress | Theme by mg12 | Remixed by Stanus
  • What We Talkin’ ‘Bout? Changes?

    New Memphis Grizzlies point guard, Allen Iverson, thought there would be a huge bidding war surrounding him during this offseason. Instead, things changed, and he was lucky to find himself on an NBA squad at all.

    And, just as Iverson must now be, the NBA Sim Tournament is nothing, if not, flexible during the research phase (or NBA Sim offseason, if you will). And, by that, I of course mean I’m shifting gears. Pretty much consider the last blog null and void.

    The original plan had me going through 82games.com and basketballvalue.com and copying down (or downloading) data. Then I would go through and pick out the individual pieces I needed.

    Unfortunately, I ran into a lot of issues from both sites that caused huge slow downs. 82games.com didn’t have any download-able versions of their data so I was finding myself going to every single player’s page over 6 seasons and keeping the stats in a spreadsheet. Basketballvalue.com had download-able datasets, but it was in an unreadable (for my purposes) form so I had to perform some major surgery on it.

    Well, I’ve had it. There’s no super pressing reason why I need to have all this data actually collected. I’ve created a technique called Cloning where I can locate a player’s closest basketball relative (playing-wise) to swap in missing stats. Using this technique I can just go to the cloned player’s pages on these sites, a process about 1,000,000 percent easier than copying down everyone’s data. Alright, here is the updated pre-Sim itinerary.

    # Task Completed
    1 Revisit pre-1978 Pace regressions formulas
    I use some linear regression models to come up with team pace for pre-1978 teams. This is based on data from 1980-2008 (NBA) and the 1968-1976 (ABA), however. The larger the sample size, the more comfortable I am with the results, therefore I will be adding in last year’s data (2009 NBA season) into these formulas and recalculate the regression formulae. I will also be adding in the 1974-1979 NBA season, as well (in fact, I’m not sure why I hadn’t added them before). It will be interesting to see if anything really changes.
    9/22/09
    2 Create Year-Team-Pace database
    Following that train of thought, I will be keeping track of all the team paces that are determined using these regression models by year. So, in the future, I can see how, say, the 1966 Boston Celtics’ estimated team pace has changed as more and more seasons have been entered into the regression sample pool. I’m hoping there isn’t much change, but I’d like to know the truth.
    9/22/09
    3 Enter in data into Year-Team-Pace database
    I will “backup” my estimated team pace results into this new database.
    9/22/09
    4 Calculate new pre-1978 Pace for each team
    Using the regression model, I will then calculate the team pace for every pre-1978 team in the NBA, ABA, and BAA.
    9/22/09
    5 Null out Pace values in REAL_team_misc table
    A housekeeping item: I want to keep only real pace values in the “real” database. Any pace values generated by my regression model will go in the new pace database.
    9/23/09
    6 Re-calculate APP in REAL_player_seasons for players on teams with newly generated pace values
    More housekeeping! With the new team pace values, player APP values will change slightly, as well. So I’ll need to go ahead and make this update as well.
    9/25/09
    7 Recreate work_82_player_season DB
    Before applying the new regression model, make sure the 82-game-standaradized DB is correct. The 82 game season wasn’t adopted until 1968 in the NBA and never in the BAA or ABA. This levels all the stats so that any changes to them going forward will be applied without bias to length of schedule.
    9/25/09
    8 Recreate work_apped_player-season DB
    Now that I have a foundation to work with, pace-standardize the already schedule-standardized stats (I standardize to the nice round pace number of 100 possessions per game). This means players on slower-tempo teams will not be penalized for fewer stats and vice versa for those on faster-paced teams. This step will use the new regression model generated team pace values.
    9/25/09
    9 Recreate work_flat_stats DB
    I pick out only a few specific stats from the schedule- and pace-standardized database to use in fiding player clones (field goal percentage, field goal attempts, three-point percentage, three point attempts, free throw percentage, free throw attempts, offensive rebounds, defensive rebounds, total rebounds, assists, steals, blocks, turnovers, personal fouls and points). All stats are totals, and not per-game.
    9/25/09
    10 Recreate work_flat_max DB
    I do a quick run through each flat stat (for example, total assists) and find the highest instance. This is probably (ok, most definitely) going to be John Stockton’s 1991 season. But, who knows, perhaps with the two waves of standardization and added 2009 season of data, maybe some of the other stats’ maxes have changed. But I doubt it. These max values are used for comparison to determine a player’s clone.
    9/25/09
    11 Redo player clones
    I run scripts I have created that determine every single player’s clone from every single season he played. This is (as of 2009) over 22,000 clones that must be created per type. And, I have 5 clone types: Most Similar of All Time, Most Similar from Last Season (2009), Most Similar from 1980-Present, Most Similar from 2003-Present, and Most Similar from 2007-Present. Each has it’s specific uses, but I won’t go into that here.
    9/27/09
    12 Recreate work_cloned_player_seasons DB
    Next, I run another script that populates the missing stats of players with their clone’s stats. In 1949, the BAA did not record minutes played, three-pointers made, three-pointers attempted, offensive rebounds, defensive rebounds, total rebounds, steals, blocks, and turnovers (whew!). If 1949 George Mikan’s clone is selected as 1991 Karl Malone, however, all of those missing Mikan stats are filled in with Malone’s. (NOTE: Stats are all schedule- and pace-standardized)
    9/27/09
    13 Rename work_ DBs to include a number showing what step in the process they are
    This is merely a housekeeping task but one that needs to happen as I have so many different tables in my database, they all tend to run together. To make matters worse, I’ve forgotten in what order this process should be done so some sort of number in the table names will help me out immenseley.
    9/27/09
    14 Come up with team rosters
    Once I have all missing stats, I can find the top 12 players from each franchise. As usual, I will then make sure each roster is unique (a player can only be on one team so the team who has his best seasons, statistically, gets him).
    10/7/09
    15 Calculate player Attributes/Tendencies
    I already have all of these formulas figured out. I just need the team rosters set and can then burn through this process. I’ll probably eyeball them as I go to make sure things are turning out right (for example, we don’t want everyone’s speed to be maxed out at 99 … I don’t want Shaq tying Iverson in a footrace).
    16 Calculate Team Sliders
    I have these formulas created as well, but they are dependent on player Attributes and Tendencies, so they must be done second.
    17 Sim!
    Self-explanatory.

    I’ll continue updating this checklist with the dates I complete these tasks. I’m doing pretty well, so far, but we’ll see how well this efficiency lasts.

    Tuesday, September 22nd, 2009 at 17:22
No comments yet.

Leave a comment

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>
TOP