Your PHP website - how to factor and refactor to reduce growing pains
Archive - Originally posted on "The Horse's Mouth" - 2011-09-24 11:21:06 - Graham EllisAs your project grows ... what do you change? In an ideal world, you would know exactly what you were coding before you started, and write the full job to spec to last for many years. This isn't an ideal world, though. Our web site has changed over the years - we now have "version 8" (See [here] to see some older versions), and although there are elements that do not change, it's now really a different sit eto it was at the start.
Of course, it can never be an ideal world - we simply couldn't do in 2001 what we can do in 2011. And had we waited, we would have done nothing - and we would now be planning a site that was fit to be unaltered up to 2021. So work has to be done and then updated - much of the trick is in thinking far enough ahead to reduce that updating, and coding in a way that makes your work condusive to such changes. This is great stuff to say, but you don't necesarily know all the tricks from Day 1, nor is it practical to apply them all (for some will cost you development time for later return) in the early days. And I'm also an advocate of spike solutions - writing some experimental code to see if something wil work, in the full knowledge that you'll simply learn from it and the regress and rewrite it from the strength of that knowledge. The "emailing your server" example I recently wrote - [here] - is a classic example of a spike solution; the proof of concept and exploration of initial alorithms is good, but it now needs to take on consideration of other MIME types, pick up attachment names and have a good layer of security added to it to stop people sending me emails that
So - what SHOULD you be looking at and considering as your project grows?
Refactoring to Objects
Separating out the code that relates to particular data types, so that it can have its own test harness, so that it can be re-used across multiple related programs, so that you can deal with multiple instances easily, so that calls to your data type can easily be sanitised at the caller interface (API), and so that you can have a whole series of similar types of data (e.g. bank accounts, stocks, bonds, futures ...) based on a common set of logic, with separate classes defining what changes in each case from the common logic.
Dettaching your (My)SQL

If you're using MySQL - it's excellent for most jobs - stick with it. But separate out the calls into a separate layer - change your calls to mysql_summat or mysqli_summat into calls to sql_summat and then you can write a simple intermediate wrapper so that all your MySQL calls are in one place. You then have just one (wrapper) file to deal with if you want to change your code to another broadly equivalent database. More extreme alternatives include adding in a degree of functionallity to the wrapper so that calls to it from the main code can be much more generic, limiting the number of types of different calls to aid portability, and indeed using an intermediate code level (a database abstraction library / level) such as ODBC, AdoDB, OpenDBX, DBI:: and DBIx. Systems such as Django and Rails include code to dettach the database from the application, and indeed to provide the additional validation of relationships and data intergrity that you want but doesn't always come with a database and direct calls.
Using approaches such as these, you can make a far better tuned decision as to whether your data should be stored via a database server (such as MySQL, Oracle, PostgreSQL, MSSQL ...) or via a lighter structure within your code (SQLite, CSV files, Access), or indeed in different ways on different installations of your system.
Image - is the client / server structure of MySQL always going to be right for you?
Status Variables
Have you ever come across a conditional like this: "If there's enough petrol in the car for a 20 mile journey and we would like to go somewhere, or if there's no petrol but we need to go somewhere and we have the money for fuel and there is no public transport that will do OR if we have an urgent journey and either we have fuel of we have a way of buying it and we can't get a lift or it's not so close that we can walk anyway ....". Not easy to follow, is it?
Code that starts off simple develops complex conditions over time, and that makes it into a spider's web to test and would produce some quite horrid logic tables to check you've got everthing right ... but only for you to find at a later date that some obsure set of conditions isn't met correctly. And then a single change will spread, ripple like across a pond, and introduce new issues.
As you see your code moving towards this complexity of logic, start to divide the tests out and use status variables. You'll find that much of our code has
$error = 0;
or $aok = 1;
at the top of the main logic, and then these variables get flipped whenever a test is made that will effect the validity of the final page results. It saves the need for a single truely horrid (and unmaintainable) check at the point at which a decision is made based on the presenece of an error, and it also means that you don't have to store all the variables needed to help with that decision right through to the final decision point.Exceptionally, I'm reasonably happy with the use of a global variable for a few error statuses - though an error object class is a much more perfect approach. And once you're going down the error status / error object approach, you can use it to gather up error messages so that you can easily explain to your user which test it was that failed, rather than just giving him a "blue screen of death".
Commonallity Tables
Have you even seen code like this:
$_SESSION[name] = get_magic_quotes_gpc() ? stripslashes($_REQUEST['name'] : $_REQUEST['name'];
$_SESSION[email] = get_magic_quotes_gpc() ? stripslashes($_REQUEST['email'] : $_REQUEST['email'];
$_SESSION[phone] = get_magic_quotes_gpc() ? stripslashes($_REQUEST['phone_no'] : $_REQUEST['phone_no'];
$_SESSION[course] = get_magic_quotes_gpc() ? stripslashes($_REQUEST['course'] : $_REQUEST['course'];
etc
Oh - for goodness sake - write an array and a loop. The code will end up much shorter, and you'll be able to spot any breaks from the pattern very easily in the maintainance phase of the project. (Did you note that "phone_no" became "phone" in my example?)
Pagination
I wrote a system in 199x for worldwide time card completion by a team of specialists out in "the field". The team comprised a few dozen people, and there were 74 project / task codes. The system was a miracle solution for the company who used it, allowing bills to be sent out from HQ within a couple of days of each month end. In 200x (where "x" has the same value as previously - i.e. 10 years later), I received a complaint that the updating of the project / task codes - a monthly job for one of their admin staff - was getting very slow; it turned out they were up to 1035 project / task codes, giving lie to the "it will only grow a little" that had been told to me on the writing of the system. And updating was done by downloading the whole project/task file into an editor window on the browser and re-uploading.
There becomes a point where a pagination system needs to be built into code - and it's far better to anticipate that up front than later on. Later on, the amoount of work can seem quite disproportionate. And with a "page 1, page 2, page 3" type approach you can do OK ... when you get up to dozens of pages, you need to have sorting, filtering and searching options too. You then start looking at user configurability and favourites. See [here] for a paginated example of our blog ... with search box to let you search through titles too. Far better than having everything on one page, but still plenty of scope for improvement due to data size. The blog started off as a few weeks' experiment. And has been running since 2004 ...
Towards more Formalised MVC
Are your staff excellent programmers? Are they superb graphic designers? Do that understand the structure of the data really well? And are they well versed in the layout of website URLs, mod_rewrite and the like? If you can honestly answer "yes" to all those questions, for every member of your team, you truely have a team that's second to note. And probably a team that requires a really high salary, with gaps being very hard to fill if any one person were to leave. But chances are that even with your fantastic team, they'll be loosing time as they work on your web site if all the HTML is mixed in with the PHP code, the MySQL, the CSS, the JavaScript and the natural language (English?) content.
The "four layer model" - or "MVC technology" - can help you resolve this issue by uncoupling each of the elements. MVC is really just a fancy name for a design approach in which each of the elements is written in its own area, and with each area having its own expert / group of experts with a clearly defined interface between them. It means that you can have the world's best designers working on the look and feel of your pages, without them needing to understand any programming, without them having to 'lock' the code away from the programmers because they're working on files that include programs, and without effecting the functionallity by making changes. And it means that your programmers can be sorting out user's issues, developing new functions and algorithms without having to puch through a forest of style and HTML tags to do so. By separting out the "business logic" into its own area of the code, you can even use it for offline processing (batch / overnight stuff, to provide "web2" services, etc), and by separating out the look and feel into templates, you can change the template and give your website a new lick of paint - or a personalised paint job for different clients - very easily.
P.S. MVC = Model, View. Controller. Or ... how that data is structure and held, how the screen presentation is done, and how that data is moved from the model to the view.
Single Encompassing script



http://www.wellho.net/share/potomaccrossing.html - Bridges over the Potomac
http://www.wellho.net/share/courseterms.html - Terms and conditions for training courses
http://www.wellho.net/share/riverside.html - Melksham's Riverside Walk
Three different URLs ... but internally all are routed to a single script, with "potomaccrossing", "courseterms" and "riverside" becoming a parameter as if it had been filled in on a form. The single script - in this case - then goes and checks with our databases for the record that matches the page name in one of its columns.
Images - three views, three different URLs, all the same script
Using Modules

By using standard modules, you're piggy-backing on other people's expertise. And these other people are typically the enthusiasts and experts. Modules distributed by Open Source tend to be updated from time to time, and indeed feedback to the originators helps them do so. And if some new technology / standard level comes along which needs support, chances are that they'll add it. And they'll do so quicker than your team could, and probably at no cost to you. All your team has to do is to drop in the new version of the component which - if the job's been done right - will be "plug and play" with extra features switched on by an extra option on a method call, or incoming data arriving in a new format.
Some modules ship with products ... others (typically those which are more niche / specialised) need to be downloaded separately. You'll find libraries for most languages, and install tools too. Ruby Gems, Perl's CPAN, Python's Pypi, Pear and Pecl for PHP. As part of the investigation for the previous article I wrote, I installed the imap module on our PHP server; that's far better than writing my own email decoder (though I do have the tools to do so). And we also use MagpieRSS and MaxMind - to pick up news feeds, and to identify IP addresses back to the country and town from which our users are browsing.
Image - using MaxMind to show where visitors to our web site arrive from. This is an analysis of all of yesterday's UK visitors.
And What do you NOT want to change?
* Ability to use old data
* URLs - especially ones that have lots of external inbound links
* Anything in a rush / for the sake of it
And in summary
This is a long article, inspired by a recent customer / tailored training course. It would be easy - far too easy - for me to come across as critical as I make all these points that concern existing code. But code goes through an evolution as do systems, and it's impossible to know early on WHICH are the areas that will effect any particular project.
The inspiration behind this article is web site, and PHP based.
If you're a newcomer to programming, we can teach you how to program in PHP from scratch - Learning to Program in PHP; if you've programmed before, but in another language (and perhaps not for the web), then PHP Programming would suit you better.
For delegates who already know the basics of PHP, our PHP Techniques course is an excellent second level course - it covers many of the aspects I have mentioned above, together with web2, searching, graphics and geographics and much more. A PHP revision start at the beginning of the course helps us dot "i"s and cross "t"s for our delegates, and during the course the tutor can also fill in any gaps which are revealed; many PHP programmers are self-taught and inevitably they'll have missed out on some tips and techniques.
Object Orientation is a big subject in its own right, and is worthy of a single day course on its own - Object Oriented PHP. For delegates who are learning PHP, and likely to be going straight into medium sized to larger projects, "OO" is a vital subject, so we run the OO day as an optional extra directly following the regular PHP beginner courses. But it can also be taken as a single day by delegates who have started on smaller projects, or perhaps projects which are large enough to benefit from the OO model but that hadn't been realised.
Finally, if you feel that a day or two of external code review could help, please ask me! Such work may provide you with some valuable thoughts and pointers. "This has paid for itself already" said a participant in such a session the week before last ... and that was before 10 a.m. on the first of two days!